text mining - Taking a random sample of a structural topic model object list -

June 15, 2010

i take random sample of structural topic model object list stm experimentation before running on full sample.

my current solution following:

library(stm); library(quanteda)  # use data available in stm package df <- gadarian  # convert vector corpus mycorpus <- corpus(df$open.ended.response)   # convert doc-feature-matrix dfm <- dfm(mycorpus,           remove = c(stopwords("english")),          ngrams= 1l,         stem = f,        remove_numbers = true,        remove_punct = true,      remove_symbols = true)   # use quanteda converter convert our dfm stmobject <- convert(dfm, = "stm", docvars = docvars(mycorpus))  # work on smaller sample experimentation set.seed(10) small.df.rows <- sample(1:nrow(df), 10)  # subsample data stmdf.sm <- list(documents = stmobject$documents[small.df.rows], vocab = stmobject$vocab) # dtm df.sm <- df[small.df.rows,] # meta-data   # preprocess ##out <- prepdocuments(stmobject$documents, stmobject$vocab, stmobject$meta, lower.thresh = 5)   # create prevalance variable , place in global env ##treatment <- df.sm$treatment  # run ##stmfit.sm <- stm(out$documents, out$vocab, k = 0, prevalence =~ treatment ,    ##                 max.em.its = 150, init.type = "spectral", seed = 300, verbose = t, ngroups = 5)

i know 1 can create held.out data in stm, adding covariates (df in our example above) dfm object, however, prefer have them (the corpus , covariates) separate objects post-estimation calculations.

i appreciate advice.

ps: background question stm estimations crashes. have been unable identify reason. 1 issue might way creating random samples (sometimes stm works, not). or, issue might converting document-term-object of tm package stm corpus-object following function: readcorpus(dtm, type ="slam"). have filed separate , more detailed issue report on github (https://github.com/bstewart/stm/issues/89). if have additional advice on that, appreciate lot too. ds

Search This Blog

Force Net

text mining - Taking a random sample of a structural topic model object list -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -