Text Mining Barack Obama using R #rstats

DECISION STATS

  • We copy and paste President Barack Obama’s “Yes We Can” speech in a text document and read it in. For a word cloud we need a dataframe with two columns, one with words and the the other with frequency.We read in the transcript from http://www.nytimes.com/2008/01/08/us/politics/08text-obama.html?pagewanted=all&_r=0  and paste in the file located in the local directory- /home/ajay/Desktop/new. Note tm is a powerful package and will read ALL the text documents within the particular folder

library(tm)

library(wordcloud)

txt2=”/home/ajay/Desktop/new”

b=Corpus(DirSource(txt2), readerControl = list(language = “eng”))

> b b b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

Now it seems we need to remove some of the very commonly occuring words like “the” and “and”. We are not using the standard stopwords in english (the tm package provides that see Chapter 13 Text Mining case studies), as the words “we” and “can” are also included .

> b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

But let’s see…

View original post 112 more words

Advertisements