![]() ![]() #> # … with 230 more rows, and abbreviated variable names ¹sotu_type, ²n_citizen #> X president year years_active party sotu_…¹ doc_id text n_cit…² Sotu_whole %>% mutate( n_citizen = str_count(text, "citizen")) #> # A tibble: 240 × 9 There are several ways to handle this sort of cleaning, we’ll show a few examples below. remove urls or certain numbers, such as phone numbers,.Depending on the quality of your data and your goal, you might for example need to: Typically quite a bit of effort goes into pre-processing the text for further analysis. Now that we have our data combined, we can start looking at the text. #> $ text "\n\n Fellow-Citizens of the Senate and House of Represen… #> $ sotu_type "written", "written", "written", "written", "written", "w… #> $ party "Republican", "Republican", "Republican", "Republican", "… #> $ president "Abraham Lincoln", "Abraham Lincoln", "Abraham Lincoln", … Sotu_whole % arrange(president) %>% # sort metadata bind_cols(sotu_texts) %>% # combine with texts as_tibble() # convert to tibble for better screen viewing glimpse(sotu_whole) #> Rows: 240
0 Comments
Leave a Reply. |