Provide cleaning function. #721

llrs · 2022-08-08T11:16:09Z

Many text analysis remove url, hashtags, cashtags and user mentions.
It would be nice if there were a function to remove this from the information provided by the API.

meier-flo · 2022-09-23T09:03:12Z

Great idea! I feel like working with the new data structure is quite tricky.
For example, the entities list column: I can easily get access to the hashtags, however, the user_mentions I can't figure out how to unnest here:

search_object_result%>%select(id_str,entities)%>%
                                                  unnest_auto(entities)%>%
                                                          unnest(hashtags)

However, this breaks:

search_object_result%>%select(id_str,entities)%>%
                                                  unnest_auto(entities)%>%
                                                          unnest(user_mentions)

Using unnest_wider(entities); elements have 5 names in common
Error: ! Can't combine ..1$indices <data.frame> and ..22$indices .
Run rlang::last_error() to see where the error occurred.

A cleaning function for both text and maybe the tricky list columns would be nice!

llrs · 2022-09-23T13:13:48Z

Entities is itself a list: search_object_result$entities[[1]]$user_mentions, search_object_result$entities[[2]]$user_mentions, ...
so to extract each user_mentions one would need to do something like lapply(search_object_result$entities, function(x){x$user_mentions}).

I suppose that unnest_auto is from tidyr or some other similar package but I'm not sure how it handles lists but I hope this is helpful.

llrs · 2022-09-27T09:39:03Z

In the devel branch I implemented a function that can remove url, media links, mentions and hashtags from the text and returns the remaining text. You can test it in the devel branch (I recommend to activate the dev mode for this testing)

devtools::dev_mode()
remotes::install_github("ropensci/rtweet@devel")
library("rtweet")
clean_tweets(search_object_result)

Let me know if this works well or you think a different way would be better

llrs · 2022-10-04T14:58:27Z

I'm closing the issue, but if you have any feedback will be appreciated.

llrs · 2022-12-09T00:29:03Z

In the latest version in devel (1.0.2.9014) there are some helpers to extract these. After installing it check ?helpers. Please @meier-flo let me know if they help or I should document better the output for each helper

llrs closed this as completed Oct 4, 2022

llrs added the enhancement label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide cleaning function. #721

Provide cleaning function. #721

llrs commented Aug 8, 2022

meier-flo commented Sep 23, 2022

llrs commented Sep 23, 2022

llrs commented Sep 27, 2022

llrs commented Oct 4, 2022

llrs commented Dec 9, 2022

Provide cleaning function. #721

Provide cleaning function. #721

Comments

llrs commented Aug 8, 2022

meier-flo commented Sep 23, 2022

llrs commented Sep 23, 2022

llrs commented Sep 27, 2022

llrs commented Oct 4, 2022

llrs commented Dec 9, 2022