Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter ingestion #1225

Open
1 task done
shatanikmahanty opened this issue Oct 4, 2024 · 9 comments
Open
1 task done

Twitter ingestion #1225

shatanikmahanty opened this issue Oct 4, 2024 · 9 comments
Assignees

Comments

@shatanikmahanty
Copy link
Contributor

🔖 Feature description

Add new remote ingestion method from Twitter

🎤 Why is this feature needed ?

It will allow users to ingest data from Twitter

✌️ How do you aim to achieve this?

I plan to use

https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.twitter.TwitterTweetLoader.html

🔄️ Additional Information

No response

👀 Have you spent some time to check if this feature request has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

Yes I am willing to submit a PR!

@shatanikmahanty
Copy link
Contributor Author

@dartpain may I work on this. I will try to achieve this by following examples and testing it out over the coming week!

@dartpain
Copy link
Contributor

dartpain commented Oct 5, 2024

Seems like a cool idea.
I do have one suggestion to make this a killer feature.

Most people might scrape X/twitter once in a while. But what if we do it similarly to
https://github.com/arc53/DocsGPT/blob/main/application/retriever/brave_search.py

Such that instead of ingesting data into similarity search vectordb we can create a search query to X/twitter and analyze current data.

@shatanikmahanty
Copy link
Contributor Author

@dartpain seems like a really cool addition. Will investigate the suggested integration and start on this by mid next week. Will keep updating the status here!

@shatanikmahanty
Copy link
Contributor Author

@dartpain after careful review of the requirements I found out that langchain doesn't have twitter search. They had an open issue in which they mentioned it won't be implemented because of pricing related concerns. Attaching the link to the same: langchain-ai/langchain#11538

Although search can be integrated through using the Twitter search API, I have one concern is how will we process the question put forward by the user as a prompt. In case of LangChain we use the run method on the search result. If we go with the twitter API, is there anything similar we can do?

@dartpain
Copy link
Contributor

I suggest you even use llm to genrate a search query and then use it in the search api

@shatanikmahanty
Copy link
Contributor Author

I suggest you even use llm to generate a search query and then use it in the search api

I see, thanks for the suggestion. I will use it accordingly and generate search queries. Once we are done with generating search results, I plan to pass that to the LLM again and summarise search results to give a readable answer

@shatanikmahanty
Copy link
Contributor Author

shatanikmahanty commented Oct 20, 2024

@dartpain I was trying to use the classic rag to generate a twitter query in my local, but it kept on generating the same output of project contribution guide and some other stuff that pointed to github of DocsGPT. By using LLM did you mean something else?

@dartpain
Copy link
Contributor

  1. Check out this, https://github.com/arc53/DocsGPT/tree/main/application/retriever
    you will need to create a separate file here. while testing / experimenting I suggest you change classic rag.
  2. You will see that it uses LLM abstract class there, thats what I meant.

thank you!

@shatanikmahanty
Copy link
Contributor Author

shatanikmahanty commented Oct 23, 2024

@dartpain thanks for the additional context on LLMs. I was able to generate a search term for Twitter using the LLM, but on trying to access the twitter api I found out that it can only be used by paid plan subscribers. If anyone is willing to provide me an api key to test with I can create a PR. Meanwhile I will draft a PR with my current work and highlight the blockers so that in case anyone with access to paid api wants to continue with the rest of the PR they can go ahead. Thanks again for letting me work on this 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants