Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Feature Request: Bulk Term Checking Script for Contributors to Add Multiple Words Efficiently" #125

Open
Anshumanv28 opened this issue Aug 30, 2024 · 11 comments

Comments

@Anshumanv28
Copy link
Contributor

Proposal implementing a feature that allows for bulk checking of terms. This script would enable developers to input multiple terms and check their presence in the encyclopedia all at once.

The goal is to enhance the developer experience for those looking to contribute by simplifying the process of verifying multiple terms. Instead of checking each term individually, contributors could use this script to quickly see which terms are already present and which are not.

This feature aims to streamline the contribution process, making it easier and more efficient for developers to suggest and add multiple terms to the encyclopedia at once.

@Anshumanv28
Copy link
Contributor Author

Anshumanv28 commented Aug 30, 2024

Let me know your thoughts on this.
It can start from a simple script in the repo for any contributor to use, but might serve more usecases in future with improvements.

@RayMathew
Copy link
Contributor

Not clear on what the script would look like. Could you break down the steps please?

@Anshumanv28
Copy link
Contributor Author

Add your terms to a txt file say myterms.txt
run the script
The script matches the titles of all the existing JSON files in content/terms (stripping and casing the file titles), with the terms in the file myterms.txt
A simple approach but open to improvements. makes checking the existence multiple terms at once possible.

@RayMathew
Copy link
Contributor

Ok. And you're thinking of adding it as a utility file in the project?

@Sudharshaun
Copy link
Contributor

@Anshumanv36 but isn't the search feature already doing it? Correct me if I'm wrong with what you are trying to do.

@Anshumanv28
Copy link
Contributor Author

Anshumanv28 commented Aug 31, 2024

@RayMathew yes exactly.
@Sudharshaun the keyword "Bulk", I dropped this idea earlier thinking the same but I am concerned about the improvements for contributors who want to push more words at once.
Also the script can be improved to provide more functionality, could help with data acquisition with web scraping and LLMs in future.

@Anshumanv28
Copy link
Contributor Author

example usecase:
Lets suppose you have about 20-30 terms you want to contribute, do you go to the website and check each and every word if it already exists in the database? Just automate it
also we can have a script to make JSON files with content generated by LLMs in the required format which would help.

@Anshumanv28
Copy link
Contributor Author

@Buzzpy let me know if this would be useful, as you know Hacktoberfest is approaching, contributers could find this useful

@KC900201
Copy link
Contributor

KC900201 commented Sep 2, 2024

example usecase: Lets suppose you have about 20-30 terms you want to contribute, do you go to the website and check each and every word if it already exists in the database? Just automate it also we can have a script to make JSON files with content generated by LLMs in the required format which would help.

@Anshumanv36 wouldn't this be going back to the cost bottleneck again for the LLM implementation which is mentioned in #95?

@Anshumanv28
Copy link
Contributor Author

Yes usecase 2(LLM response to JSON file) would present the quality issue related to using LLMs but thats assuming the contributers choose to not review their entries and just push low quality content, and even if they do we are not merging any entries to master yet with review first.
The bulk search usecase is still without bottlnecks
If we can define some sort of flow to verify the quality of content then the second LLM usecase will be more suitable but as @RayMathew mentioned earlier this is gonna sit as tools in utils dir for voluntary use by contributors who want to add, till we find better use for it.

@Buzzpy
Copy link
Owner

Buzzpy commented Sep 15, 2024

Hello everyone!

Thanks for all the input and ideas! The concept of bulk-checking terms sounds great and could really streamline contributions for everyone. However, since this project is still in its early stages and there are some bottlenecks to consider (like the LLM-related challenges, and cost), I’ll temporarily pause the bulk-checking feature for now.

That said, the idea is definitely valuable! In the meantime, contributors can still add bulk terms one by one using the existing methods. And if anyone finds it tricky or needs help, they can always open an issue and ask for help—I’ll soon update the issue template with info on how to request help.

Thanks again for pushing forward such helpful ideas, and I'm excited to keep evolving this together! Cheers! 🥂

And feel free to continue the chat if needed, I won't close the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants