Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I love this package and noticed the absence of Albanian stopwords. I am eager to help by providing them, and I've gathered them from two sources:
https://en.wikipedia.org/wiki/Albanian_morphology
https://huggingface.co/datasets/Kushtrim/Kosovo-Parliament-Transcriptions
The second link contains an extensive dataset of all Kosovo Parliament speeches, facilitating the extraction of common stopwords. Meanwhile, the first link offers the fundamental ones.
I intend to include stopwords with special Albanian characters and without, as they are often replaced by other characters. For instance, both "tanë" and "tane" will be included.
Here is the issue: Albanian stopwords missing #204
Here is the list I came up with: albanian.txt
tyre rreth le atyre këta megjithëse kemi per ndonëse dytë pse tha aty ndaj ke këtë duhet pa perket veç ndonje një keshtu s janë jane ti ia megjithese prej ishte tjerë ai se tillë do si ja tonë keta pastaj ndersa siç unë gjate di kësaj cilin kjo dhënë da teper ketij ama pasi fjalë kanë vetem za d.m.th. ose pas ndonjë cila ndodhur dyte ardhur kësi nga vete atij ta jenë rendit tane keso deri tone të prandaj bëjë domethënë dhe qi mirepo tona që u këtu cilet jene tjere gjë së gjatë duhej t dhene thuhet po une dy cfare ndërsa sepse edhe cilen to meqenese meje tij qene jeni them përket keto ni këso asaj ajo sic vetëm ketyre andaj na sa kesaj cili këtyre domethene mirëpo cilën mos madh qenë cilët thënë jemi fjale soje neve gjitha kështu vet kur ty meqë meqenëse jush ketë para kush i mua dite ate për tepër nesh meqe ketu ku disa ato mbi gje ne është tille teje megjithate ju nese saj ashtu më mbasi te thene jo ditë nuk gjithe shume nje tanë mund aqsa sot këto tjera tjetër tjeter atë kisha megjithatë këtij nëse dimë eshte vazhdojmë ka kam kesi je vazhdojme duke dime kinse por kane pika keni beje ky parasysh apo gjithë me ata çfarë jam juve kete a pra qe tash në vetë vec as ndonese tani pak e shumë