Skip to content

Word Vector

Wannaphong Phatthiyaphaibun edited this page Jul 14, 2022 · 2 revisions

LaoNLP support Word2Vec for Word Vector.

We has train Lao word2vec with OSCAR Corpus by gensim. You can see the training notebook at https://github.com/wannaphong/LaoNLP-Notebook/blob/main/Lao_Word2Vec.ipynb

Example

from laonlp.word_vector import Word2Vec

wv = Word2Vec(model="skip-gram") # cbow or skip-gram

print(wv.similarity("ວຽງຈັນ", "ເມືອງ"))
# output: 0.46474797

print(wv.most_similar_cosmul(positive=["ວຽງຈັນ", "ເມືອງ"],negative=[]))
# output: [('ສຸຂຸມາ', 0.6676176190376282), ('ແຂວ', 0.6541932821273804), ('ທຸລະຄົມ', 0.6540694832801819), ('ຫ້ອງການຍຸຕິທຳ', 0.6540253758430481), ('ສີສັດຕະນາກ', 0.6531381607055664), ('ພະລານໄຊ', 0.6501346230506897), ('ພັດທະນາກວມລວມ', 0.6448683738708496), ('ກະລຶມ', 0.6448098421096802), ('ຍົມມະລາດ', 0.6435081958770752), ('ປົກຄອງເມືອງ', 0.6423164010047913)]
Clone this wiki locally