Skip to content

CorCenCC/Thesawrws-Welsh

Repository files navigation

Read this in English

Thesawrws Ar-lein Cymraeg Cyfoes (ThACC) / Welsh Online Thesaurus

Mae'r ystorfa hon yn cyflwyno thesawrws Cymraeg ar-lein sy'n fynediad agored ac yn hawdd ei ddefnyddio. Nod y prosiect oedd cyfoethogi adnoddau digidol yn y Gymraeg. Gan fanteisio ar ddatblygiadau ym maes Prosesu Iaith Naturiol (NLP), mae ein dull yn cyfuno mewnblaniadau geiriau sy'n bodoli eisioes, tagiwr semanteg Cymraeg, a gwerthusiadau gan fodau dynol er mwyn dod o hyd i gyfystyron.

This repo introduces an open-access, user-friendly online thesaurus for the Welsh language, aimed at enriching digital resources for Welsh speakers and learners. Utilising advances in Natural Language Processing (NLP), our approach combines pre-existing word embeddings, a Welsh semantic tagger, and human evaluation to establish related terms.

Gosodiad / Installation

I osod y wefan Flask hon, agorwch eich terfynell a gweithredwch y gorchmynion canlynol: To install this Flask website, open your terminal and execute the following commands:

$ git clone https://github.com/Nouran-Khallaf/Thesawrws-website.git
$ sudo apt-get install python3 python3-venv
$ cd Thesawrws-website
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
# To install FastText
$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .

Dechrau gwefan y Thesawrws / Start the Thesawrws website

Gweithredwch y gorchmynion canlynol: Execute the following commands:

$ cd Thesawrws-website
$ export FLASK_APP=main.py
$ flask run --host=0.0.0.0

I ddefnyddio'r RESTful API / To use the RESTful API

Llinell Orchymyn: Command line:

$ curl "http:/148.88.72.60:8010/api/synonyms?word=pobl"

neu sgript Python or Python script

$ import requests
$ import json
$ response = requests.get('http://148.88.72.60:8010/api/synonyms', params={'word': 'school'})
$ data = response.json()
$ print(json.dumps(data, indent=2))

Os defnyddiwch unrhyw un o'r corpora yn eich gwaith, cyfeiriwch at y papur hwn: If you use any of these corpora in your work, please cite this paper:


@inproceedings{,
    title = "Open-Source Thesaurus Development for Under-Resourced Languages: a Welsh Case Study",
    author = "Nouran Khallaf, Elin Arfon, Mo El-Haj, Jonathan Morris, Dawn Knight, Paul Rayson,Tymaa Hammouda3 and Mustafa Jarrar",
    month = sep.,
    year = "2023",
    publisher = "The 4th Conference on Language, Data and Knowledge Conference (LDK 2023), Vienna, Austria.",
    url = "",
    pages = "",
}

Cysylltiadau / Contacts

Creative Commons Licence

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published