Releases: yomidevs/kaikki-to-yomitan
24.08.30.10
What's Changed
- Update downloads.md with list of .zip files by @github-actions in #133
- vary canonical form behavior by language by @StefanVukovic99 in #136
- add sicilian by @StefanVukovic99 in #138
- add hindi by @StefanVukovic99 in #140
Full Changelog: v2024-08-22-10-39-17-00-00...v2024-08-30-10-34-43-00-00
24.08.22.10
What's Changed
- Update downloads.md with list of .zip files by @github-actions in #123
- [de] add past and present participle to multiword by @StefanVukovic99 in #124
- use regular space in multiword inflections, [de-de] parse more inflection glosses by @StefanVukovic99 in #126
- [de-en] fix missing prepositions by @StefanVukovic99 in #127
- [ja] add readings by @StefanVukovic99 in #129
- Blacklist romanization by @StefanVukovic99 in #130
- add middle english by @StefanVukovic99 in #132
Full Changelog: v2024-07-29-18-48-46-00-00...v2024-08-22-10-39-17-00-00
24.07.29.18
What's Changed
- [de] add tests for Herz and Fahrer, fix Fahrer by @StefanVukovic99 in #118
- [*-de] use deinflections from german edition by @StefanVukovic99 in #120
- [lv] add latvian by @StefanVukovic99 in #121
- [de] update Herz test with fixed kaikki entry by @StefanVukovic99 in #122
Full Changelog: v2024-07-20-13-53-30-00-00...v2024-07-29-18-48-46-00-00
24.07.20.13
What's Changed
- Update downloads.md with list of .zip files by @github-actions in #101
- [cs] update example with newest kaikki format by @StefanVukovic99 in #102
- separate downloads page generation from release workflow by @StefanVukovic99 in #103
- Update downloads.md with list of .zip files by @github-actions in #104
- [tl] add tagalog by @StefanVukovic99 in #106
- Remove tl diacritics by @Casheeew in #108
- use translations section to create more dictionaries by @StefanVukovic99 in #107
- Update downloads.md with list of .zip files by @github-actions in #109
- add new dictionaries to downloads page, split tables by @StefanVukovic99 in #110
- fix gloss links by @StefanVukovic99 in #111
- [tl] Remove hyphen and apostrophe by @Casheeew in #113
- [it] remove italian diacritics by @StefanVukovic99 in #115
- make dictionaries updatable by @StefanVukovic99 in #117
New Contributors
- @github-actions made their first contribution in #101
- @Casheeew made their first contribution in #108
Full Changelog: v2024-07-02-12-21-57-00-00...v2024-07-20-13-53-30-00-00
24.07.13.14
What's Changed
- Update downloads.md with list of .zip files by @github-actions in #101
- [cs] update example with newest kaikki format by @StefanVukovic99 in #102
- separate downloads page generation from release workflow by @StefanVukovic99 in #103
- Update downloads.md with list of .zip files by @github-actions in #104
- [tl] add tagalog by @StefanVukovic99 in #106
- Remove tl diacritics by @cashewnuttynuts in #108
New Contributors
- @github-actions made their first contribution in #101
- @cashewnuttynuts made their first contribution in #108
Full Changelog: v2024-07-02-12-21-57-00-00...v2024-07-13-09-59-22-00-00
24.07.02.12
What's Changed
- generate downloads table on release by @StefanVukovic99 in #96
- fix the word 'constructor' crashing tidy-up script by @StefanVukovic99 in #98
- fix downloads table pull request by @StefanVukovic99 in #100
Full Changelog: v2024-06-28-18-58-12-00-00...v2024-07-02-12-21-57-00-00
24.06.28.0
What's Changed
- add romanian language by @tiberiuiurco in #66
- romanian diacritics normalization by @tiberiuiurco in #67
- [cs] add some multiword inflections by @StefanVukovic99 in #65
- refactor tidy-up script by @StefanVukovic99 in #70
- [la] add baseline test for domus by @StefanVukovic99 in #71
- fix duplicate first line on some nested entries by @StefanVukovic99 in #72
- add Ukrainian by @dgisser in #74
- fix write-test bug when no ipa by @StefanVukovic99 in #76
- [th] add test case by @StefanVukovic99 in #77
- fix test bug when there is no ipa by @StefanVukovic99 in #78
- [kn] add Kannada by @StefanVukovic99 in #82
- Update french tag_bank_term.json by @Ceynou in #80
- add mongolian by @StefanVukovic99 in #89
- match new kaikki jsonl filenames by @StefanVukovic99 in #90
- use existing and custom css to color some tags by @StefanVukovic99 in #91
New Contributors
- @tiberiuiurco made their first contribution in #66
- @dgisser made their first contribution in #74
- @Ceynou made their first contribution in #80
Full Changelog: v2024-06-11-12-58-48-00-00...v2024-06-28-18-58-12-00-00
24.06.11.0
What's Changed
- add Esperanto by @StefanVukovic99 in #52
- add Czech by @StefanVukovic99 in #54
- [sh] remove accent marks by @StefanVukovic99 in #58
- fix some forms not being written, [cs] add test cases by @StefanVukovic99 in #61
- [cs] keep past participles as multiword inflections by @StefanVukovic99 in #62
- add missing inflections that include 'nominative' by @StefanVukovic99 in #63
Full Changelog: v2024-05-31-09-23-25-00-00...v2024-06-11-12-58-48-00-00
General Info
Types of files:
- Bilingual dictionaries -
kty-en-de.zip
for example has English headwords and their definitions/translations in German. - Monolingual dictionaries -
kty-en-en.zip
and such. - IPA dictionaries from a single wiktionary - e.g.
kty-en-en-ipa.zip
- Merged IPA dictionaries from all 6 wiktionary editions - e.g.
kty-en-ipa.zip
. These have more terms covered but not all the entries might be in the same format.
Note that Kaikki currently only supports 6 wiktionary editions, so only dictionaries in these languages are available. For a rough overview see the graph:
Some of the dictionaries are small; rather than decide on a lower bound for usefulness they are all included here.
If the language you want isn't here, or you would like to see an improvement to a dictionary, please open an issue.
Languages are referred to by their ISO code (ISO 639-1 where available, ISO 639-3 where not)
these dictionaries are converted from kaikki dictionaries extracted on 2024-?? from the enwiktionary dump dated 2024-??
24.05.31.0
What's Changed
- add gulf arabic, reorder languages.json by @StefanVukovic99 in #17
- fix some undefined errors by @StefanVukovic99 in #18
- prevent redundant redownloading of archives for non-english target languages by @StefanVukovic99 in #19
- add old irish by @StefanVukovic99 in #20
- add migaku tool by @StefanVukovic99 in #22
- improve performance to handle Finnish by @StefanVukovic99 in #24
- add basic CI by @StefanVukovic99 in #26
- populate dictionary metadata by @StefanVukovic99 in #27
- add dutch by @StefanVukovic99 in #33
- reduce memory use further by @StefanVukovic99 in #36
- add baseline test for french "avatar" by @StefanVukovic99 in #42
- start including raw tags, and french tort tag by @StefanVukovic99 in #43
- Added Old English; Removed Diacritics in SGA, GRC and ANG by @martholomew in #45
- allow wildcard format for language names when running auto.sh by @StefanVukovic99 in #47
- add korean by @StefanVukovic99 in #49
- add merged IPA dicts to auto release by @StefanVukovic99 in #50
New Contributors
- @martholomew made their first contribution in #45
Full Changelog: 24.04.08.0...v2024-05-31-09-23-25-00-00
General Info
Types of files:
- Bilingual dictionaries -
kty-en-de.zip
for example has English headwords and their definitions/translations in German. - Monolingual dictionaries -
kty-en-en.zip
and such. - IPA dictionaries from a single wiktionary - e.g.
kty-en-en-ipa.zip
- Merged IPA dictionaries from all 6 wiktionary editions - e.g.
kty-en-ipa.zip
. These have more terms covered but not all the entries might be in the same format.
Note that Kaikki currently only supports 6 wiktionary editions, so only dictionaries in these languages are available. For a rough overview see the graph:
Some of the dictionaries are small; rather than decide on a lower bound for usefulness they are all included here.
If the language you want isn't here, or you would like to see an improvement to a dictionary, please open an issue.
Languages are referred to by their ISO code (ISO 639-1 where available, ISO 639-3 where not)
these dictionaries are converted from kaikki dictionaries extracted on 2024-05-29 from the enwiktionary dump dated 2024-05-02
24.04.08.0
What's Changed
- The format for parts of speech has changed from
"noun", "verb"...
to ""n", "v"...
so it should now work better with yomitan's algorithm deinflection (a754d93) - nested entries with structured definitions now have cleaner tags (8d3a254)
- IPA tags are now all shown (1f74b4f)
- The "reading" is now used to display text with diacritics above the term, disambiguating words in some languages (Latin, Arabic, Farsi etc)
- 🇫🇷 add missing inflection tag, add tests by @seth-js in #13
- remove junk text from some headwords by @seth-js in #14
- support freq dict conversion by @seth-js in #16
- other improvements to deinflections
Full Changelog: beta...24.04.08.0
General Info
Types of files:
- Bilingual dictionaries -
kty-en-de.zip
for example has English headwords and their definitions/translations in German. - Monolingual dictionaries -
kty-en-en.zip
and such. - IPA dictionaries from a single wiktionary - e.g.
kty-en-en-ipa.zip
- Merged IPA dictionaries from all 6 wiktionary editions - e.g.
kty-en-ipa.zip
. These have more terms covered but not all the entries might be in the same format.
Note that Kaikki currently only supports 6 wiktionary editions, so only dictionaries in these languages are available. For a rough overview see the graph:
Some of the dictionaries are small; rather than decide on a lower bound for usefulness they are all included here.
If the language you want isn't here, or you would like to see an improvement to a dictionary, please open an issue.
Some of the dictionaries are small; rather than decide on a lower bound for usefulness they are all included here.
If the language you want isn't here, or you would like to see an improvement to a dictionary, please open an issue.
Languages are referred to by their ISO code (ISO 639-1 where available, ISO 639-3 where not)
these dictionaries are converted from kaikki dictionaries extracted on 2024-04-06 from the enwiktionary dump dated 2024-04-01