GitHub

What is GrammatikTAK?

Project to use a combination of NLP, AI and Linguists to make danish grammar assistant.

Corrected at GrammatikTAK.com., no longer live (See the code for it here). Models and datasets are not included in this repo.

How to see what this repo is capable of:

The backend is no longer hosted. You can run this locally and change the code of the website to point to your locally hosted backend.

The backend uses trained models. To use the backend without the models change the first line in GrammatiktakBackend/main.py to local_models_avaliable = False then cd GrammatiktakBackend and host with flask --app main run

This project could definitely be better documented. If you need any assistance, want to go through the project or want my datasets/models to further experiment, feel free to contact me.

Why?

The rise in NLP and AI has greatly affected popular languages, their respective grammar assistants and NLP work. The nordic, especially danish, are sadly way behind. This repo is hopefully going to help cover some basic NLP needs and make a great danish, an potential nordic, grammar assistant.

Design principles

I focus on making GrammatikTAK:

Simple: Build in modules, a module can easily be replaced, reworked or even deleted without affecting other modules.
Adaptable: Although speed is important, I have focused on adaptability and readability over speed.
Well-tested: I have tried to do extensive testing to my models to secure a high accuracy.

Directories:

Here is a small overview of the most important directories withing the Development folder:

BackendAssistants: Scripts for analysing the backend performance & complexity.
DataProcessing: Scripts & notebooks for converting text to datasets.
FineTuneModels: Scripts for finetuning models and logging performance
GoogleDocsAddOn: Scripts for the GrammatikTAK Google Docs Add-on
GoogleExtension: Scripts for the GrammatikTAK Google Extension (not finished)
GrammatiktakBackend: development of backend 2.0. main.py is the backend. Currently used.
Other: Powerpoints
TestingOtherModels: Scripts for testing models from other people to use or compare with.

Website

This is the frontend script for the website at GrammatikTAK.com.

Todo:

see measurements based on each school, maybe this should be avaliable for schools to check at all times. @https://grammatiktak.com/data
Be able to put in code in frontend to unlock better features.
Send school id to backend
School code should work (green if correct, red if not and then reset, should load from cookie)
Authentication token (github secret) to backend

GrammatikTAK Datasets

What is GrammatikTAK Datasets?

A collection of human annotated datasets for everyone to use distributed under the CC-BY-SA 4.0 license.

This is not a set of training data. There exist a large number of huge nordic datasets (1, 2 & 3) with different kinds of data for training.

I felt that the danish NLP community could benefit from having a high-quality, human annotated dataset to use when testing NLP models or for other usage.

I strive to make the datasets:

Of extremely high quality (if you find a mistake, let us know).
Large, so that the test size is big enough for you to get a reasonable accurate estimate of your models performance.
Broad to capture different themes, sentence constructions and lengths.

How to use?

You are free to use whatever file you want. Here is a description of how they are constructed:

Every file is focused on one specific error in the selected language (currently only danish - we would like to expand this to other nordics languages).
To see more about each file and how to load, look the README in the specific folder.

How has the datasets been gathered?

Wikipedia
Made up text
Internal documents
Danish gigaword

In the works..

Danish Named Entity Recognition

Name		Name	Last commit message	Last commit date
Latest commit History 825 Commits
Automation		Automation
Backend		Backend
Datasets		Datasets
Desktop&WordAddIn		Desktop&WordAddIn
GrammatiktakDev		GrammatiktakDev
Models		Models
Website		Website
.gitignore		.gitignore
LogoText.png		LogoText.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is GrammatikTAK?

How to see what this repo is capable of:

Why?

Design principles

Directories:

Website

Todo:

GrammatikTAK Datasets

What is GrammatikTAK Datasets?

How to use?

How has the datasets been gathered?

In the works..

About

Releases

Packages

Languages

Apros7/GrammatikTAK

Folders and files

Latest commit

History

Repository files navigation

What is GrammatikTAK?

How to see what this repo is capable of:

Why?

Design principles

Directories:

Website

Todo:

GrammatikTAK Datasets

What is GrammatikTAK Datasets?

How to use?

How has the datasets been gathered?

In the works..

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages