Skip to content

Latest commit

 

History

History
123 lines (75 loc) · 11.7 KB

README.md

File metadata and controls

123 lines (75 loc) · 11.7 KB

Masakhane - A living collection of NLP projects for Africans, by Africans

PRs Welcome Slack Status

MASAKHANE is an research effort for NLP for African languages that is OPEN SOURCE, CONTINENT-WIDE, DISTRIBUTED and ONLINE. This GitHub repository houses the data, code, results and research for building open baseline NLP results for African languages.

Website: masakhane.io

Our Mission

Masakhane is a grassroots organisation whose mission is to strengthen and spur NLP research in African languages, for Africans, by Africans. Despite the fact that 2000 of the world’s languages are African, African languages are barely represented in technology. The tragic past of colonialism has been devastating for African languages in terms of their support, preservation and integration. This has resulted in technological space that does not understand our names, our cultures, our places, our history.

Masakhane roughly translates to “We build together” in isiZulu. Our goal is for Africans to shape and own these technological advances towards human dignity, well-being and equity, through inclusive community building, open participatory research and multidisciplinarity

Our Values

  • Umuntu Ngumuntu Ngabantu - loosely translated from isiZulu means “a person is a person through another person” or “I am because you are”. This philosophy calls for collaboration and participation and community. It proposes relationality, over individualism for stronger social cohesions towards sustainable communities. It believes we share our successes and one’s personhood is evaluated based on their contributions to the community.

  • African-centricity. We centralize the narratives of Africans as a remedy to the effects of Euro-centricism on our beliefs. This way we reassert a new way of looking at information from a African perspective and shun any attempts to devalue our knowledge and stories

  • Ownership - We believe that Africans should be in charge of owning, driving and participating in the NLP research process, rather than as observers or data providers.

  • Openness - We believe in sharing our ideas and progress openly, especially on the African continent, for Africans. We’re against research that takes African contributions or data and puts them behind a paywall that is infeasible for Africans to access.

  • Multidisciplinarity - We truly believe that participation from all fields and experience and that multidisciplinarity leads to a more robust and more inclusive society

  • Everyone has valuable knowledge - We believe that each person’s individual experiences have value and each person is worth listening too and has something to contribute.

  • Kindness - We believe that being considerate, friendly and generous within our community is the best way to support it and encourage more inclusivity

  • Responsibility - We believe that each person in the technology process has an ethical responsibility to what they produce in the world. For this reason, we actively wreckon with the ethical impacts of our work

  • Data sovereignty - We believe Africans should be able to decide what data represents our communities globally, retain ultimate ownership of that data, and know how it is used

  • Reproducibility - We believe in reproducible research. As a result, we publish our code and data from our research so that others can reproduce and build upon it.

  • Sustainability - We believe that sustainability is necessary for societal change - that small daily efforts, over a long time are what truly change the world. To that, we aim for sustainability of our work, by being fully integrated with technological stakeholders to ensure the community continues to thrive into the future

Goals

  • For Africa: To build and facilitate a community of NLP researchers, connect and grow it, spurring and sharing further research, build helpful tools for applications in government, medicine, science and education, to enable language preservation and increase its global visibility and relevance.

  • For NLP Research: To build data sets and tools to facilitate NLP research on African languages, and to pose new research problems to enrich the NLP research landscape.

  • For the global researchers community: To discover best practices for distributed research, to be applied by other emerging research communities.

Progress

How can I contribute?

There are many ways to contribute to MASAKHANE.

  1. TRAIN A MODEL - Contribute a trained model and related code for your language
  2. ANALYSIS - Contribute analysis of data/models for any African languages. You do not need any technical experience for this! If you're a linguist, we can pair you up with a NLP practitioner and you can help contribute analysis
  3. DATA - Help build or find datasets for your language
  4. DOCUMENTATION - Help document our discussions, progress. This is VERY much needed. Or contribute to documentation of the base "notebook" that will improve the experience of others
  5. MENTORSHIP - Provide advice or help tune models for their languages and datasets, or help people get started
  6. ADMIN - Working with so many researchers can be quite a challenge! Help out with administrative tasks
  7. COMPUTE - Help with infrastructure and compute! Do you have spare compute to donate? Let us know! We're always looking for more!
  8. BRAINSTORM Join our weekly meetings, provide advice or ideas
  9. STORY-TELLING - Tell our stories to the world by doing talks about the community, contributing to our Medium publication, or engaging with media outlets
  10. MLOps & ML Engineering - Do you enjoy delving into the MLOps side of machine learning? Are you a software developer looking to hone-in on your ML engineer abilities? Join us to help build tools to support out reproducability, data gathering, and model sharing!

Want more details? Check out our current initiatives

How do I join?

  1. Join our Slack

  2. Request to join our Google Group - this will add you to our weekly meetings

  3. So we can feature you on our webpage masakhane.io, please fill in our membership form HERE:

Please be patient with a response via our email address, we're very behind on our administration, in the time of COVID-19.

Where do I start

  • If you're on slack, you'll see a number of channels which reflect our initiatives (described below). Join them and start engaging
  • Every week, we have an open meeting for our members. These are described on our meeting agenda where you can learn about the format, add and vote on topics. Make sure you've joined our google group
  • If you're not sure what value you can add, check out our growing message board to see if there are any tasks you can pick up!

Initiatives

Every week we have more ideas, and more impromptu projects that emerge. Keen on any initiatives? Join our slack and find the respective group.

Working on a Masakhane initiative that is not listed here? Please add it with a PR ❤️

Keen to help on any of these initiatives? Please see our message board

Initiative Description Slack Channel Repository
Machine Translation Benchmarks Continued expansion and iterations on our language benchmarks as documented on the main GitHUB README #benchmarks HERE
NER Datasets and Benhmarks We're busy releasing datasets and research around NER #ner HERE
Dataset Creation We never have enough data. More is always needed. We have a number of members finding creative ways to build datasets. #datasetcreation
Reproducibility The goal is to ensure reproducibility and comparability of models and results. #reproducibility
Takalani NLP Development of Language Models for South African languages #takalani-nlp
Wazobia Yoruba, Igbo, Hausa and Nigerian languages NMT #wazobia
Multilingual Chatbot Developing multilingual chatbots #multilingual-dialogue
Transfer Learning Transfer Learning & Multilingual Expansion of Benchmarks #transfer-learning
Evaluation of Masakhane Models How good are the Masakhane models? How can we measure it, besides looking at BLEU scores? #evaluation
Text-to-speech Corpora and models for text to speech synthesis (TTS) from audio bibles in Ewe, Hausa, Lingala, Asante Twi, Akuapem Twi and Yoruba #bible-speech HERE

Code of Conduct

See Code of Conduct