Open Large Language Model infused with LA data
Future goal of the folder is to include the code, data and everything needed to train a large language model similar to Llama, Mistral, Gemini or GPT.
There are many companies racing towards AGI, LA wants to try build the pipeline necessary to do in a fully open way where all the data curation and everything down to every training run is transparent.
Data is provided by LA itself + scraped from the internet (same as all SOTA models do). Compute is paid by LA together with community through something like BitTensor to aggregate the funds.
The near goal of this task is to replace use of external LLMs in LA with our own models be it for chat bots, search or other purposes.