Skip to content

sr5434/CodegebraGPT

Repository files navigation

CodegebraGPT

Finetuning multimodal LLMs on STEM datasets.

Planned Procedure

  • Compile and preprocess multiple datasets
  • Finetune SOLAR-10.7B-Instruct-v1.0 on this data using QLoRA
  • Release on Huggingface so that anybody can use it!

Datasets

There are 100k samples which I will be using to train this model. The total combination of all these datasets is about 1 million samples, but I only use about 100k samples to save costs. Those samples all come from the datasets listed below:

Name

This LLM is named after Codegebra, which is a program I made to solve equations, perform Fourier transforms, etc. It is intended to be Codegebra's successor, with a more natural interface and expanded abilities.

About

Finetuning multimodal LLMs on STEM datasets

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published