think-token

I ran some exploratory experiments to figure out whether the filler-tokens/IC-tokens/computation-tokens/pause-tokens (many people have had this idea over time thus the many names) help transformers. This is far from completed research. The experiments are all done with small GPT-2 transformers in the range of 5-40k parameters to validate whether there's any reason to run this in LLMs of bigger sizes although someone concurrently did run that: https://arxiv.org/abs/2310.02226.

A more detailed log book that has a bit more details on the experiments is here: https://wandb.ai/reasoning/think-hard/reports/Experiment-log-book--Vmlldzo1NDMwODg1?accessToken=l9091dz4i0vrvp1bdbfj0ui8wat3c4b1cbc5p9wcdwbjz6qmojlhqeqo3vrihpyu The summary is at the top, but if you feel adventurous feel to look through the experiments in the log book right beneath it which contain a lot more detail.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datacollators		datacollators
thinkgpt		thinkgpt
trainers		trainers
.gitignore		.gitignore
README.md		README.md
data-analysis.ipynb		data-analysis.ipynb
requirements.txt		requirements.txt
think_hard.py		think_hard.py
think_hard_guidance.py		think_hard_guidance.py
think_hard_masked.py		think_hard_masked.py
tokenizer.json		tokenizer.json
tokenizer.py		tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

think-token

About

Releases

Packages

Languages

eluzhnica/think-token

Folders and files

Latest commit

History

Repository files navigation

think-token

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages