llm-judge

A faster compatible implementation of LM-SYS FastChat llm-judge and derivatives

We switch to the fastest inference engine depending on format:

HF: vLLM
AWQ: vLLM
GPTQ: ExLlamaV2

Install

We use mamba (install instructions) to run.

mamba create -n llm-judge python=3.11
git clone https://github.com/AUGMXNT/llm-judge
cd llm-judge

# We 
pip install -r requirements.qwen.txt

For Qwen

pip install -r
pip install csrc/layer_norm

TODO

# 4090+3090:
* Original: 8h for 13b... wtf

# V0: Just faster inferences
[x] Just rip out for fast vLLM first
* real    16m35.112s


# AutoAWQ vs vLLM
* vLLM uses more memory than it should
* How's the speed? https://github.com/casper-hansen/AutoAWQ

# Add GGUF support
[ ] python-llama-cpp

# Batching
First Pass:
[x] Organize by Temperature

We actually should thread our queries, since we have multiturn to deal with
We can easily batch the choices together (but maybe shouldn't for seeding purposes)

A real PITA which we don't need.

Better UI
[ ] InquirerPy - anything missing, let you pick, generate cmd-line for batching or run
[ ] Add Config Files
[ ] Look at https://github.com/AUGMXNT/gpt4-autoeval
[ ] Run logging
[ ] Run autoresume

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
requirements		requirements
.gitignore		.gitignore
01-answer.sh		01-answer.sh
LICENSE		LICENSE
README.md		README.md
analyze.html		analyze.html
analyze.ipynb		analyze.ipynb
gen_model_answer.py		gen_model_answer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-judge

Install

TODO

About

Releases

Packages

Languages

License

AUGMXNT/llm-judge

Folders and files

Latest commit

History

Repository files navigation

llm-judge

Install

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages