Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pr139 dev #8

Merged
merged 32 commits into from
Aug 28, 2024
Merged

Pr139 dev #8

merged 32 commits into from
Aug 28, 2024

Conversation

risingsunomi
Copy link
Owner

Merging in PR139 development

  • Tested with Qwen2 Instruct, TinyLlama, and Llama3 8B models and successfully generated tokens.
  • Currently, the output quality is poor, often producing gibberish. I am investigating top_p, top_k, temperature settings, and the logit sampling method for improvements.
  • Successfully tested the setup on an NVIDIA A10 with enough resources for model generation.
  • Tested on local NVIDIA-based servers, achieving cross-computer generation on Linux systems. Encountered out-of-memory (OOM) errors, even with smaller models; will explore memory management on low VRAM GPUs.
  • Generalized the ShardedHuggingFaceModel forward function compatible with various HuggingFace transformer model classes, though specific handling for different model types is still required.
  • Reformatted interface test for pytorch
  • Updated generate_deps.py to update tinychat model select dropdown
  • Fix for fast check for loading tokenizers as was comparing a string to a PathLib
  • Update ShardedHuggingFaceModel to always use caching

…ference_engine.py, adding in pytorch option for inference engine
… along with adding in torch topk, added in better random distribution selection for when top_p is too low or high, started work on forward_layer_cached but infer functions need to be changed and take any and not a string
… where data from infer_prompt is not complete
…ken/logit coming from infer_tensor to infer_prompt, running into OOM issues trying on server
@risingsunomi risingsunomi merged commit 46667b6 into main Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant