Skip to content

Enable Llama2 70B to run with hqt on single card (#50)#762

Closed
HolyFalafel wants to merge 63 commits into
huggingface:mainfrom
HabanaAI:dev/dsemiat/llama_single_card_pre_1.10.0
Closed

Enable Llama2 70B to run with hqt on single card (#50)#762
HolyFalafel wants to merge 63 commits into
huggingface:mainfrom
HabanaAI:dev/dsemiat/llama_single_card_pre_1.10.0

Conversation

@HolyFalafel
Copy link
Copy Markdown
Contributor

Add disk_offload flag that controls device_map=auto. Setting this flag enbales weights offload to disk when cpu memory runs OOM.
Add const serialization path flag that gets a path for where to serialize const sections, so if there is no space on device to save all const sections they will be offloaded to disk.

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Add disk_offload flag that controls device_map=auto. Setting this flag enbales weights
offload to disk when cpu memory runs OOM.
Add const serialization path flag that gets a path for where to serialize const sections,
so if there is no space on device to save all const sections they will be offloaded to disk.
@HolyFalafel HolyFalafel requested a review from regisss as a code owner March 5, 2024 11:24
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HolyFalafel can you provide a cmd in readme for run with hqt on single card?

Comment thread examples/text-generation/utils.py
Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add how to run hqt on single card in README

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Libin, please provide an example.

Can you also rebase your branch on main and run the following please?

pip install -U ruff
make style

nngokhale and others added 23 commits March 11, 2024 11:19
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Sayantan Sarkar <sasarkar@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: Jimin Ha <jha@habana.ai>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com>
yeonsily and others added 21 commits March 11, 2024 11:19
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: ZhaiFeiyue <80079571+ZhaiFeiyue@users.noreply.github.com>
Co-authored-by: Yaser Afshar <ya.afshar@gmail.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
…ace#704)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: edward.mascarenhas <emascare@gaudi-user-hf-1.amr.corp.intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Fix Llama-70B-FSDP model loading issue (#63)
Expose Llama Fused OPs control from run_lora_clm.py (#23)

* Expose Llama Fused OPs control from run_lora_clm.py

* Update as per review comments

Co-authored-by: Vivek Goel <vgoel@habana.ai>
)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.