Skip to content

Pad the examples for QLoRa finetuning test#1941

Merged
regisss merged 1 commit into
huggingface:mainfrom
HabanaAI:auto-pr-976a202
Apr 22, 2025
Merged

Pad the examples for QLoRa finetuning test#1941
regisss merged 1 commit into
huggingface:mainfrom
HabanaAI:auto-pr-976a202

Conversation

@ckvermaAI
Copy link
Copy Markdown
Contributor

  1. Pad the examples up to max_seq_len of 1024
  2. Increase the max_steps (from 5 to 50) and eval_steps (from 3 to 10). Set throughput-related arguments (adjust_throughput, throughput_warmup_steps)
  3. Update the fine-tuning test name
  4. Also, adjust the reference "eval loss" value for fine-tuning test and "output" for inference test.

Additional updates
6. Enable the eager mode for the test (disable the torch.compile mode for now).
7. Add new requirement for installing the bitsandbytes (from https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor)

* [SW-226132] Pad the examples

* update test name

---------

Co-authored-by: Vivek Goel <vgoel@habana.ai>
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @regisss can we add these tests to slow_tests you run for G2/G3? This will help us avoid regressions. We can publish QLoRA support as experimental/beta future on G2/G3 (works for limited configurations and performance is not great). Note this PR is dependent on Synapse 1.21.0 release (not backward compatible).

@vivekgoe vivekgoe requested a review from libinta April 21, 2025 07:59
@libinta libinta added the run-test Run CI for PRs from external contributors label Apr 22, 2025
@ckvermaAI
Copy link
Copy Markdown
Contributor Author

Support for NF4 quantization/dequantization using Intel Gaudi hardware: bitsandbytes-foundation/bitsandbytes#1592

@libinta libinta changed the title [SW-226132] Pad the examples for QLoRa finetuning test Pad the examples for QLoRa finetuning test Apr 22, 2025
Comment thread setup.py
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit f52d8cd into huggingface:main Apr 22, 2025
4 checks passed
@uartie
Copy link
Copy Markdown
Contributor

uartie commented Apr 22, 2025

We're starting to see OSError: /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory when running some of the tests (e.g. test_diffusers.py)

We've run python -m pip install .[tests] and it installed bitsandbytes 1.0.0... so not sure why we're seeing this error.

@uartie
Copy link
Copy Markdown
Contributor

uartie commented Apr 22, 2025

...

pip install -r examples/stable-diffusion/requirements.txt
pip install -r examples/stable-diffusion/training/requirements.txt

... appears to cause the error to show up during test run.

@uartie
Copy link
Copy Markdown
Contributor

uartie commented Apr 22, 2025

Possibly related to old peft version in examples/stable-diffusion/training/requirements.txt... please fix the requirements file.

@ckvermaAI
Copy link
Copy Markdown
Contributor Author

This error is coming from bitsandbytes, it should not be related to peft. I've run other tests locally (test_bnb_qlora.py, test_bnb_inference.py), and I didn't face this issue.

Let me check it again.

@ckvermaAI
Copy link
Copy Markdown
Contributor Author

ckvermaAI commented Apr 23, 2025

In case of HPU, bitsandbytes loads the CPU binaries (https://github.com/bitsandbytes-foundation/bitsandbytes/blob/multi-backend-refactor/bitsandbytes/cextension.py#L73), which are not required.
And if there is no issue in test/topology, you'll not see OSError due to missing *.so file, but if failure occurs due to any other reason, you'll also see the OSError due to missing *.so file.

In short,

  1. I'll move the bitsandbytes installation from setup.py to bitsandbytes tests (test_bnb_qlora.py, test_bnb_inference.py). This should resolve any issue you're seeing.
  2. Later, we'll try to upstream the fix to the bitsandbytes repo.

@ckvermaAI
Copy link
Copy Markdown
Contributor Author

Fix for the above issue:
Move bitsandbytes requirements from setup.py to bnb tests
#1946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors synapse 1.21

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants