fea(gemma2): cleaned up old codes by imangohari1 · Pull Request #2212 · huggingface/optimum-habana

imangohari1 · 2025-08-19T16:34:38Z

What does this PR do?

This PR updates on Gemma2 with clean up on old, local implementations of KVCache, MLP, Rotary etc with the unified ones.

conducted Tests

CI (only Gemma2)

PT_HPU_LAZY_MODE=1  RUN_SLOW=true python -m pytest tests/test_text_generation_example.py::test_text_generation_bf16_1x -s -v

==================================================================================================== 2 passed in 437.09s (0:07:17) =====================================================================================================

Readme tests

PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-2-2b --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --sdp_on_bf16


Input/outputs:
input 1: ('Here is my prompt',)
output 1.1: ("Here is my prompt request! \nYou don't have to read the short story, that isn't important. just do your own original piece about the boy getting kicked out of school for being 'weird', later on it states that Mr. Handley and Miss Hottens and their mother are all going to choir practice. So have fun! Please comment if you do do it, I want to read that story when I am done with mine! Good luck!!\n\nP.S. I will try",)


Stats:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 116.26774185544866 tokens/second
Average first token latency         = 13.424454599589808 ms
Average rest token latency          = 7.379330329297433 ms
Average end to end latency          = 859.5175283997378 ms
Memory allocated                    = 12.59 GB
Max memory allocated                = 12.61 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 3.5712405099984608 seconds
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

LM_eval with 8k max new token

 PT_HPU_LAZY_MODE=1 python examples/text-generation/run_lm_eval.py --model_name_or_path google/gemma-2-2b  --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1  --max_new_tokens 8192 --tasks piqa -o tmp.json

{
  "results": {
    "piqa": {
      "alias": "piqa",
      "acc,none": 0.7856365614798694,
      "acc_stderr,none": 0.009574842136050941,
      "acc_norm,none": 0.7932535364526659,
      "acc_norm_stderr,none": 0.009448665514183273
    }
  },
  "group_subtasks": {
    "piqa": []
  },
  "configs": {
    "piqa": {
      "task": "piqa",
      "dataset_path": "piqa",
      "dataset_kwargs": {
        "trust_remote_code": true
      },
      "training_split": "train",
      "validation_split": "validation",
      "doc_to_text": "Question: {{goal}}\nAnswer:",
      "doc_to_target": "label",
      "doc_to_choice": "{{[sol1, sol2]}}",
      "description": "",
      "target_delimiter": " ",
      "fewshot_delimiter": "\n\n",
      "num_fewshot": 0,
      "metric_list": [
        {
          "metric": "acc",
          "aggregation": "mean",
          "higher_is_better": true
        },
        {
          "metric": "acc_norm",
          "aggregation": "mean",
          "higher_is_better": true
        }
      ],
      "output_type": "multiple_choice",
      "repeats": 1,
      "should_decontaminate": true,
      "doc_to_decontamination_query": "goal",
      "metadata": {
        "version": 1.0
      }
    }
  },
  "versions": {
    "piqa": 1.0
  },
  "n-shot": {
    "piqa": 0
  },
  "higher_is_better": {
    "piqa": {
      "acc": true,
      "acc_norm": true
    }
  },
  "n-samples": {
    "piqa": {
      "original": 1838,
      "effective": 1838
    }
  },
  "config": {
    "model": "google/gemma-2-2b",
    "model_args": null,
    "model_num_parameters": 3204165888,
    "model_dtype": "torch.bfloat16",
    "model_revision": "main",
    "model_sha": "",
    "batch_size": null,
    "batch_sizes": [],
    "device": null,
    "use_cache": null,
    "limit": null,
    "bootstrap_iters": 100000,
    "gen_kwargs": null,
    "random_seed": 0,
    "numpy_seed": 1234,
    "torch_seed": 1234,
    "fewshot_seed": 1234
  },
  "git_hash": "ci_13032024-890-g10a17f75",
  "date": 1755555273.21805,
  "pretty_env_info": "PyTorch version: 2.6.0+hpu_1.21.0-555.gitabf798b\nIs debug build: False\nCUDA used to build PyTorch: None\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 22.04.5 LTS (x86_64)\nGCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0\nClang version: Could not collect\nCMake version: version 3.22.1\nLibc version: glibc-2.35\n\nPython version: 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0] (64-bit runtime)\nPython platform: Linux-6.8.0-71-generic-x86_64-with-glibc2.35\nIs CUDA available: False\nCUDA runtime version: No CUDA\nCUDA_MODULE_LOADING set to: N/A\nGPU models and configuration: No CUDA\nNvidia driver version: No CUDA\ncuDNN version: No CUDA\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture:                         x86_64\nCPU op-mode(s):                       32-bit, 64-bit\nAddress sizes:                        46 bits physical, 57 bits virtual\nByte Order:                           Little Endian\nCPU(s):                               20\nOn-line CPU(s) list:                  0-19\nVendor ID:                            GenuineIntel\nBIOS Vendor ID:                       QEMU\nModel name:                           Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz\nBIOS Model name:                      pc-q35-4.2\nCPU family:                           6\nModel:                                106\nThread(s) per core:                   2\nCore(s) per socket:                   5\nSocket(s):                            2\nStepping:                             6\nBogoMIPS:                             4589.21\nFlags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq dtes64 vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear flush_l1d arch_capabilities\nVirtualization:                       VT-x\nHypervisor vendor:                    KVM\nVirtualization type:                  full\nL1d cache:                            480 KiB (10 instances)\nL1i cache:                            320 KiB (10 instances)\nL2 cache:                             12.5 MiB (10 instances)\nL3 cache:                             120 MiB (2 instances)\nNUMA node(s):                         1\nNUMA node0 CPU(s):                    0-19\nVulnerability Gather data sampling:   Not affected\nVulnerability Itlb multihit:          Not affected\nVulnerability L1tf:                   Not affected\nVulnerability Mds:                    Not affected\nVulnerability Meltdown:               Not affected\nVulnerability Mmio stale data:        Mitigation; Clear CPU buffers; SMT Host state unknown\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed:               Not affected\nVulnerability Spec rstack overflow:   Not affected\nVulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl\nVulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop\nVulnerability Srbds:                  Not affected\nVulnerability Tsx async abort:        Mitigation; TSX disabled\n\nVersions of relevant libraries:\n[pip3] habana-torch-dataloader==1.21.0.555\n[pip3] habana-torch-plugin==1.21.0.555\n[pip3] numpy==1.26.4\n[pip3] pytorch-lightning==2.5.1.post0\n[pip3] torch==2.6.0+hpu.1.21.0.555.gitabf798b\n[pip3] torch_tb_profiler==0.4.0\n[pip3] torchaudio==2.6.0+cpu\n[pip3] torchdata==0.10.1+cpu\n[pip3] torchmetrics==1.7.1\n[pip3] torchsde==0.2.6\n[pip3] torchtext==0.18.0+cpu\n[pip3] torchvision==0.21.0+cpu\n[conda] Could not collect",
  "transformers_version": "4.51.3",
  "upper_git_hash": null,
  "tokenizer_pad_token": [
    "<pad>",
    "0"
  ],
  "tokenizer_eos_token": [
    "<eos>",
    "1"
  ],
  "tokenizer_bos_token": [
    "<bos>",
    "2"
  ],
  "eot_token_id": 1,
  "max_length": 985,
  "args": {
    "buckets": [
      16,
      32,
      64,
      128,
      189,
      284,
      384,
      985
    ],
    "output_file": "tmp.json",
    "tasks": [
      "piqa"
    ],.
.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

yafshar

@imangohari1 please address the comment. Except that everything looks good to me. Thanks for the clean up :)

LGTM!

@regisss please do the final check on this PR.

imangohari1 · 2025-08-19T17:08:14Z

@imangohari1 please address the comment. Except that everything looks good to me. Thanks for the clean up :)

LGTM!

@regisss please do the final check on this PR.

Done.

regisss

LGTM!

HuggingFaceDocBuilderDev · 2025-08-21T08:34:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

fea(gemma2): cleaned up old codes

17653f6

imangohari1 marked this pull request as ready for review August 19, 2025 16:34

imangohari1 requested a review from regisss as a code owner August 19, 2025 16:34

imangohari1 mentioned this pull request Aug 19, 2025

Added the SWA to Gemma2. #2210

Merged

3 tasks

yafshar reviewed Aug 19, 2025

View reviewed changes

Comment thread optimum/habana/transformers/models/gemma2/modeling_gemma2.py Outdated

yafshar approved these changes Aug 19, 2025

View reviewed changes

fea(pr): pr review

d153ac7

regisss approved these changes Aug 21, 2025

View reviewed changes

regisss merged commit 5877b21 into huggingface:main Aug 21, 2025
2 of 4 checks passed

astachowiczhabana pushed a commit that referenced this pull request Aug 29, 2025

fea(gemma2): cleaned up old codes (#2212)

ec66d96

astachowiczhabana pushed a commit that referenced this pull request Sep 17, 2025

fea(gemma2): cleaned up old codes (#2212)

70f13ee

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

fea(gemma2): cleaned up old codes (huggingface#2212) (huggingface#608)

3abe391

Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fea(gemma2): cleaned up old codes#2212

fea(gemma2): cleaned up old codes#2212
regisss merged 2 commits into
huggingface:mainfrom
imangohari1:ig/gemma2-cleanups

imangohari1 commented Aug 19, 2025

Uh oh!

Uh oh!

yafshar left a comment

Uh oh!

imangohari1 commented Aug 19, 2025

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

imangohari1 commented Aug 19, 2025

What does this PR do?

conducted Tests

CI (only Gemma2)

Readme tests

LM_eval with 8k max new token

Before submitting

Uh oh!

Uh oh!

yafshar left a comment

Choose a reason for hiding this comment

Uh oh!

imangohari1 commented Aug 19, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants