Skip to content

Adding memory and graph stats (#156)#1858

Merged
regisss merged 3 commits into
huggingface:mainfrom
HabanaAI:auto-pr-1a05f01
Apr 17, 2025
Merged

Adding memory and graph stats (#156)#1858
regisss merged 3 commits into
huggingface:mainfrom
HabanaAI:auto-pr-1a05f01

Conversation

@jaygala223
Copy link
Copy Markdown

  • Add memory, graph stats

  • fix import formatting issues

  • sort imports

  • sort imports

What does this PR do?

Prints stats like graph compilation duration, num graphs, and memory at the end of the run

* Add memory, graph stats

* fix import formatting issues

* sort imports

* sort imports
Copy link
Copy Markdown
Contributor

@vidyasiv vidyasiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR @jaygala223 !
Could you run inference 1 HPU and 8 HPU from https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text#inference-with-mixed-precision-bf16 with your changes and paste the results please?

Comment thread examples/image-to-text/run_pipeline.py
Comment thread examples/image-to-text/run_pipeline.py Outdated
@jaygala223
Copy link
Copy Markdown
Author

jaygala223 commented Mar 21, 2025

Hi @vidyasiv, thanks for reviewing my PR. I will attach the screenshots and address your comments.

@jaygala223
Copy link
Copy Markdown
Author

Here is a screenshot of what it looks like

image

@vidyasiv
Copy link
Copy Markdown
Contributor

1 HPU README testing

python3 examples/image-to-text/run_pipeline.py \
    --model_name_or_path meta-llama/Llama-3.2-11B-Vision-Instruct \
    --use_hpu_graphs \
    --bf16 \
    --sdp_on_bf16

Output

============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_HPU_RECIPE_CACHE_CONFIG = ,false,1024
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
 PT_HPU_EAGER_PIPELINE_ENABLE = 1
 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
 PT_HPU_ENABLE_LAZY_COLLECTIVES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM       : 1056398140 KB
------------------------------------------------------------------------------
The model 'GaudiMllamaForConditionalGeneration' is not supported for image-to-text. Supported models are ['BlipForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'GitForCausalLM', 'Idefics2ForConditionalGeneration', 'InstructBlipForConditionalGeneration', 'InstructBlipVideoForConditionalGeneration', 'Kosmos2ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MllamaForConditionalGeneration', 'PaliGemmaForConditionalGeneration', 'Pix2StructForConditionalGeneration', 'Qwen2VLForConditionalGeneration', 'VideoLlavaForConditionalGeneration', 'VipLlavaForConditionalGeneration', 'VisionEncoderDecoderModel'].
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:601: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:606: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
03/28/2025 17:09:19 - INFO - __main__ - result = [[{'generated_text': 'user\n\nWhat is shown in this image?assistant\n\nThe image depicts a serene lake scene, featuring a long wooden dock extending into the water, surrounded by lush trees and mountains in the background. The dock is made of weathered wooden planks and stretches out into the calm, reflective water, creating a sense of depth and tranquility. The surrounding landscape is characterized by dense green trees and rolling hills, with a majestic mountain range visible in the distance. The sky above is overcast, adding to the peaceful ambiance of the scene. Overall, the image'}]]
03/28/2025 17:09:19 - INFO - __main__ - time = 2097.766735998448ms, Throughput (including tokenization) = 44.33286046731785 tokens/second

Stats:
--------------------------------------------------------------------------------------------------------------

Throughput (including tokenization) = 44.33286046731785 tokens/second
Number of HPU graphs                = 0
Memory allocated                    = 23.42 GB
Max memory allocated                = 23.45 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 10.795492552977521 seconds
--------------------------------------------------------------------------------------------------------------

8 HPU README testing

PT_HPU_ENABLE_LAZY_COLLECTIVES=true python examples/gaudi_spawn.py --use_deepspeed --world_size 8 examples/image-to-text/run_pipeline.py \
    --model_name_or_path meta-llama/Llama-3.2-90B-Vision-Instruct \
    --image_path "https://llava-vl.github.io/static/images/view.jpg" \
    --use_hpu_graphs \
    --bf16 \
    --use_flash_attention \
    --flash_attention_recompute

Output

<snip>
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_HPU_RECIPE_CACHE_CONFIG = ,false,1024
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 0
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
 PT_HPU_EAGER_PIPELINE_ENABLE = 1
 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
 PT_HPU_ENABLE_LAZY_COLLECTIVES = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM       : 1056398140 KB
------------------------------------------------------------------------------

<snip>
The model 'GaudiMllamaForConditionalGeneration' is not supported for image-to-text. Supported models are ['BlipForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'GitForCausalLM', 'Idefics2ForConditionalGeneration', 'InstructBlipForConditionalGeneration', 'InstructBlipVideoForConditionalGeneration', 'Kosmos2ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MllamaForConditionalGeneration', 'PaliGemmaForConditionalGeneration', 'Pix2StructForConditionalGeneration', 'Qwen2VLForConditionalGeneration', 'VideoLlavaForConditionalGeneration', 'VipLlavaForConditionalGeneration', 'VisionEncoderDecoderModel'].
The model 'GaudiMllamaForConditionalGeneration' is not supported for image-to-text. Supported models are ['BlipForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'GitForCausalLM', 'Idefics2ForConditionalGeneration', 'InstructBlipForConditionalGeneration', 'InstructBlipVideoForConditionalGeneration', 'Kosmos2ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MllamaForConditionalGeneration', 'PaliGemmaForConditionalGeneration', 'Pix2StructForConditionalGeneration', 'Qwen2VLForConditionalGeneration', 'VideoLlavaForConditionalGeneration', 'VipLlavaForConditionalGeneration', 'VisionEncoderDecoderModel'].
<snip>

/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:606: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
....
<snip>
03/28/2025 17:24:19 - INFO - __main__ - result = [[{'generated_text': 'user\n\nWhat is shown in this image?assistant\n\nThe image depicts a serene lake scene, with a wooden dock extending into the water. The dock is made of light-colored wood and features a railing on either side, although it appears to be missing in some areas. It stretches out from the foreground towards the background, where it meets a larger platform or dock.\n\nIn the background, there are trees lining the shore, and a mountain range can be seen in the distance. The sky above is overcast, with clouds covering most of the sun. The'}]]
03/28/2025 17:24:19 - INFO - __main__ - time = 4694.892791402526ms, Throughput (including tokenization) = 19.8087590349891 tokens/second

Stats:
-------------------------------------------------------------------------------------------------------------

Throughput (including tokenization) = 19.8087590349891 tokens/second
Number of HPU graphs                = 0
Memory allocated                    = 27.34 GB
Max memory allocated                = 28.22 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 20.29410206899047 seconds
-------------------------------------------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

@vidyasiv vidyasiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss please review if output is acceptable.

stats = ""
stats = stats + f"\nThroughput (including tokenization) = {throughput} tokens/second"
stats = stats + f"\nNumber of HPU graphs = {count_hpu_graphs()}"
separator = "-" * len(stats)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

@jaygala223 jaygala223 Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @libinta, thanks for the review. I have not used this before and for this PR I took reference from for the following:

stats = f"Throughput (including tokenization) = {throughput} tokens/second"

image

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Apr 17, 2025

@jaygala223 Can you merge the main branch into yours to make sure it's up to date please? The doc build workflow failed because of this

@jaygala223
Copy link
Copy Markdown
Author

Sure

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit 2e30261 into huggingface:main Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants