Skip to content

Mark scale as const and remove --fp8 flag usage#156

Merged
MrGeva merged 3 commits into
habana-mainfrom
dev/ytoms111
Apr 16, 2024
Merged

Mark scale as const and remove --fp8 flag usage#156
MrGeva merged 3 commits into
habana-mainfrom
dev/ytoms111

Conversation

@Yantom1
Copy link
Copy Markdown

@Yantom1 Yantom1 commented Apr 10, 2024

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@Yantom1 Yantom1 changed the title Dev/ytoms111 Mark scale as const and remove --fp8 flag usage Apr 10, 2024
@Yantom1 Yantom1 requested review from dudilester and ulivne April 10, 2024 13:30
set_seed(args.seed)
get_repo_root(args.model_name_or_path, local_rank=args.local_rank, token=args.token)
use_deepspeed = args.world_size > 0
if use_deepspeed or args.bf16 or args.fp8:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes behavior than what we had before, is it tested on 7b? (seems like the only way to be false)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always set --bf16 even in fp8 runs so it should work.

import habana_frameworks.torch.core as htcore

if args.fp8:
if args.quant_config:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true also when performing a measurement, was it tested?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

if args.const_serialization_path:
setup_const_serialization(args.const_serialization_path)
if args.fp8:
if args.quant_config:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in measurement

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment thread examples/text-generation/utils.py Outdated
const_marking = os.getenv("ENABLE_CONST_MARKING", "True")
if const_marking == "True":
htcore.hpu_initialize(model)
htcore.hpu_initialize(model, mark_only_scales_as_const=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we mark only scales as const we may lose some constant folding optimization (in theory).
for example we may lose transpose on constant weitghs, i think this happens in SDXL.

So anyway did you tested it for performance right ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulivne it was tested on llama 70B and 7B.
the transpose on weights now happen in HQT patched module init so it should be fine

@Yantom1 Yantom1 requested review from HolyFalafel and ulivne April 11, 2024 15:48
set_seed(args.seed)
get_repo_root(args.model_name_or_path, local_rank=args.local_rank, token=args.token)
use_deepspeed = args.world_size > 0
if use_deepspeed or args.bf16 or args.fp8:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always set --bf16 even in fp8 runs so it should work.

Comment thread examples/text-generation/utils.py Outdated
const_marking = os.getenv("ENABLE_CONST_MARKING", "True")
if const_marking == "True":
htcore.hpu_initialize(model)
htcore.hpu_initialize(model, mark_only_scales_as_const=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulivne it was tested on llama 70B and 7B.
the transpose on weights now happen in HQT patched module init so it should be fine

Comment thread examples/text-generation/utils.py Outdated
@@ -102,7 +102,7 @@ def setup_inference(args, model):
print("Initializing inference mode")
const_marking = os.getenv("ENABLE_CONST_MARKING", "True")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yantom1 you should remove it. we should not use this variable anymore. we should always call hpu_initialize in this function. as QA set this flag to false, did you test it correctly?

Change-Id: I6dba8691d842fc62d09da5202ea1e61a111f5f18
@MrGeva MrGeva dismissed HolyFalafel’s stale review April 16, 2024 17:37

it was tested

@MrGeva MrGeva merged commit 8bfd6ef into habana-main Apr 16, 2024
@MrGeva MrGeva deleted the dev/ytoms111 branch April 16, 2024 17:38
astachowiczhabana pushed a commit that referenced this pull request Apr 19, 2024
* Mark only scales as const

* remove --fp8 flag usage from llama

* removed usage of ENABLE_CONST_MARKING

Change-Id: I6dba8691d842fc62d09da5202ea1e61a111f5f18

---------

Co-authored-by: Eran Geva <egeva@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request Apr 22, 2024
* Mark only scales as const

* remove --fp8 flag usage from llama

* removed usage of ENABLE_CONST_MARKING

Change-Id: I6dba8691d842fc62d09da5202ea1e61a111f5f18

---------

Co-authored-by: Eran Geva <egeva@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* Mark only scales as const

* remove --fp8 flag usage from llama

* removed usage of ENABLE_CONST_MARKING

Change-Id: I6dba8691d842fc62d09da5202ea1e61a111f5f18

---------

Co-authored-by: Eran Geva <egeva@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* Mark only scales as const

* remove --fp8 flag usage from llama

* removed usage of ENABLE_CONST_MARKING

Change-Id: I6dba8691d842fc62d09da5202ea1e61a111f5f18

---------

Co-authored-by: Eran Geva <egeva@habana.ai>
Yantom1 added a commit that referenced this pull request May 8, 2024
* Mark only scales as const

* remove --fp8 flag usage from llama

* removed usage of ENABLE_CONST_MARKING

Change-Id: I6dba8691d842fc62d09da5202ea1e61a111f5f18

---------

Co-authored-by: Eran Geva <egeva@habana.ai>
@astachowiczhabana
Copy link
Copy Markdown

huggingface#962

astachowiczhabana pushed a commit that referenced this pull request Mar 11, 2025
* Add memory, graph stats

* fix import formatting issues

* sort imports

* sort imports
astachowiczhabana pushed a commit that referenced this pull request Mar 31, 2025
* Add memory, graph stats

* fix import formatting issues

* sort imports

* sort imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants