Skip to content

[SW-191023][PyTorch][Optimum-Habana-fork]: Add flag to run inference with partial dataset#281

Merged
pramodkumar-habanalabs merged 1 commit into
HabanaAI:habana-mainfrom
pramodkumar-habanalabs:run-partial-dataset
Jul 11, 2024
Merged

[SW-191023][PyTorch][Optimum-Habana-fork]: Add flag to run inference with partial dataset#281
pramodkumar-habanalabs merged 1 commit into
HabanaAI:habana-mainfrom
pramodkumar-habanalabs:run-partial-dataset

Conversation

@pramodkumar-habanalabs
Copy link
Copy Markdown

What does this PR do?

While running the inference with a dataset (ex: Alpaca that has 52000 prompts), executing the full dataset takes longer time. For various experiment purposes running inference with the full dataset is not desirable. Add a command line argument to run the inference with the partial dataset.

Flag ' -- run_partial_dataset' that runs the inference with the dataset for specified --n_iterations(default:5)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating a new argument --run_partial_dataset , can you use existing parameters by combining args.dataset_name & args.n_iterations ?

@ssarkar2
Copy link
Copy Markdown

ssarkar2 commented Jul 8, 2024

regd: "can you use existing parameters by combining args.dataset_name & args.n_iterations "

Using dataset_name is a bit fragile. This change is to make Alpaca run less number of prompts, but tomorrow there might be some other dataset. Is it feasible/clean to maintain a list of datasets ?

@pramodkumar-habanalabs
Copy link
Copy Markdown
Author

I agree that using 'args.dataset_name' doesn't look like a clean approach. And, '--n_iterations' alone can't be used for partial execution. @ssarkar2 why do we need to maintain a list of datasets and how does it help in the current problem statement.

Also, why using a new arg '--run_partial_dataset' seems problematic. (maintainability ?). BTW, it works well with any dataset, isn't it?

@ghost
Copy link
Copy Markdown

ghost commented Jul 9, 2024

My point was to check if args.dataset_name is not Null and then use that condition with args.n_iterations

I didn't intend to check for alpaca dataset. We should not check for datasets name to any fixed list.

@ghost ghost self-requested a review July 9, 2024 12:10
@pramodkumar-habanalabs pramodkumar-habanalabs merged commit 25b24f8 into HabanaAI:habana-main Jul 11, 2024
@pramodkumar-habanalabs pramodkumar-habanalabs deleted the run-partial-dataset branch July 11, 2024 06:14
kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 15, 2024
astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
…with partial dataset (#281)

Change-Id: Ia60d09b595b5a320157025410dc157a4c8c862e3
astachowiczhabana pushed a commit that referenced this pull request Oct 15, 2024
…with partial dataset (#281)

Change-Id: Ia60d09b595b5a320157025410dc157a4c8c862e3
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
…with partial dataset (#281)

Change-Id: Ia60d09b595b5a320157025410dc157a4c8c862e3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants