[SW-191023][PyTorch][Optimum-Habana-fork]: Add flag to run inference with partial dataset#281
Conversation
…with partial dataset
ghost
left a comment
There was a problem hiding this comment.
instead of creating a new argument --run_partial_dataset , can you use existing parameters by combining args.dataset_name & args.n_iterations ?
|
regd: "can you use existing parameters by combining args.dataset_name & args.n_iterations " Using dataset_name is a bit fragile. This change is to make Alpaca run less number of prompts, but tomorrow there might be some other dataset. Is it feasible/clean to maintain a list of datasets ? |
|
I agree that using 'args.dataset_name' doesn't look like a clean approach. And, '--n_iterations' alone can't be used for partial execution. @ssarkar2 why do we need to maintain a list of datasets and how does it help in the current problem statement. Also, why using a new arg '--run_partial_dataset' seems problematic. (maintainability ?). BTW, it works well with any dataset, isn't it? |
|
My point was to check if args.dataset_name is not Null and then use that condition with args.n_iterations I didn't intend to check for alpaca dataset. We should not check for datasets name to any fixed list. |
…with partial dataset (HabanaAI#281)
…with partial dataset (#281) Change-Id: Ia60d09b595b5a320157025410dc157a4c8c862e3
…with partial dataset (#281) Change-Id: Ia60d09b595b5a320157025410dc157a4c8c862e3
…with partial dataset (#281) Change-Id: Ia60d09b595b5a320157025410dc157a4c8c862e3
What does this PR do?
While running the inference with a dataset (ex: Alpaca that has 52000 prompts), executing the full dataset takes longer time. For various experiment purposes running inference with the full dataset is not desirable. Add a command line argument to run the inference with the partial dataset.
Flag ' -- run_partial_dataset' that runs the inference with the dataset for specified --n_iterations(default:5)
Before submitting