-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sample_idx in InputRequest for debugging #32
Conversation
f8ad178
to
d283ee8
Compare
d283ee8
to
741d9e7
Compare
|
||
tokenized_dataset = tokenize_dataset(dataset, tokenizer) | ||
sampled_dataset = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, numpy has numpy.take
?
@@ -98,6 +98,7 @@ class InputRequest: | |||
prompt_len: int = 0 | |||
output: str = "" | |||
output_len: int = 0 | |||
sample_idx: int = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is right mapping between request and output in benchmark. what is the purpose of sample_idx? Will it help you find the related request easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, your goal is to be able to print the input prompt for the output for debugging right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal is to be able to locate the original order / index from the original dataset file.
Say, if there are 10k data samples in the original dataset file, after randomly sampling, being passed as input request into server, the requests are returned based on decode completions. Currently we save the prompt, original result, and generated result in the request output file. If there are other metadata I am interested in checking in the original dataset file. How can I locate them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, it's hard to locate original dateset. Please also feel free to add the important metadata into the result.
No description provided.