You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
made it so you can quantize on cpu rather than cuda. Added options to
change batch_size and max_length and added -q for quantize
Test Plan:
python hf_eval.py --limit 8 -q int8wo --batch_size 8 --max_length 20 --compile
python hf_eval.py --limit 8 -q int8wo --batch_size 8 --max_length 200 --compile
Reviewers:
Subscribers:
Tasks:
Tags:
parser.add_argument('--compile', action='store_true', help='Whether to compile the model.')
61
+
parser.add_argument('--batch_size', type=int, default=1, help='Batch size to use for evaluation, note int8wo and int4wo work best with small batchsizes, int8dq works better with large batchsizes')
62
+
parser.add_argument('--max_length', type=int, default=None, help='Length of text to process at one time')
0 commit comments