Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ def test_sglang_moe_parity_strict(self):

ref = REFERENCE_STATS[i]
# Epsilon to allow room for different, but correct, implementations
eps = 1e-4
eps = 2e-4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This epsilon value is hardcoded. To improve readability and make it easier to adjust in the future, consider defining it as a constant at the module level, e.g., LOGPROB_DIFF_EPSILON = 2e-4.


# Assertions
self.assertEqual(v_text, s_text, f"String mismatch on prompt {i}")
Expand Down
12 changes: 6 additions & 6 deletions test/registered/quant/test_deepseek_v32_fp4_4gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ def test_a_gsm8k(
args = SimpleNamespace(
num_shots=20,
data_path=None,
num_questions=1319,
parallel=1319,
num_questions=500,
parallel=500,
max_new_tokens=512,
host="http://127.0.0.1",
port=int(self.base_url.split(":")[-1]),
Expand All @@ -72,7 +72,7 @@ def test_a_gsm8k(
f"### test_gsm8k (deepseek-v3-fp4)\n" f'{metrics["accuracy"]=:.3f}\n'
)

self.assertGreater(metrics["accuracy"], 0.935)
self.assertGreater(metrics["accuracy"], 0.93)

def test_bs_1_speed(self):
args = BenchArgs(port=int(self.base_url.split(":")[-1]), max_new_tokens=2048)
Expand Down Expand Up @@ -125,8 +125,8 @@ def test_a_gsm8k(
args = SimpleNamespace(
num_shots=20,
data_path=None,
num_questions=1319,
parallel=1319,
num_questions=500,
parallel=500,
Comment on lines +128 to +129
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test_a_gsm8k method is duplicated from TestDeepseekV32FP4DP. To improve maintainability, consider refactoring the common logic into a helper function or a mixin.

Within this duplicated method, several values are hardcoded:

  • num_questions and parallel here.
  • The accuracy threshold on line 142.

Defining these as module-level constants would avoid magic numbers and make future adjustments easier.

max_new_tokens=512,
host="http://127.0.0.1",
port=int(self.base_url.split(":")[-1]),
Expand All @@ -139,7 +139,7 @@ def test_a_gsm8k(
f"### test_gsm8k (deepseek-v3-fp4)\n" f'{metrics["accuracy"]=:.3f}\n'
)

self.assertGreater(metrics["accuracy"], 0.935)
self.assertGreater(metrics["accuracy"], 0.93)

def test_bs_1_speed(self):
args = BenchArgs(port=int(self.base_url.split(":")[-1]), max_new_tokens=2048)
Expand Down
4 changes: 2 additions & 2 deletions test/registered/quant/test_deepseek_v32_fp4_mtp_4gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def test_a_gsm8k(
f'{metrics["accuracy"]=:.3f}\n'
f"{avg_spec_accept_length=:.2f}\n"
)
self.assertGreater(metrics["accuracy"], 0.94)
self.assertGreater(metrics["accuracy"], 0.93)
self.assertGreater(avg_spec_accept_length, 2.7)

def test_bs_1_speed(self):
Expand Down Expand Up @@ -185,7 +185,7 @@ def test_a_gsm8k(
f'{metrics["accuracy"]=:.3f}\n'
f"{avg_spec_accept_length=:.2f}\n"
)
self.assertGreater(metrics["accuracy"], 0.94)
self.assertGreater(metrics["accuracy"], 0.93)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test_a_gsm8k method is duplicated from TestDeepseekV32FP4DPSpecV2. To improve maintainability, this duplicated logic should be refactored into a shared helper or mixin.

Also, this hardcoded accuracy threshold (and the one on line 98) should be defined as a module-level constant.

self.assertGreater(avg_spec_accept_length, 2.7)

def test_bs_1_speed(self):
Expand Down
Loading