Conversation
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
📝 WalkthroughWalkthroughThis PR introduces a comprehensive Lean 4 prover system, including new utilities for Lean4 text processing and error handling, a ProverTask class implementing multi-stage proof generation with refinement capabilities, a CLI command for distributed task execution, and associated prompt configurations. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as Pipeline CLI
participant Task as ProverTask
participant LLM
participant Sandbox as Sandbox Executor
participant Logger as Logger
User->>CLI: invoke generate command
CLI->>CLI: resolve generation module<br/>(ProverTask)
CLI->>CLI: compute seeds & chunks
CLI->>Task: create ProverTask instance<br/>with ProverConfig
loop for each datapoint
Task->>Task: _single_data_point_generate()
loop for each pass (pass_at_N)
Task->>LLM: get initial generation<br/>(with adaptive reasoning)
LLM-->>Task: raw proof generation
Task->>Task: extract & parse<br/>Lean4 code
alt parsing succeeds
Task->>Sandbox: execute proof code
Sandbox-->>Task: execution result
alt proof succeeds
Task->>Logger: log success
Note over Task: Save result
else execution error/timeout
Task->>Logger: log error
Task->>Task: parse error message
alt refinement enabled
Task->>LLM: generate refinement<br/>(with error context)
LLM-->>Task: refined proof
Task->>Task: replace proof in statement
Task->>Sandbox: re-execute refined proof
Sandbox-->>Task: execution result
end
end
else parsing fails
Task->>Logger: log parse error
alt refinement enabled
Task->>LLM: request correction<br/>(with parse error)
LLM-->>Task: corrected proof
end
end
end
Task->>Logger: finalize datapoint results
end
CLI->>CLI: run experiment via pipeline_utils
CLI-->>User: return results/tasks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
nemo_skills/inference/lean4_utils.py (2)
101-117: Consider returningNoneinstead of string"None".The function returns the literal string
"None"when no code block is found, which can be confusing and error-prone for callers who might check for Python'sNone. Consider returningNoneinstead:- return "None" + return NoneCallers would then use
if result is None:instead ofif result == "None":.Also note: the hardcoded
import_headmay not suit all Lean4 use cases. Consider making it configurable if needed for different proof contexts.
127-130: Hardcoded path/lean4/my_project/may limit portability.The error pattern assumes Lean4 errors come from a specific
/lean4/my_project/path. This works in a containerized environment but may fail in other setups. Consider making the path prefix configurable or using a more flexible pattern if this utility needs to work outside the expected container environment.nemo_skills/pipeline/prover.py (1)
253-262: Variable shadowing:chunk_idsreused in loop.At line 254,
chunk_idsfrom the outer scope (computed at line 198-200) is shadowed by iterating overremaining_jobs.items(). This works but could cause confusion. Consider renaming:- for seed_idx, (seed, chunk_ids) in enumerate(remaining_jobs.items()): + for seed_idx, (seed, seed_chunk_ids) in enumerate(remaining_jobs.items()): if wandb_parameters: # no need for chunks as it will run after merging wandb_parameters["samples_file"] = pipeline_utils.get_chunked_rs_filename( output_dir, random_seed=seed, chunk_id=None, ) - for chunk_id in chunk_ids: + for chunk_id in seed_chunk_ids:nemo_skills/inference/prover.py (3)
104-106: Add validation for tokenizer path.If both
cfg.tokenizerisNoneandcfg.server.get("model")returnsNone,AutoTokenizer.from_pretrained(None)will raise an unclear error. Consider adding explicit validation.# Initialize tokenizer for chat template application tokenizer_path = self.cfg.tokenizer or self.cfg.server.get("model") + if tokenizer_path is None: + raise ValueError("tokenizer or server.model must be specified for chat template application") self.hf_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
286-304: Dead code: unreachable else branch.The
elsebranch at lines 299-304 is unreachable because the preceding conditions (code == "None"and"**Error**" in full_code) are exhaustive for the outerifcondition.if code == "None" or "**Error**" in full_code: if code == "None": execution_result = { "process_status": "failed", "stderr": "", "stdout": "Parsing error. Cannot parse the code from output. Please try again and write the code in the format of ```lean4\n<code>\n```", } - elif "**Error**" in full_code: + else: # "**Error**" in full_code execution_result = { "process_status": "failed", "stderr": "", "stdout": full_code, } - else: - execution_result = { - "process_status": "failed", - "stderr": "", - "stdout": "Unknown error when parsing code.", - }
312-313: Consider making sandbox timeout configurable.The timeout of
600.0seconds is hardcoded. For long-running proofs or faster iteration during development, this could be a configuration parameter.@nested_dataclass(kw_only=True) class ProverConfig(GenerateSolutionsConfig): max_tokens: int = 40960 # model max tokens n_pass: int = 1 # number of passes to run the prover + sandbox_timeout: float = 600.0 # timeout for Lean4 code execution in secondsThen use
self.cfg.sandbox_timeoutinstead of the hardcoded value.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
nemo_skills/inference/factory.py(1 hunks)nemo_skills/inference/lean4_utils.py(1 hunks)nemo_skills/inference/prover.py(1 hunks)nemo_skills/pipeline/prover.py(1 hunks)nemo_skills/prompt/config/lean4/formal-proof-deepseek-prover-v2-nemotron.yaml(1 hunks)nemo_skills/prompt/config/lean4/goedel-prover-v2-nemotron.yaml(1 hunks)nemo_skills/prompt/config/lean4/goedel-prover-v2-refinement-nemotron.yaml(1 hunks)nemo_skills/prompt/config/lean4/goedel-prover-v2-refinement.yaml(1 hunks)nemo_skills/prompt/config/lean4/goedel-prover-v2.yaml(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
nemo_skills/inference/prover.py (5)
nemo_skills/inference/model/base.py (1)
EndpointType(38-41)nemo_skills/prompt/utils.py (1)
get_prompt(410-443)nemo_skills/utils.py (4)
get_help_message(341-361)get_logger_name(39-43)nested_dataclass(69-102)parse_reasoning(49-66)nemo_skills/pipeline/prover.py (1)
generate(39-328)nemo_skills/inference/lean4_utils.py (5)
extract_code(101-117)get_error_str(149-205)parse_error(120-146)refine_by_sorry(80-98)replace_statement_in_proof(64-77)
nemo_skills/inference/lean4_utils.py (2)
nemo_skills/utils.py (1)
get_logger_name(39-43)nemo_skills/inference/patch_litellm_logging.py (1)
start(37-38)
nemo_skills/pipeline/prover.py (10)
nemo_skills/dataset/utils.py (1)
import_from_path(60-67)nemo_skills/inference/factory.py (1)
GenerationType(18-22)nemo_skills/pipeline/app.py (1)
typer_unpacker(25-53)nemo_skills/utils.py (2)
compute_chunk_ids(456-484)str_ids_to_list(428-453)nemo_skills/pipeline/utils/server.py (2)
should_get_random_port(62-63)get_server_command(114-227)nemo_skills/pipeline/utils/cluster.py (1)
get_cluster_config(314-368)nemo_skills/pipeline/utils/mounts.py (1)
resolve_mount_paths(317-359)nemo_skills/inference/generate.py (2)
get_generation_default_args(243-251)get_server_command_fn(254-264)nemo_skills/pipeline/utils/generation.py (1)
get_remaining_jobs(60-175)nemo_skills/pipeline/utils/exp.py (3)
get_exp(753-769)add_task(416-702)run_exp(705-750)
🪛 Ruff (0.14.7)
nemo_skills/inference/prover.py
75-75: Avoid specifying long messages outside the exception class
(TRY003)
83-83: Avoid specifying long messages outside the exception class
(TRY003)
86-86: Avoid specifying long messages outside the exception class
(TRY003)
114-114: Unused method argument: data
(ARG002)
119-119: Avoid specifying long messages outside the exception class
(TRY003)
205-205: Unused method argument: data
(ARG002)
268-273: Consider iterable unpacking instead of concatenation
(RUF005)
384-384: Loop control variable i not used within loop body
(B007)
nemo_skills/pipeline/prover.py
58-58: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
69-69: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
110-112: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
160-162: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
191-191: Avoid specifying long messages outside the exception class
(TRY003)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
230-233: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (16)
nemo_skills/inference/factory.py (1)
22-22: LGTM!The new
proverenum member and module mapping follow the established patterns for other generation types in this file.Also applies to: 29-29
nemo_skills/prompt/config/lean4/goedel-prover-v2.yaml (1)
19-27: LGTM!The prompt configuration is well-structured with a clear workflow: present the problem, request a proof plan, then generate the formal proof. The
{problem}placeholder follows the expected templating convention.nemo_skills/prompt/config/lean4/goedel-prover-v2-refinement.yaml (1)
19-24: Verify: Missing{proof_attempt}placeholder.This refinement prompt includes
{error_message}but not{proof_attempt}. Compare withgoedel-prover-v2-refinement-nemotron.yamlwhich includes both placeholders. If the proof attempt is meant to be provided via conversation history rather than template substitution, this is fine. Otherwise, users won't see the original proof that failed.nemo_skills/prompt/config/lean4/goedel-prover-v2-nemotron.yaml (1)
19-28: LGTM!The nemotron variant provides clear two-phase instructions (proof plan followed by code block) with explicit output formatting requirements. This aligns well with the
extract_codefunction inlean4_utils.pythat expects ````lean4` markers.nemo_skills/prompt/config/lean4/goedel-prover-v2-refinement-nemotron.yaml (1)
19-28: LGTM!The refinement prompt correctly includes both the previous
{proof_attempt}and{error_message}placeholders, giving the model full context for correction. The explicit output formatting aligns with the code extraction logic.nemo_skills/prompt/config/lean4/formal-proof-deepseek-prover-v2-nemotron.yaml (1)
19-29: LGTM!The deepseek-style prompt uses appropriate placeholders (
{header},{informal_prefix},{formal_statement}) with asorryplaceholder indicating where the proof needs to be completed. The two-phase instruction structure is consistent with other nemotron variants.nemo_skills/pipeline/prover.py (2)
222-223: Default toGenerationType.proverfor this prover-specific command?The fallback defaults to
GenerationType.generatewhen neithergeneration_modulenorgeneration_typeis specified. Since this isprover.py, consider defaulting toGenerationType.proverinstead:if generation_module is None: - generation_module = GENERATION_MODULE_MAP[generation_type or GenerationType.generate] + generation_module = GENERATION_MODULE_MAP[generation_type or GenerationType.prover]Alternatively, if this is intentionally a generic generation pipeline, the file might be better named to reflect that generality.
37-163: Static analysis notes: B008 and TRY003 are acceptable here.The Ruff B008 warnings about
typer.Optioncalls in default arguments are false positives - this is the standard Typer pattern required for CLI option definitions. The TRY003 warnings about long exception messages are minor style preferences that don't affect functionality.nemo_skills/inference/lean4_utils.py (1)
182-187: Potential issue with loop variablejscope.At line 186,
jis used outside the for-loop to calculateleading_spaces, but ifend_line <= start_line + 1, the loop body never executes andjwould be undefined (NameError). This edge case could occur when the error spans exactly 2 lines.else: show_line = 6 + j = start_line # Initialize in case loop doesn't execute for j in range(start_line + 1, min(end_line, start_line + show_line)): error_code += f"{code_lines[j]}\n" if end_line > start_line + show_line: leading_spaces = len(code_lines[j]) - len(code_lines[j].lstrip(" "))Likely an incorrect or invalid review comment.
nemo_skills/inference/prover.py (7)
15-51: Imports and module setup look appropriate.The imports are well-organized and the
reasoning_effort_listconstant is appropriately scoped at module level for use in the adaptive reasoning logic.
114-123: Method overrides are appropriate.The empty
log_example_promptand thedataparameter are inherited interface requirements. Thesetup_llmcorrectly differentiates between generic code execution (rejected) and Lean4-specific sandbox usage.
150-188: Adaptive reasoning logic is well-structured.The method correctly handles the inference config as both dataclass and dict, overrides endpoint type for pre-templated prompts, and implements a proper fallback loop for adaptive reasoning. The loop properly bounds at
reasoning_effort_index > 0.
379-394: pass_at_N logic is correct.The loop variable
iis used after the loop to computen_passcount. Theresults_dictcorrectly captures the last attempt's results. The static analysis warning about unusediis a false positive.
396-428: Entry points and exports are well-structured.The
process_single_datapointproperly wraps results,GENERATION_TASK_CLASSenables factory pattern registration, and the Hydra main provides proper CLI integration with help message support.
54-90: ProverConfig validation logic is sound.The configuration class properly extends
GenerateSolutionsConfigand validates Lean4-specific parameters. The_get_disallowed_params()method is correctly inherited from the parent class, and the validation in_post_init_validate_paramsensures proper configuration state.
138-148: No action needed—tokenizer is not required for refinement prompts.Both
refine_promptandnemotron_refine_promptare only used via thefill()method, which operates correctly without a tokenizer by returning a list of message dictionaries. The main prompt uses tokenizer for string formatting viaapply_chat_template, while refinement prompts use list-based message format. This is an intentional design pattern, not an oversight.
| async def _single_data_point_generate(self, data_point, data): | ||
| formal_statement = ( | ||
| (data_point["header"].strip() + "\n") | ||
| + data_point["informal_prefix"].strip() | ||
| + ("\n" + data_point["formal_statement"].strip()) | ||
| ) | ||
| formal_statement = refine_by_sorry(formal_statement) | ||
| prompt_turn_list = self.prompt.fill({"problem": formal_statement.strip()}) |
There was a problem hiding this comment.
Potential NullPointerError when prompt_format == "openai".
setup_prompt() returns None when prompt_format == "openai" (line 127), but _single_data_point_generate calls self.prompt.fill() unconditionally at line 212. This will cause an AttributeError.
Either guard the prompt access or ensure _post_init_validate_params enforces that prompt_format != "openai" when this code path is used. If OpenAI format is not intended for this prover workflow, consider failing earlier with a clear message.
🧰 Tools
🪛 Ruff (0.14.7)
205-205: Unused method argument: data
(ARG002)
🤖 Prompt for AI Agents
In nemo_skills/inference/prover.py around lines 205 to 212, self.prompt may be
None when setup_prompt() returned None for prompt_format == "openai", but
_single_data_point_generate calls self.prompt.fill() unconditionally which will
raise an AttributeError; fix by either (A) adding a guard at the start of
_single_data_point_generate that checks if self.prompt is None and raises a
clear, early ValueError/RuntimeError describing that this prover does not
support prompt_format "openai" (or falls back to an alternative prompt
creation), or (B) enforce during initialization in _post_init_validate_params
that prompt_format != "openai" (raise a clear exception at init time) so
self.prompt is guaranteed non-None when this method runs; implement one of these
two fixes and include a descriptive error message for easier debugging.
| feedback = self.refine_prompt.fill({"error_message": last_error_message}) | ||
| results_dict["feedback"] = feedback[0]["content"] | ||
| else: | ||
| # execute_code returns (result_dict, session_id) tuple | ||
| execution_result, _ = await self.sandbox.execute_code( | ||
| full_code, language="lean4", timeout=600.0, max_output_characters=1000000 | ||
| ) |
There was a problem hiding this comment.
Missing null checks for refine_prompt and sandbox.
-
self.refine_promptis accessed at line 308, but it's only initialized whencfg.refinementisTrue(line 108-109). If refinement is disabled but this code path is reached (parsing error), this will raiseAttributeError. -
self.sandbox.execute_code()is called at line 312, butself.sandboxcan beNoneifcfg.sandboxisNone(line 121).
Consider adding guards or ensuring these paths are only reachable when the required objects are initialized:
+ if not self.cfg.refinement or self.refine_prompt is None:
+ results_dict["feedback"] = None
+ else:
feedback = self.refine_prompt.fill({"error_message": last_error_message})
results_dict["feedback"] = feedback[0]["content"]For the sandbox, consider failing early if sandbox is required but not configured.
🤖 Prompt for AI Agents
In nemo_skills/inference/prover.py around lines 308 to 314, add null checks
before using self.refine_prompt and self.sandbox: guard access to
self.refine_prompt before calling fill() (if refinement is disabled, either skip
setting feedback or raise a clear RuntimeError/ValueError indicating refinement
is required for this code path), and validate self.sandbox before calling
execute_code (if sandbox is None and execution is required, raise a clear
configuration error early with a descriptive message); ensure these guards
produce deterministic behavior (either skip refinement safely or fail fast) so
AttributeError/NoneType errors cannot occur.
nemo_skills/pipeline/prover.py
Outdated
| """Generate LLM completions for a given input file. | ||
|
|
||
| Run `python -m nemo_skills.inference.generate --help` for other supported arguments | ||
| (need to be prefixed with ++, since we use Hydra for that script). | ||
| """ |
There was a problem hiding this comment.
Docstring references incorrect module.
The docstring mentions nemo_skills.inference.generate but this is prover.py. Consider updating the help reference:
"""Generate LLM completions for a given input file.
- Run `python -m nemo_skills.inference.generate --help` for other supported arguments
+ Run `python -m nemo_skills.inference.prover --help` for other supported arguments
(need to be prefixed with ++, since we use Hydra for that script).
"""🤖 Prompt for AI Agents
In nemo_skills/pipeline/prover.py around lines 164 to 168, the docstring
incorrectly references nemo_skills.inference.generate; update the help reference
to point to the correct module/script for this file (e.g.,
nemo_skills.pipeline.prover or the actual entrypoint that runs this prover) and
adjust the example command accordingly (preserve the note about ++-prefixed
Hydra args if applicable). Replace the outdated module name with the correct
module/CLI name and ensure the surrounding text remains accurate and concise.
gwarmstrong
left a comment
There was a problem hiding this comment.
@stephencge is it possible to rewrite this to use the ns generate pipeline and work by using a generation module, rather than standalone pipeline and inference script?
I see--this actually uses the generate pipeline already, but |
gwarmstrong
left a comment
There was a problem hiding this comment.
Requesting changes for concurrency control, as well as a variety of housekeeping items.
nemo_skills/inference/factory.py
Outdated
| GenerationType.generate: "nemo_skills.inference.generate", | ||
| GenerationType.math_judge: "nemo_skills.inference.llm_math_judge", | ||
| GenerationType.check_contamination: "nemo_skills.inference.check_contamination", | ||
| GenerationType.prover: "nemo_skills.inference.prover", |
There was a problem hiding this comment.
Do you think this warrants it's own generation type? I worry that having this as an option may be confusing because (a) as far as I understand this is really only for formal proofs and (b) this is a fairly specific proving workflow that might not apply to all proving. If we give it a generic type name like "prover" people may turn it on when they are doing proof generation that doesn't necessarily need this specific workflow.
If we remove from here, you just have to call the generation module specifically, e.g.,
generate(
...
generation_module="nemo_skills.inference.prover",
...
)
There was a problem hiding this comment.
yeah, we don't need it in main script, let's just use as an explicit option when calling eval
There was a problem hiding this comment.
removing from factory.py
nemo_skills/inference/lean4_utils.py
Outdated
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
There was a problem hiding this comment.
Can we move to a file like nemo_skills/code_execution/proof_utils.py?
nemo_skills/inference/lean4_utils.py
Outdated
| @@ -0,0 +1,205 @@ | |||
| # Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
| # Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. |
nemo_skills/inference/lean4_utils.py
Outdated
| # warning_pattern = re.compile( | ||
| # r"(/lean4/my_project/.*?:\d+:\d+: warning:.*?)(?=\n/lean4/my_project|\Z)", | ||
| # re.DOTALL, | ||
| # ) |
There was a problem hiding this comment.
| # warning_pattern = re.compile( | |
| # r"(/lean4/my_project/.*?:\d+:\d+: warning:.*?)(?=\n/lean4/my_project|\Z)", | |
| # re.DOTALL, | |
| # ) |
nemo_skills/inference/lean4_utils.py
Outdated
| re.DOTALL, | ||
| ) | ||
| # Find all warnings and errors | ||
| # warnings = warning_pattern.findall(log_string) |
There was a problem hiding this comment.
| # warnings = warning_pattern.findall(log_string) |
nemo_skills/inference/prover.py
Outdated
| llm = get_model(**self.cfg.server, tokenizer=self.tokenizer) | ||
| return llm | ||
|
|
||
| def setup_prompt(self): |
There was a problem hiding this comment.
to me, this looks the same as the super method, can you confirm and delete if redundant?
nemo_skills/inference/prover.py
Outdated
| generation_params["endpoint_type"] = EndpointType.text | ||
| for key, value in kwargs.items(): | ||
| generation_params[key] = value | ||
| generation = await self.llm.generate_async(**generation_params) |
There was a problem hiding this comment.
It looks like we're currently missing any kind of concurrency control. I think if you're working large datasets, this could put huge memory and network pressure on the nemo-skills process as well as the server.
We have a built-in functionality in GenerationTask to handle this called generate_with_sempahore:
Skills/nemo_skills/inference/generate.py
Lines 592 to 600 in fcb6874
I fear that that implementation might not be totally right for you though, since you make multiple LLM calls per sample. You would end up round-robining across all samples, rather than running some samples to completion before starting the next one... so It's probably better to roll your own concurrency control? I would take a similar approach to generate_with_sempahore, but probably use the semaphore in process_single_datapoint--this will be a little quality of life improvement in case there are any failures during generation--you will have more completed generations saved.
There was a problem hiding this comment.
Actually--after seeing that pass_at_N is independent, you probably want to do it around _single_data_point_generate.
|
|
||
| new_results_dict = {"success": False} | ||
| for i in range(N): | ||
| results_dict = await self._single_data_point_generate(data_point, data) |
There was a problem hiding this comment.
these are independent, but they are run serially. Use a loop like this to run them concurrently:
Skills/nemo_skills/inference/generate.py
Lines 667 to 674 in fcb6874
| } | ||
| ) | ||
|
|
||
| async def _single_data_point_generate(self, data_point, data): |
There was a problem hiding this comment.
per this comment, wrap this whole method in with self.semaphore: https://github.com/NVIDIA-NeMo/Skills/pull/1078/files#r2593972374
You can make a helper to not have to indent this whole block if you want.
|
|
||
| def _transform_for_nemotron_refinement(self, proof_attempt: str, error_message: str) -> list[dict]: | ||
| """Transform multi-turn refinement into single-turn nemotron-style prompt.""" | ||
| return self.nemotron_refine_prompt.fill( |
There was a problem hiding this comment.
consider if there are any of the config args you want to pass through here (or for the other prompts, e.g., turning on thinking in chat template):
Skills/nemo_skills/inference/generate.py
Lines 511 to 516 in fcb6874
I realize it could get tricky here since there are multiple prompts--might have to give each of them separate args.
Kipok
left a comment
There was a problem hiding this comment.
If it takes too long to fix issues, we can also merge an intermediate version early next week and refine later as long as it doesn't break any other functionality. But this is a little time sensitive to get in
nemo_skills/inference/factory.py
Outdated
| GenerationType.generate: "nemo_skills.inference.generate", | ||
| GenerationType.math_judge: "nemo_skills.inference.llm_math_judge", | ||
| GenerationType.check_contamination: "nemo_skills.inference.check_contamination", | ||
| GenerationType.prover: "nemo_skills.inference.prover", |
There was a problem hiding this comment.
yeah, we don't need it in main script, let's just use as an explicit option when calling eval
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
gwarmstrong
left a comment
There was a problem hiding this comment.
Have a few comments unresolved about concurrency behavior. Don't need to be addressed now. Have an issue to track in #1083
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary by CodeRabbit
Release Notes
generatecommand to orchestrate and distribute proof generation tasks✏️ Tip: You can customize this high-level summary in your review settings.