Conversation
📝 WalkthroughWalkthroughThis change reorganizes tools into a proper Python package structure with UV, updates documentation for the invoke command with optional flags, refactors error handling in data sync modules, adds a new equity-details sync module, updates artifact and training paths, and introduces comprehensive test coverage for multiple tool modules. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryRestructured
All changes are mechanical refactoring with no logic modifications. The restructuring follows uv package conventions and maintains backward compatibility through proper Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| maskfile.md | Updated command invocations from direct script paths to module (-m) form, aligned with uv package structure |
| pyproject.toml | Added polars dependency, sagemaker version constraint, tools/tests to testpaths, and tools/** to coverage omit |
| tools/pyproject.toml | Configured as uv package with src layout, removed massive dependency (no longer used) |
| tools/src/tools/prepare_training_data.py | Moved to src layout, removed module docstring |
Last reviewed commit: d21ad98
There was a problem hiding this comment.
Pull request overview
This PR restructures the tools/ Python package to a src/ layout for uv packaging, restores/adjusts tooling tests, and expands the datamanager service to support syncing equity “details” (sector/industry) into S3 via a new POST endpoint.
Changes:
- Restructure
tools/intotools/src/tools(uvsrclayout) and update invocation paths (python -m tools...), plus vulture whitelist path updates. - Add/restore Python unit tests for tools and adjust repo-level pytest + coverage configuration.
- Add
POST /equity-detailssync flow indatamanager(fetch Massive tickers w/ pagination, filter, store CSV in S3) and update infra/workflow knobs (logging/env).
Reviewed changes
Copilot reviewed 22 out of 28 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/tests/test_sync_equity_details_data.py | Adds unit test coverage for the equity-details sync client script. |
| tools/tests/test_sync_equity_bars_data.py | Adds unit tests for date parsing and equity-bars sync request formatting. |
| tools/tests/test_run_training_job.py | Adds unit test for SageMaker estimator .fit() invocation (module mocking). |
| tools/tests/test_prepare_training_data.py | Adds unit tests for S3 reads/writes and Polars transforms in training data prep. |
| tools/tests/test_download_model_artifacts.py | Adds unit test coverage for artifact download/extract logic. |
| tools/sync_equity_categories.py | Removes legacy standalone categories sync script (now handled in service). |
| tools/src/tools/vulture_whitelist.py | Adds vulture whitelist for type-only imports under new src layout. |
| tools/src/tools/sync_equity_details_data.py | New CLI module to trigger datamanager /equity-details sync. |
| tools/src/tools/sync_equity_bars_data.py | Adjusts error handling/messages for date-range parsing; retains bars backfill logic. |
| tools/src/tools/run_training_job.py | New CLI module for launching SageMaker training. |
| tools/src/tools/prepare_training_data.py | New CLI module to build consolidated training parquet and write to S3. |
| tools/src/tools/download_model_artifacts.py | New CLI module to list/download/extract model artifacts from S3. |
| tools/src/tools/init.py | Establishes the tools package under src/. |
| tools/pyproject.toml | Marks tools as a uv package with src layout. |
| pyproject.toml | Updates pytest discovery paths and coverage omit patterns; constrains SageMaker version. |
| maskfile.md | Updates commands to python -m tools..., adds datamanager invoke flags and vulture path. |
| infrastructure/main.py | Adds RUST_LOG env to deployed service configuration. |
| applications/datamanager/tests/test_storage.rs | Adds test coverage for writing equity details CSV to S3; expands bars query test range. |
| applications/datamanager/tests/test_state_and_health.rs | Updates env behavior expectation (required vars now panic). |
| applications/datamanager/tests/test_handlers.rs | Adds extensive handler tests for equity-details sync and pagination/failure cases. |
| applications/datamanager/src/storage.rs | Adds write_equity_details_dataframe_to_s3 and centralizes the details key constant. |
| applications/datamanager/src/state.rs | Makes MASSIVE + S3 env vars required (panic if missing). |
| applications/datamanager/src/startup.rs | Adjusts tracing default filter behavior; improves env-guard test utilities. |
| applications/datamanager/src/router.rs | Adds POST route for /equity-details sync handler. |
| applications/datamanager/src/main.rs | Ensures tests set additional env vars (SENTRY_DSN/ENVIRONMENT/RUST_LOG). |
| applications/datamanager/src/equity_details.rs | Implements equity-details sync via Massive API with pagination + S3 upload. |
| .github/workflows/sync_data.yaml | Updates scheduled sync command to pass --data-type equity-bars. |
Comments suppressed due to low confidence (2)
tools/src/tools/sync_equity_bars_data.py:16
RuntimeErroris raised fromJSONDecodeErrorwithout any message, which yields an empty/opaque error (and you removed the prior logger context). Include a descriptive message (e.g., invalid JSON + the expected shape) when rethrowing so callers/CLI logs are useful.
tools/src/tools/sync_equity_bars_data.py:30RuntimeErroris raised fromValueErrorhere without a message, which obscures whether the issue was start_date, end_date, or format mismatch. Raise with a message that mentions the required format (YYYY-MM-DD) and the received values to make CLI failures actionable.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@applications/datamanager/src/equity_details.rs`:
- Around line 219-243: This loop in equity_details.rs duplicates normalization
(uppercasing and filling missing with "NOT AVAILABLE") that's also done in
create_equity_details_dataframe in data.rs; extract a shared normalization
helper and constant (e.g., pub const NOT_AVAILABLE: &str and a function like
normalize_optional_uppercase(option: Option<String>) -> String) into a common
module, replace the match logic in the for loop (and the corresponding logic in
create_equity_details_dataframe) to call that helper, and ensure EQUITY_TYPES
check still uses ticker_type.unwrap_or_default() before passing through or
alongside the helper so behavior is preserved.
- Around line 277-284: The error response currently returns internal S3 SDK
details by interpolating err into the response body; update the Err branch that
contains the warn!("Failed to upload to S3: {}", err) and the
StatusCode::BAD_GATEWAY response so the HTTP response body uses a generic
message (e.g., "DataFrame created but S3 upload failed") without including err,
while keeping the warn! log with err for diagnostics; reference the existing
warn! call and the StatusCode::BAD_GATEWAY response construction to locate and
change only the response string interpolation.
In `@pyproject.toml`:
- Line 37: The comment next to the testpaths entry "tools/tests" is misleading:
it says tests are being excluded but the "testpaths" list (containing
"tools/tests") actually includes them; update the comment adjacent to
"tools/tests" in the testpaths array to accurately state that these tests are
being included/discovered, or if the intent was to exclude them instead, move
"tools/tests" from testpaths into the coverage omit pattern (the "tools/*" entry
in the omit list) and adjust the comment accordingly; reference the testpaths
array and the omit list when making the change.
In `@tools/pyproject.toml`:
- Line 6: Remove the duplicate boto3 declaration and consolidate it at the
workspace root using the higher constraint: delete the dependency entry
"boto3>=1.35.0" from the equitypricemodel pyproject and remove "boto3>=1.40.74"
from tools/pyproject.toml, then add or update the root pyproject.toml
dependencies to include "boto3>=1.40.74" (keeping "massive>=2.0.2" where
relevant) so the workspace uses the single, higher-version boto3 spec.
In `@tools/src/tools/sync_equity_bars_data.py`:
- Around line 19-20: The RuntimeError raised in the JSONDecodeError and
ValueError handlers currently use bare "raise RuntimeError from e" so the
RuntimeError has no message and caller logs show error=""; update those raises
to include a descriptive message (e.g., "Invalid date range JSON" or "Date range
values invalid, must include 'start_date' and 'end_date'") so the original
exception text appears in caller.logger.exception; locate the two bare raises in
the date-range parsing routine (the JSONDecodeError except block and the date
parsing/ValueError except block) and change them to raise
RuntimeError("<descriptive message>") from e, matching the pattern used where
message = "Date range JSON must contain 'start_date' and 'end_date' fields".
In `@tools/src/tools/sync_equity_details_data.py`:
- Line 20: Remove the unnecessary "# noqa: PLR2004" directives from the
conditional checks that compare status_code (the lines containing "if
status_code >= 400:"), and any other identical occurrences (e.g., the second
occurrence around line 32) so the file no longer contains unused noqa comments;
simply delete the " # noqa: PLR2004" suffixes from those "if status_code >=
400:" statements in sync_equity_details_data.py.
- Around line 41-51: Remove the unnecessary arguments dict and loop: instead of
building arguments = {"base_url": base_url} and iterating for argument in
[base_url], perform a direct check on base_url (if not base_url) and call
logger.error("Missing required positional argument(s)", base_url=base_url)
followed by sys.exit(1); update the check around the base_url variable where
it's currently validated to simplify the logic in sync_equity_details_data.py.
In `@tools/tests/test_download_model_artifacts.py`:
- Line 45: The test currently uses mock_tar.extractall.assert_called_once(),
which doesn't verify the path or filter arguments; update the assertion in
tools/tests/test_download_model_artifacts.py to assert that extractall was
called with the expected arguments (i.e.,
assert_called_once_with(path=destination_directory, filter="data") or the
equivalent kwarg form) so the test explicitly checks the destination_directory
and the "data" filter passed to mock_tar.extractall.
- Line 12: Update the mocked S3 key in tests to match the Prefix used by the
implementation and verify extractall args: change the test key from
"artifacts/run_20250601/output/model.tar.gz" to
"artifacts/equitypricemodel/run_20250601/output/model.tar.gz" (since
list_objects_v2 is called with Prefix="artifacts/equitypricemodel"), and adjust
the parsing in the implementation that currently uses parts[1] to extract the
run/version to use parts[2] (refer to the code that splits S3 key and constructs
target_path). Also replace the extractall assertion to
assert_called_once_with(path=destination_directory, filter="data") so the test
verifies the exact arguments.
In `@tools/tests/test_prepare_training_data.py`:
- Around line 98-115: The test test_read_equity_bars_from_s3_returns_dataframe
only checks that mock_s3_client.get_object was called once but not the
arguments; update the assertion to verify Bucket and Key just like the category
test: replace mock_s3_client.get_object.assert_called_once() with
mock_s3_client.get_object.assert_called_once_with(Bucket="test-bucket",
Key=f"equity_bars/{_TARGET_DATE}.parquet") (or the exact key format used by
read_equity_bars_from_s3) so the S3 key construction is validated; reference:
test_read_equity_bars_from_s3_returns_dataframe, read_equity_bars_from_s3, and
mock_s3_client.get_object.
In `@tools/tests/test_run_training_job.py`:
- Line 12: Remove the unused noqa directive on the import of run_training_job:
in the import statement that reads "from tools.run_training_job import
run_training_job # noqa: E402" delete the " # noqa: E402" suffix so the line
becomes a normal import; this removes the unused directive causing the RUF100
warning while leaving the run_training_job import intact.
In `@tools/tests/test_sync_equity_bars_data.py`:
- Around line 19-83: Add three new unit tests to cover the error paths of
validate_and_parse_dates: implement
test_validate_and_parse_dates_raises_on_invalid_json which passes a malformed
JSON string and asserts validate_and_parse_dates raises (ValueError) with a
helpful message, test_validate_and_parse_dates_raises_on_missing_fields which
passes JSON missing start_date or end_date and asserts a raise with a message
indicating missing fields, and
test_validate_and_parse_dates_raises_when_start_exceeds_end_after_clamping which
supplies dates that clamp so start_date > end_date and asserts a raise;
reference the validate_and_parse_dates function in these tests and assert the
raised exception type and that the exception message contains the expected hint
text so regressions in the new error branches are detected.
- Line 60: Remove the unused noqa directive from the assertion so the line reads
simply "assert status_code == 200"; specifically edit the assertion in
tools/tests/test_sync_equity_bars_data.py (the failing assertion "assert
status_code == 200 # noqa: PLR2004") and delete the trailing "# noqa: PLR2004"
token.
In `@tools/tests/test_sync_equity_details_data.py`:
- Around line 25-33: Add a test that covers the error branch of
sync_equity_details_data by mocking requests.post to return a response with
status_code >= 400 (e.g., 500) and text (e.g., "internal server error"), then
assert that calling sync_equity_details_data(base_url="http://localhost:8080")
raises a RuntimeError with the expected message using pytest.raises; use
MagicMock for the response and patch
"tools.sync_equity_details_data.requests.post" to return it so the test (e.g.,
test_sync_equity_details_data_raises_on_error_status) verifies the error path.
- Line 18: Remove the unnecessary noqa directive on the assertion line: edit the
test assertion `assert status_code == 200 # noqa: PLR2004` and delete the `#
noqa: PLR2004` comment so the line is simply `assert status_code == 200`; this
removes a directive for a rule (PLR2004) that isn't enabled (Ruff RUF100) and
avoids dead noqa comments in `test_sync_equity_details_data.py`.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pyproject.toml (1)
6-16:⚠️ Potential issue | 🔴 CriticalMove
sagemakertotools/pyproject.tomland remove unuseduvicorndependency from root.Per the coding guideline to scan and remove unused dependencies and move duplicates to workspace root:
sagemaker>=2.256.0,<3: Used only intools/src/tools/run_training_job.py. This dependency should live exclusively intools/pyproject.tomlrather than the workspace root, since it is not shared across multiple members.
uvicorn>=0.35.0: No imports found anywhere in the codebase. This dependency is unused and should be removed.
structlog,tinygrad,numpy: These are duplicated—they appear in both rootpyproject.tomlandapplications/equitypricemodel/pyproject.toml. Remove these from the equitypricemodel member to consolidate them at the workspace root as per the guideline.The
sagemaker>=2.256.0,<3upper bound constraint is a good defensive practice, but the dependency belongs in the member package, not at the workspace root.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 6 - 16, Remove the unused and misplaced dependencies from the workspace root pyproject.toml and consolidate duplicates: delete uvicorn>=0.35.0 from the root dependencies list; remove sagemaker>=2.256.0,<3 from the root and add that exact requirement to the tools package pyproject.toml where run_training_job.py uses it; and remove structlog>=25.5.0, tinygrad>=0.10.3, and numpy>=1.26.4 from the applications/equitypricemodel member so those packages remain only at the workspace root (keep their versions as currently specified in root).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Line 56: The coverage omit pattern "tools/*" is too broad and excludes source
files under tools/; update the omit list in pyproject.toml to target only tests
(e.g., replace "tools/*" with "tools/tests/*" or a pattern like
"tools/**/test_*.py") so only test files are omitted; locate the omit entry
(omit = ["*/__init__.py", "**/test_*.py", "tools/*"]) and change the "tools/*"
element to a more specific tests-only pattern.
---
Outside diff comments:
In `@pyproject.toml`:
- Around line 6-16: Remove the unused and misplaced dependencies from the
workspace root pyproject.toml and consolidate duplicates: delete uvicorn>=0.35.0
from the root dependencies list; remove sagemaker>=2.256.0,<3 from the root and
add that exact requirement to the tools package pyproject.toml where
run_training_job.py uses it; and remove structlog>=25.5.0, tinygrad>=0.10.3, and
numpy>=1.26.4 from the applications/equitypricemodel member so those packages
remain only at the workspace root (keep their versions as currently specified in
root).
---
Duplicate comments:
In `@pyproject.toml`:
- Around line 34-38: The inline comment and exclusion are correct — leave the
testpaths array as-is and keep the "tools/tests" entry commented out so it
remains excluded from test discovery; specifically ensure the testpaths list
(the testpaths variable) still contains "applications/*/tests",
"libraries/python/tests", and the commented-out "tools/tests" line with its
explanatory comment.
The base branch was changed.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
pyproject.toml (1)
53-53:"tools/**"still omits all tools source files from coverage — original concern unresolved.Changing
"tools/*"→"tools/**"makes the exclusion explicitly recursive, but it continues to silence coverage for source modules undertools/src/in addition to tests. The earlier suggestion was to narrow this to"tools/tests/*"so that source coverage is still measured. If the intent genuinely is to exclude all tools code temporarily, the pattern is functional but source coverage gaps will be invisible.🔧 Restore source-file coverage (original suggestion)
-omit = ["*/__init__.py", "**/test_*.py", "tools/**"] +omit = ["*/__init__.py", "**/test_*.py", "tools/tests/*"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 53, The omit pattern in pyproject.toml currently uses "tools/**" which recursively excludes all files under tools (including source like tools/src/) from coverage; update the omit list entry for the coverage omit variable to "tools/tests/*" (or another tests-only pattern) so only test files under tools are excluded and source modules (e.g., tools/src/*) are still measured by coverage; if the intent truly is to hide all tools code, keep "tools/**" but add a clear comment explaining that decision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Line 16: The dependency "polars" in pyproject.toml currently has no version
specifier; change it to include a minimum version (e.g., "polars>=1.38.1") to
prevent resolving to incompatible newer minors, then regenerate the lockfile
(e.g., poetry lock / pip-compile) and verify CI uses the updated lockfile;
update any dependency list that references "polars" to the same spec if present.
In `@tools/pyproject.toml`:
- Around line 8-10: The [tool.uv] table contains an undocumented key src =
["src"] which is likely ignored; remove the src = ["src"] line from the
[tool.uv] block and either leave only package = true (since src/ is the default)
or, if you intentionally want to override defaults, move the src declaration
into the [tool.uv.build-backend] table using the documented keys (e.g.,
module-root/module-name) instead of src; update the pyproject.toml so [tool.uv]
only contains valid keys like package = true or relocate/rename the source-root
setting under [tool.uv.build-backend].
In `@tools/src/tools/download_model_artifacts.py`:
- Around line 46-53: The current logic treats any S3 key with ≥3 path segments
as an artifact; tighten validation by requiring the object key to match the
expected artifact suffix (e.g., endswith "output/model.tar.gz") or a larger
minimum segment count before calling options.add and appending to
file_objects_with_timestamps. Update the block around file_object_name_parts,
options.add, and file_objects_with_timestamps to first verify file_object_name
(or file_object_name_parts) has the expected suffix (or at least the expected
number of segments) and skip/continue for keys that don't match so only real
artifact runs are selected.
---
Duplicate comments:
In `@pyproject.toml`:
- Line 53: The omit pattern in pyproject.toml currently uses "tools/**" which
recursively excludes all files under tools (including source like tools/src/)
from coverage; update the omit list entry for the coverage omit variable to
"tools/tests/*" (or another tests-only pattern) so only test files under tools
are excluded and source modules (e.g., tools/src/*) are still measured by
coverage; if the intent truly is to hide all tools code, keep "tools/**" but add
a clear comment explaining that decision.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
pyproject.tomltools/pyproject.tomltools/src/tools/download_model_artifacts.pytools/src/tools/prepare_training_data.pytools/tests/test_download_model_artifacts.py
💤 Files with no reviewable changes (1)
- tools/src/tools/prepare_training_data.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 16 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Dismissing and re-requesting because re-requesting doesn't seem to be triggering.
Overview
Changes
tools/to includesrc/as to align withuvspecificationsmaskfile.mdcommands withuvpackage referencesContext
Another tiny bit of cleanup. I bet most of the comments from the bots can be ignored because this is basically just a move.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores