Fix inter-service communication patterns#585
Conversation
WalkthroughThe changes introduce a new Flyte workflow for training a prediction engine model, update several Python modules for consistency and error handling, add new configuration options and dependencies, and make minor refactoring adjustments. The workflow includes tasks for fetching data, training a model, and storing the resulting artifact, utilizing updated data handling and model interfaces. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Flyte
participant fetch_data
participant train_model
participant store_model
User->>Flyte: Start training_workflow(start_date, end_date, epochs)
Flyte->>fetch_data: fetch_data(start_date, end_date)
fetch_data-->>Flyte: returns list of dicts (data)
Flyte->>train_model: train_model(data, epochs)
train_model-->>Flyte: returns model bytes
Flyte->>store_model: store_model(model bytes)
store_model-->>Flyte: returns artifact path
Flyte-->>User: returns artifact path
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Graphite Automations"Assign author to pull request" took an action on this PR • (06/03/25)1 assignee was added to this PR based on John Forstmeier's automation. |
There was a problem hiding this comment.
Pull Request Overview
This PR addresses inter-service communication patterns by modifying the data retrieval and processing flow in the Position Manager client and updating configuration settings.
- Changed the API call from POST to GET with updated request parameters and timeout for fetching equity bars.
- Switched from JSON parsing to using Apache Arrow for processing binary data.
- Updated configuration settings in .claude/settings.local.json.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| application/positionmanager/src/positionmanager/clients.py | Updated client communication logic and data transformation using Arrow. |
| .claude/settings.local.json | Added configuration for disabling all project MCP servers. |
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
.flox/env/manifest.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
.claude/settings.local.json(1 hunks)application/positionmanager/src/positionmanager/clients.py(2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
application/positionmanager/src/positionmanager/clients.py
[error] 4-4: Unable to import 'pyarrow'
(E0401)
[refactor] 85-90: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[warning] 88-90: Raising too general exception: Exception
(W0719)
🔇 Additional comments (2)
application/positionmanager/src/positionmanager/clients.py (2)
75-81:Details
✅ Verification successful
Breaking change: Verify API compatibility.
The request method has been changed from POST to GET with URL parameters. This is a significant breaking change that affects the inter-service communication contract.
The timeout increase to 30 seconds is reasonable for processing larger Arrow data streams.
🏁 Script executed:
#!/bin/bash # Description: Verify the datamanager service supports the new GET endpoint with parameters # Expected: Find the endpoint definition and verify it accepts GET requests # Search for the equity-bars endpoint definition rg -A 10 -B 5 "equity-bars" --type py # Look for any existing tests that might be affected by this change rg -A 5 -B 5 "equity-bars|get_data" --type py | grep -i testLength of output: 11471
GET /equity-bars endpoint available – no breaking change
I confirmed that the datamanager service now defines a GET handler for/equity-barswhich acceptsstart_dateandend_dateas query parameters. Positionmanager’s switch from POST to GET aligns with this and won’t break the inter-service contract. The existing POST route remains intact for the summary use case, so no further changes are required.
92-106:Details
✅ Verification successful
Verify the Arrow data processing logic.
The new Apache Arrow IPC stream processing looks functionally correct, but the column mapping changes are significant:
"t"→"date"(timestamp to date conversion)- Pivot on
"T"instead of"ticker"- Values from
"c"instead of"close_price"Please ensure the datamanager service now returns data with the new column schema (
t,T,c) instead of the previous schema (timestamp,ticker,close_price).
🏁 Script executed:
#!/bin/bash # Description: Verify the new column names match the data schema from datamanager # Expected: Find the data schema definition and verify column mappings # Search for data schema or model definitions rg -A 10 -B 5 '"t"|"T"|"c"' --type py # Look for any data transformation or schema definitions rg -A 5 -B 5 "timestamp|ticker|close_price" --type py # Check for any existing tests that validate data structure ast-grep --pattern 'def test_$_($$$) { $$$ get_data($$$) $$$ }'Length of output: 65884
🏁 Script executed:
#!/bin/bash # Inspect the datamanager HTTP endpoint and source of `bars` sed -n '1,200p' application/datamanager/src/datamanager/main.pyLength of output: 5854
No action needed: Arrow field mappings are correct
I’ve confirmed that the datamanager service writes and exposes the same JSON payload fields used by the positionmanager client:
t(timestamp in ms) is returned and cast to adateT(ticker symbol) becomes the pivot columnc(close price) supplies the pivot valuesThe pivot-on-
T/values-from-clogic in application/positionmanager/src/positionmanager/clients.py aligns with the Parquet schema produced in application/datamanager/src/datamanager/main.py. All mappings are consistent.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…nengine_into_the_workflows Integrate predictionengine into the workflows
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
application/predictionengine/src/predictionengine/gated_residual_network.py (1)
46-46: Same cast string literal concern as previous file.This change should be verified for consistency with the intended behavior across the codebase.
application/positionmanager/src/positionmanager/clients.py (1)
4-4:⚠️ Potential issuePyarrow dependency still not declared.
This duplicates a previous review comment - pyarrow is imported but not declared in the positionmanager's dependencies. This will cause import errors in production.
🧹 Nitpick comments (7)
application/predictionengine/tests/test_loss_function.py (2)
18-18: Simplify shape assertion and remove noqa
You can collapse the dual comparison into a membership check and drop the linter suppression:- assert loss.shape == () or loss.shape == (1,) # noqa: PLR1714 + assert loss.shape in ((), (1,))🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 18-18: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.
(R1714)
29-29: Simplify shape assertion in multiple-samples test
As above, use a single membership test to make it clearer and eliminate the# noqahack:- assert loss.shape == () or loss.shape == (1,) # noqa: PLR1714 + assert loss.shape in ((), (1,))🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 29-29: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.
(R1714)
application/predictionengine/src/predictionengine/main.py (1)
48-48: Question: Using SEQUENCE_LENGTH for HTTP timeout.Using
SEQUENCE_LENGTH(30) as the HTTP timeout seems semantically incorrect. The sequence length is a model parameter, while timeout is an operational parameter. Consider using a dedicatedHTTP_TIMEOUTconstant instead.+HTTP_TIMEOUT = 30 + SEQUENCE_LENGTH = 30 - response = requests.get(url, params=parameters, timeout=SEQUENCE_LENGTH) + response = requests.get(url, params=parameters, timeout=HTTP_TIMEOUT)workflows/train_predctionengine.py (3)
43-65: Document the column mapping for clarityThe column renaming from short names to descriptive names is good for readability, but consider adding a comment to document the mapping for future maintainers.
+ # Map Polygon.io column names to descriptive names + # t: timestamp, o: open, h: high, l: low, c: close, v: volume, + # vw: volume weighted average price, T: ticker symbol data = data.with_columns(
81-101: Consider making hyperparameters configurableThe model training uses several hardcoded values that could be made configurable through task parameters or environment variables for better flexibility.
@task def train_model( data: list[dict[str, Any]], epochs: int = 100, + batch_size: int = 32, + sequence_length: int = 30, + hidden_size: int = 128, + attention_head_count: int = 4, + dropout_rate: float = 0.1, ) -> bytes:
135-138: Remove unnecessary type castsThe explicit type casts are redundant as Flyte handles type conversions automatically between tasks.
def training_workflow( start_date: datetime, end_date: datetime, epochs: int = 100, ) -> str: data = fetch_data(start_date=start_date, end_date=end_date) - model_bytes = train_model(data=cast("list[dict[str, Any]]", data), epochs=epochs) - artifact_path = store_model(model_bytes=cast("bytes", model_bytes)) - return cast("str", artifact_path) + model_bytes = train_model(data=data, epochs=epochs) + artifact_path = store_model(model_bytes=model_bytes) + return artifact_pathapplication/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (1)
156-158: Remove unnecessary noqa commentThe
# noqa: RET504comment is unnecessary here as the code is clear and readable.- return average_loss # noqa: RET504 + return average_loss
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
application/datamanager/src/datamanager/config.py(1 hunks)application/datamanager/src/datamanager/main.py(2 hunks)application/positionmanager/src/positionmanager/clients.py(4 hunks)application/positionmanager/src/positionmanager/main.py(1 hunks)application/predictionengine/src/predictionengine/dataset.py(8 hunks)application/predictionengine/src/predictionengine/gated_residual_network.py(2 hunks)application/predictionengine/src/predictionengine/long_short_term_memory.py(4 hunks)application/predictionengine/src/predictionengine/loss_function.py(2 hunks)application/predictionengine/src/predictionengine/main.py(7 hunks)application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py(6 hunks)application/predictionengine/src/predictionengine/models.py(1 hunks)application/predictionengine/src/predictionengine/multi_head_self_attention.py(4 hunks)application/predictionengine/src/predictionengine/post_processor.py(3 hunks)application/predictionengine/tests/test_dataset.py(2 hunks)application/predictionengine/tests/test_gated_residual_network.py(5 hunks)application/predictionengine/tests/test_long_short_term_memory.py(4 hunks)application/predictionengine/tests/test_loss_function.py(4 hunks)application/predictionengine/tests/test_multi_head_self_attention.py(4 hunks)application/predictionengine/tests/test_post_processor.py(2 hunks)application/predictionengine/tests/test_ticker_embedding.py(1 hunks)workflows/backfill_datamanager.py(1 hunks)workflows/prediction_model.py(0 hunks)workflows/pyproject.toml(1 hunks)workflows/train_predctionengine.py(1 hunks)
💤 Files with no reviewable changes (1)
- workflows/prediction_model.py
✅ Files skipped from review due to trivial changes (12)
- application/predictionengine/tests/test_ticker_embedding.py
- application/predictionengine/tests/test_dataset.py
- workflows/pyproject.toml
- application/predictionengine/tests/test_post_processor.py
- application/predictionengine/tests/test_gated_residual_network.py
- application/datamanager/src/datamanager/config.py
- application/predictionengine/tests/test_long_short_term_memory.py
- application/positionmanager/src/positionmanager/main.py
- application/predictionengine/src/predictionengine/models.py
- application/predictionengine/src/predictionengine/loss_function.py
- application/predictionengine/tests/test_multi_head_self_attention.py
- workflows/backfill_datamanager.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
application/datamanager/src/datamanager/main.py (1)
application/datamanager/src/datamanager/models.py (1)
BarsSummary(49-51)
application/predictionengine/src/predictionengine/main.py (3)
application/predictionengine/src/predictionengine/dataset.py (1)
DataSet(18-216)application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (2)
MiniatureTemporalFusionTransformer(22-188)load(167-172)application/predictionengine/src/predictionengine/models.py (1)
PredictionResponse(4-5)
workflows/train_predctionengine.py (2)
application/predictionengine/src/predictionengine/dataset.py (3)
DataSet(18-216)load_data(145-155)get_preprocessors(157-175)application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (2)
MiniatureTemporalFusionTransformer(22-188)train(110-141)
🪛 Pylint (3.3.7)
application/predictionengine/src/predictionengine/main.py
[error] 14-14: Attempted relative import beyond top-level package
(E0402)
[error] 15-15: Attempted relative import beyond top-level package
(E0402)
[error] 16-16: Attempted relative import beyond top-level package
(E0402)
[refactor] 95-95: Too many local variables (17/15)
(R0914)
application/positionmanager/src/positionmanager/clients.py
[error] 90-90: Instance of 'LookupDict' has no 'not_found' member
(E1101)
[error] 92-92: Instance of 'LookupDict' has no 'ok' member
(E1101)
application/predictionengine/src/predictionengine/gated_residual_network.py
[refactor] 7-7: Too many instance attributes (8/7)
(R0902)
[refactor] 7-7: Too few public methods (1/2)
(R0903)
application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py
[error] 13-13: Attempted relative import beyond top-level package
(E0402)
[error] 14-14: Attempted relative import beyond top-level package
(E0402)
[error] 15-15: Attempted relative import beyond top-level package
(E0402)
[error] 16-16: Attempted relative import beyond top-level package
(E0402)
[error] 17-17: Attempted relative import beyond top-level package
(E0402)
[error] 18-18: Attempted relative import beyond top-level package
(E0402)
[error] 19-19: Attempted relative import beyond top-level package
(E0402)
[refactor] 23-23: Too many arguments (12/5)
(R0913)
[refactor] 23-23: Too many positional arguments (12/5)
(R0917)
application/predictionengine/src/predictionengine/post_processor.py
[refactor] 9-9: Too few public methods (1/2)
(R0903)
application/predictionengine/tests/test_loss_function.py
[refactor] 18-18: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.
(R1714)
[refactor] 29-29: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.
(R1714)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Run Python tests
🔇 Additional comments (27)
application/predictionengine/tests/test_loss_function.py (1)
1-4: Approve import reordering
The imports are now grouped and ordered per PEP8 (stdlib, third-party, local). MovingTensorafterpytestimproves readability.application/predictionengine/src/predictionengine/multi_head_self_attention.py (5)
1-5: LGTM: Import reordering and modernization looks good.The import simplification and reordering improves code organization.
15-16: LGTM: Improved error handling pattern.Assigning the error message to a variable before raising the exception improves readability and debugging.
32-38: LGTM: Parameter renaming frominputtofeaturesis excellent.This change avoids shadowing the built-in
input()function and makes the parameter name more descriptive.
33-33: LGTM: Type annotation modernization is appropriate.Using lowercase
tupleinstead ofTuplealigns with Python 3.9+ best practices.
54-54:Details
❌ Incorrect review comment
Verify the cast string literal change doesn't break functionality.
The change from
cast(Tensor, ...)tocast("Tensor", ...)modifies behavior - the former provides runtime type checking while the latter is only for static analysis tools.
🏁 Script executed:
#!/bin/bash # Search for other cast usages to check consistency rg "cast\(" --type py -A 2 -B 1Length of output: 4580
No action needed:
castis a no-op at runtime
Thetyping.castfunction always returns its second argument without any runtime type checking, regardless of whether you pass a type object or a string literal. Static analyzers (mypy, pyright) support string literals as forward references under PEP 484/563, so this change has no impact on functionality or type checking.Likely an incorrect or invalid review comment.
application/predictionengine/src/predictionengine/gated_residual_network.py (3)
2-3: LGTM: Import formatting improvement.The blank line separation between typing and tinygrad imports improves readability.
12-12: LGTM: Modern union syntax is cleaner.Using
int | Noneinstead ofOptional[int]aligns with Python 3.10+ best practices.
33-45: LGTM: Consistent parameter renaming and usage.The renaming from
inputtofeaturesis applied consistently throughout the method.application/predictionengine/src/predictionengine/long_short_term_memory.py (5)
2-2: LGTM: Import reordering maintains logical grouping.Moving Tensor import after LSTMCell keeps related imports together.
17-17: LGTM: Type annotation modernization.Using lowercase
listinstead ofListfollows modern Python conventions.
24-26: LGTM: Consistent parameter renaming.The
inputtofeaturesrenaming is applied consistently and improves code clarity.Also applies to: 38-38
61-63: LGTM: Improved error handling pattern.Assigning the error message before raising maintains consistency with other files.
64-67:Details
✅ Verification successful
Verify the conditional logic and tensor stacking changes.
Two modifications need verification:
- Changing
eliftoiffor the single output case - this alters the logical flow structure- The
Tensor.stackargument unpacking change from*outputstooutputs[0], *outputs[1:]- ensure this doesn't affect tensor stacking behavior
🏁 Script executed:
#!/bin/bash # Check if Tensor.stack behavior differs between these two forms rg "Tensor\.stack" --type py -A 2 -B 1Length of output: 885
Conditional flow and
Tensor.stackusage are correctI’ve verified that:
- The
if ... elsestructure at lines 64–67 cleanly handles the single- vs. multi-output cases.- Calling
Tensor.stack(outputs[0], *outputs[1:], dim=1)is equivalent toTensor.stack(*outputs, dim=1)and matches the pattern used elsewhere (e.g. indataset.py).No changes required here.
application/predictionengine/src/predictionengine/post_processor.py (3)
1-6: LGTM: Import organization improvement.The blank line separation between different import groups and reordering improves code organization.
12-13: LGTM: Type annotation modernization.Using lowercase
dictandtupleinstead ofDictandTuplefollows modern Python conventions.Also applies to: 24-24
44-45: LGTM: Consistent error handling pattern.The error message assignment before raising maintains consistency with the modernization effort across the codebase.
application/datamanager/src/datamanager/main.py (1)
140-141: Excellent fix for variable shadowing.The renaming from
summary_datetorequest_summary_dateproperly avoids shadowing the parameter name and makes the code more readable by clearly distinguishing between the date object parameter and the formatted string.Also applies to: 181-181
application/positionmanager/src/positionmanager/clients.py (2)
79-82: LGTM: HTTP method change and timeout increase.The change from POST with JSON payload to GET with URL parameters is more RESTful for data retrieval operations. The timeout increase from 10 to 30 seconds accommodates the larger Apache Arrow response format.
Also applies to: 85-85
90-96: Improved error handling with proper status codes.The explicit handling of 404 status and use of
requests.codesconstants improves readability. The HTTPError exception provides better context than the generic Exception.🧰 Tools
🪛 Pylint (3.3.7)
[error] 90-90: Instance of 'LookupDict' has no 'not_found' member
(E1101)
[error] 92-92: Instance of 'LookupDict' has no 'ok' member
(E1101)
application/predictionengine/src/predictionengine/dataset.py (2)
1-2: Excellent type hint modernizations.The changes follow Python 3.9+ best practices:
- Using
collections.abc.Generatorinstead oftyping.Generator- Using built-in
dictandtupleinstead oftyping.Dictandtyping.Tuple- Proper handling of mutable default arguments with None check
These improvements enhance code clarity and follow modern Python conventions.
Also applies to: 24-24, 26-27, 32-32, 112-112, 124-124, 157-157, 177-177
139-140: Improved error message handling.The refactoring to assign error messages to variables before raising exceptions improves readability and follows good practices for error handling.
Also applies to: 159-160, 201-202
application/predictionengine/src/predictionengine/main.py (4)
18-18: Good introduction of SEQUENCE_LENGTH constant.Defining this constant improves maintainability and eliminates magic numbers throughout the codebase.
3-6: Excellent import modernizations.The changes follow Python 3.9+ best practices:
- Using
collections.abc.AsyncGeneratorinstead oftyping.AsyncGenerator- Adding proper imports for Path and UTC
- Clean import organization
Also applies to: 9-12
83-83: Good modernizations for file operations and timezone handling.
- Using
Path.exists()is more modern and type-safe thanos.path.exists()- Using
datetime.now(tz=UTC)ensures consistent timezone handlingAlso applies to: 99-99
94-94:Details
✅ Verification successful
Consider impact of removing response model annotation.
Removing the response model annotation from the route decorator may affect OpenAPI documentation generation. Verify that the API documentation still properly describes the response schema.
🏁 Script executed:
#!/bin/bash # Description: Check if response model is used elsewhere for OpenAPI documentation # and verify if the removal affects API documentation echo "=== Searching for PredictionResponse usage in OpenAPI/docs context ===" rg -A 5 -B 5 "PredictionResponse|response.*model" --type py echo -e "\n=== Searching for FastAPI response model patterns ===" ast-grep --pattern '@$_.post($$$, response_model=$_)' ast-grep --pattern '@$_.get($$$, response_model=$_)'Length of output: 3734
Maintain response schema via return type annotation
FastAPI will infer the Pydantic model from the function’s-> PredictionResponsereturn type, so you don’t need an explicitresponse_modelon the decorator. The OpenAPI docs will still include the correct response schema as long as the return annotation is present.application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (1)
74-77: Good addition of parameter accessor methodThe
get_parametersmethod provides a clean API for accessing model parameters, which improves encapsulation.
…_datamanager Add basic tests for datamanager
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
application/datamanager/tests/test_datamanager_models.py (2)
1-4: Consider standardizing on a single test framework.The file mixes
unittestandpytestframeworks. While functional, this creates inconsistency in testing patterns and assertions.Consider using only
pytestfor consistency:-import unittest from datetime import date import pytest from pydantic import ValidationErrorThen convert test classes to simple functions or use
pytestclass structure.
89-93: Consider validating the business logic for negative counts.The test accepts negative counts for
BarsSummary, which may not make business sense for counting bars/data points.Consider adding validation to the
BarsSummarymodel to ensure non-negative counts:def test_bars_summary_negative_count_validation(self) -> None: with pytest.raises(ValidationError, match="Count must be non-negative"): BarsSummary(date="2023-05-15", count=-1)This would require updating the model with appropriate validation.
application/datamanager/tests/test_datamanager_main.py (2)
19-36: Consider consolidating duplicate model tests.The
TestDataManagerModelsclass duplicates tests already covered intest_datamanager_models.py. For integration testing, focus on testing the models within the API context rather than duplicating unit tests.Consider replacing these with API-specific integration tests:
def test_summary_date_in_api_context(self) -> None: # Test SummaryDate when used in actual API endpoints response = client.post("/equity-bars", json={"date": "2023-01-01"}) # Assert the API processes the date correctly
69-89: Complex mocking setup requires careful maintenance.The database error test uses extensive mocking that could become brittle. Consider simplifying or documenting the mock setup clearly.
Consider extracting the mock setup into a helper method or fixture:
def _setup_database_error_mock(self, mock_duckdb: MagicMock): """Helper to setup database error scenario.""" mock_connection = MagicMock() mock_connection.execute.side_effect = IOException("Database error") mock_duckdb.connect.return_value = mock_connection mock_settings = MagicMock() mock_settings.gcp.bucket.name = "test-bucket" return mock_connection, mock_settingsThis improves readability and reusability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
application/datamanager/tests/test_datamanager_main.py(1 hunks)application/datamanager/tests/test_datamanager_models.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
application/datamanager/tests/test_datamanager_models.py (1)
application/datamanager/src/datamanager/models.py (2)
BarsSummary(49-51)SummaryDate(7-28)
application/datamanager/tests/test_datamanager_main.py (1)
application/datamanager/src/datamanager/models.py (2)
BarsSummary(49-51)SummaryDate(7-28)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Run Python tests
🔇 Additional comments (3)
application/datamanager/tests/test_datamanager_models.py (1)
14-44: Excellent test coverage and structure.The test suite provides comprehensive coverage of:
- Default initialization and explicit value setting
- String parsing in multiple formats
- Validation error handling for invalid inputs
- JSON serialization/deserialization
- Model integration scenarios
The tests properly verify both success and failure paths with appropriate assertions.
Also applies to: 47-73, 76-106, 109-127
application/datamanager/tests/test_datamanager_main.py (2)
14-16: Well-structured endpoint testing.The endpoint tests effectively cover:
- Missing required parameters (422 responses)
- Invalid data formats (422 responses)
- Different HTTP methods (GET, POST, DELETE)
- Proper status code assertions
This provides good coverage of the API's validation behavior.
Also applies to: 40-67
71-71:Details
✅ Verification successful
Verify the IOException import location.
The
IOExceptionis imported inside the test method fromduckdb. Ensure this import path is correct and stable.
🏁 Script executed:
#!/bin/bash # Description: Verify duckdb.IOException exists and is the correct exception type # Expected: Should show the exception class and its module python3 << 'EOF' try: import duckdb print(f"duckdb version: {duckdb.__version__}") print(f"IOException available: {hasattr(duckdb, 'IOException')}") if hasattr(duckdb, 'IOException'): print(f"IOException type: {type(duckdb.IOException)}") print(f"IOException MRO: {duckdb.IOException.__mro__}") except ImportError as e: print(f"Import error: {e}") except Exception as e: print(f"Other error: {e}") EOFLength of output: 48
🏁 Script executed:
#!/bin/bash # Search for duckdb mentions in the repository to verify dependency declarations grep -R "duckdb" -n .Length of output: 3588
🏁 Script executed:
#!/bin/bash # Print the relevant section of the test file to understand the context around IOException import sed -n '1,200p' application/datamanager/tests/test_datamanager_main.pyLength of output: 3599
Import Path Verified:
IOExceptionIs Correctly Imported fromduckdbThe
IOExceptionclass is exposed at the top level of theduckdbpackage, and you’re already declaringduckdb>=1.2.2in yourpyproject.toml. Both the application code (main.py, line 13) and the test (test_datamanager_main.py, line 71) importIOExceptionthe same way, so this import is stable and consistent. No changes needed.
fix linting
fix linting
Fix/linting
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
application/positionmanager/src/positionmanager/clients.py (1)
4-4: pyarrow dependency still needs proper declaration.The pyarrow import was flagged in previous reviews as missing from the module's dependencies. This issue remains unresolved and could cause import errors in production.
🧹 Nitpick comments (2)
application/predictionengine/tests/test_dataset.py (1)
16-19: Inconsistent testing patterns within the file.While the simplification from NamedTuple to direct assertions is functionally correct, it creates inconsistency with
test_dataset_batches()(lines 108-116) which still uses theExpectedNamedTuple pattern. Consider applying the same simplification approach consistently across all test methods for better maintainability.For consistency, you could either:
- Apply the same direct assertion pattern to
test_dataset_batches()- Or revert to using constants/NamedTuple if you prefer more descriptive test data
# Option 1: Apply direct assertions to test_dataset_batches - class Expected(NamedTuple): - batch_size: int = 1 - sequence_length: int = 2 - sample_count: int = 3 - observations: int = 2 - features: int = 6 - target: int = 1 - - expected = Expected() - batch_count = 0 for tickers, features, targets in dataset.batches(): batch_count += 1 - assert tickers.shape[0] == expected.batch_size + assert tickers.shape[0] == 1 # noqa: PLR2004 assert features.shape == ( - expected.batch_size, - expected.sequence_length, - expected.features, + 1, # noqa: PLR2004 + 2, # noqa: PLR2004 + 6, # noqa: PLR2004 ) - assert targets.shape == (expected.batch_size, expected.target) + assert targets.shape == (1, 1) # noqa: PLR2004pyproject.toml (1)
5-5: Consider the implications of exact Python version pinning.Exact version pinning (
==3.12.10) provides reproducibility but prevents automatic security and bug fixes from newer patch versions. For a production application, consider whether this level of strictness is necessary or if a range like>=3.12.10,<3.13would be more appropriate.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
application/datamanager/Dockerfile(1 hunks)application/datamanager/pyproject.toml(1 hunks)application/positionmanager/pyproject.toml(2 hunks)application/positionmanager/src/positionmanager/clients.py(4 hunks)application/predictionengine/Dockerfile(1 hunks)application/predictionengine/pyproject.toml(1 hunks)application/predictionengine/src/predictionengine/main.py(6 hunks)application/predictionengine/src/predictionengine/post_processor.py(1 hunks)application/predictionengine/tests/test_dataset.py(1 hunks)application/predictionengine/tests/test_long_short_term_memory.py(1 hunks)application/predictionengine/tests/test_multi_head_self_attention.py(1 hunks)infrastructure/monitoring.py(2 hunks)infrastructure/pyproject.toml(1 hunks)linter.nu(1 hunks)pyproject.toml(1 hunks)workflows/prediction_model.py(0 hunks)workflows/pyproject.toml(1 hunks)
💤 Files with no reviewable changes (1)
- workflows/prediction_model.py
✅ Files skipped from review due to trivial changes (7)
- application/predictionengine/src/predictionengine/post_processor.py
- application/datamanager/pyproject.toml
- application/predictionengine/Dockerfile
- application/predictionengine/tests/test_multi_head_self_attention.py
- application/predictionengine/tests/test_long_short_term_memory.py
- infrastructure/monitoring.py
- application/datamanager/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (3)
- application/positionmanager/pyproject.toml
- workflows/pyproject.toml
- application/predictionengine/src/predictionengine/main.py
🧰 Additional context used
🪛 Pylint (3.3.7)
application/positionmanager/src/positionmanager/clients.py
[error] 92-92: Instance of 'LookupDict' has no 'not_found' member
(E1101)
[error] 94-94: Instance of 'LookupDict' has no 'ok' member
(E1101)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Run Python tests
🔇 Additional comments (7)
infrastructure/pyproject.toml (1)
4-4:Details
❓ Verification inconclusive
Verify compatibility with the Python version downgrade.
The change from
>=3.13to==3.12.10represents a significant downgrade. Ensure that all infrastructure code and dependencies are compatible with Python 3.12.10, particularly any features that may have been designed for Python 3.13.
🏁 Script executed:
#!/bin/bash # Description: Check for Python 3.13+ specific features in infrastructure code # Expected: No usage of Python 3.13+ exclusive features # Search for potential Python 3.13+ features rg -A 3 "match.*case" infrastructure/ rg -A 3 "typing\.override" infrastructure/ rg -A 3 "pathlib\.Path\.walk" infrastructure/ rg -A 3 "itertools\.batched" infrastructure/Length of output: 179
🏁 Script executed:
#!/bin/bash # Display the contents of the infrastructure pyproject.toml for dependency review sed -n '1,200p' infrastructure/pyproject.tomlLength of output: 284
Verify compatibility with Python 3.12.10
We’ve scanned theinfrastructure/code for Python 3.13+-only features (enhancedmatch…case,typing.Override,pathlib.Path.walk,itertools.batched) and found no usages. Before approving the pin from>=3.13→==3.12.10, please confirm:
- All declared dependencies in
infrastructure/pyproject.tomlsupport Python 3.12.10:
pulumi>=3.169.0pulumi-gcp>=8.30.1pulumi-docker>=3.0.0pulumi-docker-build>=0.0.12loguru>=0.7.3- Your CI/CD and deployment pipelines have been validated under Python 3.12.10.
linter.nu (2)
4-8: LGTM - Improved file discovery pattern.The change from
fd/findto thelspipeline with filtering is more idiomatic for nushell and should be more reliable.
13-13: Good improvement to version parsing.Using
split row ":" | get 1 | str trimis more explicit and robust than regex replacement for extracting the Python version from Docker FROM statements.application/positionmanager/src/positionmanager/clients.py (4)
22-23: Consistent error message pattern applied correctly.The changes to use intermediate
messagevariables for error strings improve consistency and readability across the codebase.Also applies to: 34-35, 76-77, 89-90
81-84: HTTP method change from POST to GET with appropriate timeout increase.The shift from POST with JSON body to GET with URL parameters is appropriate for data retrieval. The timeout increase from 10 to 30 seconds accommodates the potentially larger Apache Arrow IPC responses.
Also applies to: 87-87
100-104: Apache Arrow IPC stream processing implementation looks correct.The implementation properly creates a buffer from response content, reads it as an IPC stream, and converts to Polars DataFrame. This aligns with the datamanager service's Arrow stream output.
110-110:Details
✅ Verification successful
Pivot operation updated to use new column names.
The pivot now uses "T" for ticker symbols and "c" for close prices, which matches the abbreviated Arrow schema. Verify these column names are consistently used across the data pipeline.
Verify the column name consistency across the data pipeline:
🏁 Script executed:
#!/bin/bash # Description: Verify that "T" and "c" column names are used consistently # Expected: Find consistent usage in datamanager and other consumers echo "=== Searching for ticker column usage ===" rg -B 2 -A 2 '"T"' --type py echo -e "\n=== Searching for close price column usage ===" rg -B 2 -A 2 '"c"' --type py echo -e "\n=== Checking for any remaining old column names ===" rg -B 2 -A 2 '"ticker"|"close_price"' --type pyLength of output: 18370
Column names “T” and “c” correctly applied and aliased for downstream use
The abbreviated Arrow schema columns “T” and “c” are only used at the ingestion/pivot stage and immediately aliased back to
tickerandclose_price. All downstream code—including workflows, dataset logic, main/post-processor, and tests—consistently operates on the aliased names. No further changes are required.

Overview
Changes
Comments
This is what Claude caught. It also added the MCP thing so I'll just include that too.
Summary by CodeRabbit
New Features
Improvements
Dependency Updates
pyarrowandloguru.Style
Tests
Chores