Skip to content

Fix inter-service communication patterns#585

Merged
chrisaddy merged 24 commits intomasterfrom
06-02-fix_inter-service_communication_patterns
Jun 6, 2025
Merged

Fix inter-service communication patterns#585
chrisaddy merged 24 commits intomasterfrom
06-02-fix_inter-service_communication_patterns

Conversation

@forstmeier
Copy link
Copy Markdown
Collaborator

@forstmeier forstmeier commented Jun 3, 2025

Overview

Changes

  • update service request response parsing logic

Comments

This is what Claude caught. It also added the MCP thing so I'll just include that too.

Summary by CodeRabbit

  • New Features

    • Introduced a workflow to train prediction engine models, including tasks for data fetching, model training, and model storage.
    • Added a method to retrieve model parameters in the miniature temporal fusion transformer.
  • Improvements

    • Updated data fetching to use Apache Arrow IPC format and enhanced error handling.
    • Unified and clarified variable naming for better consistency.
    • Increased HTTP request timeout for data retrieval.
    • Added new configuration option to control project MCP servers.
  • Dependency Updates

    • Added new dependencies: pyarrow and loguru.
  • Style

    • Improved code readability by renaming variables and reorganizing imports.
  • Tests

    • Added comprehensive unit and integration tests for datamanager models and API endpoints.
  • Chores

    • Updated Python version requirements to 3.12.10 across projects and Dockerfiles.
    • Minor adjustments to type annotations and configuration files.
    • Removed obsolete environment variable declarations and deprecated workflows.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2025

Walkthrough

The changes introduce a new Flyte workflow for training a prediction engine model, update several Python modules for consistency and error handling, add new configuration options and dependencies, and make minor refactoring adjustments. The workflow includes tasks for fetching data, training a model, and storing the resulting artifact, utilizing updated data handling and model interfaces.

Changes

File(s) Change Summary
.claude/settings.local.json Added "enableAllProjectMcpServers": false as a top-level property.
application/positionmanager/pyproject.toml Added pyarrow>=20.0.0 to dependencies.
workflows/pyproject.toml Added loguru>=0.7.3 to dependencies.
application/positionmanager/src/positionmanager/clients.py Refactored DataClient.get_data to use HTTP GET with params, Arrow IPC parsing, and updated error/message handling.
application/positionmanager/src/positionmanager/main.py Cleaned up and reordered imports; no logic changes.
application/datamanager/src/datamanager/main.py Renamed local variable to avoid shadowing; no logic changes.
application/predictionengine/src/predictionengine/dataset.py Renamed error message variable from msg to message in several methods.
application/predictionengine/src/predictionengine/long_short_term_memory.py Renamed error message variable from msg to message.
application/predictionengine/src/predictionengine/loss_function.py Reordered imports, assigned error messages to variables before raising, updated type hints.
application/predictionengine/src/predictionengine/main.py Replaced LOOKBACK_DAYS with SEQUENCE_LENGTH constant throughout; broadened exception handling; minor decorator change.
application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py Added public method get_parameters to return model parameters.
application/predictionengine/src/predictionengine/multi_head_self_attention.py Renamed error message variable from msg to message.
workflows/backfill_datamanager.py Removed type annotation from results variable.
workflows/train_predctionengine.py Added new Flyte workflow and tasks for data fetching, model training, and artifact storage.
application/datamanager/tests/test_datamanager_main.py Added comprehensive tests for datamanager FastAPI endpoints and models, including error handling and mocking.
application/datamanager/tests/test_datamanager_models.py Added comprehensive unit tests for datamanager Pydantic models and integration tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Flyte
    participant fetch_data
    participant train_model
    participant store_model

    User->>Flyte: Start training_workflow(start_date, end_date, epochs)
    Flyte->>fetch_data: fetch_data(start_date, end_date)
    fetch_data-->>Flyte: returns list of dicts (data)
    Flyte->>train_model: train_model(data, epochs)
    train_model-->>Flyte: returns model bytes
    Flyte->>store_model: store_model(model bytes)
    store_model-->>Flyte: returns artifact path
    Flyte-->>User: returns artifact path
Loading

Possibly related PRs

Suggested reviewers

  • chrisaddy

Poem

In the warren of code where the data flows free,
New models are trained—oh, what joy for a bunny like me!
With Arrow and Polars, the bytes swiftly stream,
And workflows now sparkle with a Flyte-powered gleam.
Dependencies grow, configs gain a new key—
This rabbit hops on, in code jubilee!

((\
( -.-)
o_(")(")


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70739d6 and e0bbcb6.

📒 Files selected for processing (1)
  • application/positionmanager/src/positionmanager/clients.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • application/positionmanager/src/positionmanager/clients.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Collaborator Author

forstmeier commented Jun 3, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@forstmeier forstmeier requested review from chrisaddy and Copilot June 3, 2025 01:30
@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented Jun 3, 2025

Graphite Automations

"Assign author to pull request" took an action on this PR • (06/03/25)

1 assignee was added to this PR based on John Forstmeier's automation.

@forstmeier forstmeier added this to the Rebuild milestone Jun 3, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses inter-service communication patterns by modifying the data retrieval and processing flow in the Position Manager client and updating configuration settings.

  • Changed the API call from POST to GET with updated request parameters and timeout for fetching equity bars.
  • Switched from JSON parsing to using Apache Arrow for processing binary data.
  • Updated configuration settings in .claude/settings.local.json.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
application/positionmanager/src/positionmanager/clients.py Updated client communication logic and data transformation using Arrow.
.claude/settings.local.json Added configuration for disabling all project MCP servers.

Comment thread application/positionmanager/src/positionmanager/clients.py Outdated
Comment thread .claude/settings.local.json
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6bf51c and 8373092.

⛔ Files ignored due to path filters (1)
  • .flox/env/manifest.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • .claude/settings.local.json (1 hunks)
  • application/positionmanager/src/positionmanager/clients.py (2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
application/positionmanager/src/positionmanager/clients.py

[error] 4-4: Unable to import 'pyarrow'

(E0401)


[refactor] 85-90: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[warning] 88-90: Raising too general exception: Exception

(W0719)

🔇 Additional comments (2)
application/positionmanager/src/positionmanager/clients.py (2)

75-81:

Details

✅ Verification successful

Breaking change: Verify API compatibility.

The request method has been changed from POST to GET with URL parameters. This is a significant breaking change that affects the inter-service communication contract.

The timeout increase to 30 seconds is reasonable for processing larger Arrow data streams.


🏁 Script executed:

#!/bin/bash
# Description: Verify the datamanager service supports the new GET endpoint with parameters
# Expected: Find the endpoint definition and verify it accepts GET requests

# Search for the equity-bars endpoint definition
rg -A 10 -B 5 "equity-bars" --type py

# Look for any existing tests that might be affected by this change
rg -A 5 -B 5 "equity-bars|get_data" --type py | grep -i test

Length of output: 11471


GET /equity-bars endpoint available – no breaking change
I confirmed that the datamanager service now defines a GET handler for /equity-bars which accepts start_date and end_date as query parameters. Positionmanager’s switch from POST to GET aligns with this and won’t break the inter-service contract. The existing POST route remains intact for the summary use case, so no further changes are required.


92-106:

Details

✅ Verification successful

Verify the Arrow data processing logic.

The new Apache Arrow IPC stream processing looks functionally correct, but the column mapping changes are significant:

  • "t""date" (timestamp to date conversion)
  • Pivot on "T" instead of "ticker"
  • Values from "c" instead of "close_price"

Please ensure the datamanager service now returns data with the new column schema (t, T, c) instead of the previous schema (timestamp, ticker, close_price).


🏁 Script executed:

#!/bin/bash
# Description: Verify the new column names match the data schema from datamanager
# Expected: Find the data schema definition and verify column mappings

# Search for data schema or model definitions
rg -A 10 -B 5 '"t"|"T"|"c"' --type py

# Look for any data transformation or schema definitions
rg -A 5 -B 5 "timestamp|ticker|close_price" --type py

# Check for any existing tests that validate data structure
ast-grep --pattern 'def test_$_($$$) {
  $$$
  get_data($$$)
  $$$
}'

Length of output: 65884


🏁 Script executed:

#!/bin/bash
# Inspect the datamanager HTTP endpoint and source of `bars`
sed -n '1,200p' application/datamanager/src/datamanager/main.py

Length of output: 5854


No action needed: Arrow field mappings are correct

I’ve confirmed that the datamanager service writes and exposes the same JSON payload fields used by the positionmanager client:

  • t (timestamp in ms) is returned and cast to a date
  • T (ticker symbol) becomes the pivot column
  • c (close price) supplies the pivot values

The pivot-on-T/values-from-c logic in application/positionmanager/src/positionmanager/clients.py aligns with the Parquet schema produced in application/datamanager/src/datamanager/main.py. All mappings are consistent.

Comment thread .claude/settings.local.json
Comment thread application/positionmanager/src/positionmanager/clients.py
Comment thread application/positionmanager/src/positionmanager/clients.py Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
chrisaddy
chrisaddy previously approved these changes Jun 3, 2025
…nengine_into_the_workflows

Integrate predictionengine into the workflows
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
application/predictionengine/src/predictionengine/gated_residual_network.py (1)

46-46: Same cast string literal concern as previous file.

This change should be verified for consistency with the intended behavior across the codebase.

application/positionmanager/src/positionmanager/clients.py (1)

4-4: ⚠️ Potential issue

Pyarrow dependency still not declared.

This duplicates a previous review comment - pyarrow is imported but not declared in the positionmanager's dependencies. This will cause import errors in production.

🧹 Nitpick comments (7)
application/predictionengine/tests/test_loss_function.py (2)

18-18: Simplify shape assertion and remove noqa
You can collapse the dual comparison into a membership check and drop the linter suppression:

-    assert loss.shape == () or loss.shape == (1,)  # noqa: PLR1714
+    assert loss.shape in ((), (1,))
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 18-18: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.

(R1714)


29-29: Simplify shape assertion in multiple-samples test
As above, use a single membership test to make it clearer and eliminate the # noqa hack:

-    assert loss.shape == () or loss.shape == (1,)  # noqa: PLR1714
+    assert loss.shape in ((), (1,))
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 29-29: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.

(R1714)

application/predictionengine/src/predictionengine/main.py (1)

48-48: Question: Using SEQUENCE_LENGTH for HTTP timeout.

Using SEQUENCE_LENGTH (30) as the HTTP timeout seems semantically incorrect. The sequence length is a model parameter, while timeout is an operational parameter. Consider using a dedicated HTTP_TIMEOUT constant instead.

+HTTP_TIMEOUT = 30
+
 SEQUENCE_LENGTH = 30

-    response = requests.get(url, params=parameters, timeout=SEQUENCE_LENGTH)
+    response = requests.get(url, params=parameters, timeout=HTTP_TIMEOUT)
workflows/train_predctionengine.py (3)

43-65: Document the column mapping for clarity

The column renaming from short names to descriptive names is good for readability, but consider adding a comment to document the mapping for future maintainers.

+    # Map Polygon.io column names to descriptive names
+    # t: timestamp, o: open, h: high, l: low, c: close, v: volume, 
+    # vw: volume weighted average price, T: ticker symbol
     data = data.with_columns(

81-101: Consider making hyperparameters configurable

The model training uses several hardcoded values that could be made configurable through task parameters or environment variables for better flexibility.

 @task
 def train_model(
     data: list[dict[str, Any]],
     epochs: int = 100,
+    batch_size: int = 32,
+    sequence_length: int = 30,
+    hidden_size: int = 128,
+    attention_head_count: int = 4,
+    dropout_rate: float = 0.1,
 ) -> bytes:

135-138: Remove unnecessary type casts

The explicit type casts are redundant as Flyte handles type conversions automatically between tasks.

 def training_workflow(
     start_date: datetime,
     end_date: datetime,
     epochs: int = 100,
 ) -> str:
     data = fetch_data(start_date=start_date, end_date=end_date)
-    model_bytes = train_model(data=cast("list[dict[str, Any]]", data), epochs=epochs)
-    artifact_path = store_model(model_bytes=cast("bytes", model_bytes))
-    return cast("str", artifact_path)
+    model_bytes = train_model(data=data, epochs=epochs)
+    artifact_path = store_model(model_bytes=model_bytes)
+    return artifact_path
application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (1)

156-158: Remove unnecessary noqa comment

The # noqa: RET504 comment is unnecessary here as the code is clear and readable.

-        return average_loss  # noqa: RET504
+        return average_loss
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8373092 and 49b392f.

📒 Files selected for processing (24)
  • application/datamanager/src/datamanager/config.py (1 hunks)
  • application/datamanager/src/datamanager/main.py (2 hunks)
  • application/positionmanager/src/positionmanager/clients.py (4 hunks)
  • application/positionmanager/src/positionmanager/main.py (1 hunks)
  • application/predictionengine/src/predictionengine/dataset.py (8 hunks)
  • application/predictionengine/src/predictionengine/gated_residual_network.py (2 hunks)
  • application/predictionengine/src/predictionengine/long_short_term_memory.py (4 hunks)
  • application/predictionengine/src/predictionengine/loss_function.py (2 hunks)
  • application/predictionengine/src/predictionengine/main.py (7 hunks)
  • application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (6 hunks)
  • application/predictionengine/src/predictionengine/models.py (1 hunks)
  • application/predictionengine/src/predictionengine/multi_head_self_attention.py (4 hunks)
  • application/predictionengine/src/predictionengine/post_processor.py (3 hunks)
  • application/predictionengine/tests/test_dataset.py (2 hunks)
  • application/predictionengine/tests/test_gated_residual_network.py (5 hunks)
  • application/predictionengine/tests/test_long_short_term_memory.py (4 hunks)
  • application/predictionengine/tests/test_loss_function.py (4 hunks)
  • application/predictionengine/tests/test_multi_head_self_attention.py (4 hunks)
  • application/predictionengine/tests/test_post_processor.py (2 hunks)
  • application/predictionengine/tests/test_ticker_embedding.py (1 hunks)
  • workflows/backfill_datamanager.py (1 hunks)
  • workflows/prediction_model.py (0 hunks)
  • workflows/pyproject.toml (1 hunks)
  • workflows/train_predctionengine.py (1 hunks)
💤 Files with no reviewable changes (1)
  • workflows/prediction_model.py
✅ Files skipped from review due to trivial changes (12)
  • application/predictionengine/tests/test_ticker_embedding.py
  • application/predictionengine/tests/test_dataset.py
  • workflows/pyproject.toml
  • application/predictionengine/tests/test_post_processor.py
  • application/predictionengine/tests/test_gated_residual_network.py
  • application/datamanager/src/datamanager/config.py
  • application/predictionengine/tests/test_long_short_term_memory.py
  • application/positionmanager/src/positionmanager/main.py
  • application/predictionengine/src/predictionengine/models.py
  • application/predictionengine/src/predictionengine/loss_function.py
  • application/predictionengine/tests/test_multi_head_self_attention.py
  • workflows/backfill_datamanager.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
application/datamanager/src/datamanager/main.py (1)
application/datamanager/src/datamanager/models.py (1)
  • BarsSummary (49-51)
application/predictionengine/src/predictionengine/main.py (3)
application/predictionengine/src/predictionengine/dataset.py (1)
  • DataSet (18-216)
application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (2)
  • MiniatureTemporalFusionTransformer (22-188)
  • load (167-172)
application/predictionengine/src/predictionengine/models.py (1)
  • PredictionResponse (4-5)
workflows/train_predctionengine.py (2)
application/predictionengine/src/predictionengine/dataset.py (3)
  • DataSet (18-216)
  • load_data (145-155)
  • get_preprocessors (157-175)
application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (2)
  • MiniatureTemporalFusionTransformer (22-188)
  • train (110-141)
🪛 Pylint (3.3.7)
application/predictionengine/src/predictionengine/main.py

[error] 14-14: Attempted relative import beyond top-level package

(E0402)


[error] 15-15: Attempted relative import beyond top-level package

(E0402)


[error] 16-16: Attempted relative import beyond top-level package

(E0402)


[refactor] 95-95: Too many local variables (17/15)

(R0914)

application/positionmanager/src/positionmanager/clients.py

[error] 90-90: Instance of 'LookupDict' has no 'not_found' member

(E1101)


[error] 92-92: Instance of 'LookupDict' has no 'ok' member

(E1101)

application/predictionengine/src/predictionengine/gated_residual_network.py

[refactor] 7-7: Too many instance attributes (8/7)

(R0902)


[refactor] 7-7: Too few public methods (1/2)

(R0903)

application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py

[error] 13-13: Attempted relative import beyond top-level package

(E0402)


[error] 14-14: Attempted relative import beyond top-level package

(E0402)


[error] 15-15: Attempted relative import beyond top-level package

(E0402)


[error] 16-16: Attempted relative import beyond top-level package

(E0402)


[error] 17-17: Attempted relative import beyond top-level package

(E0402)


[error] 18-18: Attempted relative import beyond top-level package

(E0402)


[error] 19-19: Attempted relative import beyond top-level package

(E0402)


[refactor] 23-23: Too many arguments (12/5)

(R0913)


[refactor] 23-23: Too many positional arguments (12/5)

(R0917)

application/predictionengine/src/predictionengine/post_processor.py

[refactor] 9-9: Too few public methods (1/2)

(R0903)

application/predictionengine/tests/test_loss_function.py

[refactor] 18-18: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.

(R1714)


[refactor] 29-29: Consider merging these comparisons with 'in' by using 'loss.shape in ((), (1, ))'. Use a set instead if elements are hashable.

(R1714)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
🔇 Additional comments (27)
application/predictionengine/tests/test_loss_function.py (1)

1-4: Approve import reordering
The imports are now grouped and ordered per PEP8 (stdlib, third-party, local). Moving Tensor after pytest improves readability.

application/predictionengine/src/predictionengine/multi_head_self_attention.py (5)

1-5: LGTM: Import reordering and modernization looks good.

The import simplification and reordering improves code organization.


15-16: LGTM: Improved error handling pattern.

Assigning the error message to a variable before raising the exception improves readability and debugging.


32-38: LGTM: Parameter renaming from input to features is excellent.

This change avoids shadowing the built-in input() function and makes the parameter name more descriptive.


33-33: LGTM: Type annotation modernization is appropriate.

Using lowercase tuple instead of Tuple aligns with Python 3.9+ best practices.


54-54:

Details

❌ Incorrect review comment

Verify the cast string literal change doesn't break functionality.

The change from cast(Tensor, ...) to cast("Tensor", ...) modifies behavior - the former provides runtime type checking while the latter is only for static analysis tools.


🏁 Script executed:

#!/bin/bash
# Search for other cast usages to check consistency
rg "cast\(" --type py -A 2 -B 1

Length of output: 4580


No action needed: cast is a no-op at runtime
The typing.cast function always returns its second argument without any runtime type checking, regardless of whether you pass a type object or a string literal. Static analyzers (mypy, pyright) support string literals as forward references under PEP 484/563, so this change has no impact on functionality or type checking.

Likely an incorrect or invalid review comment.

application/predictionengine/src/predictionengine/gated_residual_network.py (3)

2-3: LGTM: Import formatting improvement.

The blank line separation between typing and tinygrad imports improves readability.


12-12: LGTM: Modern union syntax is cleaner.

Using int | None instead of Optional[int] aligns with Python 3.10+ best practices.


33-45: LGTM: Consistent parameter renaming and usage.

The renaming from input to features is applied consistently throughout the method.

application/predictionengine/src/predictionengine/long_short_term_memory.py (5)

2-2: LGTM: Import reordering maintains logical grouping.

Moving Tensor import after LSTMCell keeps related imports together.


17-17: LGTM: Type annotation modernization.

Using lowercase list instead of List follows modern Python conventions.


24-26: LGTM: Consistent parameter renaming.

The input to features renaming is applied consistently and improves code clarity.

Also applies to: 38-38


61-63: LGTM: Improved error handling pattern.

Assigning the error message before raising maintains consistency with other files.


64-67:

Details

✅ Verification successful

Verify the conditional logic and tensor stacking changes.

Two modifications need verification:

  1. Changing elif to if for the single output case - this alters the logical flow structure
  2. The Tensor.stack argument unpacking change from *outputs to outputs[0], *outputs[1:] - ensure this doesn't affect tensor stacking behavior

🏁 Script executed:

#!/bin/bash
# Check if Tensor.stack behavior differs between these two forms
rg "Tensor\.stack" --type py -A 2 -B 1

Length of output: 885


Conditional flow and Tensor.stack usage are correct

I’ve verified that:

  • The if ... else structure at lines 64–67 cleanly handles the single- vs. multi-output cases.
  • Calling Tensor.stack(outputs[0], *outputs[1:], dim=1) is equivalent to Tensor.stack(*outputs, dim=1) and matches the pattern used elsewhere (e.g. in dataset.py).

No changes required here.

application/predictionengine/src/predictionengine/post_processor.py (3)

1-6: LGTM: Import organization improvement.

The blank line separation between different import groups and reordering improves code organization.


12-13: LGTM: Type annotation modernization.

Using lowercase dict and tuple instead of Dict and Tuple follows modern Python conventions.

Also applies to: 24-24


44-45: LGTM: Consistent error handling pattern.

The error message assignment before raising maintains consistency with the modernization effort across the codebase.

application/datamanager/src/datamanager/main.py (1)

140-141: Excellent fix for variable shadowing.

The renaming from summary_date to request_summary_date properly avoids shadowing the parameter name and makes the code more readable by clearly distinguishing between the date object parameter and the formatted string.

Also applies to: 181-181

application/positionmanager/src/positionmanager/clients.py (2)

79-82: LGTM: HTTP method change and timeout increase.

The change from POST with JSON payload to GET with URL parameters is more RESTful for data retrieval operations. The timeout increase from 10 to 30 seconds accommodates the larger Apache Arrow response format.

Also applies to: 85-85


90-96: Improved error handling with proper status codes.

The explicit handling of 404 status and use of requests.codes constants improves readability. The HTTPError exception provides better context than the generic Exception.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 90-90: Instance of 'LookupDict' has no 'not_found' member

(E1101)


[error] 92-92: Instance of 'LookupDict' has no 'ok' member

(E1101)

application/predictionengine/src/predictionengine/dataset.py (2)

1-2: Excellent type hint modernizations.

The changes follow Python 3.9+ best practices:

  • Using collections.abc.Generator instead of typing.Generator
  • Using built-in dict and tuple instead of typing.Dict and typing.Tuple
  • Proper handling of mutable default arguments with None check

These improvements enhance code clarity and follow modern Python conventions.

Also applies to: 24-24, 26-27, 32-32, 112-112, 124-124, 157-157, 177-177


139-140: Improved error message handling.

The refactoring to assign error messages to variables before raising exceptions improves readability and follows good practices for error handling.

Also applies to: 159-160, 201-202

application/predictionengine/src/predictionengine/main.py (4)

18-18: Good introduction of SEQUENCE_LENGTH constant.

Defining this constant improves maintainability and eliminates magic numbers throughout the codebase.


3-6: Excellent import modernizations.

The changes follow Python 3.9+ best practices:

  • Using collections.abc.AsyncGenerator instead of typing.AsyncGenerator
  • Adding proper imports for Path and UTC
  • Clean import organization

Also applies to: 9-12


83-83: Good modernizations for file operations and timezone handling.

  • Using Path.exists() is more modern and type-safe than os.path.exists()
  • Using datetime.now(tz=UTC) ensures consistent timezone handling

Also applies to: 99-99


94-94:

Details

✅ Verification successful

Consider impact of removing response model annotation.

Removing the response model annotation from the route decorator may affect OpenAPI documentation generation. Verify that the API documentation still properly describes the response schema.


🏁 Script executed:

#!/bin/bash
# Description: Check if response model is used elsewhere for OpenAPI documentation
# and verify if the removal affects API documentation

echo "=== Searching for PredictionResponse usage in OpenAPI/docs context ==="
rg -A 5 -B 5 "PredictionResponse|response.*model" --type py

echo -e "\n=== Searching for FastAPI response model patterns ==="
ast-grep --pattern '@$_.post($$$, response_model=$_)'
ast-grep --pattern '@$_.get($$$, response_model=$_)'

Length of output: 3734


Maintain response schema via return type annotation
FastAPI will infer the Pydantic model from the function’s -> PredictionResponse return type, so you don’t need an explicit response_model on the decorator. The OpenAPI docs will still include the correct response schema as long as the return annotation is present.

application/predictionengine/src/predictionengine/miniature_temporal_fusion_transformer.py (1)

74-77: Good addition of parameter accessor method

The get_parameters method provides a clean API for accessing model parameters, which improves encapsulation.

Comment thread application/predictionengine/tests/test_loss_function.py Outdated
Comment thread application/positionmanager/src/positionmanager/clients.py
Comment thread workflows/train_predctionengine.py
Comment thread workflows/train_predctionengine.py
Comment thread workflows/train_predctionengine.py
Comment thread workflows/train_predctionengine.py
…_datamanager

Add basic tests for datamanager
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
application/datamanager/tests/test_datamanager_models.py (2)

1-4: Consider standardizing on a single test framework.

The file mixes unittest and pytest frameworks. While functional, this creates inconsistency in testing patterns and assertions.

Consider using only pytest for consistency:

-import unittest
 from datetime import date

 import pytest
 from pydantic import ValidationError

Then convert test classes to simple functions or use pytest class structure.


89-93: Consider validating the business logic for negative counts.

The test accepts negative counts for BarsSummary, which may not make business sense for counting bars/data points.

Consider adding validation to the BarsSummary model to ensure non-negative counts:

def test_bars_summary_negative_count_validation(self) -> None:
    with pytest.raises(ValidationError, match="Count must be non-negative"):
        BarsSummary(date="2023-05-15", count=-1)

This would require updating the model with appropriate validation.

application/datamanager/tests/test_datamanager_main.py (2)

19-36: Consider consolidating duplicate model tests.

The TestDataManagerModels class duplicates tests already covered in test_datamanager_models.py. For integration testing, focus on testing the models within the API context rather than duplicating unit tests.

Consider replacing these with API-specific integration tests:

def test_summary_date_in_api_context(self) -> None:
    # Test SummaryDate when used in actual API endpoints
    response = client.post("/equity-bars", json={"date": "2023-01-01"})
    # Assert the API processes the date correctly

69-89: Complex mocking setup requires careful maintenance.

The database error test uses extensive mocking that could become brittle. Consider simplifying or documenting the mock setup clearly.

Consider extracting the mock setup into a helper method or fixture:

def _setup_database_error_mock(self, mock_duckdb: MagicMock):
    """Helper to setup database error scenario."""
    mock_connection = MagicMock()
    mock_connection.execute.side_effect = IOException("Database error")
    mock_duckdb.connect.return_value = mock_connection
    
    mock_settings = MagicMock()
    mock_settings.gcp.bucket.name = "test-bucket"
    
    return mock_connection, mock_settings

This improves readability and reusability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b05fa4 and 3e4f77c.

📒 Files selected for processing (2)
  • application/datamanager/tests/test_datamanager_main.py (1 hunks)
  • application/datamanager/tests/test_datamanager_models.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
application/datamanager/tests/test_datamanager_models.py (1)
application/datamanager/src/datamanager/models.py (2)
  • BarsSummary (49-51)
  • SummaryDate (7-28)
application/datamanager/tests/test_datamanager_main.py (1)
application/datamanager/src/datamanager/models.py (2)
  • BarsSummary (49-51)
  • SummaryDate (7-28)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
🔇 Additional comments (3)
application/datamanager/tests/test_datamanager_models.py (1)

14-44: Excellent test coverage and structure.

The test suite provides comprehensive coverage of:

  • Default initialization and explicit value setting
  • String parsing in multiple formats
  • Validation error handling for invalid inputs
  • JSON serialization/deserialization
  • Model integration scenarios

The tests properly verify both success and failure paths with appropriate assertions.

Also applies to: 47-73, 76-106, 109-127

application/datamanager/tests/test_datamanager_main.py (2)

14-16: Well-structured endpoint testing.

The endpoint tests effectively cover:

  • Missing required parameters (422 responses)
  • Invalid data formats (422 responses)
  • Different HTTP methods (GET, POST, DELETE)
  • Proper status code assertions

This provides good coverage of the API's validation behavior.

Also applies to: 40-67


71-71:

Details

✅ Verification successful

Verify the IOException import location.

The IOException is imported inside the test method from duckdb. Ensure this import path is correct and stable.


🏁 Script executed:

#!/bin/bash
# Description: Verify duckdb.IOException exists and is the correct exception type
# Expected: Should show the exception class and its module

python3 << 'EOF'
try:
    import duckdb
    print(f"duckdb version: {duckdb.__version__}")
    print(f"IOException available: {hasattr(duckdb, 'IOException')}")
    if hasattr(duckdb, 'IOException'):
        print(f"IOException type: {type(duckdb.IOException)}")
        print(f"IOException MRO: {duckdb.IOException.__mro__}")
except ImportError as e:
    print(f"Import error: {e}")
except Exception as e:
    print(f"Other error: {e}")
EOF

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Search for duckdb mentions in the repository to verify dependency declarations
grep -R "duckdb" -n .

Length of output: 3588


🏁 Script executed:

#!/bin/bash
# Print the relevant section of the test file to understand the context around IOException import
sed -n '1,200p' application/datamanager/tests/test_datamanager_main.py

Length of output: 3599


Import Path Verified: IOException Is Correctly Imported from duckdb

The IOException class is exposed at the top level of the duckdb package, and you’re already declaring duckdb>=1.2.2 in your pyproject.toml. Both the application code (main.py, line 13) and the test (test_datamanager_main.py, line 71) import IOException the same way, so this import is stable and consistent. No changes needed.

Comment thread application/datamanager/tests/test_datamanager_models.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
application/positionmanager/src/positionmanager/clients.py (1)

4-4: pyarrow dependency still needs proper declaration.

The pyarrow import was flagged in previous reviews as missing from the module's dependencies. This issue remains unresolved and could cause import errors in production.

🧹 Nitpick comments (2)
application/predictionengine/tests/test_dataset.py (1)

16-19: Inconsistent testing patterns within the file.

While the simplification from NamedTuple to direct assertions is functionally correct, it creates inconsistency with test_dataset_batches() (lines 108-116) which still uses the Expected NamedTuple pattern. Consider applying the same simplification approach consistently across all test methods for better maintainability.

For consistency, you could either:

  1. Apply the same direct assertion pattern to test_dataset_batches()
  2. Or revert to using constants/NamedTuple if you prefer more descriptive test data
# Option 1: Apply direct assertions to test_dataset_batches
-    class Expected(NamedTuple):
-        batch_size: int = 1
-        sequence_length: int = 2
-        sample_count: int = 3
-        observations: int = 2
-        features: int = 6
-        target: int = 1
-
-    expected = Expected()
-
     batch_count = 0
     for tickers, features, targets in dataset.batches():
         batch_count += 1
-        assert tickers.shape[0] == expected.batch_size
+        assert tickers.shape[0] == 1  # noqa: PLR2004
         assert features.shape == (
-            expected.batch_size,
-            expected.sequence_length,
-            expected.features,
+            1,  # noqa: PLR2004
+            2,  # noqa: PLR2004
+            6,  # noqa: PLR2004
         )
-        assert targets.shape == (expected.batch_size, expected.target)
+        assert targets.shape == (1, 1)  # noqa: PLR2004
pyproject.toml (1)

5-5: Consider the implications of exact Python version pinning.

Exact version pinning (==3.12.10) provides reproducibility but prevents automatic security and bug fixes from newer patch versions. For a production application, consider whether this level of strictness is necessary or if a range like >=3.12.10,<3.13 would be more appropriate.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f872aae and 70739d6.

📒 Files selected for processing (17)
  • application/datamanager/Dockerfile (1 hunks)
  • application/datamanager/pyproject.toml (1 hunks)
  • application/positionmanager/pyproject.toml (2 hunks)
  • application/positionmanager/src/positionmanager/clients.py (4 hunks)
  • application/predictionengine/Dockerfile (1 hunks)
  • application/predictionengine/pyproject.toml (1 hunks)
  • application/predictionengine/src/predictionengine/main.py (6 hunks)
  • application/predictionengine/src/predictionengine/post_processor.py (1 hunks)
  • application/predictionengine/tests/test_dataset.py (1 hunks)
  • application/predictionengine/tests/test_long_short_term_memory.py (1 hunks)
  • application/predictionengine/tests/test_multi_head_self_attention.py (1 hunks)
  • infrastructure/monitoring.py (2 hunks)
  • infrastructure/pyproject.toml (1 hunks)
  • linter.nu (1 hunks)
  • pyproject.toml (1 hunks)
  • workflows/prediction_model.py (0 hunks)
  • workflows/pyproject.toml (1 hunks)
💤 Files with no reviewable changes (1)
  • workflows/prediction_model.py
✅ Files skipped from review due to trivial changes (7)
  • application/predictionengine/src/predictionengine/post_processor.py
  • application/datamanager/pyproject.toml
  • application/predictionengine/Dockerfile
  • application/predictionengine/tests/test_multi_head_self_attention.py
  • application/predictionengine/tests/test_long_short_term_memory.py
  • infrastructure/monitoring.py
  • application/datamanager/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (3)
  • application/positionmanager/pyproject.toml
  • workflows/pyproject.toml
  • application/predictionengine/src/predictionengine/main.py
🧰 Additional context used
🪛 Pylint (3.3.7)
application/positionmanager/src/positionmanager/clients.py

[error] 92-92: Instance of 'LookupDict' has no 'not_found' member

(E1101)


[error] 94-94: Instance of 'LookupDict' has no 'ok' member

(E1101)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
🔇 Additional comments (7)
infrastructure/pyproject.toml (1)

4-4:

Details

❓ Verification inconclusive

Verify compatibility with the Python version downgrade.

The change from >=3.13 to ==3.12.10 represents a significant downgrade. Ensure that all infrastructure code and dependencies are compatible with Python 3.12.10, particularly any features that may have been designed for Python 3.13.


🏁 Script executed:

#!/bin/bash
# Description: Check for Python 3.13+ specific features in infrastructure code
# Expected: No usage of Python 3.13+ exclusive features

# Search for potential Python 3.13+ features
rg -A 3 "match.*case" infrastructure/
rg -A 3 "typing\.override" infrastructure/ 
rg -A 3 "pathlib\.Path\.walk" infrastructure/
rg -A 3 "itertools\.batched" infrastructure/

Length of output: 179


🏁 Script executed:

#!/bin/bash
# Display the contents of the infrastructure pyproject.toml for dependency review
sed -n '1,200p' infrastructure/pyproject.toml

Length of output: 284


Verify compatibility with Python 3.12.10
We’ve scanned the infrastructure/ code for Python 3.13+-only features (enhanced match…case, typing.Override, pathlib.Path.walk, itertools.batched) and found no usages. Before approving the pin from >=3.13==3.12.10, please confirm:

  • All declared dependencies in infrastructure/pyproject.toml support Python 3.12.10:
    • pulumi>=3.169.0
    • pulumi-gcp>=8.30.1
    • pulumi-docker>=3.0.0
    • pulumi-docker-build>=0.0.12
    • loguru>=0.7.3
  • Your CI/CD and deployment pipelines have been validated under Python 3.12.10.
linter.nu (2)

4-8: LGTM - Improved file discovery pattern.

The change from fd/find to the ls pipeline with filtering is more idiomatic for nushell and should be more reliable.


13-13: Good improvement to version parsing.

Using split row ":" | get 1 | str trim is more explicit and robust than regex replacement for extracting the Python version from Docker FROM statements.

application/positionmanager/src/positionmanager/clients.py (4)

22-23: Consistent error message pattern applied correctly.

The changes to use intermediate message variables for error strings improve consistency and readability across the codebase.

Also applies to: 34-35, 76-77, 89-90


81-84: HTTP method change from POST to GET with appropriate timeout increase.

The shift from POST with JSON body to GET with URL parameters is appropriate for data retrieval. The timeout increase from 10 to 30 seconds accommodates the potentially larger Apache Arrow IPC responses.

Also applies to: 87-87


100-104: Apache Arrow IPC stream processing implementation looks correct.

The implementation properly creates a buffer from response content, reads it as an IPC stream, and converts to Polars DataFrame. This aligns with the datamanager service's Arrow stream output.


110-110:

Details

✅ Verification successful

Pivot operation updated to use new column names.

The pivot now uses "T" for ticker symbols and "c" for close prices, which matches the abbreviated Arrow schema. Verify these column names are consistently used across the data pipeline.

Verify the column name consistency across the data pipeline:


🏁 Script executed:

#!/bin/bash
# Description: Verify that "T" and "c" column names are used consistently
# Expected: Find consistent usage in datamanager and other consumers

echo "=== Searching for ticker column usage ==="
rg -B 2 -A 2 '"T"' --type py

echo -e "\n=== Searching for close price column usage ==="
rg -B 2 -A 2 '"c"' --type py

echo -e "\n=== Checking for any remaining old column names ==="
rg -B 2 -A 2 '"ticker"|"close_price"' --type py

Length of output: 18370


Column names “T” and “c” correctly applied and aliased for downstream use

The abbreviated Arrow schema columns “T” and “c” are only used at the ingestion/pivot stage and immediately aliased back to ticker and close_price. All downstream code—including workflows, dataset logic, main/post-processor, and tests—consistently operates on the aliased names. No further changes are required.

Comment thread application/predictionengine/pyproject.toml
Comment thread linter.nu
Comment thread application/positionmanager/src/positionmanager/clients.py Outdated
Comment thread application/positionmanager/src/positionmanager/clients.py Outdated
@chrisaddy chrisaddy merged commit 6908d3c into master Jun 6, 2025
6 checks passed
@chrisaddy chrisaddy deleted the 06-02-fix_inter-service_communication_patterns branch June 6, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants