Create initial datamanager helper classes#618
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds Datamanager Alpaca and S3 clients with unit tests and packaging updates; updates test Dockerfile. Replaces large Pulumi/EKS-based infra with Docker Swarm/Lightsail stack and Nu orchestration, removes many Pulumi modules, and adds Docker Swarm stack, Prometheus config, and helper scripts/settings. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant AC as AlpacaClient
participant AssetsAPI as Alpaca Assets API
participant SnapAPI as Alpaca Snapshot API
participant TZ as Timezone Normalizer
participant DF as Polars
Caller->>AC: fetch_latest_data(current_date)
AC->>AssetsAPI: Get active US equity assets
AssetsAPI-->>AC: symbols[]
loop for each chunk (≤100)
AC->>SnapAPI: Request snapshots(chunk, feed=iex)
SnapAPI-->>AC: snapshots
AC->>TZ: Normalize daily_bar timestamp → America/New_York
TZ-->>AC: local_date
alt daily_bar present AND local_date == current_date
AC->>DF: Append record (ticker, ts_ms, o/h/l/c, vol, vwap)
else
AC-->>AC: Skip / log
end
AC-->>AC: Sleep (rate limit)
end
AC-->>Caller: Polars DataFrame
sequenceDiagram
autonumber
actor Caller
participant S3C as S3Client
participant DuckDB as DuckDB+HTTPFS
participant S3 as S3
participant DF as Polars
rect rgb(230,245,255)
note right of S3C: Write flow
Caller->>S3C: write_equity_bars_data(df)
S3C->>DF: add dt, extract year/month/day
S3C->>S3: write_parquet(path=s3://.../equity/bars/, partition_by=[year,month,day])
end
rect rgb(240,255,240)
note right of S3C: Read flow
Caller->>S3C: read_equity_bars_data(start_date,end_date)
S3C->>DuckDB: read_parquet('s3://.../**/*.parquet', hive_partitioning=1)
DuckDB->>DuckDB: apply WHERE year/month/day range filter
DuckDB-->>S3C: results (Pandas/Arrow)
S3C-->>Caller: Polars DataFrame
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces initial datamanager helper classes for collecting and managing financial market data. The implementation creates client classes for integrating with Alpaca API for market data fetching and AWS S3 for data storage.
- Adds AlpacaClient class for fetching equity market data from Alpaca API
- Adds S3Client class for reading/writing equity bars data to/from S3 storage
- Updates project dependencies and Docker configuration to support the new functionality
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| applications/datamanager/src/datamanager/alpaca_client.py | Implements AlpacaClient for fetching latest equity market data with rate limiting and error handling |
| applications/datamanager/src/datamanager/s3_client.py | Implements S3Client for reading/writing partitioned equity bars data using DuckDB and Polars |
| applications/datamanager/tests/test_alpaca_client.py | Comprehensive test suite covering AlpacaClient functionality including edge cases |
| applications/datamanager/tests/test_s3_client.py | Test suite for S3Client covering write/read operations and empty dataframe handling |
| applications/datamanager/pyproject.toml | Updates dependencies to include required packages for Alpaca API, AWS, and data processing |
| Dockerfile.tests | Updates Python version and sync configuration for the test environment |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Graphite Automations"Assign author to pull request" took an action on this PR • (08/21/25)1 assignee was added to this PR based on John Forstmeier's automation. |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (11)
applications/datamanager/pyproject.toml (1)
5-12: Consider Python version consistency across the project.The pyproject.toml specifies Python 3.12.10, while the Dockerfile.tests uses Python 3.12 (which defaults to the latest 3.12.x). This could lead to subtle differences between local development and CI/CD environments.
Consider aligning the Python version specifications. Either:
- Update the Dockerfile to use
python:3.12.10for exact version matching, or- Relax the pyproject.toml requirement to
>=3.12,<3.13for flexibility-requires-python = "==3.12.10" +requires-python = ">=3.12,<3.13"applications/datamanager/tests/test_alpaca_client.py (2)
56-56: Consider using a constant for theis_paperparameter.The
FBT003noqa comment suggests the linter is warning about boolean positional arguments. Consider defining a constant or using keyword-only arguments for better readability.- alpaca_client = AlpacaClient("test-key", "test-secret", True) # noqa: FBT003 + alpaca_client = AlpacaClient("test-key", "test-secret", is_paper=True)
1-157: Consider adding tests for error handling and chunk processing.While the current tests cover the main scenarios well, consider adding tests for:
- Exception handling when
get_all_assetsfails- Chunk processing with more than 100 tickers
- Partial chunk failures (when some chunks succeed and others fail)
- VWAP being None scenario
Would you like me to generate additional test cases to cover these scenarios?
applications/datamanager/src/datamanager/alpaca_client.py (2)
26-26: Consider making rate limit configurable.The hardcoded rate limit of 0.5 seconds might not be optimal for all API tiers. Consider making this configurable via an environment variable or constructor parameter.
def __init__( self, api_key: str, api_secret: str, is_paper: bool, # noqa: FBT001 + rate_limit_sleep: float = 0.5, ) -> None: - self.rate_limit_sleep = 0.5 # seconds + self.rate_limit_sleep = rate_limit_sleep # seconds
62-63: Consider validating chunk_size parameter.The chunk size of 100 is hardcoded. Consider making it configurable and validate it's a positive integer.
Add chunk_size as a class attribute or method parameter:
class AlpacaClient: + DEFAULT_CHUNK_SIZE = 100 + def __init__( self, api_key: str, api_secret: str, is_paper: bool, # noqa: FBT001 + chunk_size: int = DEFAULT_CHUNK_SIZE, ) -> None: self.rate_limit_sleep = 0.5 # seconds + if chunk_size <= 0: + raise ValueError("chunk_size must be positive") + self.chunk_size = chunk_sizeThen use
self.chunk_sizeinstead of the hardcoded value.applications/datamanager/src/datamanager/s3_client.py (1)
68-70: Consider memory efficiency for large datasets.The
fetchdf()method loads all data into memory as a pandas DataFrame before converting to Polars. For large date ranges, this could cause memory issues.Consider using DuckDB's arrow interface for more efficient memory usage:
- result = self.duckdb_connection.execute(query).fetchdf() - - return pl.from_pandas(result) + result = self.duckdb_connection.execute(query).arrow() + return pl.from_arrow(result)applications/datamanager/tests/test_s3_client.py (5)
11-14: Patch at the point of use to make mocks reliable and future-proof.Patch where the symbols are looked up (datamanager.s3_client.*) to avoid leaky mocks if import bindings change.
- patch("boto3.Session") as mock_session, - patch("duckdb.connect") as mock_duckdb_connect, + patch("datamanager.s3_client.boto3.Session") as mock_session, + patch("datamanager.s3_client.duckdb.connect") as mock_duckdb_connect, patch("polars.DataFrame.write_parquet") as mock_write_parquet,- patch("boto3.Session") as mock_session, - patch("duckdb.connect") as mock_duckdb_connect, + patch("datamanager.s3_client.boto3.Session") as mock_session, + patch("datamanager.s3_client.duckdb.connect") as mock_duckdb_connect,- patch("boto3.Session") as mock_session, - patch("duckdb.connect") as mock_duckdb_connect, + patch("datamanager.s3_client.boto3.Session") as mock_session, + patch("datamanager.s3_client.duckdb.connect") as mock_duckdb_connect,Also applies to: 47-49, 86-88
39-43: Strengthen assertions: verify partition columns exist on the DataFrame being written and that DuckDB init sets the region.This ensures we don’t just check call shape but also the transformed data and initialization side effects.
mock_write_parquet.assert_called_once() call_args = mock_write_parquet.call_args assert call_args[1]["file"] == "s3://test-bucket/equity/bars/" assert call_args[1]["partition_by"] == ["year", "month", "day"] + # also assert that the DF passed into write_parquet contains the partition columns + df_written = call_args[0][0] # 'self' of the bound method + assert set(["year", "month", "day"]).issubset(set(df_written.columns)) + + # verify DuckDB init SQL applied the region from boto3 + init_sql = mock_duckdb_conn.execute.call_args_list[0].args[0] + assert "SET s3_region='us-east-1'" in init_sql
71-79: Make the read test robust: avoid brittle call_count and assert SQL intent; distinguish init vs. query with side_effect.Relying on exact call counts is fragile if init is refactored. Assert on SQL content instead and use side_effect to model two different returns.
- mock_duckdb_conn.execute.return_value = mock_result + # first execute() for init returns a dummy, second for query returns a result with fetchdf() + mock_duckdb_conn.execute.side_effect = [MagicMock(), mock_result] @@ - assert mock_duckdb_conn.execute.call_count == 2 # noqa: PLR2004 - - assert isinstance(result, pl.DataFrame) + # Assert init SQL and query SQL contents instead of exact counts + calls = mock_duckdb_conn.execute.call_args_list + assert any("INSTALL httpfs" in str(c.args[0]) for c in calls) + assert any("SET s3_region='us-east-1'" in str(c.args[0]) for c in calls) + assert "HIVE_PARTITIONING=1" in calls[-1].args[0] + assert "s3://test-bucket/equity/bars/*/*/*/*" in calls[-1].args[0] + + # Sanity-check conversion to Polars and expected columns propagated + assert isinstance(result, pl.DataFrame) + expected_cols = { + "ticker", + "timestamp", + "open_price", + "high_price", + "low_price", + "close_price", + "volume", + "volume_weighted_average_price", + "year", + "month", + "day", + } + assert expected_cols.issubset(set(result.columns))
84-98: Empty DataFrame test should assert no write occurs.Currently it only ensures “does not raise.” Tighten the contract by asserting write_parquet is not called.
- with ( - patch("boto3.Session") as mock_session, - patch("duckdb.connect") as mock_duckdb_connect, - ): + with ( + patch("datamanager.s3_client.boto3.Session") as mock_session, + patch("datamanager.s3_client.duckdb.connect") as mock_duckdb_connect, + patch("polars.DataFrame.write_parquet") as mock_write_parquet, + ): @@ s3_client.write_equity_bars_data(empty_data) + mock_write_parquet.assert_not_called()
9-37: Optional: Add edge-case tests for default-region fallback and error propagation.These will harden behavior around AWS env variability and logging on failures.
# New tests (can be appended to this file) def test_s3_client_defaults_region_when_boto3_none() -> None: with ( patch("datamanager.s3_client.boto3.Session") as mock_session, patch("datamanager.s3_client.duckdb.connect") as mock_duckdb_connect, ): mock_session.return_value.region_name = None mock_conn = MagicMock() mock_duckdb_connect.return_value = mock_conn S3Client("test-bucket") init_sql = mock_conn.execute.call_args_list[0].args[0] assert "SET s3_region='us-east-1'" in init_sql def test_s3_client_write_equity_bars_data_propagates_exceptions() -> None: with ( patch("datamanager.s3_client.boto3.Session") as mock_session, patch("datamanager.s3_client.duckdb.connect") as mock_duckdb_connect, patch("datamanager.s3_client.logger") as mock_logger, patch("polars.DataFrame.write_parquet", side_effect=RuntimeError("boom")), ): mock_session.return_value.region_name = "us-east-1" mock_duckdb_connect.return_value = MagicMock() s3_client = S3Client("test-bucket") df = pl.DataFrame({"timestamp": [1]}) try: s3_client.write_equity_bars_data(df) assert False, "Expected exception" except RuntimeError: pass mock_logger.error.assert_called()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
Dockerfile.tests(2 hunks)applications/datamanager/pyproject.toml(1 hunks)applications/datamanager/src/datamanager/alpaca_client.py(1 hunks)applications/datamanager/src/datamanager/s3_client.py(1 hunks)applications/datamanager/tests/test_alpaca_client.py(1 hunks)applications/datamanager/tests/test_s3_client.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
applications/datamanager/tests/test_s3_client.py (1)
applications/datamanager/src/datamanager/s3_client.py (3)
S3Client(9-70)write_equity_bars_data(21-44)read_equity_bars_data(46-70)
applications/datamanager/tests/test_alpaca_client.py (1)
applications/datamanager/src/datamanager/alpaca_client.py (2)
AlpacaClient(19-141)fetch_latest_data(40-141)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Python tests
🔇 Additional comments (5)
applications/datamanager/tests/test_alpaca_client.py (3)
9-76: Test coverage looks comprehensive for the happy path.The test effectively validates the core functionality of
fetch_latest_datawith proper mocking of external dependencies and assertion of all expected fields in the resulting DataFrame.
78-110: Good edge case coverage for missing daily bar data.The test properly validates that the client handles missing daily bar data gracefully by returning an empty DataFrame.
112-156: Well-structured test for date mismatch scenario.The test effectively validates that the client filters out equity bars that don't match the requested date.
applications/datamanager/src/datamanager/alpaca_client.py (1)
19-39: Well-structured initialization with proper client setup.The initialization properly sets up both historical and trading clients with appropriate sandbox/paper mode configuration.
applications/datamanager/tests/test_s3_client.py (1)
41-42: No action needed: Polars and DuckDB already align on Hive-style partitionsPolars’
DataFrame.write_parquet(..., partition_by=[…])useshive_style=Trueby default, emitting directories likeyear=2022/month=1/day=2, which matches DuckDB’sHIVE_PARTITIONING=1expectation. Since your code doesn’t overridehive_style, it will produce the correct key=value layout end-to-end, and the existing test correctly verifies thepartition_bysetting.If you’d like extra clarity or to guard against any future default changes, you can optionally:
- Pass
hive_style=Trueexplicitly in yourwrite_parquetcall.- Add an assertion in
test_s3_client.pyto ensure the parameter is present, for example:assert call_args[1]["partition_by"] == ["year", "month", "day"] +assert call_args[1].get("hive_style", True) is TrueOtherwise, this concern can be considered resolved.
8fdec8f to
98fca58
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (4)
applications/datamanager/src/datamanager/alpaca_client.py (1)
54-57: Log full exception context; avoid bare Exception when possible.Switch to
logger.exceptionto capture the stack trace, and consider narrowing the exception type (HTTP/network/API error) if the SDK exposes them.- except Exception as e: - logger.error(f"Error fetching Alpaca assets: {e}") + except Exception as e: # consider narrowing to the SDK's error types + logger.exception(f"Error fetching Alpaca assets: {e}") raiseapplications/datamanager/src/datamanager/s3_client.py (3)
34-41: Nice: context manager and explicit close implemented.The connection lifecycle improvements address earlier feedback about leaks and make usage with with-statements straightforward.
65-65: Path pattern fix looks good.Switching to s3://…/equity/bars/**/*.parquet resolves the earlier mismatch with the hive-style year/month/day partitioning.
67-84: Simplify and harden the date filter; cast partition columns or build a DATE.With HIVE_PARTITIONING=1, year/month/day arrive as strings. Explicit casts avoid accidental lexical comparisons; using make_date + BETWEEN is simpler and often improves partition pruning.
- query = """ - SELECT * - FROM read_parquet( - ?, - HIVE_PARTITIONING=1 - ) - WHERE - (year > ? OR - (year = ? AND month > ?) OR - (year = ? AND month = ? - AND day >= ?)) - AND - (year < ? OR - (year = ? AND month < ?) OR - (year = ? - AND month = ? - AND day <= ?)) - """ + query = """ + SELECT * + FROM read_parquet(?, HIVE_PARTITIONING=1) + WHERE make_date(CAST(year AS INTEGER), + CAST(month AS INTEGER), + CAST(day AS INTEGER)) + BETWEEN ? AND ? + """ - params = ( - path_pattern, - start_date.year, - start_date.year, - start_date.month, - start_date.year, - start_date.month, - start_date.day, - end_date.year, - end_date.year, - end_date.month, - end_date.year, - end_date.month, - end_date.day, - ) + params = (path_pattern, start_date, end_date)Also applies to: 86-101
🧹 Nitpick comments (14)
applications/datamanager/src/datamanager/alpaca_client.py (8)
1-5: Add missing import for math (used in failure-threshold calc suggestion).You’ll need math for a safer failure-threshold calculation below.
+import math import time from datetime import date from typing import TYPE_CHECKING, cast from zoneinfo import ZoneInfo
46-51: Use enum for asset attributes instead of a raw string.
attributes="has_options"is brittle. Prefer the enum list to align with the SDK’s request model (and future-proof parsing). Also import the enum.-from alpaca.trading.enums import AssetClass, AssetStatus +from alpaca.trading.enums import AssetClass, AssetStatus, AssetAttributes @@ - GetAssetsRequest( - status=AssetStatus.ACTIVE, - asset_class=AssetClass.US_EQUITY, - attributes="has_options", - ) + GetAssetsRequest( + status=AssetStatus.ACTIVE, + asset_class=AssetClass.US_EQUITY, + attributes=[AssetAttributes.OPTIONABLE], + )If your intent was to include all equities (optionable or not), drop the
attributesfilter entirely.
65-66: Define UTC alongside market timezone for consistent normalization.We’ll need UTC for safe timestamp normalization below.
- timezone_filter = ZoneInfo("America/New_York") + timezone_filter = ZoneInfo("America/New_York") + UTC = ZoneInfo("UTC")
81-83: Prefer enum member over string for DataFeed.Use
DataFeed.IEXinstead of constructing via string; it’s clearer and safer across SDK versions.- feed=DataFeed("iex"), + feed=DataFeed.IEX,
111-115: Reduce log noise for expected skips.Skipping non-matching dates is normal; demote to debug to avoid overwhelming logs with large universes.
- logger.info( + logger.debug( f"Skipping equity bar for {snapshot.symbol} on {daily_equity_bar_date}" # noqa: E501 )
149-150: Ensure rate limiting even on failures (sleep in finally).Right now you sleep only on success. If a chunk fails, you immediately try the next chunk—risking rate limits. Move the sleep into a
finally.- time.sleep(self.rate_limit_sleep) - - except Exception as e: + except Exception as e: logger.error( f"Error fetching Alpaca snapshots for chunk {i // chunk_size + 1}: {e}" # noqa: E501 ) failed_chunks += 1 if failed_chunks > maximum_failed_chunks: message = f"Too many chunk failures: {failed_chunks} chunks failed" raise RuntimeError(message) from e continue # continue with next chunk instead of raising + finally: + time.sleep(self.rate_limit_sleep)Also applies to: 151-160
161-165: Optional: stabilize output schema when no records.If nothing is collected,
pl.DataFrame(equity_bars)yields a 0x0 frame. Consider returning an empty DataFrame with a fixed schema so downstream code doesn’t branch on shape.I can provide a tiny helper to build an empty Polars frame with the expected dtypes if you want to standardize this.
68-69: Optional: make chunk size configurable.Consider promoting
chunk_sizeto an instance attribute or constructor arg to tune throughput vs. rate limits without code changes.applications/datamanager/src/datamanager/s3_client.py (6)
24-28: Parameterize DuckDB SET and split multi-statement SQL.Avoid string interpolation in SQL and prefer one statement per execute for clearer errors and safety. Parameterization also makes the prior regex validation less critical.
Apply this diff:
- self.duckdb_connection.execute(f""" - INSTALL httpfs; - LOAD httpfs; - SET s3_region='{region}'; - """) + # Run one statement per execute; parameterize the region setting. + self.duckdb_connection.execute("INSTALL httpfs") + self.duckdb_connection.execute("LOAD httpfs") + self.duckdb_connection.execute("SET s3_region = ?", [region])
30-33: Make close() idempotent and avoid stale handle after closing.Set the connection to None after closing so repeated close() or exit calls are safe and references don’t point to a closed handle.
def close(self) -> None: - if self.duckdb_connection: - self.duckdb_connection.close() + con = getattr(self, "duckdb_connection", None) + if con is not None: + try: + con.close() + finally: + self.duckdb_connection = NoneAlso applies to: 40-41
57-61: Prefer logger.exception to capture stack traces.You already include rich context. logger.exception preserves the traceback automatically; no need to interpolate the exception.
- except Exception as e: - logger.error( - f"Error writing equity bars data to bucket '{self.data_bucket_name}' " # noqa: E501 - f"at path '{self.daily_equity_bars_path}' with {count} rows: {e}" - ) + except Exception: + logger.exception( + f"Error writing equity bars data to bucket '{self.data_bucket_name}' " # noqa: E501 + f"at path '{self.daily_equity_bars_path}' with {count} rows" + ) raise
64-66: Validate date bounds and reuse the path constant.Guard against start_date > end_date early, and reuse the existing path attribute for consistency.
- path_pattern = f"s3://{self.data_bucket_name}/equity/bars/**/*.parquet" + if start_date > end_date: + raise ValueError("start_date must be on or before end_date") + path_pattern = f"{self.daily_equity_bars_path}**/*.parquet"
102-104: Return Polars directly from DuckDB; avoid pandas hop.DuckDB’s Python API can yield a Polars DataFrame directly, reducing dependencies and conversion overhead.
- result = self.duckdb_connection.execute(query, params).fetchdf() - - return pl.from_pandas(result) + return self.duckdb_connection.execute(query, params).pl()
34-41: Optional: add precise type hints for context manager dunder methods.If you decide to remove noqa suppressions, these hints keep mypy/ruff happy and improve readability.
+from typing import Optional, Type +from types import TracebackType @@ - def __enter__(self): # noqa: ANN204 - return self + def __enter__(self) -> "S3Client": + return self @@ - def __exit__(self, exc_type, exc_val, exc_tb): # noqa: ANN001, ANN204 - _ = exc_type, exc_val, exc_tb - - self.close() + def __exit__( + self, + exc_type: Optional[Type[BaseException]], + exc_val: Optional[BaseException], + exc_tb: Optional[TracebackType], + ) -> None: + _ = exc_type, exc_val, exc_tb + self.close()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
Dockerfile.tests(2 hunks)applications/datamanager/pyproject.toml(1 hunks)applications/datamanager/src/datamanager/alpaca_client.py(1 hunks)applications/datamanager/src/datamanager/s3_client.py(1 hunks)applications/datamanager/tests/test_alpaca_client.py(1 hunks)applications/datamanager/tests/test_s3_client.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- applications/datamanager/tests/test_s3_client.py
🚧 Files skipped from review as they are similar to previous changes (3)
- applications/datamanager/tests/test_alpaca_client.py
- Dockerfile.tests
- applications/datamanager/pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Python tests
🔇 Additional comments (3)
applications/datamanager/src/datamanager/alpaca_client.py (2)
35-39: TradingClient initialization looks correct.Passing
paper=is_paperis the intended knob for live vs. paper trading.
29-33: StockHistoricalDataClient sandbox parameter is supportedI’ve confirmed that as of the July 2025 release,
StockHistoricalDataClient’s constructor includes asandbox: bool = Falseparameter (and does not use apaperflag). The existing code usingsandbox=is_paperis therefore correct—no changes are needed.applications/datamanager/src/datamanager/s3_client.py (1)
52-55: Remove theLazyFrame.sink_parquetrefactor—useDataFrame.write_parquetfor partitioned writesThe
LazyFrame.sink_parquetAPI does not support apartition_byparameter as of August 2025; partitioned (“Hive‐style”) Parquet datasets—especially to S3—must be written withDataFrame.write_parquet.File: applications/datamanager/src/datamanager/s3_client.py
Lines: 52–55Suggested update
- ).drop("dt").write_parquet( - file=self.daily_equity_bars_path, - partition_by=["year", "month", "day"], - ) + ).drop("dt").write_parquet( + file=self.daily_equity_bars_path, + partition_by=["year", "month", "day"], + use_pyarrow=True, # ensure pyarrow dataset write + compression="snappy", + storage_options=self.s3_storage_options, # pass AWS creds/options + )Verification
- Confirm that your project’s Polars version (in
requirements.txtorpyproject.toml) is ≥ the release whereDataFrame.write_parquet(..., partition_by=…)on S3 was stabilized.Likely an incorrect or invalid review comment.
98fca58 to
fc5e48f
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
applications/datamanager/src/datamanager/s3_client.py (3)
30-33: Make close() idempotent and drop unused variables in exit.Set the connection to None after closing to avoid double-closing; exit doesn’t need to consume its args.
def close(self) -> None: - if self.duckdb_connection: - self.duckdb_connection.close() + if self.duckdb_connection is not None: + try: + self.duckdb_connection.close() + finally: + self.duckdb_connection = None @@ - def __exit__(self, exc_type, exc_val, exc_tb): # noqa: ANN001, ANN204 - _ = exc_type, exc_val, exc_tb - - self.close() + def __exit__(self, exc_type, exc_val, exc_tb) -> None: # noqa: ANN001 + self.close()Also applies to: 37-41
65-65: Path pattern looks correct for recursive partitions.Good move to a recursive glob with explicit Parquet suffix; matches the partitioned dataset layout and avoids the prior 4-wildcard mismatch.
67-84: Date filtering can return wrong rows; cast partitions and compare as a proper DATE.year/month/day come from Hive partitions and are strings; comparing them to integers is fragile and can be lexicographic in some engines. Build a DATE from the partitions and compare using BETWEEN with date-typed parameters.
Apply these diffs to simplify and harden the query and parameters:
- query = """ - SELECT * - FROM read_parquet( - ?, - HIVE_PARTITIONING=1 - ) - WHERE - (year > ? OR - (year = ? AND month > ?) OR - (year = ? AND month = ? - AND day >= ?)) - AND - (year < ? OR - (year = ? AND month < ?) OR - (year = ? - AND month = ? - AND day <= ?)) - """ + query = """ + SELECT * + FROM read_parquet( + ?, + HIVE_PARTITIONING=1 + ) + WHERE make_date(CAST(year AS INT), CAST(month AS INT), CAST(day AS INT)) + BETWEEN ? AND ? + """ @@ - params = ( - path_pattern, - start_date.year, - start_date.year, - start_date.month, - start_date.year, - start_date.month, - start_date.day, - end_date.year, - end_date.year, - end_date.month, - end_date.year, - end_date.month, - end_date.day, - ) + params = (path_pattern, start_date, end_date)Also applies to: 86-100
🧹 Nitpick comments (7)
applications/datamanager/tests/test_alpaca_client.py (5)
15-16: Assert the rate-limit sleeps to lock in behavior.You already patch time.sleep; asserting the call count makes the tests guard the rate limiting contract (once after assets, once after snapshots).
Apply this diff to add an explicit assertion and drop the noqa:
- patch("time.sleep") as mock_sleep, # noqa: F841 + patch("time.sleep") as mock_sleep, @@ assert row["timestamp"] == expected_timestamp + + # rate-limiting: one sleep after assets call and one after snapshots + assert mock_sleep.call_count == 2Also applies to: 60-76
37-41: Nit: clarify timezone variable name for readability.Avoid shadowing the stdlib symbol name “timezone” and make intent explicit.
- timezone = ZoneInfo("America/New_York") + ny_tz = ZoneInfo("America/New_York") @@ - mock_daily_bar.timestamp = datetime(2024, 1, 15, 16, 0, 0, tzinfo=timezone) + mock_daily_bar.timestamp = datetime(2024, 1, 15, 16, 0, 0, tzinfo=ny_tz) @@ - expected_timestamp = int( - datetime(2024, 1, 15, 16, 0, 0, tzinfo=timezone).timestamp() * 1000 - ) + expected_timestamp = int(datetime(2024, 1, 15, 16, 0, 0, tzinfo=ny_tz).timestamp() * 1000)Also applies to: 72-75
84-85: Replicate the sleep assertion in the “no daily_bar” path.Even when skipping rows, the client still rate-limits. This assertion protects against accidental changes.
- patch("time.sleep") as mock_sleep, # noqa: F841 + patch("time.sleep") as mock_sleep, @@ assert isinstance(result, pl.DataFrame) assert len(result) == 0 + assert mock_sleep.call_count == 2Also applies to: 108-110
118-119: Replicate the sleep assertion in the “wrong date” path.Keeps rate limiting guarantees consistent across branches.
- patch("time.sleep") as mock_sleep, # noqa: F841 + patch("time.sleep") as mock_sleep, @@ assert isinstance(result, pl.DataFrame) assert len(result) == 0 + assert mock_sleep.call_count == 2Also applies to: 155-156
1-157: Add coverage for naive timestamps and missing VWAP.fetch_latest_data treats naive timestamps as UTC and allows vwap=None. A focused test will prevent regressions.
Proposed test to append to this file:
def test_alpaca_client_fetch_latest_data_naive_timestamp_and_none_vwap() -> None: from datetime import date, datetime with ( patch("datamanager.alpaca_client.StockHistoricalDataClient") as mock_hist, patch("datamanager.alpaca_client.TradingClient") as mock_trading, patch("time.sleep"), ): trading = MagicMock() mock_trading.return_value = trading asset = MagicMock() asset.symbol = "NAIVE" trading.get_all_assets.return_value = [asset] hist = MagicMock() mock_hist.return_value = hist # naive timestamp -> interpreted as UTC in implementation mock_daily_bar = MagicMock() mock_daily_bar.timestamp = datetime(2024, 1, 15, 21, 0, 0) # naive; 21:00 UTC mock_daily_bar.open = 1.0 mock_daily_bar.high = 2.0 mock_daily_bar.low = 0.5 mock_daily_bar.close = 1.5 mock_daily_bar.volume = 123 mock_daily_bar.vwap = None # explicit None path snap = MagicMock() snap.symbol = "NAIVE" snap.daily_bar = mock_daily_bar hist.get_stock_snapshot.return_value = {"NAIVE": snap} client = AlpacaClient("k", "s", True) # noqa: FBT003 df = client.fetch_latest_data(date(2024, 1, 15)) assert len(df) == 1 row = df.row(0, named=True) assert row["ticker"] == "NAIVE" assert row["volume_weighted_average_price"] is None # naive 2024-01-15 21:00:00 interpreted as UTC -> epoch ms matches timestamp() assert row["timestamp"] == int(mock_daily_bar.timestamp.timestamp() * 1000)applications/datamanager/src/datamanager/s3_client.py (2)
64-66: Validate input dates early.Defensive check improves error messages and avoids surprising empty results.
def read_equity_bars_data(self, start_date: date, end_date: date) -> pl.DataFrame: - path_pattern = f"s3://{self.data_bucket_name}/equity/bars/**/*.parquet" + if start_date > end_date: + raise ValueError(f"start_date {start_date} must be on or before end_date {end_date}") + path_pattern = f"s3://{self.data_bucket_name}/equity/bars/**/*.parquet"
24-29: Minor: prefer separate executes for clarity and error locality.Splitting the multi-statement SQL helps isolate failures and improves readability when diagnosing connection/setup issues.
- self.duckdb_connection.execute(f""" - INSTALL httpfs; - LOAD httpfs; - SET s3_region='{region}'; - """) + self.duckdb_connection.execute("INSTALL httpfs;") + self.duckdb_connection.execute("LOAD httpfs;") + self.duckdb_connection.execute(f"SET s3_region='{region}';")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
Dockerfile.tests(2 hunks)applications/datamanager/pyproject.toml(1 hunks)applications/datamanager/src/datamanager/alpaca_client.py(1 hunks)applications/datamanager/src/datamanager/s3_client.py(1 hunks)applications/datamanager/tests/test_alpaca_client.py(1 hunks)applications/datamanager/tests/test_s3_client.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- applications/datamanager/tests/test_s3_client.py
- Dockerfile.tests
- applications/datamanager/src/datamanager/alpaca_client.py
- applications/datamanager/pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
applications/datamanager/tests/test_alpaca_client.py (1)
applications/datamanager/src/datamanager/alpaca_client.py (2)
AlpacaClient(21-169)fetch_latest_data(63-169)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Python tests
…hub.com:pocketsizefund/pocketsizefund into 08-20-create_initial_datamanager_helper_classes
There was a problem hiding this comment.
Actionable comments posted: 11
♻️ Duplicate comments (1)
Dockerfile.tests (1)
28-35: Remove redundant seconduv sync; install local packages in editable mode instead, and make the first sync deterministic and fasterRunning
uv sync --all-packages --all-groupsagain after copying sources re-resolves the entire environment and lengthens builds. Use a single, cached, frozen sync for third-party deps, then install the local packages you need in editable mode. This preserves layer caching and avoids full re-resolution. This mirrors the prior feedback on the same lines.-RUN uv sync --all-packages --all-groups +RUN --mount=type=cache,target=/root/.cache/uv uv sync --all-packages --all-groups --frozen @@ -COPY libraries/ ./libraries/ - -RUN uv sync --all-packages --all-groups +COPY libraries/ ./libraries/ +RUN uv pip install -e ./applications/datamanager +# If this is a first-party package consumed by tests, keep the editable install; +# otherwise, drop the line below. +RUN uv pip install -e ./libraries/pythonNotes:
--frozenensures the environment matchesuv.lockexactly, failing fast on drift.- The BuildKit cache mount speeds up repeat builds considerably.
- If you truly need a second
uv sync(e.g., to pick up workspace resolution across multiple local packages), preferuv sync --frozenand scope it to the specific project; otherwise the editable installs above should suffice.
🧹 Nitpick comments (19)
Dockerfile.tests (1)
1-1: Good call aligning tests to Python 3.12.xPinning to 3.12.10 matches the repo-wide 3.12 standard and keeps test runs reproducible. As an optional tweak to shrink image size, consider switching to the slim variant and using
--no-install-recommendsif the toolchain allows.Apply if desired:
-FROM python:3.12.10 +FROM python:3.12.10-slim @@ -RUN apt-get update && apt-get install -y \ +RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ clang \ libc6-dev \ linux-libc-dev \ && rm -rf /var/lib/apt/lists/*infrastructure/Pulumi.yaml (2)
7-10: Good move to config-based tags; consider adding a minimal standard tag set for discoverability.The
pulumi:templatetag is fine. To replace the prior code-based tagging and improve search/ownership in the Pulumi Service, add stable tags (non-reserved keys) likerepo,owner, andservice. These won’t affect resources, only project metadata.Apply if desired:
config: pulumi:tags: value: pulumi:template: aws-python + repo: pocketsizefund/pocketsizefund + service: infrastructure + owner: datamanagerNote: environment-specific tags (e.g.,
environment) are better attached at the stack level rather than project-level.
2-2: Nit: make the description clearer.Minor clarity/readability tweak.
-description: pocketsizefund infra +description: Pocket Size Fund infrastructure.claude/settings.local.json (4)
19-19: ln: allow only safe symlink creation within the repo.Unrestricted ln can create hardlinks/symlinks to system locations. Restrict to symlinks and relative targets in the workspace.
Apply:
- "Bash(ln:*)", + "Bash(ln -s:./*)",Optional: if supported by the matcher, also add a deny to block absolute paths:
- "Bash(ln -s:/*)" in the deny list.
21-21: docker build: restrict to infra contexts to avoid arbitrary code execution.Building arbitrary Dockerfiles can execute RUN steps over the network. Scope it to known-safe paths.
Apply:
- "Bash(docker build:*)", + "Bash(docker build:infrastructure/**)",Optional hardening:
- Prefer pinned builders/targets (e.g., buildx bake) if applicable.
- If supported, add deny entries for parent-directory contexts and absolute paths (e.g., "Bash(docker build:../)", "Bash(docker build:/)").
- Consider requiring an approval mode for docker build even if other commands are auto-approved.
22-22: Add CI step to verifyuv lockconsistencyIt looks like none of your CI definitions (e.g.
.github/workflows/…ortests.yaml) currently runuv lock --frozenbefore executing tests. To ensure that your lockfile remains up-to-date and reproducible, add a pre-test check in your pipeline:• In your CI YAML (e.g.
tests.yaml), insert before the coverage or test step:- name: Verify uv lock consistency run: | set -euo pipefail uv lock --frozen || { echo "Lockfile is out of date"; exit 1; }• If you ever migrate to GitHub Actions, you can similarly add in your workflow
.github/workflows/ci.yml:jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Install dependencies run: pip install uv - name: Verify uv lock consistency run: | set -euo pipefail uv lock --frozen || { echo "Lockfile is out of date"; exit 1; } - name: Run tests run: uv run coverage run --parallel-mode -m pytestThis will cause CI to fail whenever the lockfile drifts from your declared dependencies, ensuring updates are deliberate.
20-20: Prefer allowlisted WebFetch domains over broad WebSearch.WebSearch broadens exfiltration surface and introduces non-deterministic behavior. Either replace it with explicit WebFetch rules for only the domains you trust, or if you must keep WebSearch, lock down its default approval mode to ensure you’re prompted before any search.
• Option A — Replace WebSearch with explicit WebFetch entries (add alongside your existing allow rules):
"permissions": { "allow": [ "WebFetch(domain:docs.github.com)", "WebFetch(domain:docs.astral.sh)", "WebFetch(domain:alpaca.markets)", "WebFetch(domain:docs.aws.amazon.com)" ] }• Option B — Keep WebSearch but require human approval by adjusting
permissions.defaultModein.claude/settings.local.json. Valid modes are:
–default– prompts on first use of each tool (standard behavior) (docs.anthropic.com)
–acceptEdits– automatically accepts file-edit permissions for the session (docs.anthropic.com)
–plan– analysis-only mode; Claude can inspect code but cannot modify files or execute commands (docs.anthropic.com)
–bypassPermissions– skips all permission prompts (use only in isolated/safe environments) (docs.anthropic.com)Command patterns follow these rules (all case-sensitive):
•Bash(cmd)exactly matches the entire shell invocation.
•Bash(prefix:*)matches any command beginning with that prefix (e.g.Bash(docker build:*)matchesdocker build …) but does not span chained operators like&&. (docs.anthropic.com)Example to require a prompt before any WebSearch:
"permissions": { "allow": [ "WebSearch" ], "defaultMode": "default" }.gitignore (1)
18-18: Good call ignoring the private key; also ignore the pub/known_hosts counterpartsSwarm key material often generates a .pub and SSH writes known_hosts. Prevent accidental commits.
Apply this diff:
infrastructure/swarm.pem +infrastructure/swarm.pem.pub +infrastructure/known_hosts +infrastructure/.ssh/infrastructure/prometheus.yml (2)
1-3: LGTM; consider setting a scrape_timeout to bound slow targetsA small timeout improves scheduler behavior when a target hangs.
Apply this diff:
global: scrape_interval: 15s evaluation_interval: 15s + scrape_timeout: 10s
10-16: Optional: future-proof service discoveryStatic targets work in Swarm if service VIPs resolve, but DNS SD avoids hardcoding. If you later expose exporters with DNS names, consider dns_sd_configs or file_sd_configs. No change required now.
infrastructure/__main__.py (3)
114-131: Quote the join token to avoid accidental shell interpretationDefensive quoting of the token avoids edge cases if whitespace/newlines ever sneak in.
Apply this diff:
create_cmd = pulumi.Output.all(mgr_ip.ip_address, worker_token).apply( lambda vals: ( - "bash -lc 'for i in {1..120}; do sudo docker info >/dev/null 2>&1 && break || sleep 3; done && " - f"sudo docker swarm join --token {vals[1]} {vals[0]}:2377'" + "bash -lc 'for i in {1..120}; do sudo docker info >/dev/null 2>&1 && break || sleep 3; done && " + f"sudo docker swarm join --token \"{vals[1]}\" {vals[0]}:2377'" ) )Also applies to: 143-149
11-16: Avoid cross-stack name collisions by suffixing resource names with the stackLightsail KeyPair and Instance names are region-unique. Appending the Pulumi stack name prevents clashes when deploying multiple stacks.
Apply this diff:
-ssh_key = tls.PrivateKey("swarm-key", algorithm="RSA", rsa_bits=4096) -ls_key = aws.lightsail.KeyPair( - "swarm-ls-key", - name="swarm-ls-key", +ssh_key = tls.PrivateKey("swarm-key", algorithm="RSA", rsa_bits=4096) +stack_name = pulumi.get_stack() +ls_key = aws.lightsail.KeyPair( + f"swarm-ls-key-{stack_name}", + name=f"swarm-ls-key-{stack_name}", public_key=ssh_key.public_key_openssh, ) @@ - inst = aws.lightsail.Instance( - name, - name=name, + inst = aws.lightsail.Instance( + f"{name}-{stack_name}", + name=f"{name}-{stack_name}", @@ - ip = aws.lightsail.StaticIp(f"{name}-ip", name=f"{name}-ip") + ip = aws.lightsail.StaticIp(f"{name}-ip-{stack_name}", name=f"{name}-ip-{stack_name}") @@ -mgr_inst, mgr_ip = mk_instance("swarm-mgr-1", bundle_mgr) -w1_inst, w1_ip = mk_instance("swarm-wkr-1", bundle_wkr) -w2_inst, w2_ip = mk_instance("swarm-wkr-2", bundle_wkr) +mgr_inst, mgr_ip = mk_instance("swarm-mgr-1", bundle_mgr) +w1_inst, w1_ip = mk_instance("swarm-wkr-1", bundle_wkr) +w2_inst, w2_ip = mk_instance("swarm-wkr-2", bundle_wkr)Note: Instance logical names passed to mk_instance remain stable; the AWS-side names gain the stack suffix.
Also applies to: 52-61, 91-96, 101-103
1-5: Optional: add typing for return values and roleImproves readability and IDE support; small ergonomics boost.
Apply this diff:
import pulumi import pulumi_aws as aws import pulumi_tls as tls from pulumi_command import remote +from typing import NamedTuple, Literal @@ -def mk_instance(name: str, bundle_id: str): # noqa: ANN201 +class InstanceResources(NamedTuple): + inst: aws.lightsail.Instance + ip: aws.lightsail.StaticIp + attach: aws.lightsail.StaticIpAttachment + +def mk_instance(name: str, bundle_id: str) -> InstanceResources: # noqa: ANN201.mise.toml (1)
48-48: Yamllint invocation looks good; consider centralizing config.Inline config is fine, but a repo-level .yamllint keeps local IDE/CI consistent and easier to tweak without touching tasks. Not a blocker.
infrastructure/main.nu (1)
46-49: Minor hardening suggestion: prefer ED25519 and avoid accepting new host keys implicitly.You already remove old keys and repopulate via ssh-keyscan. Consider pinning the key type and keeping StrictHostKeyChecking default/“yes” to avoid TOFU surprises. Not blocking given current workflow.
Example tweak:
ssh-keyscan -t ed25519 -H $manager_ip | save --append $"($env.HOME)/.ssh/known_hosts"And in the SSH block:
StrictHostKeyChecking yesinfrastructure/stack.yml (4)
45-52: Enable ACME/TLS and persist certificates; you already declared a traefik_letsencrypt volume.Right now Grafana is HTTP-only and Traefik has no cert resolver. Recommend wiring Let’s Encrypt and switching Grafana to websecure.
Apply this diff in traefik service:
- --metrics.prometheus.addServicesLabels=true + - --certificatesresolvers.letsencrypt.acme.email=ops@pocketsizefund.com + - --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json + - --certificatesresolvers.letsencrypt.acme.httpchallenge=true + - --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web @@ volumes: - - /var/run/docker.sock:/var/run/docker.sock:ro + - /var/run/docker.sock:/var/run/docker.sock:ro + - traefik_letsencrypt:/letsencryptAnd in grafana labels (Lines 134-137), prefer TLS:
- - traefik.http.routers.grafana.entrypoints=web + - traefik.http.routers.grafana.entrypoints=websecure + - traefik.http.routers.grafana.tls=true + - traefik.http.routers.grafana.tls.certresolver=letsencrypt
55-56: Avoid floating tags for Portainer.
latestcan introduce breaking changes unexpectedly. Pin to a known-good version used in your environment.Example:
- image: portainer/portainer-ce:latest + image: portainer/portainer-ce:2.20.3If you prefer, create a version variable and inject via CI.
135-137: Grafana router host is a placeholder; make it configurable.Hardcoding
grafana.example.commay rot. Consider an env or compose variable, e.g.,${GRAFANA_HOST}from a.env.Apply this diff:
- - traefik.http.routers.grafana.rule=Host(`grafana.example.com`) + - traefik.http.routers.grafana.rule=Host(`${GRAFANA_HOST}`)
1-138: Add missing newline at end of file.Yamllint flagged this. Add a trailing newline to keep tooling happy.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
.flox/env/manifest.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
.claude/settings.local.json(1 hunks).flox/env/manifest.toml(1 hunks).gitignore(1 hunks).mise.toml(1 hunks)Dockerfile.tests(2 hunks)infrastructure/.claude/settings.local.json(1 hunks)infrastructure/.gitignore(1 hunks)infrastructure/.python-version(1 hunks)infrastructure/Pulumi.yaml(1 hunks)infrastructure/__main__.py(1 hunks)infrastructure/api.py(0 hunks)infrastructure/cluster.py(0 hunks)infrastructure/grafana-dashboard.json(0 hunks)infrastructure/ingress.py(0 hunks)infrastructure/keys.py(0 hunks)infrastructure/main.nu(1 hunks)infrastructure/monitors.py(0 hunks)infrastructure/prometheus.yml(1 hunks)infrastructure/pyproject.toml(1 hunks)infrastructure/requirements.txt(1 hunks)infrastructure/services.py(0 hunks)infrastructure/stack.yml(1 hunks)infrastructure/tags.py(0 hunks)infrastructure/upload_grafana_dashboard.nu(0 hunks)infrastructure/vpc.py(0 hunks)
💤 Files with no reviewable changes (10)
- infrastructure/tags.py
- infrastructure/keys.py
- infrastructure/ingress.py
- infrastructure/vpc.py
- infrastructure/grafana-dashboard.json
- infrastructure/upload_grafana_dashboard.nu
- infrastructure/services.py
- infrastructure/cluster.py
- infrastructure/monitors.py
- infrastructure/api.py
✅ Files skipped from review due to trivial changes (3)
- infrastructure/.gitignore
- infrastructure/.python-version
- infrastructure/.claude/settings.local.json
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-15T13:54:49.230Z
Learnt from: forstmeier
PR: pocketsizefund/pocketsizefund#613
File: infrastructure/Pulumi.yaml:1-10
Timestamp: 2025-08-15T13:54:49.230Z
Learning: The user forstmeier prefers Pulumi stack naming convention as `pocketsizefund/pocketsizefund/production` to maintain organization/repository/environment mapping pattern for the pocketsizefund project.
Applied to files:
infrastructure/Pulumi.yaml
📚 Learning: 2025-08-15T13:54:45.910Z
Learnt from: forstmeier
PR: pocketsizefund/pocketsizefund#613
File: infrastructure/main.nu:10-13
Timestamp: 2025-08-15T13:54:45.910Z
Learning: The team typically runs Nu shell scripts in infrastructure/main.nu through Mise commands, which provides a controlled execution environment where relative paths like "stack.yml" work reliably.
Applied to files:
infrastructure/main.nu.mise.toml
🪛 YAMLlint (1.37.1)
infrastructure/stack.yml
[error] 138-138: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Python tests
🔇 Additional comments (4)
.flox/env/manifest.toml (1)
14-14: ✅ AWS CLI v2 migration verifiedThe automated scan completed without detecting any legacy v1-only commands:
- No usages of
aws ecr get-login(v1) were found.- No commands with
--payloadpatterns that typically require--cli-binary-format raw-in-base64-outwere detected.All existing scripts appear compatible with
awscli2. Feel free to proceed with the migration.infrastructure/Pulumi.yaml (1)
3-6: Confirmuvtoolchain availability in all environmentsWe’ve verified the following:
- All expected
pyproject.tomlfiles are present at the repo root and within each sub-package (applications/datamanager,applications/models,applications/portfoliomanager,infrastructure,libraries/python) souv synchas the manifests it needs.- The tests’ Docker image installs
uvviaCOPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uvand runsuv sync --all-packages --all-groupsinDockerfile.tests, ensuring containerized CI builds succeed.Next steps to ensure smooth operation outside that Docker context:
- Install
uvin local and non-Docker CI runners
Add an installation step to your setup scripts or CI workflows, for example:# Install uv globally via pipx pipx install uv- Verify the lockfile is committed
Ensureuv.lockexists at the repository root for reproducible installs:fd -t f uv.lock || echo "Missing uv.lock – lockfile for reproducible installs"- (Optional) Fallback to pip if needed
If any environment can’t useuv, you can temporarily switch to the defaultpiptoolchain:runtime: name: python options: - toolchain: uv + toolchain: pipFinally, consider adding a short note to your README or
docs/summarizing how to install and useuvfor local development and CI setups.infrastructure/main.nu (1)
10-13: Invalid flag on docker stack deploy.
docker stack deploydoesn’t support--detach. This will error out and the stack won’t deploy.Apply this diff:
- docker stack deploy -c stack.yml infrastructure --detach=false + docker stack deploy -c stack.yml infrastructureLikely an incorrect or invalid review comment.
infrastructure/stack.yml (1)
25-35: Traefik provider flags are wrong for v3; Docker provider must be used with swarmMode=true.
providers.swarm.*no longer applies. With v3, use the Docker provider and enable swarm mode; otherwise Traefik won’t pick up any services.Apply this diff to the command block:
- - --providers.swarm.endpoint=unix:///var/run/docker.sock - - --providers.swarm.exposedByDefault=false + - --providers.docker.endpoint=unix:///var/run/docker.sock + - --providers.docker.swarmMode=true + - --providers.docker.exposedByDefault=falseLikely an incorrect or invalid review comment.
…hub.com:pocketsizefund/pocketsizefund into 08-20-create_initial_datamanager_helper_classes
|
I have no idea why stuff that was deleted in |

Overview
Changes
S3Clienthelper classAlpacaClienthelper classComments
Probably gonna change a bit but this was basically a slimmed down port of the logic we had in the earlier
datamanageriteration.Summary by CodeRabbit
New Features
Tests
Chores