Skip to content

bunch a stuff#593

Merged
chrisaddy merged 9 commits intomasterfrom
fix/linting
Jun 6, 2025
Merged

bunch a stuff#593
chrisaddy merged 9 commits intomasterfrom
fix/linting

Conversation

@chrisaddy
Copy link
Copy Markdown
Collaborator

@chrisaddy chrisaddy commented Jun 6, 2025

This pull request introduces a variety of changes across multiple areas, including infrastructure updates, Dockerfile adjustments, code refactoring, and workflow enhancements. The most significant updates focus on improving cloud infrastructure management, updating dependencies, and enhancing functionality for data fetching and processing.

Infrastructure Updates:

  • Added a new IAM role platform-service-account-owner to grant owner permissions to the platform service account in infrastructure/project.py.
  • Introduced HMAC key generation for DUCKDB_ACCESS_KEY and DUCKDB_SECRET in infrastructure/environment_variables.py, replacing the previous use of secrets from configuration.
  • Exported DATAMANAGER_BASE_URL and HMAC key details (duckdb_access_key, duckdb_secret) to Pulumi outputs for easier access. [1] [2]

Dockerfile Adjustments:

  • Updated Python base image to 3.12.10 in application/datamanager/Dockerfile and application/positionmanager/Dockerfile for consistency and compatibility. [1] [2]
  • Adjusted the ENTRYPOINT in application/datamanager/Dockerfile to simplify the module path for uvicorn.

Code Refactoring:

  • Replaced requests.codes.not_found and requests.codes.ok with dictionary-style access (requests.codes["not_found"], requests.codes["ok"]) in application/positionmanager/src/positionmanager/clients.py for improved readability.
  • Removed the deprecated backfill_datamanager.py workflow, replacing it with a new fetch_data.py workflow that uses httpx, polars, and Google Cloud Run APIs for fetching and processing data. [1] [2]

Workflow Enhancements:

  • Added a new fetch_dates task in workflows/fetch_data.py for fetching data from the datamanager service using Google Cloud Run and Polars for data processing.
  • Updated dependencies in workflows/pyproject.toml to include google-auth, google-cloud-run, polars, and pyarrow for the new workflow.

Miscellaneous:

  • Updated the version in infrastructure/pyproject.toml to 20250606.4 and in workflows/pyproject.toml to 0.1.0 for version tracking. [1] [2]
  • Commented out the tool.ruff.lint section in pyproject.toml to disable specific linting rules temporarily.

Summary by CodeRabbit

  • New Features

    • Introduced a new workflow task to fetch equity bar data directly from the datamanager service, supporting authenticated requests and returning results as Polars data structures.
  • Improvements

    • Enhanced infrastructure health checks with more robust assertions and clearer output.
    • Updated environment variable handling for DuckDB credentials to use dynamically generated Google Cloud Storage HMAC keys.
    • Exported key environment variables and service URLs as stack outputs for easier access.
    • Elevated permissions for the platform service account and production data bucket for improved management capabilities.
    • Improved Dockerfile configurations for reliability and compatibility, including precise Python versioning and environment variable setup.
    • Added owner-level permissions to the platform service account.
    • Updated infrastructure task to run an additional health check command.
  • Dependency Updates

    • Added new dependencies for Google Cloud integration, data processing, and workflow management.
  • Bug Fixes

    • Corrected service health check logic and improved validation steps.
  • Chores

    • Removed the backfill workflow for equity bar data.
    • Updated project version numbers and streamlined package management.
    • Updated environment package configurations and added activation hooks.

Copilot AI review requested due to automatic review settings June 6, 2025 20:01
@chrisaddy chrisaddy self-assigned this Jun 6, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces a new data-fetching workflow using Google Cloud Run and Polars, updates infrastructure scripts and Dockerfiles, modifies environment variable management, and adjusts IAM roles and permissions. Several dependency and configuration files are updated, and a previous Flyte workflow for backfilling data is removed.

Changes

Files/Group Change Summary
.flox/env/manifest.toml Removed fd package, added google-cloud-sdk, and introduced an empty [hook] section.
.mise.toml Added nu ping.nu command to the infrastructure:up task's run script.
application/datamanager/Dockerfile Set PYTHONPATH and simplified ENTRYPOINT import path.
application/positionmanager/Dockerfile Updated base image from python:3.12 to python:3.12.10.
infrastructure/main.py Exported DATAMANAGER_BASE_URL as a Pulumi stack output.
infrastructure/buckets.py Renamed IAM member and updated role from object creator to object admin.
infrastructure/environment_variables.py Replaced config secrets with a GCS HMAC key for DuckDB credentials; exported as stack outputs.
infrastructure/ping.nu Enhanced health checks, added assertions, updated service key names, changed POST date, and improved output.
infrastructure/project.py Added IAMMember resource granting owner role to the platform service account.
infrastructure/pyproject.toml Updated project version string.
workflows/backfill_datamanager.py Deleted Flyte workflow and task for backfilling equity bar data.
workflows/fetch_data.py Added new module with a Flyte task to fetch equity bar data using Cloud Run and Polars.
workflows/pyproject.toml Added dependencies: google-auth, google-cloud-run, polars, pyarrow, and unionai.

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as Flyte Workflow
    participant GCloud as gcloud CLI
    participant CloudRun as Google Cloud Run
    participant Datamanager as Datamanager Service
    participant Polars as Polars/pyarrow

    Workflow->>CloudRun: List services in project/region
    CloudRun-->>Workflow: Return service URLs
    Workflow->>GCloud: Get identity token for Datamanager
    GCloud-->>Workflow: Return identity token
    Workflow->>Datamanager: HTTP GET /equity-bars (with token, date range)
    Datamanager-->>Workflow: Return Arrow IPC stream
    Workflow->>Polars: Read Arrow data
    Polars-->>Workflow: Return DataFrame/Series
Loading

Possibly related PRs

  • bunch a stuff #593: Makes identical changes to .flox/env/manifest.toml by removing the "fd" package, adding "google-cloud-sdk," and introducing a new [hook] section.
  • Rebuild full application #485: Related infrastructure and environment changes complement the comprehensive infrastructure rebuild and cleanup in that PR.

Suggested labels

application, python

Suggested reviewers

  • forstmeier

Poem

In code we hop, from cloud to sky,
Fetching data as days go by.
Ducks and buckets, keys anew,
Polars and arrows, slicing through.
The workflow’s fresh, the backfill’s gone—
A rabbit’s work is never done!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c4de5a and 3abf921.

📒 Files selected for processing (3)
  • infrastructure/__main__.py (2 hunks)
  • infrastructure/environment_variables.py (2 hunks)
  • workflows/fetch_data.py (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces a variety of improvements across infrastructure, Docker configurations, and data management workflows. Key changes include new IAM role additions with HMAC key generation for DUCKDB access, Dockerfile and dependency updates for consistency and enhanced functionality, and code refactoring to remove deprecated workflows and improve error handling.

  • Updated cloud infrastructure management and Pulumi outputs
  • Enhanced data fetching and processing via a new workflow using httpx, polars, and Cloud Run APIs
  • Improved Dockerfile configurations and dependency updates across multiple services

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
workflows/pyproject.toml Added new dependencies for improved workflow functionality
workflows/fetch_data.py Introduced a new data fetching workflow using httpx and polars
workflows/backfill_datamanager.py Removed the deprecated backfill workflow
pyproject.toml Commented out specific linting rules temporarily
infrastructure/pyproject.toml Updated project version
infrastructure/project.py Added a new IAM member for platform service account owner
infrastructure/ping.nu Updated health check script and assertions
infrastructure/environment_variables.py Switched DUCKDB keys to use generated HMAC keys and export outputs
infrastructure/buckets.py Adjusted bucket IAM member role and identifier
infrastructure/main.py Exported DATAMANAGER_BASE_URL for easier access
application/positionmanager/clients.py Refactored status code handling with dictionary-style access
application/positionmanager/Dockerfile Updated Python base image version
application/datamanager/Dockerfile Set PYTHONPATH and simplified the uvicorn module path for ENTRYPOINT
.mise.toml Updated run command to invoke the new ping test
.flox/env/manifest.toml Added google-cloud-sdk package to the manifest
Comments suppressed due to low confidence (1)

infrastructure/ping.nu:41

  • [nitpick] The variable 'datamanager_fetch' is used to hold the HTTP status code from the GET request. Renaming it (e.g., to 'fetch_status') would improve clarity regarding its purpose.
assert equal $datamanager_fetch 200

Comment thread infrastructure/buckets.py Outdated
@graphite-app graphite-app Bot requested a review from forstmeier June 6, 2025 20:03
@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented Jun 6, 2025

Graphite Automations

"Assign author to pull request" took an action on this PR • (06/06/25)

1 assignee was added to this PR based on John Forstmeier's automation.

Copy link
Copy Markdown
Collaborator

@forstmeier forstmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot caught a typo but otherwise that's good.

chrisaddy and others added 2 commits June 6, 2025 16:07
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
.flox/env/manifest.toml (1)

16-19: Empty on-activate hook is a no-op.
If activation commands aren’t needed yet, consider removing this block or documenting its future purpose to avoid confusion.

infrastructure/ping.nu (1)

24-24: Updated test date.

The date change from "2025-01-04" to "2025-01-07" appears to be for testing purposes. Ensure this date is valid for your test data.

workflows/fetch_data.py (2)

4-4: Consider the import style suggestion from static analysis.

The static analysis tool suggests using from pyarrow import ipc instead of import pyarrow.ipc as ipc for consistency.

-import pyarrow.ipc as ipc
+from pyarrow import ipc
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 4-4: Use 'from pyarrow import ipc' instead

(R0402)


22-52: Consider adding comprehensive error handling and logging.

The function would benefit from more robust error handling to provide better debugging information when issues occur.

Consider wrapping the entire function logic in try-except blocks and adding logging for better observability:

import logging

@task
def fetch_dates(start_date: str, end_date: str) -> pl.DataFrame:
    try:
        logging.info(f"Fetching data for date range: {start_date} to {end_date}")
        # ... existing logic with error handling improvements
        logging.info(f"Successfully fetched {len(data)} records")
        return data
    except Exception as e:
        logging.error(f"Failed to fetch data: {str(e)}")
        raise
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6908d3c and 074a73b.

⛔ Files ignored due to path filters (2)
  • .flox/env/manifest.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .flox/env/manifest.toml (1 hunks)
  • .mise.toml (1 hunks)
  • application/datamanager/Dockerfile (2 hunks)
  • application/positionmanager/Dockerfile (2 hunks)
  • application/positionmanager/src/positionmanager/clients.py (1 hunks)
  • infrastructure/__main__.py (2 hunks)
  • infrastructure/buckets.py (1 hunks)
  • infrastructure/environment_variables.py (2 hunks)
  • infrastructure/ping.nu (1 hunks)
  • infrastructure/project.py (2 hunks)
  • infrastructure/pyproject.toml (1 hunks)
  • pyproject.toml (1 hunks)
  • workflows/backfill_datamanager.py (0 hunks)
  • workflows/fetch_data.py (1 hunks)
  • workflows/pyproject.toml (1 hunks)
💤 Files with no reviewable changes (1)
  • workflows/backfill_datamanager.py
🧰 Additional context used
🪛 Pylint (3.3.7)
workflows/fetch_data.py

[refactor] 4-4: Use 'from pyarrow import ipc' instead

(R0402)


[error] 7-7: No name 'run_v2' in module 'google.cloud'

(E0611)

🔇 Additional comments (20)
application/positionmanager/Dockerfile (2)

1-1: Pin to a specific Python patch version.
Upgrading from the generic python:3.12 to python:3.12.10 ensures consistent builds across environments.


17-17: The ENTRYPOINT remains functionally unchanged; no review comments needed.

application/positionmanager/src/positionmanager/clients.py (2)

92-94: Conditional 404 handling is clear.
Switching to requests.codes["not_found"] maintains behavior and improves consistency with other HTTP status checks.


95-98: Non-OK response error handling is appropriate.
Raising a custom HTTPError with the response text and status code is consistent and informative.

.flox/env/manifest.toml (1)

13-13: Add Google Cloud SDK to environment.
Including google-cloud-sdk aligns with the new GCP workflows and CLI requirements.

infrastructure/pyproject.toml (1)

3-3: Bump infrastructure version to 20250606.4.
This aligns the project version with the latest Pulumi exports and IAM changes.

infrastructure/__main__.py (2)

4-4: Import export from Pulumi.
Bringing in export is necessary to expose stack outputs; this import correctly enables the subsequent call.


74-74: Verify stack output duplication.
Confirm whether create_environment_variable("DATAMANAGER_BASE_URL",…) already exports this value. A second export("DATAMANAGER_BASE_URL",…) may produce duplicate outputs or conflicts.

workflows/pyproject.toml (1)

8-14:

Details

❓ Verification inconclusive

Verify dependency versions are current and secure.

The new dependencies support the Google Cloud Run integration and data processing workflow. The additions look appropriate for the stated functionality.

Please verify these dependency versions are current and free from known vulnerabilities:


🌐 Web query:

Are the following Python package versions current and secure: google-auth 2.40.2, google-cloud-run 0.10.18, polars 1.30.0, pyarrow 20.0.0, unionai 0.1.183?

💡 Result:

Based on the latest available information, the following Python package versions are not the most current:

  • google-auth 2.40.2: The latest version is 2.38.0, released on January 23, 2025. (pypi.org)

  • google-cloud-run 0.10.18: The latest version is 0.10.10. (pypistats.org)

  • pyarrow 20.0.0: The latest version is 16.1.0. (cloud.google.com)

Regarding polars 1.30.0 and unionai 0.1.183, I couldn't find specific information on their latest versions.

Using outdated versions may expose your project to known vulnerabilities and compatibility issues. It's recommended to update these packages to their latest versions to ensure security and functionality.

Citations:


Verify and update pinned dependency versions.

The added dependencies align with the new workflow, but please confirm each minimum version matches the latest stable release on PyPI to avoid security or compatibility issues:

  • google-auth (>=2.40.2)
  • google-cloud-run (>=0.10.18)
  • polars (>=1.30.0)
  • pyarrow (>=20.0.0)
  • unionai (>=0.1.183)

If any of these are outdated, bump to the newest compatible version.

infrastructure/buckets.py (1)

18-18:

Details

✅ Verification successful

Verify the permission elevation is intentional and necessary.

The role was elevated from roles/storage.objectCreator (create-only) to roles/storage.objectAdmin (full object management including delete, update, and administration). This significantly expands permissions.

Please confirm this permission elevation is necessary for the new functionality and follows the principle of least privilege.


🏁 Script executed:

#!/bin/bash
# Description: Check if the elevated permissions are used in the codebase
# Expected: Find references to delete/update operations that require objectAdmin role

echo "Searching for Google Cloud Storage operations that require objectAdmin permissions..."
rg -i "delete|remove|update.*bucket|bucket.*delete" --type py
rg "storage\..*\.(delete|remove|update)" --type py

Length of output: 2176


Confirmed: roles/storage.objectAdmin required for object deletion

  • In application/datamanager/src/datamanager/main.py, the code calls bucket.delete_blobs(blobs), which requires delete permissions not granted by roles/storage.objectCreator.
  • No other bucket delete/update operations were found that would contradict this need.

Elevation to roles/storage.objectAdmin is intentional and necessary for the new delete functionality.

application/datamanager/Dockerfile (2)

5-5: PYTHONPATH configuration looks correct.

Setting PYTHONPATH=/app/src properly configures the Python module search path for the container environment.


17-17:

Details

✅ Verification successful

Verify the simplified module path is correct.

The ENTRYPOINT changed from datamanager.src.datamanager.main:application to datamanager.main:application. Ensure this simplified path correctly references the application module after the recent refactoring.


🏁 Script executed:

#!/bin/bash
# Description: Verify the datamanager module structure supports the new import path
# Expected: Find main.py with application object in the correct location

echo "Checking datamanager module structure..."
fd -t f "main.py" application/datamanager/
echo ""
echo "Looking for application object in main.py files..."
rg -A 5 "application.*=" application/datamanager/ || echo "No application object found"

Length of output: 868


ENTRYPOINT module path is valid

The main.py file under application/datamanager/src/datamanager/ defines the application instance, so using datamanager.main:application with --app-dir src correctly locates and loads your FastAPI app. No changes needed here.

infrastructure/environment_variables.py (2)

1-5: Import additions look correct.

The new imports for HMAC key management and project dependencies are appropriate for the updated functionality.


36-48: HMAC key generation implementation is sound.

The approach of generating HMAC keys instead of using static secrets is a good security practice. The implementation correctly creates the key using the platform service account and exports the values as stack outputs for external access.

Note that this depends on the platform service account having sufficient permissions, which is addressed by the owner role assignment in infrastructure/project.py.

infrastructure/ping.nu (5)

1-1: Good addition of assertion module.

Importing the std/assert module enables proper validation of the script's operations.


9-9: Improved service name extraction.

The change from service to name key provides better clarity in the service object structure.


14-18: Enhanced health check loop with better feedback.

The updated loop provides clearer output by explicitly accessing service properties and confirming successful health checks.


29-29: Good addition of data validation.

The assertion ensures the datamanager returns adequate data (≥100 elements), which is important for validating service functionality.


31-41:

Details

✅ Verification successful

Verify manual URL construction logic.

The manual URL construction for the GET request with explicit scheme, host, path, and params is more verbose but explicit. Ensure the URL joining and parameter handling work correctly with nushell's url join command.


🏁 Script executed:

#!/bin/bash
# Description: Check if nushell url join command works as expected with the structure used
# Expected: Confirm the URL construction pattern is valid

echo "Checking nushell documentation for url join usage..."

Length of output: 114


🌐 Web query:

How does nushell's url join command work with scheme, host, path, and params?

💡 Result:

How Nushell's url join Command Works

Nushell's url join command constructs a URL string from a structured record by combining fields such as scheme, host, path, and query parameters (params or query). This allows for programmatic and composable URL building directly from Nushell's pipeline.

Core Fields Used:

  • scheme: The URL scheme (e.g., "http", "https").
  • username and password: For user authentication (optional).
  • host: The domain or IP address.
  • port: The port number (optional).
  • path: The path after the host (can be a string; leading slash is optional).
  • query or params: The query string. You can provide this either as a URL-encoded string via query, or a record/list/table via params.
  • fragment: The "hash" part after # (optional).

Example Usages

Below are common patterns and behaviors for url join:

Minimal Example

{
  scheme: "http",
  host: "www.example.com"
} | url join
# Output: "http://www.example.com"

Adding Path and Params

{
  scheme: "http",
  host: "www.example.com",
  path: "/search",
  params: { q: "nushell", sort: "asc" }
} | url join
# Output: "http://www.example.com/search?q=nushell&sort=asc"
  • The params field, if set as a record, creates the query string with proper percent-encoding[1][5].

Combining Username, Password, and Port

{
  scheme: "http",
  username: "user",
  password: "pwd",
  host: "example.com",
  port: "8080"
} | url join
# Output: "http://user:pwd@example.com:8080"
  • Username and password are included before the host, and port is appended to the host[1].

Path and Fragment

{
  scheme: "http",
  host: "www.example.com",
  path: "page",
  fragment: "section1"
} | url join
# Output: "http://www.example.com/page#section1"
  • The fragment appears at the end, separated by #[1].

Handling Query vs Params

Both params and query are accepted for the query string. Their usage:

  • query: A pre-encoded string (e.g., "a=1&b=2").
  • params: A record, list, or even a table that will be encoded by the command. For lists or tables, multiple parameters with the same key are supported:
{
  scheme: "http",
  host: "example.com",
  params: { a: ["one", "two"], b: "three" }
} | url join
# Output: "http://example.com?a=one&a=two&b=three"

or with a table of key-value pairs:

{
  scheme: "http",
  host: "localhost",
  params: [
    ["key", "value"],
    ["par_1", "aaa"],
    ["par_2", "bbb"],
    ["par_1", "ccc"],
  ]
} | url join
# Output: "http://localhost?par_1=aaa&par_2=bbb&par_1=ccc"

[1][5]

Summary Table

Field Function
scheme URL scheme (e.g., http, https)
username Username (optional, before '@')
password Password (optional, before '@')
host Domain or IP
port Port number (optional)
path Path after host (optional)
query Prebuilt query string (optional)
params Record/list/table for query params
fragment Fragment after '#' (optional)

Key Points

  • url join expects a record where each field corresponds to a logical URL component.
  • Both params (structured record, list, table) and query (raw string) can be used for the query string; params is generally preferred for programmatic construction[1][5].
  • If both params and query are present, params is encoded and used.
  • Supports all parts of standard URLs including authentication, port, and fragments.

For more details and examples, see the official documentation[1][5][2].

Citations:


URL construction with url join is correct

Nushell’s url join command will correctly assemble the URL from the structured record—combining scheme, host, path, and the params record into a properly encoded query string. The code in infrastructure/ping.nu (lines 31–41) is valid as written; no further changes are needed.

workflows/fetch_data.py (1)

11-19: LGTM! Good error handling in the subprocess call.

The function properly uses check=True to raise exceptions on command failure and strips the output to remove whitespace.

Comment thread pyproject.toml Outdated
Comment thread infrastructure/buckets.py Outdated
Comment thread .mise.toml
Comment thread infrastructure/project.py
Comment thread workflows/fetch_data.py
Comment thread workflows/fetch_data.py
Comment thread workflows/fetch_data.py Outdated
Comment thread workflows/fetch_data.py
Comment thread workflows/fetch_data.py Outdated
chrisaddy and others added 5 commits June 6, 2025 16:10
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
workflows/fetch_data.py (1)

1-9: ⚠️ Potential issue

Add missing os import to fix runtime errors.

The code uses os.getenv() on lines 25-26 but the os module is not imported, which will cause a NameError at runtime.

Apply this diff to fix the import:

 import subprocess
+import os
 import httpx
 import polars as pl

Additionally, consider addressing the import sorting and style issues flagged by static analysis tools for better code consistency.

🧰 Tools
🪛 GitHub Check: Run Python quality checks

[failure] 4-4: Ruff (PLR0402)
workflows/fetch_data.py:4:8: PLR0402 Use from pyarrow import ipc in lieu of alias


[failure] 1-8: Ruff (I001)
workflows/fetch_data.py:1:1: I001 Import block is un-sorted or un-formatted

🪛 Pylint (3.3.7)

[refactor] 4-4: Use 'from pyarrow import ipc' instead

(R0402)


[error] 7-7: No name 'run_v2' in module 'google.cloud'

(E0611)

🧹 Nitpick comments (1)
workflows/fetch_data.py (1)

11-19: Security flags are acceptable for this use case, but consider improvements.

Static analysis tools flag the subprocess usage as a security concern, but calling gcloud auth print-identity-token is the intended and secure way to obtain identity tokens for Google Cloud services in this context.

Consider this optional improvement for better subprocess handling:

 def get_identity_token() -> str:
     result = subprocess.run(
         ["gcloud", "auth", "print-identity-token"],
-        stdout=subprocess.PIPE,
-        stderr=subprocess.PIPE,
+        capture_output=True,
         check=True,
         text=True,
     )
     return result.stdout.strip()
🧰 Tools
🪛 GitHub Check: Run Python quality checks

[failure] 13-13: Ruff (S607)
workflows/fetch_data.py:13:9: S607 Starting a process with a partial executable path


[failure] 12-12: Ruff (S603)
workflows/fetch_data.py:12:14: S603 subprocess call: check for execution of untrusted input


[failure] 12-18: Ruff (UP022)
workflows/fetch_data.py:12:14: UP022 Prefer capture_output over sending stdout and stderr to PIPE

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 074a73b and 5c4de5a.

📒 Files selected for processing (2)
  • infrastructure/buckets.py (1 hunks)
  • workflows/fetch_data.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • infrastructure/buckets.py
🧰 Additional context used
🪛 Ruff (0.11.9)
workflows/fetch_data.py

25-25: Undefined name os

(F821)


26-26: Undefined name os

(F821)

🪛 GitHub Check: Run Python quality checks
workflows/fetch_data.py

[failure] 39-41: Ruff (TRY003)
workflows/fetch_data.py:39:15: TRY003 Avoid specifying long messages outside the exception class


[failure] 26-26: Ruff (F821)
workflows/fetch_data.py:26:14: F821 Undefined name os


[failure] 25-25: Ruff (F821)
workflows/fetch_data.py:25:15: F821 Undefined name os


[failure] 13-13: Ruff (S607)
workflows/fetch_data.py:13:9: S607 Starting a process with a partial executable path


[failure] 12-12: Ruff (S603)
workflows/fetch_data.py:12:14: S603 subprocess call: check for execution of untrusted input


[failure] 12-18: Ruff (UP022)
workflows/fetch_data.py:12:14: UP022 Prefer capture_output over sending stdout and stderr to PIPE


[failure] 4-4: Ruff (PLR0402)
workflows/fetch_data.py:4:8: PLR0402 Use from pyarrow import ipc in lieu of alias


[failure] 1-8: Ruff (I001)
workflows/fetch_data.py:1:1: I001 Import block is un-sorted or un-formatted

🪛 Pylint (3.3.7)
workflows/fetch_data.py

[refactor] 4-4: Use 'from pyarrow import ipc' instead

(R0402)


[error] 7-7: No name 'run_v2' in module 'google.cloud'

(E0611)


[error] 25-25: Undefined variable 'os'

(E0602)


[error] 26-26: Undefined variable 'os'

(E0602)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
🔇 Additional comments (1)
workflows/fetch_data.py (1)

22-59: Good implementation with proper error handling.

The function correctly implements the data fetching workflow with appropriate error handling for:

  • Missing datamanager service (lines 38-39)
  • HTTP request failures (line 49)
  • Data conversion failures (lines 56-57)

The configurable project/region approach using environment variables is well implemented once the os import is added.

🧰 Tools
🪛 Ruff (0.11.9)

25-25: Undefined name os

(F821)


26-26: Undefined name os

(F821)

🪛 GitHub Check: Run Python quality checks

[failure] 39-41: Ruff (TRY003)
workflows/fetch_data.py:39:15: TRY003 Avoid specifying long messages outside the exception class


[failure] 26-26: Ruff (F821)
workflows/fetch_data.py:26:14: F821 Undefined name os


[failure] 25-25: Ruff (F821)
workflows/fetch_data.py:25:15: F821 Undefined name os

🪛 Pylint (3.3.7)

[error] 25-25: Undefined variable 'os'

(E0602)


[error] 26-26: Undefined variable 'os'

(E0602)

@chrisaddy chrisaddy merged commit cc2d585 into master Jun 6, 2025
2 checks passed
@chrisaddy chrisaddy deleted the fix/linting branch June 6, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants