Skip to content

Conversation

@muhammad-ali-e
Copy link
Contributor

What

  • Fixed BigQuery database insertion failure caused by float precision issues in PARSE_JSON
  • Added BigQuery-specific float sanitization with IEEE 754 double precision safe zone (15 significant figures)
  • Consolidated duplicate float sanitization logic into shared utilities

Why

  • BigQuery's PARSE_JSON() is stricter than Python's json.loads() and requires floats that can "round-trip" through string representation
  • Unix timestamps with microsecond precision (e.g., 1760509016.282637) have 16 significant figures, exceeding IEEE 754 double precision safe zone (15 digits)
  • This caused insertion errors: Invalid input: Input number: 1760509016.282637 cannot round-trip through string representation
  • Multiple duplicate implementations of float sanitization existed across codebase, causing maintenance issues

How

  • Created common sanitize_floats_for_database() utility in unstract/connectors/databases/utils.py that handles NaN/Inf for all databases
  • Added BigQuery-specific _sanitize_for_bigquery() method that limits total significant figures to 15 using magnitude-based calculation: safe_decimals = max(0, 15 - magnitude)
  • Updated BigQuery connector to use database-specific sanitization at 3 call sites:
    • JSON columns with PARSE_JSON
    • STRING columns with JSON serialization
    • Parsed JSON values from strings
  • Removed duplicate _sanitize_floats_for_database() method from workers/shared/infrastructure/database/utils.py and updated 4 call sites to use common utility
  • Applied sanitization before json.dumps() to ensure clean binary representation for BigQuery's PARSE_JSON

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No breaking changes expected:

  • Changes are defensive and only affect BigQuery connector behavior
  • For BigQuery: Floats are limited to 15 significant figures, which is within IEEE 754 double precision guarantees
  • For large numbers (timestamps): Slightly reduces decimal precision (1760509016.2826371760509016.28264), but maintains sufficient accuracy for millisecond-level timing
  • For small numbers (costs): Full precision preserved (0.001228 remains 0.001228)
  • Other databases (PostgreSQL, MySQL, Snowflake) unaffected - they only receive minimal NaN/Inf sanitization without precision limiting
  • Worker utils now uses shared common utility instead of duplicate implementation - behavior remains functionally identical

Database Migrations

  • None required

Env Config

  • None required

Relevant Docs

  • IEEE 754 Double Precision: 15-17 decimal digits of precision
  • BigQuery PARSE_JSON documentation
  • Related issue: UN-2882

Related Issues or PRs

  • Fixes UN-2882

Dependencies Versions

  • No dependency version changes

Notes on Testing

  • Tested with workflow execution that inserts metadata containing timestamps to BigQuery destination
  • Verified that timestamps with 16+ significant figures are correctly sanitized to 15 figures
  • Confirmed no insertion errors during BigQuery PARSE_JSON operations
  • Worker image rebuilt and deployed successfully
  • All pre-commit hooks passed

Screenshots

N/A - Backend fix with no UI changes

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

…tadata serialization

- Added BigQuery-specific float sanitization with IEEE 754 double precision safe zone
- Consolidated duplicate float sanitization logic into shared utilities
- Fixed insertion errors caused by floats with >15 significant figures

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of special float values in database writes, preventing errors and malformed JSON.
    • Ensures NaN/Infinity are converted to null and large/small numbers round-trip accurately (up to 15 significant digits).
    • More reliable serialization for JSON and string fields in BigQuery.
  • Refactor

    • Centralized float sanitization into a shared utility used across components for consistent behavior and easier maintenance.

Walkthrough

Adds a shared float-sanitization utility and integrates it into BigQuery JSON/STRING handling and worker database utilities; BigQuery now sanitizes nested dicts/lists and floats (NaN/±Inf → None, large floats rounded) before JSON serialization and when parsing SQL values.

Changes

Cohort / File(s) Summary
BigQuery connector float/JSON sanitization
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py
Adds BigQuery._sanitize_for_bigquery(data: Any) -> Any (recursive: NaN/±Inf → None, round large floats to 15 significant digits) and uses it before json.dumps for JSON and STRING column paths in execute_query; also sanitizes parsed JSON in get_sql_values_for_query.
Shared DB float sanitizer (new module)
unstract/connectors/src/unstract/connectors/databases/utils.py
New public sanitize_floats_for_database(data: Any) -> Any that recursively replaces NaN/±Inf with None across dicts/lists; intended as a shared helper for connectors and workers.
Workers refactor to shared sanitizer
workers/shared/infrastructure/database/utils.py
Removes local _sanitize_floats_for_database implementation; imports and uses sanitize_floats_for_database from the new connectors utils and updates call sites accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant BigQuery as BigQuery.execute_query
  participant Sanit as _sanitize_for_bigquery / sanitize_floats_for_database
  participant JSON as json.dumps
  participant DB as BigQuery Service

  Caller->>BigQuery: execute_query(values including dict/list/JSON)
  rect rgba(220,235,255,0.4)
    BigQuery->>Sanit: sanitize(value)\n- recurse dict/list\n- NaN/±Inf -> None\n- round to 15 sig figs
    Sanit-->>BigQuery: sanitized_value
    alt sanitized_value is not None
      BigQuery->>JSON: json.dumps(sanitized_value)
      JSON-->>BigQuery: json_string
      BigQuery->>DB: send query with json_string
    else sanitized_value is None
      BigQuery->>DB: send query with NULL representation
    end
  end
  DB-->>BigQuery: query result
  BigQuery-->>Caller: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly references the related issue, indicates that it is a fix, and concisely describes the main change as addressing BigQuery float precision in PARSE_JSON for metadata serialization; it directly summarizes the core purpose of the PR.
Description Check ✅ Passed The description fully follows the repository template by providing clear “What,” “Why,” and “How” sections, addressing potential breaking changes, database migrations, environment configuration, relevant documentation, related issues, dependency versions, testing notes, screenshots, and the checklist; all required sections are present and comprehensively filled out.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)

✅ Unit Test PR creation complete.

  • Create PR with unit tests
  • Commit unit tests in branch fix/UN-2882-bigquery-float-precision
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7ca14cf and 76e2365.

📒 Files selected for processing (3)
  • unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py (4 hunks)
  • unstract/connectors/src/unstract/connectors/databases/utils.py (1 hunks)
  • workers/shared/infrastructure/database/utils.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (5)
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py (2)

66-109: LGTM! Solid implementation of magnitude-based precision limiting.

The algorithm correctly limits total significant figures to 15 for IEEE 754 compatibility. The magnitude calculation floor(log10(abs(data))) + 1 properly handles both large numbers (like Unix timestamps) and small numbers (like costs) by adjusting decimal precision based on the number's scale.

Edge cases are well handled:

  • Zero returns 0.0 directly
  • NaN/Inf converted to None
  • Small numbers (< 1) get extra decimal places, but this doesn't violate the 15 significant figure limit since leading zeros aren't significant

370-373: LGTM! Correct application of sanitization after parsing.

The sanitization is correctly applied after json.loads() without problematic conditional checks. The parsed value is sanitized and assigned directly.

Minor note: The comment "json.loads() creates new float objects that may have binary precision problems" is slightly imprecise—it's the original JSON string's precision that may exceed binary representation limits, not json.loads() itself causing the problem. However, this is a documentation nitpick and doesn't affect correctness.

unstract/connectors/src/unstract/connectors/databases/utils.py (1)

11-51: LGTM! Clean shared utility with clear scope.

The implementation correctly provides minimal NaN/Inf sanitization for all databases, with clear documentation that precision handling should be implemented in specific connectors. The recursive approach properly handles nested structures.

The docstring examples effectively demonstrate the utility's limited scope—showing that regular floats (like 1760509016.282637) are preserved unchanged, delegating precision handling to database-specific implementations like BigQuery's _sanitize_for_bigquery().

workers/shared/infrastructure/database/utils.py (2)

17-17: LGTM! Correct import of shared sanitizer.

The import properly brings in the centralized sanitize_floats_for_database utility, replacing the internal implementation that was removed.


297-297: LGTM! Consistent usage of shared sanitizer across all call sites.

All four call sites correctly use sanitize_floats_for_database() to handle NaN/Inf values before database operations. The refactoring properly centralizes the sanitization logic without changing behavior.

Also applies to: 365-365, 377-377, 384-384

muhammad-ali-e and others added 3 commits October 15, 2025 19:50
- Changed 'if sanitized_value' to 'if sanitized_value is not None'
- Prevents empty dicts {}, empty lists [], and zero values from becoming None
- Addresses CodeRabbit AI feedback on PR #1593

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link
Contributor

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$

@github-actions
Copy link
Contributor

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_non\_retryable\_http\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retryable\_http\_errors}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_post\_method\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_platform.py}}$$ $$\textcolor{#23d18b}{\tt{TestPlatformHelperRetry.test\_retry\_logging}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_success\_on\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_retry\_on\_errors}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_prompt.py}}$$ $$\textcolor{#23d18b}{\tt{TestPromptToolRetry.test\_wrapper\_methods\_retry}}$$ $$\textcolor{#23d18b}{\tt{4}}$$ $$\textcolor{#23d18b}{\tt{4}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_connection\_error\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_timeout\_is\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_non\_retryable\_status\_codes}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_http\_error\_without\_response}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{5}}$$ $$\textcolor{#23d18b}{\tt{5}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_os\_error\_non\_retryable\_errno}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestIsRetryableError.test\_other\_exception\_not\_retryable}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_without\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_exponential\_backoff\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCalculateDelay.test\_max\_delay\_cap\_with\_jitter}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_successful\_call\_first\_attempt}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_after\_transient\_failure}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_max\_retries\_exceeded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_max\_time\_exceeded}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_retry\_with\_custom\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_no\_retry\_with\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_exception\_not\_in\_tuple\_not\_retried}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryWithExponentialBackoff.test\_delay\_would\_exceed\_max\_time}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_default\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_environment\_variable\_configuration}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_max\_retries}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_max\_time}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_base\_delay}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_invalid\_multiplier}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_jitter\_values}}$$ $$\textcolor{#23d18b}{\tt{2}}$$ $$\textcolor{#23d18b}{\tt{2}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_exceptions\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_custom\_predicate\_only}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_both\_exceptions\_and\_predicate}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestCreateRetryDecorator.test\_exceptions\_match\_but\_predicate\_false}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_platform\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_retry\_prompt\_service\_call\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_platform\_service\_decorator\_retries\_on\_connection\_error}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestPreconfiguredDecorators.test\_prompt\_service\_decorator\_retries\_on\_timeout}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_warning\_logged\_on\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_info\_logged\_on\_success\_after\_retry}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/utils/test\_retry\_utils.py}}$$ $$\textcolor{#23d18b}{\tt{TestRetryLogging.test\_exception\_logged\_on\_giving\_up}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{66}}$$ $$\textcolor{#23d18b}{\tt{66}}$$

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py (1)

89-102: Consider refining magnitude calculation for numbers < 1.

The magnitude-based calculation works correctly for large numbers (timestamps), but for small numbers (< 1), the formula safe_decimals = max(0, 15 - magnitude) allows more than 15 decimal places since magnitude is negative. For example, 0.001228 has magnitude = -2, yielding safe_decimals = 17.

While the f-string format .{N}f still rounds naturally and this likely isn't a practical issue (small numbers rarely exceed float precision), the implementation doesn't strictly enforce the "15 significant figures" claim for all inputs. Consider using a library like decimal for precise significant figure handling if strict enforcement is needed:

from decimal import Decimal, ROUND_HALF_UP

# Example: strictly limit to 15 significant figures
sig_figs = 15
d = Decimal(str(data))
# Calculate the exponent and round to sig_figs
exponent = d.adjusted()  # Position of most significant digit
quantize_exp = Decimal(10) ** (exponent - sig_figs + 1)
return float(d.quantize(quantize_exp, rounding=ROUND_HALF_UP))

That said, given that the primary use case (large timestamps) is handled correctly and small numbers with many significant figures are rare, the current implementation is acceptable for this defensive fix.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 76e2365 and 58ce1a6.

📒 Files selected for processing (1)
  • unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (5)
unstract/connectors/src/unstract/connectors/databases/bigquery/bigquery.py (5)

66-86: LGTM: Clear documentation and purpose.

The method signature and docstring clearly explain the BigQuery PARSE_JSON compatibility requirements and provide helpful examples.


104-109: LGTM: Recursive handling is correct.

The recursive processing of dicts and lists ensures nested structures are thoroughly sanitized, and the default passthrough preserves non-float types appropriately.


250-256: LGTM: Truthiness issue resolved.

The code now correctly uses is not None instead of a truthy check, preserving empty dicts, empty lists, and zeros as requested in previous reviews. The sanitization is properly applied before JSON serialization.


267-273: LGTM: Truthiness issue resolved.

Identical to lines 250-256, the code now correctly uses is not None to avoid dropping valid empty values and zeros. The sanitization is properly applied before JSON serialization.


374-377: LGTM: Sanitization after parsing is appropriate.

Applying sanitization after json.loads() is a good defensive measure, as parsed float objects may have binary precision issues. This ensures consistent handling across all three code paths where floats enter the BigQuery pipeline.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

UTG Post-Process Complete

No new issues were detected in the generated code and all check runs have completed. The unit test generation process has completed successfully.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Creating a PR to put the unit tests in...

The changes have been created in this pull request: View PR

@muhammad-ali-e muhammad-ali-e merged commit 464fbb3 into main Oct 16, 2025
7 checks passed
@muhammad-ali-e muhammad-ali-e deleted the fix/UN-2882-bigquery-float-precision branch October 16, 2025 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants