Skip to content

Conversation

@coderfender
Copy link
Contributor

@coderfender coderfender commented Dec 1, 2025

Which issue does this PR close?

Closes #326 (part 1 - String to Float casts)

Rationale for this change

Comet currently supports some basic string -> non int casts and this PR essentially extends on that to support all possible casts from string to float/double (decimal casts are covered in a different PR)

What changes are included in this PR?

Updates to cast logic to support exhaustive casts from string to numeric float

Float inputs (which should include DoubleType)

  1. Added logic to support all float types along with scientific notation handling
  2. Added unit tests to support double and float casts

How are these changes tested?

Unit tests for specific edge cases like infinity, scientific notations etc (covering the edge cases mentioned in the linked issue) and fuzz tests to make sure all valid/invalid inputs are verified

cc : @andygrove , @martin-g

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.58%. Comparing base (f09f8af) to head (a03bbd1).
⚠️ Report is 799 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2835      +/-   ##
============================================
+ Coverage     56.12%   59.58%   +3.45%     
- Complexity      976     1365     +389     
============================================
  Files           119      167      +48     
  Lines         11743    15490    +3747     
  Branches       2251     2567     +316     
============================================
+ Hits           6591     9230    +2639     
- Misses         4012     4962     +950     
- Partials       1140     1298     +158     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderfender coderfender marked this pull request as draft December 2, 2025 16:00
@coderfender coderfender changed the title feat: Support string non int numeric types feat: Support casting string non int numeric types Dec 6, 2025
@coderfender coderfender force-pushed the support_string_non_int_numeric_types branch from e7a1144 to 79d0ea9 Compare December 7, 2025 19:36
@coderfender coderfender marked this pull request as ready for review December 7, 2025 19:53
@coderfender
Copy link
Contributor Author

Fixed issues with formatting

@coderfender
Copy link
Contributor Author

Fixed failing tests and added some more . Also modularized to replicate spark's behavior

@coderfender
Copy link
Contributor Author

coderfender commented Dec 11, 2025

Reasons for the tests to be failing :

  1. Test name mismatch (Comet is checking for cast {dt} to {dt} while the actual test name has ANSI support cast StringType to FloatType .
  2. Err msg Mismatches between Spark and Comet outputs

@coderfender
Copy link
Contributor Author

The failures were largely semantic caused due to mismatch in the error message (whitespace) and test nomenclature

@coderfender
Copy link
Contributor Author

The cast suite was specifically testing for Decimal(10,2) data type only. Changed to make make sure all casts are now supported. Barring that no semantic / incorrect results are observed

@coderfender
Copy link
Contributor Author

coderfender commented Dec 16, 2025

The tests are all passing. PR is ready for review

@coderfender
Copy link
Contributor Author

Splitting this into multiple PRs for better readability

@coderfender coderfender changed the title feat: Support casting string non int numeric types feat: Support casting string float types Dec 17, 2025
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @coderfender

@coderfender
Copy link
Contributor Author

Thank you @andygrove

@andygrove
Copy link
Member

@coderfender There is a clippy warning:

    Checking datafusion-comet-spark-expr v0.13.0 (/__w/datafusion-comet/datafusion-comet/native/spark-expr)
error: unused variable: `allow_incompat`
    --> spark-expr/src/conversion_funcs/cast.rs:1214:5
     |
1214 |     allow_incompat: bool,
     |     ^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_allow_incompat`

@coderfender
Copy link
Contributor Author

@andygrove thank you very much . Let me fix the warning and push a commit right away

@coderfender
Copy link
Contributor Author

@andygrove pushed a commit to fix clippy errors. Please kickoff CI whenever you get a chance

@coderfender
Copy link
Contributor Author

@andygrove the CI is passing now . Please merge it whenever you get a chance . Thank you

@andygrove andygrove merged commit a951da9 into apache:main Dec 24, 2025
118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Spark-compatible CAST from String to Floating Point

4 participants