Commit f7d3865
Refactor decimal conversion in PyArrow tables to use direct casting (#544)
This PR replaces the previous implementation of convert_decimals_in_arrow_table() with a more efficient approach that uses PyArrow's native casting operation instead of going through pandas conversion and array creation.
- Remove conversion to pandas DataFrame via to_pandas() and apply() methods
- Remove intermediate steps of creating array from decimal column and setting it back
- Replace with direct type casting using PyArrow's cast() method
- Build a new table with transformed columns rather than modifying the original table
- Create a new schema based on the modified fields
The new approach is more performant by avoiding pandas conversion overhead. The table below highlights substantial performance improvements when retrieving all rows from a table containing decimal columns, particularly when compression is disabled. Even greater gains were observed with compression enabled—showing approximately an 84% improvement (6 seconds compared to 39 seconds). Benchmarking was performed against e2-dogfood, with the client located in the us-west-2 region.

Signed-off-by: Jayant Singh <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>1 parent 8f7754b commit f7d3865
1 file changed
+19
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
611 | 611 | | |
612 | 612 | | |
613 | 613 | | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
614 | 617 | | |
| 618 | + | |
| 619 | + | |
615 | 620 | | |
616 | | - | |
617 | | - | |
618 | | - | |
619 | 621 | | |
620 | 622 | | |
621 | 623 | | |
622 | | - | |
623 | | - | |
| 624 | + | |
624 | 625 | | |
625 | | - | |
626 | | - | |
627 | | - | |
628 | | - | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
629 | 639 | | |
630 | 640 | | |
631 | 641 | | |
| |||
0 commit comments