Skip to content

Conversation

@kevintang2022
Copy link
Contributor

@kevintang2022 kevintang2022 commented Jul 1, 2025

Summary: Currently, seeing index out of range errors for cases when a struct column is read, but none of its subfields are referenced

Reviewed By: rschlussel

Differential Revision: D76869401

Test Plan

with shaped as (SELECT fb_reshape_row(person,CAST(NULL AS ROW(age INTEGER, city VARCHAR))) AS pcol FROM tangk_struct_table),
raw as (select person as pcol from prism.di.tangk_struct_table)
select pcol from shaped

For cases when the struct column does not have any of its subfields referenced, the fb_reshape_row expression is passed into the toSubfield method
image

When a subfield is actually read in the query, a DEREFERENCE is passed in
image

In the dereference case, the elements immutablelist builder will have at least one element. In the first case, the list builder will be empty

Manual tests:
Verifier query that caught the empty subfields bug:
0.294-edge2: 20250618_024807_00003_snxkc. Failed with index out of range
0.294-20250618.023458-171: 20250618_025504_00004_29vsz succeed

Arg pushdown
20250618_025745_00008_29vsz correct query plan with pushed down subfield

Fragment 0 [SINGLE]
    CPU: 884.46us, Scheduled: 982.69us, Input: 1 row (812B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (812B), 1 tasks
    Output layout: [field]
    Output partitioning: SINGLE []
    Output encoding: COLUMNAR
    Stage Execution Strategy: UNGROUPED_EXECUTION
    - Output[PlanNodeId 6][Query Plan] => [field:varchar(807)]
            CPU: 0.00ns (?%), Scheduled: 0.00ns (?%), Output: 1 row (812B)
            Input avg.: 1.00 rows, Input std.dev.: 0.00%
            Query Plan := field
        - Values[PlanNodeId 0] => [field:varchar(807)]
                CPU: 0.00ns (?%), Scheduled: 0.00ns (?%), Output: 1 row (812B)
                Input avg.: 1.00 rows, Input std.dev.: 0.00%
                (VARCHAR'- Output[PlanNodeId 10][age] => [expr_3:integer]
                        age := expr_3 (3:8)
                    - RemoteStreamingExchange[PlanNodeId 222][GATHER - COLUMNAR] => [expr_3:integer]
                        - ScanProject[PlanNodeId 0,6][table = TableHandle {connectorId=''prism'', connectorHandle=''PrismTableHandle{schemaName=di, tableName=tangk_struct_table, analyzePartitionValues=Optional.empty, sideTableFeatureIds=[]}'', layout=''Optional[di.tangk_struct_table{}]''}, projectLocality = LOCAL] => [expr_3:integer]
                                expr_3 := DEREFERENCE(fb_reshape_row(person, null), INTEGER''0'') (1:114)
                                LAYOUT: di.tangk_struct_table{}
                                person := person:struct<age:int,city:string>:0:REGULAR:[person.age] (1:113)
                                id:bigint:-13:PARTITION_KEY
                                    :: [["1"], ["2"], ["3"], ["4"], ["5"]]
                ')

20250618_025948_00010_29vsz query plan with non relevant function

Fragment 0 [SINGLE]
    CPU: 990.72us, Scheduled: 1.05ms, Input: 1 row (805B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (805B), 1 tasks
    Output layout: [field]
    Output partitioning: SINGLE []
    Output encoding: COLUMNAR
    Stage Execution Strategy: UNGROUPED_EXECUTION
    - Output[PlanNodeId 6][Query Plan] => [field:varchar(800)]
            CPU: 0.00ns (?%), Scheduled: 0.00ns (?%), Output: 1 row (805B)
            Input avg.: 1.00 rows, Input std.dev.: 0.00%
            Query Plan := field
        - Values[PlanNodeId 0] => [field:varchar(800)]
                CPU: 0.00ns (?%), Scheduled: 0.00ns (?%), Output: 1 row (805B)
                Input avg.: 1.00 rows, Input std.dev.: 0.00%
                (VARCHAR'- Output[PlanNodeId 10][age] => [expr_3:integer]
                        age := expr_3 (1:209)
                    - RemoteStreamingExchange[PlanNodeId 222][GATHER - COLUMNAR] => [expr_3:integer]
                        - ScanProject[PlanNodeId 0,6][table = TableHandle {connectorId=''prism'', connectorHandle=''PrismTableHandle{schemaName=di, tableName=tangk_struct_table, analyzePartitionValues=Optional.empty, sideTableFeatureIds=[]}'', layout=''Optional[di.tangk_struct_table{}]''}, projectLocality = LOCAL] => [expr_3:integer]
                                expr_3 := DEREFERENCE(fb_reshape_row_old(person, null), INTEGER''0'') (1:124)
                                LAYOUT: di.tangk_struct_table{}
                                person := person:struct<age:int,city:string>:0:REGULAR (1:123)
                                id:bigint:-13:PARTITION_KEY
                                    :: [["1"], ["2"], ["3"], ["4"], ["5"]]
                ')


20250618_030040_00011_29vsz expected plan

Fragment 0 [SINGLE]
    CPU: 986.39us, Scheduled: 1.05ms, Input: 1 row (792B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (792B), 1 tasks
    Output layout: [field]
    Output partitioning: SINGLE []
    Output encoding: COLUMNAR
    Stage Execution Strategy: UNGROUPED_EXECUTION
    - Output[PlanNodeId 6][Query Plan] => [field:varchar(787)]
            CPU: 0.00ns (?%), Scheduled: 0.00ns (?%), Output: 1 row (792B)
            Input avg.: 1.00 rows, Input std.dev.: 0.00%
            Query Plan := field
        - Values[PlanNodeId 0] => [field:varchar(787)]
                CPU: 0.00ns (?%), Scheduled: 0.00ns (?%), Output: 1 row (792B)
                Input avg.: 1.00 rows, Input std.dev.: 0.00%
                (VARCHAR'- Output[PlanNodeId 10][age] => [expr_4:integer]
                        age := expr_4 (1:205)
                    - RemoteStreamingExchange[PlanNodeId 215][GATHER - COLUMNAR] => [expr_4:integer]
                        - ScanProject[PlanNodeId 0,6][table = TableHandle {connectorId=''prism'', connectorHandle=''PrismTableHandle{schemaName=di, tableName=tangk_struct_table, analyzePartitionValues=Optional.empty, sideTableFeatureIds=[]}'', layout=''Optional[di.tangk_struct_table{}]''}, projectLocality = LOCAL] => [expr_4:integer]
                                expr_4 := DEREFERENCE(person, INTEGER''0'') (1:158)
                                LAYOUT: di.tangk_struct_table{}
                                person := person:struct<age:int,city:string>:0:REGULAR:[person.age] (1:177)
                                id:bigint:-13:PARTITION_KEY
                                    :: [["1"], ["2"], ["3"], ["4"], ["5"]]
                ')

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==
General Changes
* Fix subfield pushdown arg index for scalar functions to support selecting whole struct column

@kevintang2022 kevintang2022 requested a review from a team as a code owner July 1, 2025 17:17
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jul 1, 2025
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76869401

@kevintang2022 kevintang2022 changed the title Fix subfield pushdown arg index for fb reshape row Fix subfield pushdown arg index for selecting entire struct column Jul 1, 2025
@kevintang2022 kevintang2022 requested a review from rschlussel July 1, 2025 17:20
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a release note?

Summary: Currently, seeing index out of range errors for cases when a struct column is read, but none of its subfields are referenced

Reviewed By: rschlussel

Differential Revision: D76869401
@kevintang2022 kevintang2022 merged commit d8bb991 into prestodb:master Jul 2, 2025
109 checks passed
@prestodb-ci prestodb-ci mentioned this pull request Jul 28, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants