-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
The DataFusion SQL Unparser does not correctly roundtrip SQL queries that use UNION rather than UNION ALL. For example the following query:
SELECT col1 FROM footable
UNION
SELECT col1 FROM bartableShould result in a query that filters out duplicate rows from the final result. DataFusion handles this by adding a LogicalPlan::Distinct node as the parent of the LogicalPlan::Union node.
Distinct:
Union:
TableScan
TableScanHowever, this is currently unparsed to the following SQL:
SELECT col1 FROM footable
UNION ALL
SELECT col1 FROM bartableThat will cause incorrect results when executed, because the duplicate rows will not be filtered out.
To Reproduce
Parse the following query into a DataFusion LogicalPlan and then immediately unparse it and note that it unparses to a UNION ALL instead of a UNION:
SELECT j1_string AS col1, j1_id AS id FROM j1
UNION
SELECT j2_string AS col1, j2_id AS id FROM j2
UNION
SELECT j3_string AS col1, j3_id AS id FROM j3Expected behavior
The UNION is correctly preserved in the unparsed SQL from a LogicalPlan that adds a Distinct node directly above a Union node.
Additional context
I've already fixed this and will submit a PR shortly.