Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Dec 22, 2023

What changes were proposed in this pull request?

Retain plan_id in ResolveUnpivot and UnpivotCoercion

Why are the changes needed?

they drop the plan id and cause:

In [1]: df = spark.createDataFrame([(1, 2, 3, 4, 5, 6)], ["f1", "f2", "label", "pred", "model_version", "ts"])
   ...: df.melt("model_version", ["label", "f2"], "f1", "f2").groupby("f1").count().count()
23/12/22 11:37:52 WARN CheckAllocator: More than one DefaultAllocationManager on classpath. Choosing first found
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)

...

AnalysisException: When resolving 'f1, fail to find subplan with plan_id=2 in 'Aggregate ['f1], ['f1, count(1) AS count#125L]
+- Expand [[model_version#117L, label, label#115L], [model_version#117L, f2, f2#114L]], [model_version#117L, f1#126, f2#127L]
   +- Project [f1#101L AS f1#113L, f2#102L AS f2#114L, label#103L AS label#115L, pred#104L AS pred#116L, model_version#105L AS model_version#117L, ts#106L AS ts#118L]
      +- LocalRelation [f1#101L, f2#102L, label#103L, pred#104L, model_version#105L, ts#106L]


JVM stacktrace:
org.apache.spark.sql.AnalysisException
	at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.$anonfun$resolveUnresolvedAttributeByPlanId$3(ColumnResolutionHelper.scala:521)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.$anonfun$resolveUnresolvedAttributeByPlanId$2(ColumnResolutionHelper.scala:516)
	at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:469)
	at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.resolveUnresolvedAttributeByPlanId(ColumnResolutionHelper.scala:511)
	at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.tryResolveColumnByPlanId(ColumnResolutionHelper.scala:494)

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

added ut

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng zhengruifeng force-pushed the connect_unpivot_plan_id branch from ed4bb07 to ae4a2f3 Compare December 22, 2023 12:07
zhengruifeng added a commit that referenced this pull request Dec 29, 2023
… keep the plan id

### What changes were proposed in this pull request?
1, make following helper functions keep the plan id in transformation:

- `resolveOperatorsDownWithPruning`
- `resolveOperatorsUpWithNewOutput`

2, change the way to keep plan id in `ResolveNaturalAndUsingJoin`:
before:
```
Project <- tag hiddenOutputTag
  - Join <- tag PLAN_ID_TAG
```

after:
```
Project <- tag hiddenOutputTag & PLAN_ID_TAG
  - Join
```

3, to verify this fix, this PR also reverts previous tags copying changes in the rules

### Why are the changes needed?
we had make following rules keep the plan id:
1, `ResolveNaturalAndUsingJoin` in 167bbca
- using `resolveOperatorsUpWithPruning`, it set the tag `Project.hiddenOutputTag` internally, so `copyTagsFrom` (only works if `tags.isEmpty`) in `resolveOperatorsUpWithPruning` takes no effect

2, `ExtractWindowExpressions` in 185a0a5
- using `resolveOperatorsDownWithPruning`, which doesn't copy tags

3, `WidenSetOperationTypes` in 17c206f
- using `resolveOperatorsUpWithNewOutput -> transformUpWithNewOutput`, which doesn't copy tags

4, `ResolvePivot` in 1a89bdc
- using `resolveOperatorsWithPruning -> resolveOperatorsDownWithPruning`, which doesn't copy tags

5, `CTESubstitution` in 79d1cde
- using both `resolveOperatorsDownWithPruning` and `resolveOperatorsUp -> resolveOperatorsUpWithPruning`, the former does't copy tags

But plan id missing issue still keep popping up (see #44454), so this PR attempt to cover more cases by fixing the helper functions which are used to build the rules

6, `ResolveUnpivot`
- using `resolveOperatorsWithPruning -> resolveOperatorsDownWithPruning`, which doesn't copy tags

7, `UnpivotCoercion`
- using `resolveOperators -> resolveOperatorsWithPruning -> resolveOperatorsDownWithPruning`, which doesn't copy tags

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44462 from zhengruifeng/sql_res_op_keep.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
@zhengruifeng
Copy link
Contributor Author

close in favor of #44462

@zhengruifeng zhengruifeng deleted the connect_unpivot_plan_id branch December 29, 2023 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant