Skip to content
Closed

This file was deleted.

90 changes: 40 additions & 50 deletions ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/PlanMapper.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import java.lang.reflect.Modifier;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.IdentityHashMap;
Expand Down Expand Up @@ -200,54 +199,46 @@ public void merge(Object o1, Object o2) {
}

private void link(Object o1, Object o2, boolean mayMerge) {

Set<Object> keySet = Collections.newSetFromMap(new IdentityHashMap<Object, Boolean>());
keySet.add(o1);
keySet.add(o2);
keySet.add(getKeyFor(o1));
keySet.add(getKeyFor(o2));

Set<EquivGroup> mGroups = Collections.newSetFromMap(new IdentityHashMap<EquivGroup, Boolean>());

for (Object object : keySet) {
EquivGroup group = objectMap.get(object);
if (group != null) {
mGroups.add(group);
}
// Caches signatures on the first access. A signature of an Operator could change as optimizers could mutate it,
// keeping its semantics. The current implementation caches the signature before optimizations so that we can
// link Operators with their signatures at consistent timing.
Comment on lines +202 to +204
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this comment and don't pre-heat caches unless it gives real benefits
the purpose of the cache is not something like this ; but to prevent O(N^2) computation ( and also to reduce storage size by making it more serialize friendly...)

registerSignature(o1);
registerSignature(o2);
Comment on lines +205 to +206
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking operators implicitly (using signatures) was added by @kgyrtkirk in HIVE-18926. This change reverts that logic and takes signatures out of the equation for determining equivalent groups.

My understanding is that if this change goes in operators can be linked with signatures only explicitly. I don't see test failures and .q.out changes due to this so it seems that the change does not have a big impact on existing use-cases. Moreover, it fixes some known compilation crashes so I consider this an improvement over the current situation.

I don't see a significant risk in merging this change but I will let @kgyrtkirk comment in case I am missing something.

Assuming that we do not allow implicit linking via signatures I don't know what's the point of registering the signature at this stage. My understanding is that the cache is meant to improve performance not a means to perform equivalence checks. If we don't make use of the signature here then I don't think we should compute it. @okumin although you added some comments justifying the registration I would appreciate some additional clarifications regarding these additions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change reverts that logic and takes signatures out of the equation for determining equivalent groups.

haven't read all of it - but if thats true - please don't do that...
looking at the volume of changes - it must be changing something fundamental...

what's the issue - is it an equiv violation? can you put the stacktrace somewhere?
I think that should be fixed without altering this part - it might be a missing link; or a missing signature annotation somewhere - you should not change this at all...

all these things could enable to back-map runtime statistict up to the calcite join planning phase...but I think that feature was never completed/enabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to get a failing test for these changes you would need 2 ops which have conflicting signatures stored into the metastore ; loaded back and they might get applied incorrectly...
possibly pretty hard to do...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we do not allow implicit linking via signatures I don't know what's the point of registering the signature at this stage

That's because some optimization can mutate Operators, changing their true signatures. One example I remember is TableScanPPD. I'm not 100% sure which signatures should be used in that case, but it sounds more reasonable and predictable for me to use ones before optimization than ones in the middle of optimization. I would say Operators should not be mutable for the signing purpose, but it is a too fundamental change.

I will make a reply to @kgyrtkirk in another thread as I guess those comments were written before reading my gist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed mentioning what are affected by such optimizations. Many test cases on the following revision failed with equivalence mapping violation.
217b26d

That's because it tries to LINK a pre-optimized operator with a post-optimized operator.

  1. Before optimization
    • Operator A has a signature SA and is grouped as GA
    • Operator B has a signature SB and is grouped as GB
  2. AuxSignatureLinker doesn't MERGE GA and GB as A and B have different signatures. Note that non aux signatures are not cached here
  3. After optimization
    • Operator B is optimized, and its signature becomes SA
  4. StatsRulePsrocFactory tries to LINK GA and GB as A and B share the same signature SA -> equivalence mapping violation

Potentially, GA and GB are actually mergeable though the current implementation doesn't do so.

final EquivGroup linkedGroup1 = objectMap.get(o1);
final EquivGroup linkedGroup2 = objectMap.get(o2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes will weaken this stuff - so that 2 OPs which have the same signature will be allowed -> that's not good....
its not causing much issues yet - because almost all existing operators have good signatures; but if that degrades...runtime stats will be applied incorrectly


if (linkedGroup1 == null && linkedGroup2 == null) {
final EquivGroup group = new EquivGroup();
group.add(o1);
group.add(o2);
groups.add(group);
return;
}
if (mGroups.size() > 1) {
if (!mayMerge) {
throw new RuntimeException("equivalence mapping violation");
}
EquivGroup newGrp = new EquivGroup();
newGrp.add(o1);
newGrp.add(o2);
for (EquivGroup g : mGroups) {
for (Object o : g.members) {
newGrp.add(o);
}
}
groups.add(newGrp);
groups.removeAll(mGroups);
} else {
EquivGroup targetGroup = mGroups.isEmpty() ? new EquivGroup() : mGroups.iterator().next();
groups.add(targetGroup);
targetGroup.add(o1);
targetGroup.add(o2);
if (linkedGroup1 == null || linkedGroup2 == null || linkedGroup1 == linkedGroup2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's all these changes do better than the old?

final EquivGroup group = linkedGroup1 != null ? linkedGroup1 : linkedGroup2;
group.add(o1);
group.add(o2);
return;
}

if (!mayMerge) {
throw new RuntimeException(String.format(
"Failed to link %s and %s. This error mostly means a bug of Hive",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: An uncaught RuntimeException almost always denotes a bug of the application. I don't think we need to make the exception that verbose stating the obvious.

Adding the operators which led to exception in the message is good idea, makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this error message pretty cloudy on what the issue is...

I think the sentecte "Equivalence mapping violation" should be there - as that's what the problem is...we have 2 groups; not allowed to put them into the same equiv group -> big trouble...

o1, o2
));
}
EquivGroup newGrp = new EquivGroup();
newGrp.add(o1);
newGrp.add(o2);
linkedGroup1.members.forEach(newGrp::add);
linkedGroup2.members.forEach(newGrp::add);
groups.add(newGrp);
groups.remove(linkedGroup1);
groups.remove(linkedGroup2);
}

private OpTreeSignatureFactory signatureCache = OpTreeSignatureFactory.newCache();

private Object getKeyFor(Object o) {
if (o instanceof Operator) {
Operator<?> operator = (Operator<?>) o;
return signatureCache.getSignature(operator);
}
return o;
}

public <T> List<T> getAll(Class<T> clazz) {
List<T> ret = new ArrayList<>();
for (EquivGroup g : groups) {
Expand All @@ -256,12 +247,6 @@ public <T> List<T> getAll(Class<T> clazz) {
return ret;
}

public void runMapper(GroupTransformer mapper) {
for (EquivGroup equivGroup : groups) {
mapper.map(equivGroup);
}
}

public <T> List<T> lookupAll(Class<T> clazz, Object key) {
EquivGroup group = objectMap.get(key);
if (group == null) {
Expand All @@ -286,8 +271,13 @@ public Iterator<EquivGroup> iterateGroups() {
}

public OpTreeSignature getSignatureOf(Operator<?> op) {
OpTreeSignature sig = signatureCache.getSignature(op);
return sig;
return signatureCache.getSignature(op);
}

private void registerSignature(Object o) {
if (o instanceof Operator) {
getSignatureOf((Operator<?>) o);
}
}
Comment on lines +277 to 281
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the point of this method? the cache will compute it if needed...what the point of pre-registering - if that's needed its not a cache anymore!


public void clearSignatureCache() {
Expand Down
27 changes: 27 additions & 0 deletions ql/src/test/queries/clientpositive/cbo_cte_materialization.q
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
--! qt:dataset:src

set hive.optimize.cte.materialize.threshold=1;
set hive.optimize.cte.materialize.full.aggregate.only=false;

EXPLAIN CBO
WITH materialized_cte AS (
SELECT key, value FROM src WHERE key != '100'
),
another_materialized_cte AS (
SELECT key, value FROM src WHERE key != '100'
)
SELECT a.key, a.value, b.key, b.value
FROM materialized_cte a
JOIN another_materialized_cte b ON a.key = b.key
ORDER BY a.key;
Comment on lines +6 to +16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding (or replacing this with) a traditional EXPLAIN where we can see the effect of hive.optimize.cte.materialize.threshold and the full plan for the materialized ctes.


WITH materialized_cte AS (
SELECT key, value FROM src WHERE key != '100'
),
another_materialized_cte AS (
SELECT key, value FROM src WHERE key != '100'
)
SELECT a.key, a.value, b.key, b.value
FROM materialized_cte a
JOIN another_materialized_cte b ON a.key = b.key
ORDER BY a.key;
Comment on lines +18 to +27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is in the compilation phase so I don't think we need to actually run the query. Consider dropping the execution. If you opt to keep it then I would suggest crafting a much simpler test (without 1K lines in the output). It would be nice to keep the execution time of our test suite as low as possible. Moreover, it is not easy to see what the src table contains so verifying that the result is indeed correct is cumbersome.

1 change: 0 additions & 1 deletion ql/src/test/queries/clientpositive/perf/cbo_query14.q
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
--! qt:disabled:HIVE-24167
set hive.mapred.mode=nonstrict;
-- start query 1 in stream 0 using template query14.tpl and seed 1819994127
explain cbo
Expand Down
1 change: 0 additions & 1 deletion ql/src/test/queries/clientpositive/perf/query14.q
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
--! qt:disabled:HIVE-24167
set hive.mapred.mode=nonstrict;
-- start query 1 in stream 0 using template query14.tpl and seed 1819994127
explain
Expand Down
Loading