Conversation
Will add test later when we can convert the column expression back to SQL expression
…ng of all subclasses
22e0466 to
7d0df09
Compare
highker
left a comment
There was a problem hiding this comment.
Typo "funtion" in commit title.
Finished reviewing commit "Move funtion signature to SPI". Minor comments only. I think currently it makes sense to move Signature to spi.function rather than a new package given most function dependencies won't go to the new package and Signature is a common helper that will be used by connectors.
| return new Signature(name, FunctionKind.SCALAR, ImmutableList.of(), ImmutableList.of(), returnType, argumentTypes, false); | ||
| } | ||
|
|
||
| public static SignatureBuilder builder() |
There was a problem hiding this comment.
Moving Signature to presto-spi but leaving SignatureBuilder in presto-main reminds me that we have Type in presto-spi but all type coercion in presto-main. This weakens the power of the moved class.
We can
- move
SignatureBuilder+OperatorSignatureUtilstopresto-spi - rename
OperatorSignatureUtilstoSignatureUtils.
There was a problem hiding this comment.
I think SignatureBuilder depends on guava.
presto-spi/src/main/java/com/facebook/presto/spi/function/Signature.java
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/function/Signature.java
Show resolved
Hide resolved
| @@ -229,9 +209,4 @@ public static LongVariableConstraint longVariableExpression(String variable, Str | |||
| { | |||
| return new LongVariableConstraint(variable, expression); | |||
| } | |||
|
|
|||
| public static SignatureBuilder builder() | |||
highker
left a comment
There was a problem hiding this comment.
Finished reviewing "Move RowExpression to SPI".
Maybe add commit body "Move RowExpression to SPI and rename RowExpression to ColumnExpression".
This commit fails to compile. There are uncleaned RowExpression existing in the codebase. For example, SqlToRowExpressionTranslator is barely a file move without altering the class name.
presto-spi/src/main/java/com/facebook/presto/spi/relation/column/CallExpression.java
Show resolved
Hide resolved
...to-spi/src/main/java/com/facebook/presto/spi/relation/column/LambdaDefinitionExpression.java
Show resolved
Hide resolved
highker
left a comment
There was a problem hiding this comment.
Finished commit "Add ColumnReferenceExpression". Please update the commit body to say: ColumnReferenceExpression is a new expression for table source blah blah. The only concern of this commit is the translate() interface. That may need some more discussion. Otherwise looks good to me.
| } | ||
|
|
||
| @Override | ||
| public Void visitColumnReference(ColumnReferenceExpression columnReferenceExpression, Void context) |
There was a problem hiding this comment.
columnReferenceExpression -> reference
| import static java.util.stream.Collectors.toMap; | ||
|
|
||
| public final class SqlToRowExpressionTranslator | ||
| public final class SqlToColumnExpressionTranslator |
There was a problem hiding this comment.
Either squash the current commit with the previous one ("Move RowExpression to SPI") or move the renaming change to the previous one in order to not to fail the compilation.
| return translate(expression, functionKind, types, ImmutableMap.of(), functionRegistry, typeManager, session, optimize); | ||
| } | ||
|
|
||
| public static ColumnExpression translate( |
There was a problem hiding this comment.
Users can be confused by these interfaces. Just keep the most expressive one and remove the other two. Update the callers' parameters with ImmutableList.of() if necessary.
| if (columnHandles.containsKey(node.getName())) { | ||
| return new ColumnReferenceExpression(columnHandles.get(node.getName()), getType(node)); | ||
| } | ||
| else if (inputChannels.containsKey(node.getName())) { |
| public ColumnExpression visitCall(CallExpression call, Void context) | ||
| { | ||
| return call.replaceChildren(call.getArguments().stream() | ||
| .map(argument -> argument.accept(this, context)).collect(toImmutableList())); |
There was a problem hiding this comment.
we usually do
something
.func1()
.func2()
.func3();| } | ||
| } | ||
|
|
||
| private static class InlineInputChannelVistor |
There was a problem hiding this comment.
Typo Vistor and call it InlineInputChannelsVisitor
| this.inputs = requireNonNull(inputs, "input is null"); | ||
| } | ||
|
|
||
| public static ColumnExpression inlineInputs(ColumnExpression columnExpression, List<ColumnExpression> inputs) |
| public ColumnExpression visitInputReference(InputReferenceExpression reference, Void context) | ||
| { | ||
| int field = reference.getField(); | ||
| checkArgument(field >= 0 && field < inputs.size(), "Unknown input field"); |
There was a problem hiding this comment.
format("Unknown input field %s", field)
| FunctionKind functionKind, | ||
| Map<NodeRef<Expression>, Type> types, | ||
| Map<String, ColumnHandle> columnHandles, | ||
| List<String> inputChannelNames, |
There was a problem hiding this comment.
Not a big fan passing these two parameters in. They are too specific for table source. But before we address them, I find columnHandles is always empty passed from the callers (even in your last commit). Is that true? If it is, let's just remove it. If not, I'm thinking maybe leveraging context to tell the visitor what exactly to translate: a filter, a project, ... in order to hide the parameters.
| protected RowExpression visitSymbolReference(SymbolReference node, Void context) | ||
| protected ColumnExpression visitSymbolReference(SymbolReference node, Void context) | ||
| { | ||
| if (columnHandles.containsKey(node.getName())) { |
There was a problem hiding this comment.
This condition uses the implication of column names are symbol names. I feel a bit risky here. Also, these if statements kinda break abstraction because they are basically serving two different callers. One possibility is to leverage Context as stated above. But let's figure out whether columnHandles is used or not...
highker
left a comment
There was a problem hiding this comment.
Skip the review of "Convert ColumnExpression back to Expression" for now. I will come back to this commit after going through all following commits to see if there is way to refactor PlanNode to avoid dependency on sql.tree.
highker
left a comment
There was a problem hiding this comment.
Finished review "Add translation from planNode to tableExpression". Most are minor issues. TableExpression is a much cleaner IR than the current PlanNode. It's very unfortunate we cannot use PlanNode for connectors.
The biggest concern is the translation, not the correctness per se but consistency. Especially if someone alters the PlanNode in the future. Currently, the tests with reflection can avoid this to some extent. But still, I'm thinking if we can move the translation to some place (e.g., RelationPlanner) to have a consistent translation.
|
|
||
| public TableExpressionToPlanNodeTranslator(PlanNodeIdAllocator idAllocator, SymbolAllocator symbolAllocator, LiteralEncoder literalEncoder, Metadata metadata) | ||
| { | ||
| requireNonNull(metadata, "metadata is null"); |
There was a problem hiding this comment.
Move private final Metadata metadata; as the first member variable. Make this line this.metadata = requireNonNull(metadata, "metadata is null");
| return tableExpression.accept(new Visitor(session, connectorId), new Context(outputSymbols)); | ||
| } | ||
|
|
||
| private static class Context |
| } | ||
| } | ||
|
|
||
| private class Visitor |
| return plan.accept(new PlanRewriter(), null); | ||
| } | ||
|
|
||
| private class PlanRewriter |
| private ColumnExpression toColumnExpression(Expression expression, List<Symbol> inputs, FunctionKind type) | ||
| { | ||
| Map<NodeRef<Expression>, Type> expressionTypes = | ||
| getExpressionTypes(session, metadata, parser, types, expression, ImmutableList.of(), WarningCollector.NOOP, false); |
| @Override | ||
| public String toString() | ||
| { | ||
| return "FilterExpression{" + |
There was a problem hiding this comment.
Use com.google.common.base.MoreObjects.toStringHelper
| @Override | ||
| public String toString() | ||
| { | ||
| return "AggregateExpression{" + |
There was a problem hiding this comment.
Use com.google.common.base.MoreObjects.toStringHelper
| @Override | ||
| public String toString() | ||
| { | ||
| return "ProjectExpression{" + |
|
|
||
| import java.util.List; | ||
|
|
||
| public abstract class TableExpression |
| @Override | ||
| public String toString() | ||
| { | ||
| return "TableScanExpression{" + |
highker
left a comment
There was a problem hiding this comment.
Finished review "Add connector based optimizer". Pretty clean patch % minor comments
| PageSourceManager pageSourceManager, | ||
| IndexManager indexManager, | ||
| NodePartitioningManager nodePartitioningManager, | ||
| ConnectorOptimizationRuleManager optimizationRuleManager, NodePartitioningManager nodePartitioningManager, |
| // index manager | ||
| binder.bind(IndexManager.class).in(Scopes.SINGLETON); | ||
|
|
||
| binder.bind(ConnectorOptimizationRuleManager.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
Abstract ConnectorOptimizationRuleManager out with RuleManager as the interface. Create NoopRuleManager for worker. Bind different implementation for coordinator and worker.
|
|
||
| public class ConnectorOptimizationRuleManager | ||
| { | ||
| private final ConcurrentMap<ConnectorId, ConnectorRuleProvider> providers = new ConcurrentHashMap<>(); |
| } | ||
|
|
||
| @Override | ||
| public PlanNode optimize(PlanNode plan, Session session, TypeProvider types, SymbolAllocator symbolAllocator, PlanNodeIdAllocator idAllocator, WarningCollector warningCollector) |
There was a problem hiding this comment.
Add requireNonNulls in the function body.
| return progress; | ||
| } | ||
|
|
||
| private static class Context |
| { | ||
| requireNonNull(node, "node is null"); | ||
| TraitGroup merged = TraitGroup.emptyTraitGroup(); | ||
| node.getSources().stream() |
There was a problem hiding this comment.
new line stream(). Same for the one below.
| traitCollectors.stream() | ||
| .filter(traitCollector -> traitCollector.canApplyTo(node)) | ||
| .map(traitCollector -> traitCollector.exploreTraits(node)) | ||
| .forEach(traitGroup -> merged.addAll(traitGroup)); |
|
|
||
| public void addAll(TraitGroup other) | ||
| { | ||
| other.traitGroups.entrySet().stream() |
There was a problem hiding this comment.
The following statement is good enough
other.traitGroups.forEach((type, traits) -> traits.getValues().forEach(value -> addTrait(type, value)));|
|
||
| boolean match(ConnectorSession session, TableExpression tableExpression); | ||
|
|
||
| Optional<TableExpression> optimize(ConnectorSession session, TableExpression tableExpression); |
There was a problem hiding this comment.
Why Optional? If there is nothing to optimize, we can return the given tableExpression.
highker
left a comment
There was a problem hiding this comment.
Finished review "Prevent overriding picked layout and move connector optimzation to later stage". LGTM. Refine the title as "Move connector optimization after picking table layout".
| { | ||
| List<PlanNode> possiblePlans = PickTableLayout.listTableLayouts(node, predicate, true, session, types, idAllocator, metadata, parser, domainTranslator); | ||
| List<PlanWithProperties> possiblePlansWithProperties = possiblePlans.stream() | ||
| List<PlanWithProperties> possiblePlansWithProperties = Stream.concat(node.getLayout().isPresent() ? Stream.of(node) : Stream.empty(), possiblePlans.stream()) |
| .addAll(new ExtractSpatialJoins(metadata, splitManager, pageSourceManager, sqlParser).rules()) | ||
| .add(new InlineProjections()) | ||
| .build())); | ||
| builder.add(new ConnectorOptimizer(metadata, sqlParser, connectorOptimizationRuleManager, new LiteralEncoder(metadata.getBlockEncodingSerde()))); |
There was a problem hiding this comment.
Add a comment saying why we put the connector optimizer here.
highker
left a comment
There was a problem hiding this comment.
Finished review "Add checks to prevent uncleaned relation be returned from connector". Alter the title to "Add checks to prevent uncleaned relation returned from connectors"
BTW, the commit timestamps are messed up. Use git rebase master --ignore-date -x 'git commit --amend -C HEAD --date=\"$(date -R)\" && sleep 1.05' to clean up timestamps before sending out a PR.
|
|
||
| private void checkNoColumnInputs(UnaryTableExpression node) | ||
| { | ||
| int numColumnReferences = node.getOutput().stream() |
| import static java.util.stream.Collectors.toMap; | ||
|
|
||
| public final class SqlToColumnExpressionTranslator | ||
| public final class |
| public static class InputCollector | ||
| implements ColumnExpressionVisitor<Set<ColumnExpression>, Void> | ||
| { | ||
| private InputCollector() |
| @Override | ||
| public Set<ColumnExpression> visitCall(CallExpression call, Void context) | ||
| { | ||
| return call.getArguments().stream() |
highker
left a comment
There was a problem hiding this comment.
Reviewed "Allow use intermediate type as input of aggregation functin" but has not finished yet. I will pause it here for a moment given the logic is quite hard to follow. This commit may need some refactoring. But let's don't worry about this for now and focus on the translation part first.
| .anyMatch(this::isSpecializedType); | ||
| } | ||
|
|
||
| private boolean isSpecializedType(TypeSignature typeSignature) |
|
|
||
| private boolean isSpecializedType(TypeSignature typeSignature) | ||
| { | ||
| return !typeSignature.getParameters() |
There was a problem hiding this comment.
inline and use
return typeSignature.getParameters().stream().noneMatch(TypeSignatureParameter::isVariable);| try { | ||
| return specializedAggregationCache.getUnchecked(getSpecializedFunctionKey(signature)); | ||
| } | ||
| catch (UncheckedExecutionException e) { |
| throw new PrestoException(FUNCTION_NOT_FOUND, message); | ||
| } | ||
|
|
||
| private boolean isSpecializedFunction(Signature signature) |
|
|
||
| private boolean isSpecializedFunction(Signature signature) | ||
| { | ||
| return isSpecializedType(signature.getReturnType()) && signature.getArgumentTypes().stream() |
There was a problem hiding this comment.
return isSpecializedType(signature.getReturnType()) &&
signature.getArgumentTypes().stream().anyMatch(FunctionRegistry::isSpecializedType);| ConnectorPageSource source = pageSourceProvider.createPageSource(operatorContext.getSession(), split, columns); | ||
| if (source instanceof RecordPageSource) { | ||
| cursor = ((RecordPageSource) source).getCursor(); | ||
| pageSource = source; |
|
Did a first round review. Two high-level comments:
|
|
@hellium01, as a first step, I think it is safe to separate the first commit ("move signature to SPI") as a separate PR to prevent future rebase. That one looks clean. |
|
@highker @hellium01 Except that it should be |
highker
left a comment
There was a problem hiding this comment.
High-level comments regarding commit "Convert ColumnExpression back to Expression":
ColumnExpressionis not expressive enough as the IR representation forExpression. Indeed, I feel there is a need to introduce the concept of IR fully. An IR is the augmented representation for AST with metadata info in Reverse Polish notation.- With IR introduced, the refactoring can be much easier. Most
Expressionoperations are just doing disjunction/conjunction decomposition, which can be easily achieve by IR. I was also thinking directly useExpressiongiven it has been deeply rooted in our codebase doing the job that originally belongs to IR.
|
@hellium01 Add up to the previous comment: |
|
I think I might have a way to avoid explicit translation. The ultimate goal is to
Proposing annotation-based translation
@Immutable
@PlanNodeMapping(FilterExpression.class)
public class FilterNode
extends PlanNode
{
private final PlanNode source;
private final Expression predicate;
@JsonCreator
@PlanNodeCreator
public FilterNode(
@JsonProperty("id") PlanNodeId id,
@JsonProperty("source") @PlanNodeProperty("source") PlanNode source,
@JsonProperty("predicate") @PlanNodeProperty("predicate") Expression predicate)
{
super(id);
this.source = source;
this.predicate = predicate;
}
@PlanNodeProperty // with name or not is optional; it can be inferred from getter.
@JsonProperty("predicate")
public Expression getPredicate()
{
return predicate;
}
@PlanNodeProperty
@JsonProperty("source")
public PlanNode getSource()
{
return source;
}
}
Both The benefit of this approach is to shift node-oriented translation to type-oriented translation, which can easily scale up in the future. |
|
Some more example code as a starting point @Target({ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface PlanNodeMapping
{
Class<? extends TableExpression> value() default UnmappedTableExpression.class;
final class UnmappedTableExpression
extends TableExpression
{
@Override
public List<ColumnExpression> getOutput()
{
throw new UnsupportedOperationException();
}
}
}
public class TestTranslationManager
{
@Test
public void testTranslation()
{
ConnectorPlanNodeManager manager = new ConnectorPlanNodeManager();
manager.register(FilterNode.class);
manager.toTableExprssion(new FilterNode(new PlanNodeId("1"), null, new LogicalBinaryExpression(AND, new BooleanLiteral("False"), new BooleanLiteral("true"))));
}
public class ConnectorPlanNodeManager
{
private final BiMap<Class<? extends PlanNode>, Class<? extends TableExpression>> mapping = HashBiMap.create();
public ConnectorPlanNodeManager()
{
}
public void register(Class<? extends PlanNode> clazz)
{
PlanNodeMapping planNodeMapping = clazz.getAnnotation(PlanNodeMapping.class);
mapping.put(clazz, planNodeMapping.value());
// TODO: sanity check to make sure the translation is (1) complete and (2) loss-less
}
public TableExpression toTableExprssion(PlanNode planNode)
{
System.out.println(mapping.get(planNode.getClass()));
Method[] methods = planNode.getClass().getMethods();
for (Method method : methods) {
PlanNodeProperty planNodeProperty = method.getAnnotation(PlanNodeProperty.class);
if (planNodeProperty != null) {
String property = planNodeProperty.value();
if (property.equals(USE_DEFAULT_NAME)) {
String methodName = method.getName();
if (methodName.startsWith("is")) {
property = methodName.substring(2);
property = Character.toLowerCase(property.charAt(0)) + property.substring(1);
}
else if (methodName.startsWith("get")) {
property = methodName.substring(3);
property = Character.toLowerCase(property.charAt(0)) + property.substring(1);
}
else {
throw new IllegalArgumentException();
}
}
Class<?> fromType = method.getReturnType();
// TODO: translate
}
}
return null;
}
// TODO: put type-to-type translation here
private ColumnExpression toColumnExpression(Session session, Expression projectionExpression)
{
// TODO: put translation here
return null;
}
}
} |
|
Thanks for the review, I will start addressing it in following days once I am back. About the type to type translation, the difficulty is we sometimes have more than 1 node to represent a relation. |
|
Expression has too much semantic information that we actually never use later in planning phase. The only reason we convert columnExpression (or whatever name we decide) back to expression is because we are currently using expression in planNode. That being said, ColumnExpression should be able to fully converted back to expression given enough care but this means we should always attempt to make this mapping work (we today have to manage conversion from expression to columnExpression anyway). I think we should provide only the simple relation information to connector and leave all the implementation related information in the engine. Simpler representation should always be better since it is easier to be operated on. Rewrite columnExpression into connector logic will be much easier than rewrite expression. Assuming in the future we want to match if a subtree has corresponding materialized table, matching column/table expression (simple relation) will be much easier than matching planNode mapped expression which has more information that connector don't care but has to handle (so even we don't do it in engine level, connector will most likely eventually do a translation itself so that it can reduce complexity so that it can operate and rewrite). Simpler representation (tableExpression vs planNode) also prevents connector from seeing changing node types or node structure, which makes connector optimization rules more stable across versions: the only care we need is when we have a change in planNode, we need to make sure the translation back/forth does not break which is much easier than making sure all different connectors have no broken rule. |
|
Nah, I'm not object to a simple representation, especially given I have proposed the annotation-based translator. I'm totally cool with a "view" concept on top of |
|
superseded by #12920 |
Related to: #12368