Skip to content

Conversation

@jackiehanyang
Copy link
Contributor

@jackiehanyang jackiehanyang commented Feb 1, 2022

Signed-off-by: Jackie Han [email protected]

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jackiehanyang jackiehanyang requested a review from a team as a code owner February 1, 2022 22:38
@jackiehanyang
Copy link
Contributor Author

build-windows32 and build-windows64 check failed due to actions/runner-images#4856

/** commands */
commands
: whereCommand | fieldsCommand | renameCommand | statsCommand | dedupCommand | sortCommand | evalCommand | headCommand
| topCommand | rareCommand;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have considered more generic command name for ML? e.g. apply model-name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed how to construct the command so that it can be as generic as possible, and it seems like just pass in the algorithm name is the best solution. Model-name or model-id is hidden in the whole train and predict process. We don't want customers to make an effort to take notes on model attribute and pass it in the command.

Comment on lines 145 to 148
Map<String, Object> items = new HashMap<>();
input.next().tupleValue().forEach((key, value) -> {
items.put(key, value.value());
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ExprTulpleValue::value()?

Copy link
Contributor Author

@jackiehanyang jackiehanyang Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean. I don't think we are able to reference ExprTulpleValue here? Could you elaborate more?

public void open() {
super.open();
DataFrame inputDataFrame = generateInputDataset();
MLAlgoParams mlAlgoParams = convertArgumentToMLParameter(arguments.get(0), algorithm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to parse Argument in Analyzer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to put Argument parsing logic in MLCommonsOperator, because I don't see any argument parsing logic in Analyzer for other logical plans. Also, it looks like the main purpose of Analyzer class is to construct the logical plan, so I prefer to leave the argument parsing work for the actual operator, which is MLCommonsOperator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we parse Argument in Analyzer, it will create dependencies in Core module as Analyzer class sits in Core module.

@penghuo
Copy link
Collaborator

penghuo commented Feb 3, 2022

Thanks for making the change!. Not finished yet, two high level comments

  1. In general, Core module should know nothing about Storage Engine. Which means each storage engine should decide how to convert LogicalPlan to PhysicalPlan. So, MLCommonsOperator should be implemented in OpenSearch module instead of Core module.
  2. Could you add doc for ML command also? e.g. https://github.com/penghuo/os-sql/blob/issue-396-pr/docs/user/ppl/index.rst

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2022

Codecov Report

Merging #407 (c703f46) into feature/ppl-ml (3ffa1ca) will decrease coverage by 33.60%.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                  @@
##             feature/ppl-ml     #407       +/-   ##
=====================================================
- Coverage             96.52%   62.91%   -33.61%     
=====================================================
  Files                   266       10      -256     
  Lines                  7191      658     -6533     
  Branches                540      118      -422     
=====================================================
- Hits                   6941      414     -6527     
+ Misses                  196      191        -5     
+ Partials                 54       53        -1     
Flag Coverage Δ
query-workbench 62.91% <ø> (ø)
sql-engine ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/opensearch/sql/analysis/Analyzer.java
...ch/sql/planner/logical/LogicalPlanNodeVisitor.java
.../sql/planner/physical/PhysicalPlanNodeVisitor.java
...ch/sql/opensearch/client/OpenSearchNodeClient.java
...ch/sql/opensearch/client/OpenSearchRestClient.java
...ecutor/protector/OpenSearchExecutionProtector.java
...search/sql/opensearch/storage/OpenSearchIndex.java
...java/org/opensearch/sql/ppl/parser/AstBuilder.java
.../org/opensearch/sql/ppl/utils/ArgumentFactory.java
.../opensearch/sql/protocol/response/QueryResult.java
... and 246 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ffa1ca...c703f46. Read the comment docs.

@jackiehanyang
Copy link
Contributor Author

Will send out a separate PR for documentation update along with documentation for AD integration.

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change!

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor things, other looks good.

Comment on lines 168 to 171
Map<String, Object> items = new HashMap<>();
input.next().tupleValue().forEach((key, value) ->
items.put(key, value.value()));
inputData.add(items);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you simplify as inputData.add((Map<String, Object>)value.value())?

* Get ml-commons client.
* @return ml-commons client
*/
MachineLearningClient mlCommonsClient();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add new mlCommonsClient interface. Because OpenSearchClient is not designed as Factory class.
Instead, Could you add new interface NodeClient getNodeClient() interface?

*/
@RequiredArgsConstructor
@EqualsAndHashCode(callSuper = false)
public class MLCommonsOperator extends PhysicalPlan {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap MLCommonsOperator with OpenSearchExecutionProtector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why MLCommonsOperator needs to be wrapped by OpenSearchExecutionProtector? extends two classes will make nesting structure.

return new HashMap<String, Literal>() {{
put("shingle_size", (ctx.shingle_size != null)
? getArgumentValue(ctx.shingle_size)
: new Literal(8, DataType.INTEGER));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have these default values in MLCommons. How about we just set these values as null in PPL, that means we just let MLCommons to decide what's the default value. It may conflict in future if we have both PPL and MLCommons set default values.

//STRING_LITERAL: DQUOTA_STRING | SQUOTA_STRING | BQUOTA_STRING;
ID: ID_LITERAL;
INTEGER_LITERAL: DEC_DIGIT+;
DOUBLE_LITERAL: (DEC_DIGIT+)? '.' DEC_DIGIT+;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems DOUBLE_LITERAL is same with DECIMAL_LITERAL. Do we really need this?

.build();
}

private Map<String, ExprValue> convertRowIntoExprValue(ColumnMeta[] columnMetas, Row row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: duplicate code with MLCommonsOperator, can refactor this part.

compile group: 'org.json', name: 'json', version:'20180813'
compileOnly group: 'org.opensearch.client', name: 'opensearch-rest-high-level-client', version: "${opensearch_version}"
compile group: 'org.opensearch.ml', name:'opensearch-ml-client', version: '1.3.0.0'
compile group: 'org.opensearch', name: 'opensearch', version: "1.3.0-SNAPSHOT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it required?

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

@penghuo penghuo merged commit 9768bd9 into opensearch-project:feature/ppl-ml Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants