PPL Integration - Add implementation for KMeans algorithm #407

jackiehanyang · 2022-02-01T22:38:03Z

Signed-off-by: Jackie Han [email protected]

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass, including unit test, integration test and doctest
New functionality has been documented.
- New functionality has javadoc added
- New functionality has user manual doc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jackie Han <[email protected]>

jackiehanyang · 2022-02-02T00:16:01Z

build-windows32 and build-windows64 check failed due to actions/runner-images#4856

penghuo · 2022-02-02T21:56:24Z

ppl/src/main/antlr/OpenSearchPPLParser.g4

 /** commands */
 commands
    : whereCommand | fieldsCommand | renameCommand | statsCommand | dedupCommand | sortCommand | evalCommand | headCommand
-    | topCommand | rareCommand;


Have considered more generic command name for ML? e.g. apply model-name?

We discussed how to construct the command so that it can be as generic as possible, and it seems like just pass in the algorithm name is the best solution. Model-name or model-id is hidden in the whole train and predict process. We don't want customers to make an effort to take notes on model attribute and pass it in the command.

core/build.gradle

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java

penghuo · 2022-02-02T23:20:00Z

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java

+      Map<String, Object> items = new HashMap<>();
+      input.next().tupleValue().forEach((key, value) -> {
+        items.put(key, value.value());
+      });


Could you use ExprTulpleValue::value()?

I'm not quite sure what you mean. I don't think we are able to reference ExprTulpleValue here? Could you elaborate more?

penghuo · 2022-02-03T01:09:42Z

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java

+  public void open() {
+    super.open();
+    DataFrame inputDataFrame = generateInputDataset();
+    MLAlgoParams mlAlgoParams = convertArgumentToMLParameter(arguments.get(0), algorithm);


Does it make sense to parse Argument in Analyzer?

I prefer to put Argument parsing logic in MLCommonsOperator, because I don't see any argument parsing logic in Analyzer for other logical plans. Also, it looks like the main purpose of Analyzer class is to construct the logical plan, so I prefer to leave the argument parsing work for the actual operator, which is MLCommonsOperator.

If we parse Argument in Analyzer, it will create dependencies in Core module as Analyzer class sits in Core module.

penghuo · 2022-02-03T01:14:02Z

Thanks for making the change!. Not finished yet, two high level comments

In general, Core module should know nothing about Storage Engine. Which means each storage engine should decide how to convert LogicalPlan to PhysicalPlan. So, MLCommonsOperator should be implemented in OpenSearch module instead of Core module.
Could you add doc for ML command also? e.g. https://github.com/penghuo/os-sql/blob/issue-396-pr/docs/user/ppl/index.rst

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java

ppl/src/main/java/org/opensearch/sql/ppl/utils/ArgumentFactory.java

codecov-commenter · 2022-02-07T22:43:58Z

Codecov Report

Merging #407 (c703f46) into feature/ppl-ml (3ffa1ca) will decrease coverage by 33.60%.
The diff coverage is n/a.

@@                  Coverage Diff                  @@
##             feature/ppl-ml     #407       +/-   ##
=====================================================
- Coverage             96.52%   62.91%   -33.61%     
=====================================================
  Files                   266       10      -256     
  Lines                  7191      658     -6533     
  Branches                540      118      -422     
=====================================================
- Hits                   6941      414     -6527     
+ Misses                  196      191        -5     
+ Partials                 54       53        -1

Flag	Coverage Δ
query-workbench	`62.91% <ø> (ø)`
sql-engine	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ain/java/org/opensearch/sql/analysis/Analyzer.java
...ch/sql/planner/logical/LogicalPlanNodeVisitor.java
.../sql/planner/physical/PhysicalPlanNodeVisitor.java
...ch/sql/opensearch/client/OpenSearchNodeClient.java
...ch/sql/opensearch/client/OpenSearchRestClient.java
...ecutor/protector/OpenSearchExecutionProtector.java
...search/sql/opensearch/storage/OpenSearchIndex.java
...java/org/opensearch/sql/ppl/parser/AstBuilder.java
.../org/opensearch/sql/ppl/utils/ArgumentFactory.java
.../opensearch/sql/protocol/response/QueryResult.java
... and 246 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ffa1ca...c703f46. Read the comment docs.

Signed-off-by: jackieyanghan <[email protected]>

jackiehanyang · 2022-02-09T23:08:44Z

Will send out a separate PR for documentation update along with documentation for AD integration.

ylwu-amzn

LGTM. Thanks for the change!

penghuo

some minor things, other looks good.

penghuo · 2022-02-09T22:57:00Z

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/MLCommonsOperator.java

+      Map<String, Object> items = new HashMap<>();
+      input.next().tupleValue().forEach((key, value) ->
+              items.put(key, value.value()));
+      inputData.add(items);


Could you simplify as inputData.add((Map<String, Object>)value.value())?

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/MLCommonsOperator.java

penghuo · 2022-02-09T23:36:52Z

opensearch/src/main/java/org/opensearch/sql/opensearch/client/OpenSearchClient.java

+   * Get ml-commons client.
+   * @return ml-commons client
+   */
+  MachineLearningClient mlCommonsClient();


I don't think we should add new mlCommonsClient interface. Because OpenSearchClient is not designed as Factory class.
Instead, Could you add new interface NodeClient getNodeClient() interface?

penghuo · 2022-02-09T23:38:31Z

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/MLCommonsOperator.java

+ */
+@RequiredArgsConstructor
+@EqualsAndHashCode(callSuper = false)
+public class MLCommonsOperator extends PhysicalPlan {


Wrap MLCommonsOperator with OpenSearchExecutionProtector.

Why MLCommonsOperator needs to be wrapped by OpenSearchExecutionProtector? extends two classes will make nesting structure.

Signed-off-by: jackieyanghan <[email protected]>

ylwu-amzn · 2022-02-24T08:01:11Z

ppl/src/main/java/org/opensearch/sql/ppl/utils/ArgumentFactory.java

+    return new HashMap<String, Literal>() {{
+      put("shingle_size", (ctx.shingle_size != null)
+              ? getArgumentValue(ctx.shingle_size)
+              : new Literal(8, DataType.INTEGER));


We already have these default values in MLCommons. How about we just set these values as null in PPL, that means we just let MLCommons to decide what's the default value. It may conflict in future if we have both PPL and MLCommons set default values.

ylwu-amzn · 2022-02-24T08:02:26Z

ppl/src/main/antlr/OpenSearchPPLLexer.g4

 //STRING_LITERAL:                     DQUOTA_STRING | SQUOTA_STRING | BQUOTA_STRING;
 ID:                                 ID_LITERAL;
 INTEGER_LITERAL:                    DEC_DIGIT+;
+DOUBLE_LITERAL:                     (DEC_DIGIT+)? '.' DEC_DIGIT+;


Seems DOUBLE_LITERAL is same with DECIMAL_LITERAL. Do we really need this?

ylwu-amzn · 2022-02-24T08:08:52Z

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/ADOperator.java

+            .build();
+  }
+
+  private Map<String, ExprValue> convertRowIntoExprValue(ColumnMeta[] columnMetas, Row row) {


Minor: duplicate code with MLCommonsOperator, can refactor this part.

Signed-off-by: jackieyanghan <[email protected]>

penghuo · 2022-03-03T17:10:57Z

opensearch/build.gradle

    compile group: 'org.json', name: 'json', version:'20180813'
    compileOnly group: 'org.opensearch.client', name: 'opensearch-rest-high-level-client', version: "${opensearch_version}"
+    compile group: 'org.opensearch.ml', name:'opensearch-ml-client', version: '1.3.0.0'
+    compile group: 'org.opensearch', name: 'opensearch', version: "1.3.0-SNAPSHOT"


Does it required?

Signed-off-by: jackieyanghan <[email protected]>

ylwu-amzn

LGTM

penghuo

Thanks for the change!

PPL Integration - Add implementation for KMeans algorithm

08ae4a2

Signed-off-by: Jackie Han <[email protected]>

jackiehanyang requested a review from a team as a code owner February 1, 2022 22:38

penghuo reviewed Feb 2, 2022

View reviewed changes

core/build.gradle Outdated Show resolved Hide resolved

penghuo reviewed Feb 2, 2022

View reviewed changes

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java Show resolved Hide resolved

penghuo reviewed Feb 2, 2022

View reviewed changes

penghuo reviewed Feb 3, 2022

View reviewed changes

ylwu-amzn reviewed Feb 4, 2022

View reviewed changes

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java Outdated Show resolved Hide resolved

ylwu-amzn reviewed Feb 4, 2022

View reviewed changes

core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java Show resolved Hide resolved

ylwu-amzn reviewed Feb 4, 2022

View reviewed changes

ppl/src/main/java/org/opensearch/sql/ppl/utils/ArgumentFactory.java Show resolved Hide resolved

Move ml-commons dependency to opesnsearch module

979279a

Signed-off-by: jackieyanghan <[email protected]>

ylwu-amzn approved these changes Feb 10, 2022

View reviewed changes

penghuo reviewed Feb 11, 2022

View reviewed changes

Add new interface for ml-commons client

e0026cf

Signed-off-by: jackieyanghan <[email protected]>

ylwu-amzn reviewed Feb 24, 2022

View reviewed changes

Exclude ml-common clients related tests

cda9360

Signed-off-by: jackieyanghan <[email protected]>

penghuo reviewed Mar 3, 2022

View reviewed changes

Remove duplicate opensearch dependency

def9e40

Signed-off-by: jackieyanghan <[email protected]>

ylwu-amzn approved these changes Mar 3, 2022

View reviewed changes

penghuo mentioned this pull request Mar 3, 2022

Update github action on building dependency ml-commons #454

Closed

penghuo approved these changes Mar 3, 2022

View reviewed changes

penghuo merged commit 9768bd9 into opensearch-project:feature/ppl-ml Mar 3, 2022

jackiehanyang mentioned this pull request Mar 8, 2022

PPL integration with AD and ml-commons #468

Merged

6 tasks

PPL Integration - Add implementation for KMeans algorithm #407

PPL Integration - Add implementation for KMeans algorithm #407

Uh oh!

Conversation

jackiehanyang commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Check List

Uh oh!

jackiehanyang commented Feb 2, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackiehanyang Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penghuo commented Feb 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jackiehanyang commented Feb 9, 2022

Uh oh!

ylwu-amzn left a comment

Choose a reason for hiding this comment

Uh oh!

penghuo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ylwu-amzn left a comment

Choose a reason for hiding this comment

Uh oh!

penghuo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jackiehanyang commented Feb 1, 2022 •

edited

Loading

jackiehanyang Feb 7, 2022 •

edited

Loading

codecov-commenter commented Feb 7, 2022 •

edited

Loading