-
Notifications
You must be signed in to change notification settings - Fork 181
PPL Integration - Add implementation for KMeans algorithm #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jackie Han <[email protected]>
|
|
| /** commands */ | ||
| commands | ||
| : whereCommand | fieldsCommand | renameCommand | statsCommand | dedupCommand | sortCommand | evalCommand | headCommand | ||
| | topCommand | rareCommand; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have considered more generic command name for ML? e.g. apply model-name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed how to construct the command so that it can be as generic as possible, and it seems like just pass in the algorithm name is the best solution. Model-name or model-id is hidden in the whole train and predict process. We don't want customers to make an effort to take notes on model attribute and pass it in the command.
core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java
Show resolved
Hide resolved
| Map<String, Object> items = new HashMap<>(); | ||
| input.next().tupleValue().forEach((key, value) -> { | ||
| items.put(key, value.value()); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ExprTulpleValue::value()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean. I don't think we are able to reference ExprTulpleValue here? Could you elaborate more?
| public void open() { | ||
| super.open(); | ||
| DataFrame inputDataFrame = generateInputDataset(); | ||
| MLAlgoParams mlAlgoParams = convertArgumentToMLParameter(arguments.get(0), algorithm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to parse Argument in Analyzer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to put Argument parsing logic in MLCommonsOperator, because I don't see any argument parsing logic in Analyzer for other logical plans. Also, it looks like the main purpose of Analyzer class is to construct the logical plan, so I prefer to leave the argument parsing work for the actual operator, which is MLCommonsOperator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we parse Argument in Analyzer, it will create dependencies in Core module as Analyzer class sits in Core module.
|
Thanks for making the change!. Not finished yet, two high level comments
|
core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/planner/physical/MLCommonsOperator.java
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## feature/ppl-ml #407 +/- ##
=====================================================
- Coverage 96.52% 62.91% -33.61%
=====================================================
Files 266 10 -256
Lines 7191 658 -6533
Branches 540 118 -422
=====================================================
- Hits 6941 414 -6527
+ Misses 196 191 -5
+ Partials 54 53 -1
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Signed-off-by: jackieyanghan <[email protected]>
|
Will send out a separate PR for documentation update along with documentation for AD integration. |
ylwu-amzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the change!
penghuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor things, other looks good.
| Map<String, Object> items = new HashMap<>(); | ||
| input.next().tupleValue().forEach((key, value) -> | ||
| items.put(key, value.value())); | ||
| inputData.add(items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you simplify as inputData.add((Map<String, Object>)value.value())?
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/MLCommonsOperator.java
Outdated
Show resolved
Hide resolved
| * Get ml-commons client. | ||
| * @return ml-commons client | ||
| */ | ||
| MachineLearningClient mlCommonsClient(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add new mlCommonsClient interface. Because OpenSearchClient is not designed as Factory class.
Instead, Could you add new interface NodeClient getNodeClient() interface?
| */ | ||
| @RequiredArgsConstructor | ||
| @EqualsAndHashCode(callSuper = false) | ||
| public class MLCommonsOperator extends PhysicalPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap MLCommonsOperator with OpenSearchExecutionProtector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why MLCommonsOperator needs to be wrapped by OpenSearchExecutionProtector? extends two classes will make nesting structure.
Signed-off-by: jackieyanghan <[email protected]>
| return new HashMap<String, Literal>() {{ | ||
| put("shingle_size", (ctx.shingle_size != null) | ||
| ? getArgumentValue(ctx.shingle_size) | ||
| : new Literal(8, DataType.INTEGER)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have these default values in MLCommons. How about we just set these values as null in PPL, that means we just let MLCommons to decide what's the default value. It may conflict in future if we have both PPL and MLCommons set default values.
| //STRING_LITERAL: DQUOTA_STRING | SQUOTA_STRING | BQUOTA_STRING; | ||
| ID: ID_LITERAL; | ||
| INTEGER_LITERAL: DEC_DIGIT+; | ||
| DOUBLE_LITERAL: (DEC_DIGIT+)? '.' DEC_DIGIT+; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems DOUBLE_LITERAL is same with DECIMAL_LITERAL. Do we really need this?
| .build(); | ||
| } | ||
|
|
||
| private Map<String, ExprValue> convertRowIntoExprValue(ColumnMeta[] columnMetas, Row row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: duplicate code with MLCommonsOperator, can refactor this part.
Signed-off-by: jackieyanghan <[email protected]>
opensearch/build.gradle
Outdated
| compile group: 'org.json', name: 'json', version:'20180813' | ||
| compileOnly group: 'org.opensearch.client', name: 'opensearch-rest-high-level-client', version: "${opensearch_version}" | ||
| compile group: 'org.opensearch.ml', name:'opensearch-ml-client', version: '1.3.0.0' | ||
| compile group: 'org.opensearch', name: 'opensearch', version: "1.3.0-SNAPSHOT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it required?
Signed-off-by: jackieyanghan <[email protected]>
ylwu-amzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
penghuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change!
Signed-off-by: Jackie Han [email protected]
Description
[Describe what this change achieves]
Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.