Skip to content

Conversation

@aalva500-prog
Copy link
Contributor

@aalva500-prog aalva500-prog commented Jan 10, 2026

Description

Implement a convert command in PPL for numeric and memory data type conversions. This command provides a streamlined interface for converting field values to numeric types, reducing the need for multiple eval statements and improving query readability.

Key Features:

  • Numeric Conversions: auto(), num(), rmcomma(), rmunit() , memk() functions
  • Field Preservation: AS clause support for creating new fields while preserving originals
  • Multiple Conversions: Support for multiple conversion functions in a single command

Syntax:

... | convert <convert-function>(<field>) [AS <field>] [<convert-function>(<field>) [AS <field>]]...

Example Usage:

source=accounts | convert auto(balance)
source=data | convert rmcomma(price) AS clean_price
source=metrics | convert rmunit(duration) num(count)
source=processor | convert memk(virtual_memory)

Future Enhancements:
This initial implementation focuses on numeric and memory conversions. Time conversions (ctime(), dur2sec(), etc.) and wildcard support will be implemented in subsequent PRs.

This PR partially addresses the current limitation where users must chain multiple evaluation functions or write complex expressions for common conversion operations.

Related Issues

Resolves partially (only numeric and memory conversions in this PR) #5001

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 10, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added PPL Convert command to apply one-or-more conversion functions to fields.
  • Functions

    • New conversion framework and built-in conversion functions: auto, num, rmcomma, rmunit, memk, none (including a memory-aware AUTO).
  • Language

    • Lexer/parser updated to recognize the convert command and TIMEFORMAT token; AST and planner handle convert semantics.
  • Documentation

    • Comprehensive convert command docs with syntax, examples, behavior, and limitations.
  • Testing

    • Extensive unit, integration, and explain tests plus anonymization coverage for convert.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a new PPL convert command: lexer/parser and AST node, AST builder support, analyzer and field-resolution handling, Calcite planner integration to emit conversion projections, a shared BaseConversionUDF with multiple concrete conversion functions, function registration, docs, and extensive tests.

Changes

Cohort / File(s) Summary
Lexer & Parser
ppl/src/main/antlr/OpenSearchPPLLexer.g4, ppl/src/main/antlr/OpenSearchPPLParser.g4
New tokens CONVERT, TIMEFORMAT and grammar rules convertCommand / convertFunction; integrated into commands/commandName.
AST & Visitor
core/src/main/java/org/opensearch/sql/ast/tree/Convert.java, core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java, ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
New Convert AST node, visitor hook visitConvert, and AstBuilder logic to construct Convert with Let conversions and target resolution.
Analysis / Field Resolution / Analyzer
core/src/main/java/org/opensearch/sql/analysis/Analyzer.java, core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java
Analyzer visit for Convert and FieldResolutionVisitor handling to derive input/output fields and push/pop requirements.
Calcite Planning
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
New visitConvert and helpers: process conversions (field-copy vs function), enforce single-source-per-conversion, and build conversion projection with replacements and aliases.
UDF Base & Implementations
core/src/main/java/org/opensearch/sql/expression/function/udf/BaseConversionUDF.java, .../AutoConvertFunction.java, .../NumConvertFunction.java, .../RmcommaConvertFunction.java, .../RmunitConvertFunction.java, .../MemkConvertFunction.java
New abstract BaseConversionUDF with Calcite implementor and parsing helpers; concrete conversion UDFs implementing conversion logic and static convert entry points.
Function Names & Registration
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java, .../PPLBuiltinOperators.java, .../PPLFuncImpTable.java
New builtin function names and PPL operator registrations for convert functions (AUTO, NUM, MEMK, RMCOMMA, RMUNIT, NONE, etc.) and wiring into function implementation table.
PPL Utilities & Anonymizer
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Anonymizer support for Convert node to mask function names/arguments in query strings.
Docs & Index
docs/user/ppl/cmd/convert.md, docs/user/ppl/index.md
Added detailed convert command documentation, examples, and index entry.
Tests & Resources
ppl/src/test/..., core/src/test/..., integ-test/src/test/..., integ-test/src/test/resources/expectedOutput/**
Extensive unit and integration tests for conversion functions and PPL Convert (Calcite-enabled and disabled), explain YAMLs, cross-cluster tests, anonymizer tests; test suite registrations updated.

Sequence Diagram(s)

sequenceDiagram
    participant Parser
    participant AstBuilder
    participant Analyzer
    participant FieldResolver
    participant Calcite
    participant UDFs
    participant Store

    Parser->>AstBuilder: parse "CONVERT ..." -> Convert node
    AstBuilder->>AstBuilder: build Let conversions (function, source field, target/alias)
    AstBuilder-->>Analyzer: emit Convert AST
    Analyzer->>FieldResolver: visitConvert(Convert)
    FieldResolver->>FieldResolver: compute inputFields & outputFields
    Analyzer->>Calcite: visitConvert(Convert)
    Calcite->>Calcite: iterate conversions
    alt field-copy conversion
        Calcite->>Calcite: record replacement or alias mapping
    else function conversion
        Calcite->>UDFs: reference conversion UDF (AUTO/NUM/...)
        UDFs-->>Calcite: provide Rex expression / conversion logic
    end
    Calcite->>Calcite: build projection applying replacements + appended converted aliases
    Calcite-->>Store: execute planned query
    Store-->>Calcite: return rows
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

enhancement, PPL, calcite, v3.5.0

Suggested reviewers

  • penghuo
  • LantaoJin
  • dai-chen
  • anirudha
  • kavithacm
  • joshuali925
  • GumpacG
  • Yury-Fridlyand
  • MaxKsyunz
  • ps48
  • vamsimanohar
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Adding PPL convert command with numeric and memory conversion functions' clearly and specifically summarizes the main change: introducing a new PPL convert command with specific conversion capabilities.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the convert command's features, syntax, examples, and future enhancements relevant to the implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Comment @coderabbitai help to get the list of available commands and usage tips.

@aalva500-prog aalva500-prog force-pushed the feature/ppl-convert-command branch 2 times, most recently from ea04424 to c283b39 Compare January 12, 2026 20:03
- Added convert command syntax and AST nodes (Convert, ConvertFunction)
- Implemented 5 conversion functions: auto, num, rmcomma, rmunit, none
- Full Calcite pushdown support via CalciteRelNodeVisitor
- Logical and physical operators (LogicalConvert, ConvertOperator)
- Comprehensive test coverage (26 tests total):
  * 13 unit tests (CalcitePPLConvertTest)
  * 8 integration tests with pushdown (ConvertCommandIT)
  * 8 non-pushdown tests (CalciteConvertCommandIT)
  * 3 explain tests (ExplainIT)
  * 2 cross-cluster tests (CrossClusterSearchIT)
  * Anonymizer and v2 unsupported tests
- User documentation (docs/user/ppl/cmd/convert.md)
- Code cleanup: removed dead code, simplified javadocs
- Version: 3.5 (experimental)

All tests compile and pass successfully.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Centralized common conversion function logic into BaseConversionUDF base class,
reducing code from 219 to 162 lines (26% reduction) while maintaining all functionality.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
@aalva500-prog aalva500-prog force-pushed the feature/ppl-convert-command branch from c379ad2 to 27e4bb9 Compare January 14, 2026 07:18
dai-chen
dai-chen previously approved these changes Jan 26, 2026
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteConvertCommandIT.java`:
- Around line 20-21: Update CalciteConvertCommandIT to add
NULL/empty/boundary/multi-doc coverage for the new convert functions: add test
cases that exercise rmcomma, rmunit, memk and other convert functions with (1)
NULL/missing fields, (2) empty and whitespace-only strings, (3) very large
numeric values (boundary/overflow candidates), (4) invalid units specifically
for memk (e.g., unsupported unit strings), and (5) multi-document scenarios
where some docs contain valid values and others contain null/empty/invalid
values to ensure aggregate/row behavior; implement these as new test methods in
the CalciteConvertCommandIT class using the existing PPLIntegTestCase helpers
and name them to reflect the case (e.g., testRmcommaNullEmptyBoundary,
testRmunitMultiDoc, testMemkInvalidUnitAndBoundary) so they run with the
existing test suite.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java`:
- Around line 138-159: Add unit tests in ConversionFunctionsTest to cover null,
empty, and whitespace-only inputs for RmunitConvertFunction: call
RmunitConvertFunction.convert(null), RmunitConvertFunction.convert(""), and
RmunitConvertFunction.convert("   ") and assert they return null. Place these
assertions alongside the existing rmunit tests (e.g., in testRmunitConvert or a
new test method) so the test suite verifies the function correctly handles null
and empty-string edge cases.
- Around line 81-111: Add unit tests for null and empty-string inputs to the
NumConvertFunction coverage: in ConversionFunctionsTest add assertions that
NumConvertFunction.convert(null) returns null and that
NumConvertFunction.convert("") (and optionally strings with only whitespace like
"   ") also return null; implement these as new `@Test` cases or extend an
existing test (e.g., testNumConvertWithUnits) to call NumConvertFunction.convert
and assertNull on the results so the behavior matches AutoConvertFunction and
RmcommaConvertFunction tests.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteConvertCommandIT.java`:
- Around line 63-73: The test method testConvertWithAlias in
CalciteConvertCommandIT currently only selects balance_num and thus doesn't
verify that convert auto(balance) AS balance_num preserves the original balance
field; update the query string to include both the original field and the
aliased field in the fields clause (e.g., fields balance, balance_num), then
adjust verifySchema to assert both columns (balance with its original type and
balance_num as double) and update verifyDataRows to include corresponding paired
values for each row so the test asserts preservation of the original balance
alongside the converted balance_num.

…tCommand

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
@aalva500-prog aalva500-prog force-pushed the feature/ppl-convert-command branch from 5452d31 to f7d2f1b Compare January 27, 2026 03:13
aalva500-prog and others added 4 commits January 26, 2026 19:13
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Add testNumConvertNullAndEmpty() to ensure NumConvertFunction.convert()
properly returns null for null, empty string, and whitespace-only inputs,
matching the coverage pattern of other conversion functions.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Add testRmunitConvertNullAndEmpty() to ensure RmunitConvertFunction.convert()
properly returns null for null, empty string, and whitespace-only inputs,
completing the test coverage pattern across all conversion functions.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Update testConvertWithAlias to verify that using AS clause preserves
the original field alongside the converted field. Test now checks both
balance (bigint) and balance_num (double) are present in results.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 205-214: The test testConvertCommand only calls verifyQuery when
executeQuery throws ResponseException, so when executeQuery succeeds the result
is never asserted; update the testConvertCommand to capture the successful
response as a JSONObject and call verifyQuery in the success path as well: call
executeQuery(String.format("search source=%s | convert auto(balance)",
TEST_INDEX_BANK)) and convert its response body to a JSONObject (same as done
for TestUtils.getResponseBody(e.getResponse()) in the catch), assign it to
result, and invoke verifyQuery(result) in both the try and catch branches so
both success and error paths are validated; use the same variable names result,
executeQuery, ResponseException, and verifyQuery to locate the change.
- Around line 205-214: Extend NewAddedCommandsIT (e.g., testConvertCommand) to
add focused integration tests that assert successful per-document conversions
and edge cases when Calcite is enabled: add new `@Test` cases that call
executeQuery and on success parse the JSON response (instead of only catching
ResponseException) to verify converted values (correct types/values), handle
null/empty source fields, run multi-document queries to ensure per-document
behavior, test boundary values (large/small/precision cases) and invalid inputs
that should return errors; reference existing helpers like executeQuery,
verifyQuery, TEST_INDEX_BANK and ConvertCommandIT to structure tests and reuse
setup/teardown so each scenario validates actual conversion correctness rather
than only the Calcite-disabled error path.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 531-533: The convertCommand grammar currently enforces mandatory
commas between convertFunction entries; update the rule for convertCommand to
allow optional commas so space-separated function lists are accepted by changing
the repetition from (COMMA convertFunction)* to (COMMA? convertFunction)*;
locate the convertCommand rule and modify that repetition (keeping the existing
TIMEFORMAT EQUAL timeformatValue = stringLiteral optional part and
convertFunction references intact) to match the same COMMA? pattern used by
wcFieldList for consistency.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1204-1217: handleNoneConversion currently ignores extra arguments;
add a validation that fieldArgs has exactly one element and fail fast if not. In
handleNoneConversion(List<UnresolvedExpression> fieldArgs, Field targetField)
check if (fieldArgs.size() != 1) and throw an appropriate exception (e.g.,
IllegalArgumentException or a project-specific parse/semantic exception) with a
message like "none() requires exactly one field argument"; keep the subsequent
logic (source/target name check and returning new Let(targetField,
fieldArgs.get(0))) unchanged.
🧹 Nitpick comments (1)
ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)

535-537: Consider using fieldExpression instead of fieldList for single-field semantics.

The convertFunction rule uses fieldList which allows multiple fields, but all current usage patterns show single-field conversions: auto(balance), rmcomma(price), memk(virtual_memory). While the implementation correctly handles multiple fields, restricting to fieldExpression at the grammar level would enforce single-field constraint and provide clearer intent. Compare with takeAggFunction (line 815) which explicitly accepts fieldExpression for its field argument.

♻️ Suggested change if single-field is intended
 convertFunction
-   : functionName = ident LT_PRTHS fieldList RT_PRTHS (AS alias = fieldExpression)?
+   : functionName = ident LT_PRTHS fieldExpression RT_PRTHS (AS alias = fieldExpression)?
    ;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@core/src/test/java/org/opensearch/sql/expression/function/udf/ConversionFunctionsTest.java`:
- Around line 217-260: Add explicit min/max boundary tests in
ConversionFunctionsTest covering zero and numeric extremes for each UDF
(NumConvertFunction, AutoConvertFunction, RmcommaConvertFunction,
RmunitConvertFunction): add assertions for "0", "+0", "-0" and for extreme
values such as String.valueOf(Double.MAX_VALUE), "-" + Double.MAX_VALUE,
String.valueOf(Double.MIN_VALUE), and an overflow string like "1e309" (or
similarly huge value) and assert the actual behavior of the implementation
(e.g., assertEquals(Double.MAX_VALUE, ...), assertEquals(0.0, ...), or
assertNull/assertEquals(Double.POSITIVE_INFINITY, ...) to document
overflow/precision behavior) so tests explicitly capture min/max/overflow
semantics for each function.

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
aaarone90 and others added 4 commits January 27, 2026 14:18
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java`:
- Around line 638-658: Add JavaDoc to the public override method visitConvert in
FieldResolutionVisitor: document the purpose (resolving field requirements
across Convert nodes), include `@param` for the Convert node parameter and the
FieldResolutionContext parameter, an `@return` describing that the same Node is
returned, and an `@throws` if the method can propagate any runtime exceptions (or
explicitly state none are thrown). Mention key behavior: that it computes
inputFields/outputFields, adjusts upstream FieldResolutionResult (using
FieldResolutionResult) on the context, pushes and pops requirements, and visits
children. Ensure the JavaDoc follows project style and is placed directly above
the visitConvert method signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants