Skip to content

Conversation

@kylehounslow
Copy link
Contributor

@kylehounslow kylehounslow commented Dec 6, 2025

Description

This PR converts all PPL documentation under docs/user/ppl/ from reStructuredText to Markdown format, enabling automated export to the main OpenSearch
documentation website
at opensearch-project/documentation-website.

Important Note: All existing doctest coverage has been migrated successfully and all existing GitHub-based documentation remains intact and fully functional. This change enables PPL documentation to appear on the main OpenSearch docs site while preserving the existing GitHub-based documentation experience. See demo below.

Live Demo:

Why?

  • Enables automatic export of PPL docs to main OpenSearch documentation site. Currently changes are made via manual copy/paste resulting in stale, inconsistent docs.
  • Improves discoverability of new/existing PPL commands and functionality.
  • Improves developer experience with clean, copy-able PPL code snippets.

Related Issues

Summary of Changes

Documentation Format Migration

  • Converted 70+ RST files to Markdown across all PPL documentation sections
  • Updated docs/category.json to reflect new file structure
  • Removed shell prefixes and output from code blocks for clean copy-paste
  • GitHub documentation experience unchanged - same content, same navigation, rendered by GitHub's native Markdown support

Doctest Changes

  • Added markdown_parser.py to support doctest execution on Markdown code blocks
  • Extended existing doctest framework to handle both RST and Markdown formats
  • All existing tests pass with new parser

Export Tooling

  • export_to_docs_website.py - Jekyll-compatible export to inject proper front-matter, etc while preserving exact docs structure from docs/user/ppl.
  • Conversion scripts (only run once. Kept for reference/re-use on remaining sql docs):
    • convert_rst_to_md.py - Automated RST to Markdown conversion
    • fix_markdown_formatting.py - Post-conversion cleanup and standardization to ensure proper Jekyll rendering

Future PR

  • Migrate remaining docs/user/sql to markdown.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • [n/a] New functionality has javadoc added.
  • [n/a] New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • [n/a] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 6, 2025

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (1)
  • prometheus/src/test/java/org/opensearch/sql/prometheus/storage/PrometheusStorageFactoryTest.java

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Summary by CodeRabbit

  • Documentation
    • Migrated PPL reference documentation from reStructuredText to Markdown format for improved readability.
    • Added documentation for Prometheus, S3Glue, and Security Lake connectors with configuration and usage examples.
    • Added comprehensive guides for datasource management, security settings, monitoring, and cross-cluster search.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Migrate a large set of PPL documentation from reStructuredText (.rst) to Markdown (.md), adding new Markdown docs for admin, connectors, commands, and functions while removing the corresponding .rst files and updating developer/doctest guidance and docs/category.json.

Changes

Cohort / File(s) Summary
Admin docs
docs/user/ppl/admin/datasources.md, docs/user/ppl/admin/settings.md, docs/user/ppl/admin/security.md, docs/user/ppl/admin/monitoring.md, docs/user/ppl/admin/cross_cluster_search.md
Added Markdown admin guides for datasources, settings, security, monitoring, and cross-cluster search; removed corresponding .rst files.
Connector docs
docs/user/ppl/admin/connectors/prometheus_connector.md, docs/user/ppl/admin/connectors/s3glue_connector.md, docs/user/ppl/admin/connectors/security_lake_connector.md
Added Markdown connector guides (Prometheus, S3Glue, Security Lake); deleted their .rst counterparts.
Command docs (bulk)
Commands: docs/user/ppl/cmd/*.md (e.g., ad.md, append.md, appendcol.md, appendpipe.md, bin.md, chart.md, dedup.md, describe.md, eval.md, eventstats.md, expand.md, explain.md, fields.md, fillnull.md, flatten.md, grok.md, head.md, join.md, kmeans.md, lookup.md, ml.md, multisearch.md, parse.md, patterns.md, rare.md, regex.md, rename.md, replace.md, reverse.md, rex.md, search.md, showdatasources.md, sort.md, spath.md, stats.md, streamstats.md, subquery.md, syntax.md, table.md, timechart.md, top.md, trendline.md, where.md)
Added Markdown versions for 50+ PPL command docs with syntax, examples, and limitations; removed corresponding .rst pages.
Function docs
docs/user/ppl/functions/aggregations.md, docs/user/ppl/functions/collection.md
Added Markdown references for aggregation and collection functions; removed .rst equivalents.
Developer & tooling
docs/category.json, DEVELOPER_GUIDE.rst, docs/dev/testing-doctest.md, docs/dev/ppl-commands.md, README.md, docs/dev/intro-v3-engine.md
Updated category config and README links to point to .md pages; updated developer guide/doctest instructions to include Markdown format and added MD doctest guidance; small docs updates.
Removals (paired with above adds)
docs/user/ppl/admin/*.rst, docs/user/ppl/admin/connectors/*.rst, docs/user/ppl/cmd/*.rst, docs/user/ppl/functions/*.rst, docs/user/ppl/*/*.rst
Removed the original reStructuredText files across admin, connectors, commands, and functions (many files deleted).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Large number of files but highly homogeneous (format conversion); review focuses on content fidelity.
  • Files/areas to spot-check:
    • docs/category.json for correct path entries and ordering
    • Representative command pages with complex examples (e.g., search.md, stats.md, join.md, subquery.md)
    • Connector docs with config snippets (e.g., prometheus_connector.md, s3glue_connector.md)
    • docs/dev/testing-doctest.md and DEVELOPER_GUIDE.rst for doctest instructions and tooling changes
    • Cross-links and README updates for broken links

Suggested labels

PPL, enhancement

Suggested reviewers

  • penghuo
  • ps48
  • kavithacm
  • yuancu
  • joshuali925
  • GumpacG
  • forestmvey
  • noCharger
  • anirudha
  • Yury-Fridlyand

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Migrate PPL Documentation from RST to Markdown' accurately and concisely summarizes the primary change—converting PPL documentation format from RST to Markdown.
Description check ✅ Passed The description comprehensively explains the rationale, changes, and impact of converting PPL documentation from RST to Markdown, directly corresponding to the changeset.

Comment @coderabbitai help to get the list of available commands and usage tips.

@kylehounslow kylehounslow force-pushed the feat/markdown-doctests branch from f268760 to 2384f48 Compare December 8, 2025 19:11
@kylehounslow kylehounslow changed the title [DRAFT] Migrate PPL Documentation from RST to Markdown Migrate PPL Documentation from RST to Markdown Dec 8, 2025
@kylehounslow kylehounslow marked this pull request as ready for review December 8, 2025 19:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (24)
docs/user/ppl/admin/security.md-5-5 (1)

5-5: Fix minor typos and grammar issues.

  • Line 5: "User needs" should be "Users need" (or rephrase as "A user needs")
  • Line 9: "test_user. then test_user" → capitalize as "test_user. Then test_user"
  • Line 10: "grand permission" → should be "grant permission"

Apply this diff:

-User needs `cluster:admin/opensearch/ppl` permission to use PPL plugin. User also needs indices level permission `indices:admin/mappings/get` to get field mappings, `indices:monitor/settings/get` to get cluster settings, and `indices:data/read/search*` to search index.
+Users need `cluster:admin/opensearch/ppl` permission to use PPL plugin. They also need indices level permission `indices:admin/mappings/get` to get field mappings, `indices:monitor/settings/get` to get cluster settings, and `indices:data/read/search*` to search index.
-Example: Create the ppl_role for test_user. then test_user could use PPL to query `ppl-security-demo` index.
+Example: Create the ppl_role for test_user. Then test_user could use PPL to query `ppl-security-demo` index.
-1. Create the ppl_role and grand permission to access PPL plugin and access ppl-security-demo index
+1. Create the ppl_role and grant permission to access PPL plugin and access ppl-security-demo index

Also applies to: 9-10

docs/user/ppl/cmd/dedup.md-10-10 (1)

10-10: Fix grammatical error in parameter description.

Line 10 has "if the any field" which should be "if any field".

Apply this diff:

-* keepempty: optional. If set to true, keep the document if the any field in the field-list has NULL value or field is MISSING. **Default:** false.  
+* keepempty: optional. If set to true, keep the document if any field in the field-list has NULL value or field is MISSING. **Default:** false.  
docs/user/ppl/cmd/describe.md-8-8 (1)

8-8: Fix markdown linting error: reference-style link syntax in syntax notation.

The square brackets around optional parameters are interpreted as Markdown reference-style links. Since [schema.] lacks a reference definition, the linter flags this as an error. Update the syntax notation to avoid this ambiguity.

Apply this diff to fix the syntax notation:

-describe [dataSource.][schema.]\<tablename\>
+describe [dataSource.][schema.]<tablename>

Alternatively, if you prefer to emphasize the optional nature of the parameters more clearly, consider using code formatting:

-describe [dataSource.][schema.]\<tablename\>
+`describe [dataSource.][schema.]<tablename>`

Also note that the backslashes before angle brackets (\< and \>) are unnecessary in Markdown and can be removed.

docs/user/ppl/cmd/timechart.md-170-174 (1)

170-174: Fix Example 5 description—text contradicts query.

Line 170 states "counts events for each second" but line 174 uses span=1h (1 hour). This inconsistency creates confusion about what the example demonstrates.

Apply this diff to correct the description:

-This example counts events for each second and groups them by category
+This example counts events for each hour and groups them by category
docs/user/ppl/cmd/timechart.md-8-8 (1)

8-8: Remove unnecessary backslash escaping in syntax line.

Line 8 contains escaped angle brackets (\<field_name\>) that are not needed in Markdown and reduce readability. Standard angle brackets should be used directly in regular text.

Apply this diff to clean up the syntax line:

-timechart [timefield=\<field_name\>] [span=\<time_interval\>] [limit=\<number\>] [useother=\<boolean\>] \<aggregation_function\> [by \<field\>]
+timechart [timefield=<field_name>] [span=<time_interval>] [limit=<number>] [useother=<boolean>] <aggregation_function> [by <field>]
docs/user/ppl/cmd/flatten.md-10-10 (1)

10-10: Remove escaped angle brackets—these are RST syntax artifacts.

Lines 10 and 12 contain backslash-escaped angle brackets (\<field\>, \<alias-list\>) that are relics from the RST-to-Markdown conversion. In Markdown, these backslashes render literally, producing \<field\> instead of <field>. Remove the backslashes to properly display placeholders.

Apply this diff to fix the escaped angle brackets:

-flatten \<field\> [as (\<alias-list\>)]
+flatten <field> [as (<alias-list>)]
 * field: mandatory. The field to be flattened. Only object and nested fields are supported.  
-* alias-list: optional. The names to use instead of the original key names. Names are separated by commas. It is advised to put the alias-list in parentheses if there is more than one alias. The length must match the number of keys in the struct field. The provided alias names **must** follow the lexicographical order of the corresponding original keys in the struct.
+* alias-list: optional. The names to use instead of the original key names. Names are separated by commas. It is advised to put the alias-list in parentheses if there is more than one alias. The length must match the number of keys in the struct field. The provided alias names **must** follow the lexicographical order of the corresponding original keys in the struct.

Also applies to: 12-12

docs/user/ppl/cmd/flatten.md-85-92 (1)

85-92: Fix awkward line breaking in the Limitations section.

Lines 85–87 have unnecessary line breaks that disrupt readability. The phrase "when its flattened fields are invisible" is split across lines, creating a jarring reading experience. Reformat to keep the sentence intact.

Apply this diff to improve readability:

 ## Limitations  
 
-* `flatten` command may not work as expected when its flattened fields are  
-  
-  invisible.
-  For example in query
+* `flatten` command may not work as expected when its flattened fields are invisible.
+  For example in query
docs/user/ppl/cmd/top.md-102-123 (1)

102-123: Fix duplicate example numbering.

Line 102 and line 123 are both titled "Example 5". The second occurrence should be "Example 6" to maintain sequential numbering.

Apply this diff:

-## Example 5: Specify the usenull field option
+## Example 6: Specify the usenull field option
docs/user/ppl/cmd/fields.md-218-218 (1)

218-218: Fix backtick formatting in note.

Line 218 has confusing backtick escaping. Clarify the formatting to properly show the backtick syntax.

Apply this diff to fix the formatting:

-Note: The `*` wildcard selects fields based on the index schema, not on data content. Fields with null values are included in the result set. Use backticks `` `*` ` if the plain `*`` doesn't return all expected fields.
+Note: The `*` wildcard selects fields based on the index schema, not on data content. Fields with null values are included in the result set. Use backticks around `*` (e.g., `` `*` ``) if the plain `*` doesn't return all expected fields.
docs/user/ppl/cmd/fillnull.md-12-12 (1)

12-12: Fix sentence fragment in parameter description.

Line 12 starts with a description fragment that lacks a clear subject. The LanguageTool hint flagged missing subject structure.

Apply this diff:

-* field-list: optional. List of fields to apply the replacement to. Can be comma-delimited (with `with` or `using` syntax) or space-delimited (with `value=` syntax). **Default:** all fields.
+* field-list: optional. The list of fields to apply the replacement to. Can be comma-delimited (with `with` or `using` syntax) or space-delimited (with `value=` syntax). **Default:** all fields.
docs/user/ppl/cmd/ad.md-5-5 (1)

5-5: Fix hyphenation of compound adjectives.

The LanguageTool grammar hint indicates "fixed in time RCF" should use hyphens to form a proper compound adjective. Similarly, verify hyphenation of time-related terms throughout.

Apply this diff to fix the hyphenation:

-The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data.
+The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data.
docs/user/ppl/cmd/fillnull.md-8-10 (1)

8-10: Use plain angle brackets for syntax placeholders instead of backslash escaping.

Lines 8-10 use backslash-escaped angle brackets (\<replacement\>, \<field-list\>), which is non-standard for Markdown. Official OpenSearch PPL documentation uses plain angle brackets (e.g., <null-replacement>, <nullable-field>). Remove the backslash prefixes to match the documented syntax standard.

-fillnull with \<replacement\> [in \<field-list\>]
-fillnull using \<field\> = \<replacement\> [, \<field\> = \<replacement\>]
-fillnull value=\<replacement\> [\<field-list\>]
+fillnull with <replacement> [in <field-list>]
+fillnull using <field> = <replacement> [, <field> = <replacement>]
+fillnull value=<replacement> [<field-list>]
docs/user/ppl/cmd/subquery.md-110-135 (1)

110-135: Specify language for fenced code block.

Lines 110–135 contain an unspecified code block with ScalarSubquery and RelationSubquery examples. Add a language specifier.

-```
+```ppl
 //Uncorrelated scalar subquery in Select
docs/user/ppl/cmd/multisearch.md-30-34 (1)

30-34: Specify language for fenced code block.

Line 30 opens a code block without a language identifier. Add a language specifier (e.g., ppl or remove the block if it's meant to be literal syntax).

-```
+```ppl
| multisearch [search source=table | where condition1] [search source=table | where condition2]
docs/user/ppl/cmd/subquery.md-92-106 (1)

92-106: Specify language for fenced code block.

Lines 92–106 contain an unspecified code block with ExistsSubquery examples. Add a language specifier.

-```
+```ppl
 // Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table inner,  `e`, `f` are fields of table nested
docs/user/ppl/cmd/patterns.md-26-26 (1)

26-26: Apply hyphenation to compound adjectives.

Lines 26 and 95 use unhyphenated compound adjectives. These should be hyphenated for clarity:

  • Line 26: "low frequency words" → "low-frequency words"
  • Line 95: "user defined patterns" → "user-defined patterns"

Also applies to: 95-95

docs/user/ppl/cmd/patterns.md-32-43 (1)

32-43: Specify language for fenced code block.

Line 32 opens a code block without a language identifier. Since this is a cluster settings command, use a language specifier like bash or json.

-```
+```bash
PUT _cluster/settings
docs/user/ppl/cmd/subquery.md-77-88 (1)

77-88: Specify language for fenced code block.

Lines 77–88 contain an unspecified code block with InSubquery examples. Add a language specifier.

-```
+```ppl
 source = outer | where a in [ source = inner | fields b ]
docs/user/ppl/cmd/join.md-63-63 (1)

63-63: Specify language for code blocks.

Lines 63 and 82 have fenced code blocks without language identifiers. These appear to be plain text/shell examples.

Apply this diff:

-### Basic syntax:  
-
-[joinType] join [leftAlias] [rightAlias] (on \| where) \<joinCriteria\> \<right-dataset\>
-* joinType: optional. The type of join to perform. Options: `left`, `semi`, `anti`, and performance sensitive types `right`, `full`, `cross`. **Default:** `inner`.
-...
-```
-Basic join syntax: 
-  

+```text
source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c


And line 82:
```diff
-Extended syntax with options:  
-  

+```text
source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c



Also applies to: 82-82

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/eventstats.md-64-64 (1)</summary><blockquote>

`64-64`: **Specify language for code block.**

Line 64 has a fenced code block without a language identifier. Based on the content (PPL commands), use `ppl` or `text`.


Apply this diff:
```diff
-```
+```ppl
 source = table | eventstats avg(a)
docs/user/ppl/cmd/explain.md-87-97 (1)

87-97: Specify code block language.

Line 87 starts a fenced code block without a language identifier. Based on the content (plain output with JSON-like structure), add a language spec.

Apply this diff:

-```
+```text

This prevents linter warnings and enables syntax highlighting.

docs/user/ppl/admin/monitoring.md-25-26 (1)

25-26: Replace hard tab with spaces.

Line 25 contains a hard tab character. Markdown linters expect spaces for indentation consistency.

Apply this diff:

-	>> curl -H 'Content-Type: application/json' -X GET localhost:9200/_plugins/_ppl/stats
+  >> curl -H 'Content-Type: application/json' -X GET localhost:9200/_plugins/_ppl/stats
docs/user/ppl/admin/connectors/s3glue_connector.md-9-18 (1)

9-18: Fix list indentation to be consistent.

Multiple list items have incorrect indentation levels. Markdown linters expect consistent 2-space indentation for items at the same level.

Apply this diff to fix indentation:

-* `EMRServerless Spark Execution Engine Config Setting`:  Since we execute s3Glue queries on top of spark execution engine, we require this configuration.  
-  
-  More details: [ExecutionEngine Config](../../../interfaces/asyncqueryinterface.md#id2)
-* `S3`: This is where the data lies.  
-* `Glue` Metadata store: Glue takes care of table metadata.  
-* `Opensearch IndexStore`: Index for s3 data lies in opensearch and also acts as temporary buffer for query results.  
-  
-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+* `EMRServerless Spark Execution Engine Config Setting`:  Since we execute s3Glue queries on top of spark execution engine, we require this configuration. More details: [ExecutionEngine Config](../../../interfaces/asyncqueryinterface.md#id2)
+* `S3`: This is where the data lies.
+* `Glue` Metadata store: Glue takes care of table metadata.
+* `Opensearch IndexStore`: Index for s3 data lies in opensearch and also acts as temporary buffer for query results.
+
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.
docs/user/ppl/cmd/eventstats.md-28-29 (1)

28-29: Fix list indentation consistency.

Lines 28-29 have sub-list items that are indented incorrectly. They should be at the same indentation level as other sub-list items under the bucket_nullable parameter.

Apply this diff:

 * bucket_nullable: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+ * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`
+ * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
🧹 Nitpick comments (33)
docs/user/ppl/cmd/parse.md (1)

95-129: Add language identifiers to code blocks in the Limitations section.

For consistency with Examples 1–3 and to resolve static analysis warnings, specify the ppl language identifier for all code blocks in the Limitations section.

Apply this diff to add language specifications:

- ```
+ ```ppl
  source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;
- ```
+ ```

Apply the same change (add ppl after the opening triple backticks) to lines 103, 111, 119, and 127.

docs/user/ppl/cmd/rare.md (1)

9-9: Clean up RST escape sequences in Markdown syntax.

Lines 9 and 12 contain \< and \> escape sequences (\<field-list\>, \<string\>), which are RST artifacts. In Markdown, these should be written as plain angle brackets or wrapped in backticks.

Apply this diff to use standard Markdown syntax:

-rare [rare-options] \<field-list\> [by-clause]
+rare [rare-options] <field-list> [by-clause]
 * field-list: mandatory. Comma-delimited list of field names.  
 * by-clause: optional. One or more fields to group the results by.  
-* rare-options: optional. Options for the rare command. Supported syntax is [countfield=\<string\>] [showcount=\<bool\>].  
+* rare-options: optional. Options for the rare command. Supported syntax is [countfield=<string>] [showcount=<bool>].  

Also applies to: 12-12

docs/user/ppl/cmd/showdatasources.md (2)

1-1: Minor: Remove trailing whitespace from section headings.

Lines 1, 3, 6, 9, and 30 contain trailing spaces (2 spaces each) after the heading text. These should be removed for cleaner formatting.

-# show datasources  
+# show datasources

-## Description  
+## Description

-## Syntax  
+## Syntax

-## Example 1: Fetch all PROMETHEUS datasources  
+## Example 1: Fetch all PROMETHEUS datasources

-## Limitations  
+## Limitations

Also applies to: 3-3, 6-6, 9-9, 30-30


11-17: Consider integrating the descriptive text as a code block comment or caption.

Line 12 ("PPL query for all PROMETHEUS DATASOURCES") reads as a standalone label between the example intro and the code block. For better flow, consider either:

  • Moving it inline as part of line 11's explanation, or
  • Placing it as a comment inside the code block (if the markdown parser supports it)

This is a minor readability improvement.

 This example shows fetching all the datasources of type prometheus.
-PPL query for all PROMETHEUS DATASOURCES
   
 ```ppl
 show datasources
 | where CONNECTOR_TYPE='PROMETHEUS'
(Then integrate into line 11, e.g.: "This example shows how to fetch all datasources of type prometheus using the following PPL query:")

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/timechart.md (1)</summary><blockquote>

`36-55`: **Consider elevating aggregation function sub-sections to H3 level for clearer hierarchy.**

Lines 36–55 define aggregation functions (PER_SECOND, PER_MINUTE, PER_HOUR, PER_DAY) as H2 headers (##), placing them at the same structural level as major sections like "Syntax" and "Notes." For better document hierarchy and readability, these could be H3 (###) headers or grouped under a single H2 section.



This is a low-priority organizational improvement. If you'd like to refactor, consider moving these under a parent "## Aggregation Functions" section with H3 sub-headers, or simply change to H3 directly.

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/chart.md (1)</summary><blockquote>

`8-8`: **Consider breaking the long syntax line for improved source readability.**

The syntax line (line 8) is very long and spans the full width. While technically valid Markdown, it could be reformatted for better readability in the raw source—for example, using line breaks within a code block or restructuring with separate syntax variants.

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/streamstats.md (4)</summary><blockquote>

`33-43`: **Remove unnecessary backslash escaping from syntax parameters.**

The backslash characters escaping angle brackets (e.g., `\<bool\>`, `\<int\>`) appear to be artifacts from the RST-to-Markdown conversion. Markdown doesn't require escaping angle brackets in parameter syntax. Removing these will improve readability and align with standard Markdown conventions.


Apply this diff to clean up the syntax section:

```diff
-streamstats [bucket_nullable=bool] [current=\<bool\>] [window=\<int\>] [global=\<bool\>] [reset_before="("\<eval-expression\>")"] [reset_after="("\<eval-expression\>")"] \<function\>... [by-clause]
+streamstats [bucket_nullable=bool] [current=<bool>] [window=<int>] [global=<bool>] [reset_before="("<eval-expression>")"] [reset_after="("<eval-expression>")"] <function>... [by-clause]

Also update parameter descriptions that reference these types:

-* bucket_nullable: optional. Controls whether the streamstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
-* current: optional. If true, the search includes the given, or current, event in the summary calculations. If false, the search uses the field value from the previous event. Syntax: current=\<boolean\>. **Default:** true.  
-* window: optional. Specifies the number of events to use when computing the statistics. Syntax: window=\<integer\>. **Default:** 0, which means that all previous and current events are used.  
-* global: optional. Used only when the window argument is set. Defines whether to use a single window, global=true, or to use separate windows based on the by clause. If global=false and window is set to a non-zero value, a separate window is used for each group of values of the field specified in the by clause. Syntax: global=\<boolean\>. **Default:** true.  
-* reset_before: optional. Before streamstats calculates for an event, reset_before resets all accumulated statistics when the eval-expression evaluates to true. If used with window, the window is also reset. Syntax: reset_before="("\<eval-expression\>")". **Default:** false.  
-* reset_after: optional. After streamstats calculations for an event, reset_after resets all accumulated statistics when the eval-expression evaluates to true. This expression can reference fields returned by streamstats. If used with window, the window is also reset. Syntax: reset_after="("\<eval-expression\>")". **Default:** false.
+* bucket_nullable: optional. Controls whether the streamstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
+ * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+ * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+* current: optional. If true, the search includes the given, or current, event in the summary calculations. If false, the search uses the field value from the previous event. Syntax: current=<boolean>. **Default:** true.  
+* window: optional. Specifies the number of events to use when computing the statistics. Syntax: window=<integer>. **Default:** 0, which means that all previous and current events are used.  
+* global: optional. Used only when the window argument is set. Defines whether to use a single window, global=true, or to use separate windows based on the by clause. If global=false and window is set to a non-zero value, a separate window is used for each group of values of the field specified in the by clause. Syntax: global=<boolean>. **Default:** true.  
+* reset_before: optional. Before streamstats calculates for an event, reset_before resets all accumulated statistics when the eval-expression evaluates to true. If used with window, the window is also reset. Syntax: reset_before="("<eval-expression>")". **Default:** false.  
+* reset_after: optional. After streamstats calculations for an event, reset_after resets all accumulated statistics when the eval-expression evaluates to true. This expression can reference fields returned by streamstats. If used with window, the window is also reset. Syntax: reset_after="("<eval-expression>")". **Default:** false.

43-43: Remove backslash from by-clause span-expression syntax.

The description on line 43 also contains escaped angle brackets that should be cleaned up for consistency.

-* by-clause: optional. The by clause could be the fields and expressions like scalar functions and aggregation functions. Besides, the span clause can be used to split specific field into buckets in the same interval, the stats then does the aggregation by these span buckets. Syntax: by [span-expression,] [field,]... **Default:** If no \<by-clause\> is specified, all events are processed as a single group and running statistics are computed across the entire event stream.  
-* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets.
+* by-clause: optional. The by clause could be the fields and expressions like scalar functions and aggregation functions. Besides, the span clause can be used to split specific field into buckets in the same interval, the stats then does the aggregation by these span buckets. Syntax: by [span-expression,] [field,]... **Default:** If no <by-clause> is specified, all events are processed as a single group and running statistics are computed across the entire event stream.  
+* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, `span(age, 10)` creates 10-year age buckets, `span(timestamp, 1h)` creates hourly buckets.

77-90: Add language identifier to usage examples code block for consistency.

The initial usage examples block (lines 77-90) lacks the ppl language identifier, while all subsequent examples (lines 96-122, 167-192, 217-220, 242-265) correctly use ```ppl. This inconsistency should be resolved.

-```
+```ppl
 source = table | streamstats avg(a)
 source = table | streamstats current = false avg(a)
 source = table | streamstats window = 5 sum(b)
 source = table | streamstats current = false window = 2 max(a)
 source = table | streamstats where a < 50 | streamstats count(c)
 source = table | streamstats min(c), max(c) by b
 source = table | streamstats count(c) as count_by by b | where count_by > 1000
 source = table | streamstats dc(field) as distinct_count
 source = table | streamstats distinct_count(category) by region
 source = table | streamstats current=false window=2 global=false avg(a) by b
 source = table | streamstats window=2 reset_before=a>31 avg(b)
 source = table | streamstats current=false reset_after=a>31 avg(b) by c
-```
+```ppl

149-163: Clarify the "original data" table formatting in Example 3.

The presentation of original data (lines 149-163) mixes plain text labels with incomplete table markup, which disrupts readability. Consider restructuring this section with a clear heading and properly formatted table or code block, or reformatting the explanation to avoid the incomplete table structure.

One approach is to present the original data as a properly formatted code block or table:

-This example shows how to calculate the running average of age across accounts by country, using global argument.
-original data
-    +-------+---------+------------+-------+------+-----+
-    | name  | country | state      | month | year | age |
-  
-    |-------+---------+------------+-------+------+-----+
-    | Jake  | USA     | California | 4     | 2023 | 70  |
+This example shows how to calculate the running average of age across accounts by country, using global argument.
+
+**Original data:**
+
+```text
+| name  | country | state      | month | year | age |
+|-------+---------+------------+-------+------+-----|
+| Jake  | USA     | California | 4     | 2023 | 70  |
docs/user/ppl/cmd/rex.md (1)

6-20: Clarify pattern argument syntax and quoting requirement.

Line 8 shows the pattern as an unquoted placeholder (\<pattern\>), but all 9 examples below require the pattern to be quoted. The syntax should explicitly show that the pattern argument must be a quoted string.

Additionally, the use of \<placeholder\> syntax throughout this section is valid Markdown but reflects RST conventions (backslash-escaping of angle brackets). For idiomatic Markdown, prefer omitting backslashes where angle brackets are used only as placeholders, or use backticks to denote code placeholders.

Apply this diff to clarify the syntax and modernize the Markdown style:

-rex [mode=\<mode\>] field=\<field\> \<pattern\> [max_match=\<int\>] [offset_field=\<string\>]
-* field: mandatory. The field must be a string field to extract data from.  
-* pattern: mandatory string. The regular expression pattern with named capture groups used to extract new fields. Pattern must contain at least one named capture group using `(?<name>pattern)` syntax.  
-* mode: optional. Either `extract` or `sed`. **Default:** extract
+rex [mode=<mode>] field=<field> "<pattern>" [max_match=<int>] [offset_field=<string>]
+* field: mandatory. The field must be a string field to extract data from.  
+* pattern: mandatory string (quoted). The regular expression pattern with named capture groups used to extract new fields. Pattern must contain at least one named capture group using `(?<name>...)` syntax.  
+* mode: optional. Either `extract` or `sed`. **Default:** extract
docs/user/ppl/admin/cross_cluster_search.md (1)

49-50: Hyphenate compound adjectives "cross-cluster".

The terms "cross cluster" (lines 49–50) should be hyphenated as "cross-cluster" when used as adjectives before a noun, per standard English grammar conventions.

Apply this diff to fix the hyphenation:

-Since OpenSearch does not support cross cluster index metadata retrieval, field mapping of a remote cluster index is not available to the local cluster.
-([[Feature] Cross cluster field mappings query #6573](https://github.com/opensearch-project/OpenSearch/issues/6573))
+Since OpenSearch does not support cross-cluster index metadata retrieval, field mapping of a remote cluster index is not available to the local cluster.
+([[Feature] Cross-cluster field mappings query #6573](https://github.com/opensearch-project/OpenSearch/issues/6573))
docs/user/ppl/cmd/lookup.md (1)

20-28: Specify language for code block.

The code block at line 20 contains syntax examples without a language identifier. Add a language tag (e.g., text or ppl) to comply with Markdown linting standards.

Apply this diff:

-```
+```text
 source = table1 | lookup table2 id
 source = table1 | lookup table2 id, name
 source = table1 | lookup table2 id as cid, name
 source = table1 | lookup table2 id as cid, name replace dept as department
 source = table1 | lookup table2 id as cid, name replace dept as department, city as location
 source = table1 | lookup table2 id as cid, name append dept as department
 source = table1 | lookup table2 id as cid, name append dept as department, city as location
-```
+```
docs/user/ppl/cmd/rename.md (1)

10-10: Improve clarity and variety of phrasing in parameter descriptions.

Line 10 uses "The name you want to rename to," which is somewhat awkward. The LanguageTool hint flags repetitive phrasing. Consider more direct language.

Apply this diff for improved clarity:

-* target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source.
+* target-field: mandatory. The new field name. Must have same number of wildcards as the source field.

This improves readability and adds consistency by clarifying the relationship between source and target wildcards.

docs/user/ppl/cmd/expand.md (1)

50-53: Clean up line break in limitations text.

Lines 50-52 have an awkward line break that affects readability. Reflow the text for clarity:

* The `expand` command currently only supports nested arrays. Primitive  
  
  fields storing arrays are not supported. E.g. a string field storing an array
-of strings cannot be expanded with the current implementation.
+of strings cannot be expanded with the current implementation.

Could be simplified to:

* The `expand` command currently only supports nested arrays. Primitive array fields are not supported. For example, a string field storing an array of strings cannot be expanded with the current implementation.
docs/user/ppl/admin/connectors/prometheus_connector.md (1)

98-103: Minor grammar refinements needed.

Two small grammar issues should be addressed:

  • Line 101: "endtime set to now()" should be "end time set to now()" (unless referring to a parameter name, keep as-is)
  • Line 102: "resolution is auto determined" should be "resolution is auto-determined" (hyphenate compound modifier)

Apply this diff to fix grammar:

- * Time range is determined through filter clause on `@timestamp`. If there is no such filter clause, time range will be set to 1h with endtime set to now().  
- * In case of stats, resolution is determined by `span(@timestamp,15s)` expression. For normal select queries, resolution is auto determined from the time range set.  
+ * Time range is determined through filter clause on `@timestamp`. If there is no such filter clause, time range will be set to 1h with end time set to now().  
+ * In case of stats, resolution is determined by `span(@timestamp,15s)` expression. For normal select queries, resolution is auto-determined from the time range set.  
docs/user/ppl/admin/connectors/security_lake_connector.md (2)

18-33: Fix nested list indentation throughout the document.

The markdownlint rules flag inconsistent indentation in nested lists (MD007). OpenSearch docs typically prefer 2-space indentation for first-level nested items. Standardize all nested bullet points to follow consistent spacing.

The primary issue is inconsistent indentation. Most nested items use 4 spaces where 2 spaces is expected. Example:

- `glue.auth.type` [Required]  
-    * This parameters provides the authentication type information required for execution engine to connect to glue.  
-    * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-        * `glue.auth.role_arn`  
+ `glue.auth.type` [Required]  
+   * This parameters provides the authentication type information required for execution engine to connect to glue.  
+   * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
+     * `glue.auth.role_arn`  

63-63: Wrap bare URL in markdown link format.

Line 63 contains a bare URL that should be properly formatted as a markdown link for better documentation rendering.

-Documentation for Index Queries: https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md
+Documentation for Index Queries: [https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md)
docs/user/ppl/admin/settings.md (4)

178-184: Replace hard tabs with spaces.

Lines 180-183 contain hard tab characters (MD010) which should be replaced with spaces for consistent markdown formatting.

- 	  "transient" : {
- 	    "plugins.query.buckets" : 1000
- 	  }
- 	}
+ 	  "transient" : {
+ 	    "plugins.query.buckets" : 1000
+ 	  }
+ 	}

Replace all tab characters with spaces (typically 2 or 4 spaces per indentation level).


240-245: Remove extra spaces inside code spans.

Lines 242-244 have extra spaces around content within backticks (MD038). Code spans should not have spaces inside the backticks.

- - The default value of argument [`bucket_nullable` in `stats`` command. Check `stats command](../cmd/stats.md) for details.  
- - The return value of [`divide` and `/`` operator. Check `expressions](../functions/expressions.md) for details.  
- - The default value of argument [`usenull` in `top` and `rare`` commands. Check `top command](../cmd/top.md)  and [rare command](../cmd/rare.md) for details.  
+ - The default value of argument `bucket_nullable` in `stats` command. Check [stats command](../cmd/stats.md) for details.  
+ - The return value of `divide` and `/` operator. Check [expressions](../functions/expressions.md) for details.  
+ - The default value of argument `usenull` in `top` and `rare` commands. Check [top command](../cmd/top.md) and [rare command](../cmd/rare.md) for details.  

1-10: Minor phrasing improvement.

Line 5 uses "able to change" which can be simplified to "can change" for better readability.

-When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings are able to change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.
+When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most settings can change dynamically, allowing you to control PPL plugin behavior without restarting your cluster.

205-210: Hyphenate compound modifier.

Line 209 uses "performance sensitive" which should be hyphenated when used as a compound modifier before a noun.

-Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
+Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance-sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
docs/user/ppl/admin/datasources.md (4)

38-48: Use hyphenated compound modifiers for consistency.

Lines 38 and 48 use "security disabled" where "security-disabled" would be more consistent with standard English grammar when used as a compound modifier before a noun.

- * In case of security disabled domains, authorization is disbaled.  
+ * In case of security-disabled domains, authorization is disabled.  

Note: Also corrected typo "disbaled" → "disabled" on the same line.

- we can remove authorization and other details in case of security disabled domains.
+ we can remove authorization and other details in case of security-disabled domains.

150-152: Hyphenate compound modifier.

Line 151 uses "24 character" which should be hyphenated when used as a compound modifier.

- * Sample python script to generate a 24 character master key  
+ * Sample Python script to generate a 24-character master key  

Also capitalize "Python" as a proper name.


203-207: Simplify wordy phrasing.

Line 205 uses "prior to" which is wordier than necessary. Consider using "before" for better readability.

- * In versions prior to 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.  
+ * Before version 2.7, the plugins.query.federation.datasources.config keystore setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.  

Also changed "key store" to "keystore" (one word).


227-230: Use standard English phrasing.

Line 229 uses British English "in future" which should be "in the future" for consistency with American English conventions used in OpenSearch documentation.

- This schema will be extended for views, columns and other metadata info in future.
+ This schema will be extended for views, columns and other metadata info in the future.
docs/user/ppl/admin/monitoring.md (1)

5-5: Improve grammar: replace "able to" with "can" and add hyphens for compound adjectives.

Three style improvements:

  1. "able to collect" → "can collect" (more concise)
  2. "node level statistics" → "node-level statistics" (compound adjective)
  3. "Cluster level statistics" → "Cluster-level statistics" (compound adjective)
docs/user/ppl/admin/connectors/s3glue_connector.md (1)

16-16: Fix grammar: "in future" → "in the future".

Add the article "the" for grammatically correct phrasing.

docs/user/ppl/cmd/replace.md (1)

111-111: Use hyphen for compound adjective "pattern-matching".

Compound adjectives should be hyphenated when modifying a noun.

Apply this change:

-Since replace command only supports plain string literals, you can use LIKE command with replace for pattern matching needs.
+Since replace command only supports plain string literals, you can use LIKE command with replace for pattern-matching needs.
docs/user/ppl/cmd/join.md (1)

12-12: Use hyphens for compound adjectives "performance-sensitive".

Compound adjectives should be hyphenated when modifying a noun. This appears three times in the document.

Apply these diffs:

-* joinType: optional. The type of join to perform. Options: `left`, `semi`, `anti`, and performance sensitive types `right`, `full`, `cross`. **Default:** `inner`.
+* joinType: optional. The type of join to perform. Options: `left`, `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`. **Default:** `inner`.

-* type: optional. Join type using extended syntax. Options: `left`, `outer` (alias of `left`), `semi`, `anti`, and performance sensitive types `right`, `full`, `cross`. **Default:** `inner`.
+* type: optional. Join type using extended syntax. Options: `left`, `outer` (alias of `left`), `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`. **Default:** `inner`.

-Join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
+Join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance-sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.

Also applies to: 21-21, 214-214

docs/user/ppl/cmd/eventstats.md (1)

21-21: Rewrite incomplete sentence for clarity.

Line 21 starts with "Can be used mid-search..." but lacks a clear subject. Rephrase for better readability.

Apply this diff:

-  * `eventstats`: Useful when you need to enrich events with statistical context for further analysis or filtering. Can be used mid-search to add statistics that can be used in subsequent commands.
+  * `eventstats`: Useful when you need to enrich events with statistical context for further analysis or filtering. It can be used mid-search to add statistics that can be used in subsequent commands.
docs/user/ppl/cmd/search.md (1)

89-89: Use hyphen for compound adjective "multi-field".

"Multi field" should be hyphenated when used as a compound adjective.

Apply this diff:

-* Limitations: No wildcards for partial IP matching. For wildcard search use multi field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'`
+* Limitations: No wildcards for partial IP matching. For wildcard search use multi-field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'`
docs/user/ppl/cmd/ml.md (1)

32-37: KMEANS parameters duplicate kmeans.md; consider a maintenance strategy.

The KMEANS section here (lines 34–37) replicates parameter documentation from docs/user/ppl/cmd/kmeans.md (lines 8–11), with identical defaults and descriptions. Additionally, Example 5 (lines 133–149) duplicates the iris clustering example from kmeans.md (lines 18–33).

While duplication makes ml.md self-contained for readers interested only in the ml command, it creates a maintenance risk: if defaults or parameter semantics change, both files must be updated consistently.

Recommendation: Consider whether a cross-reference to kmeans.md is sufficient, or if a shared documentation pattern (e.g., an include mechanism or docstring-based generation) would be better suited for future PPL documentation work.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/dev/testing-doctest.md (1)

54-54: Fix duplicate section heading.

Section heading "1.4.2" appears twice (line 54 and line 59). Restructure to use distinct heading numbers for "How to run existing doctest?" and "How to write documentation with doctest?".

Apply this diff to fix the section numbering:

 ### 1.4.2 How to run existing doctest?
 Doctest runs with project build by `./gradlew build`. You can also only run doctest by `./gradlew doctest`. If a Prometheus instance isn't available locally, add `-DignorePrometheus` (or set the property to any value other than `false`) to skip Prometheus setup and the Prometheus-specific doctest scenarios.

 Make sure you don't have any OpenSearch instance running at `http://localhost:9200`

-### 1.4.2 How to write documentation with doctest?
+### 1.4.3 How to write documentation with doctest?

Also applies to: 59-59

🧹 Nitpick comments (3)
docs/user/ppl/cmd/patterns.md (2)

26-26: Hyphenate the compound adjective "low-frequency."

Per standard English grammar, compound adjectives modifying a noun should be hyphenated.

- * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3.
+ * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low-frequency words. **Default:** 0.3.

95-95: Hyphenate the compound adjective "user-defined."

Per standard English grammar, compound adjectives modifying a noun should be hyphenated.

- This example shows how to extract patterns from a raw log field using user defined patterns.
+ This example shows how to extract patterns from a raw log field using user-defined patterns.
docs/dev/testing-doctest.md (1)

76-96: Clarify expected formatting and edge cases in code block examples.

The examples show fenced code blocks with language identifiers (ppl, text, etc.), but guidance on edge cases is missing:

  • What happens if an input block lacks a corresponding output block?
  • Are output blocks required for all input blocks, or only for executable code?
  • How should blank lines, whitespace, or indentation within blocks be handled?

Consider adding a note about validation strictness and common mistakes.

Add a clarification note after line 95:

 - **Supported Output Languages**: `text`, `console`, `output`, `json`, `yaml`
+
+> **Note**: Each input code block must be immediately followed by "Expected output:" and an output block. Deviations (missing output, blank lines between blocks, or mismatched language identifiers) will cause validation errors. Ensure output language matches the expected response format.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2384f48 and 4cbc34e.

📒 Files selected for processing (5)
  • DEVELOPER_GUIDE.rst (1 hunks)
  • docs/category.json (2 hunks)
  • docs/dev/testing-doctest.md (1 hunks)
  • docs/user/ppl/cmd/patterns.md (1 hunks)
  • docs/user/ppl/cmd/timechart.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • DEVELOPER_GUIDE.rst
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/category.json
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
🪛 LanguageTool
docs/user/ppl/cmd/patterns.md

[grammar] ~26-~26: Use a hyphen to join words.
Context: ...e lower bound of frequency to ignore low frequency words. Default: 0.3. ...

(QB_NEW_EN_HYPHEN)


[grammar] ~95-~95: Use a hyphen to join words.
Context: ...patterns from a raw log field using user defined patterns. ```ppl source=apach...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/timechart.md

[grammar] ~31-~31: Ensure spelling is correct
Context: ...ore distinct values than the limit. * usenull: optional. Controls whether NULL values ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (6)
docs/user/ppl/cmd/patterns.md (2)

144-144: Resolve the "Configuration" heading placement to align with Example 5 context.

The "Configuration" heading at line 144 appears as a standalone second-level heading within the "Simple Pattern Example 5" section. This placement creates structural ambiguity and ties back to the past review comment about MD024 (duplicate/conflicting heading hierarchy). Rename it to clarify that it's part of Example 5 or integrate the content into the example's description.

Apply one of these fixes:

- ## Simple Pattern Example 5: Return log patterns aggregation result with detected variable tokens
- 
- This example shows how to get aggregated results with detected variable tokens.
- ## Configuration

+ ## Simple Pattern Example 5: Return log patterns aggregation result with detected variable tokens
+ 
+ This example shows how to get aggregated results with detected variable tokens. With option `show_numbered_token` enabled, the output can detect numbered variable tokens from the pattern field.

Or rename the Configuration heading to be example-specific:

- ## Configuration
+ ## Example 5: Configuration

1-260: Overall documentation quality is solid.

The converted Markdown documentation is well-structured, comprehensive, and clear. The patterns command description, syntax, and examples (both Simple Pattern and Brain variants) provide good coverage across label and aggregation modes. Expected outputs with tables and multiple configuration scenarios make the documentation practical for users. The Limitations section is helpful.

docs/dev/testing-doctest.md (1)

67-111: Verify Markdown doctest configuration against actual implementation.

The new Markdown format section references specific supported languages, category configurations, and syntax features (ignore attribute, paired input/output blocks). These must be verified to match the actual implementation in markdown_parser.py and docs/category.json to ensure documentation accuracy and prevent user confusion.

Specifically verify:

  • Category names (lines 72-74): Confirm ppl_cli_calcite, bash_calcite, and bash_settings exist and are correctly configured for markdown-only content.
  • Supported input languages (line 94): Verify sql, ppl, bash, sh, bash ppl are all recognized.
  • Supported output languages (line 95): Verify text, console, output, json, yaml are supported.
  • Ignore syntax (line 99): Confirm that the ignore attribute syntax (e.g., ```ppl ignore) is correctly parsed and enforced.
  • Paired input/output requirement (line 93): Confirm that the framework validates that each input block is immediately followed by an output block.
docs/user/ppl/cmd/timechart.md (3)

41-69: Well-documented special functions and comprehensive notes.

The per-second/minute/hour/day functions are clearly defined with consistent formulas, and the notes section effectively covers data requirements, result formatting, and limit/null behaviors. The dual documentation of null handling (parameter + notes) aids clarity.


71-370: Examples are comprehensive and well-organized.

The 11 examples effectively demonstrate command features across a progression from basic (Examples 1–3) to advanced scenarios (Examples 6–10), with clear parameter combinations and corresponding ASCII-table outputs. Null handling and limit/useother behaviors are well-illustrated.

Confirm that the referenced datasets (events, events_many_hosts, events_null) are properly defined in the doctest fixture setup. These must be available when doctests run against these examples.


1-40: Previous review issues have been properly addressed.

The Markdown cross-reference to stats.md correctly converts from RST syntax, and the usenull and nullstr parameters are now documented in the syntax section (lines 31–35), resolving earlier feedback. The parameter documentation is comprehensive with clear descriptions, defaults, and constraints for the timechart command.

@kylehounslow kylehounslow force-pushed the feat/markdown-doctests branch from 4a5bb58 to c9d6355 Compare December 9, 2025 19:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

♻️ Duplicate comments (13)
docs/user/ppl/cmd/patterns.md (1)

26-26: Fix hyphenation for compound adjectives (unresolved from past review).

Two compound adjectives still lack required hyphens when modifying nouns:

-    * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3.
+    * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low-frequency words. **Default:** 0.3.
-This example shows how to extract patterns from a raw log field using user defined patterns.
+This example shows how to extract patterns from a raw log field using user-defined patterns.

Also applies to: 95-95

docs/user/ppl/cmd/parse.md (1)

95-97: Add language specifications to fenced code blocks in Limitations section.

The code blocks in the Limitations section lack ppl language identifiers, whereas all query examples earlier in the document consistently use ppl for syntax highlighting and clarity. For consistency and proper syntax highlighting, add the ppl language identifier to these blocks.

Apply this diff to add language specifications:

- ```
+ ```ppl
  source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;
- ```
+ ```ppl
  
- ```
+ ```ppl
  source=accounts | parse address '\d+ (?<street>.+)' | eval street='1' | where street='1' ;
- ```
+ ```ppl
  
- ```
+ ```ppl
  source=accounts | parse address '\d+ (?<street>.+)' | eval address='1' ;
- ```
+ ```ppl
  
- ```
+ ```ppl
  source=accounts | parse email '.+@(?<host>.+)' | stats avg(age) by host | where host=pyrami.com ;
- ```
+ ```ppl
  
- ```
+ ```ppl
  source=accounts | parse email '.+@(?<host>.+)' | fields email, host ;
- ```
+ ```

Also applies to: 103-105, 111-113, 119-121, 127-129

docs/user/ppl/cmd/subquery.md (6)

6-10: Add blank line before numbered list for proper indentation.

This issue was previously flagged in the past review. Lines 6–10 have inconsistent indentation; a blank line is required between the introductory text and the numbered list per Markdown linting rules (MD005, MD007).

Apply this diff:

 Subqueries are useful for:
+
 1. Filtering data based on results from another query  
 2. Checking for the existence of related data  
 3. Performing calculations that depend on aggregated values from other tables  
 4. Creating complex joins with dynamic conditions

54-71: Remove shell prefix and JSON response output from configuration example.

This issue was previously flagged in the past review. Per the PR objectives, code examples must be stripped of shell prefixes and command output for clean copy-paste. Remove the sh$ prefix, the continuation characters and ellipsis (\, ...), and the entire JSON response block (lines 58–70).

Apply this diff:

 Change the subsearch.maxout to unlimited:  
   
 ```bash ignore
-sh$ curl -sS -H 'Content-Type: application/json' \
-... -X PUT localhost:9200/_plugins/_query/settings \
-... -d '{"persistent" : {"plugins.ppl.subsearch.maxout" : "0"}}'
-{
-  "acknowledged": true,
-  "persistent": {
-    "plugins": {
-      "ppl": {
-        "subsearch": {
-          "maxout": "-1"
-        }
-      }
-    }
-  },
-  "transient": {}
-}
+curl -sS -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{"persistent" : {"plugins.ppl.subsearch.maxout" : "0"}}'

---

`77-88`: **Add language identifier to InSubquery code block.**

This issue was previously flagged in the past review. The fenced code block lacks a language identifier, violating MD040 linting rules. Add the `ppl` language identifier to the opening fence.



Apply this diff:

```diff
 InSubquery:
   
-```
+```ppl
 source = outer | where a in [ source = inner | fields b ]
 source = outer | where (a) in [ source = inner | fields b ]
 source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]

92-106: Add language identifier to ExistsSubquery code block.

This issue was previously flagged in the past review. The fenced code block lacks a language identifier, violating MD040 linting rules. Add the ppl language identifier to the opening fence at line 92.

Apply this diff:

 ExistsSubquery: 
   
-```
+```ppl
 // Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table inner,  `e`, `f` are fields of table nested
 source = outer | where exists [ source = inner | where a = c ]

110-135: Add language identifier to ScalarSubquery/RelationSubquery code block.

This issue was previously flagged in the past review. The fenced code block lacks a language identifier, violating MD040 linting rules. Add the ppl language identifier to the opening fence at line 110.

Apply this diff:

 ScalarSubquery:
   
-```
+```ppl
 //Uncorrelated scalar subquery in Select
 source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a

196-196: Replace hard tab with spaces.

This issue was previously flagged in the past review. Line 196 contains a hard tab character instead of spaces, violating the MD010 linting rule. Replace with appropriate spacing.

Apply this diff:

   """
-	}'
+}'

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/search.md (1)</summary><blockquote>

`92-93`: **Fix list indentation for field type tips.**

Lines 92–93 have incorrect indentation (3 spaces instead of 0 for top-level list items). These should align with the list above.



Apply this diff:

```diff
 **Field Type Performance Tips**:
-   * Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
-   * For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
+* Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
+* For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
docs/user/ppl/cmd/streamstats.md (1)

148-163: Replace malformed table markup with proper Markdown table syntax.

Lines 148–163 contain indented table blocks with inline markup that render poorly and are inconsistent with Markdown standards. The section claims to show "original data" but uses improper formatting. Replace with a standard Markdown table:

- This example shows how to calculate the running average of age across accounts by country, using global argument.
- original data
-     +-------+---------+------------+-------+------+-----+
-     | name  | country | state      | month | year | age |
-   
-     |-------+---------+------------+-------+------+-----+
-     | Jake  | USA     | California | 4     | 2023 | 70  |
-     | Hello | USA     | New York   | 4     | 2023 | 30  |
-     | John  | Canada  | Ontario    | 4     | 2023 | 25  |
-     | Jane  | Canada  | Quebec     | 4     | 2023 | 20  |
-     | Jim   | Canada  | B.C        | 4     | 2023 | 27  |
-     | Peter | Canada  | B.C        | 4     | 2023 | 57  |
-     | Rick  | Canada  | B.C        | 4     | 2023 | 70  |
-     | David | USA     | Washington | 4     | 2023 | 40  |
-   
-     +-------+---------+------------+-------+------+-----+
+ This example shows how to calculate the running average of age across accounts by country, using global argument.
+ 
+ **Original data:**
+ 
+ | name  | country | state      | month | year | age |
+ |-------|---------|------------|-------|------|-----|
+ | Jake  | USA     | California | 4     | 2023 | 70  |
+ | Hello | USA     | New York   | 4     | 2023 | 30  |
+ | John  | Canada  | Ontario    | 4     | 2023 | 25  |
+ | Jane  | Canada  | Quebec     | 4     | 2023 | 20  |
+ | Jim   | Canada  | B.C        | 4     | 2023 | 27  |
+ | Peter | Canada  | B.C        | 4     | 2023 | 57  |
+ | Rick  | Canada  | B.C        | 4     | 2023 | 70  |
+ | David | USA     | Washington | 4     | 2023 | 40  |
docs/user/ppl/cmd/eventstats.md (1)

28-29: Fix nested list indentation to align with Markdown standards.

Lines 28–29 use only 1 space of indentation for nested list items, but Markdown requires 2 spaces to properly nest items under a parent bullet. Adjust indentation:

 * bucket_nullable: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+ * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+  * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
docs/user/ppl/admin/connectors/prometheus_connector.md (1)

11-18: Critical: Nested list indentation still non-compliant despite prior review.

The nested list items at lines 11–18 are indented with 4 or 8 spaces when they should follow the 2-space increment standard. This issue was previously flagged and marked as addressed in commit f9b0c04, but the violations persist. This blocks the file from passing Markdown linting.

 * `prometheus.uri` [Required].  
-    * This parameters provides the URI information to connect to a prometheus instance.  
+  * This parameters provides the URI information to connect to a prometheus instance.  
 * `prometheus.auth.type` [Optional]  
-    * This parameters provides the authentication type information.  
-    * Prometheus connector currently supports `basicauth` and `awssigv4` authentication mechanisms.  
-    * If prometheus.auth.type is basicauth, following are required parameters.  
-        * `prometheus.auth.username` and `prometheus.auth.password`.  
-    * If prometheus.auth.type is awssigv4, following are required parameters.  
-        * `prometheus.auth.region`, `prometheus.auth.access_key` and `prometheus.auth.secret_key`
+  * This parameters provides the authentication type information.  
+  * Prometheus connector currently supports `basicauth` and `awssigv4` authentication mechanisms.  
+  * If prometheus.auth.type is basicauth, following are required parameters.  
+    * `prometheus.auth.username` and `prometheus.auth.password`.  
+  * If prometheus.auth.type is awssigv4, following are required parameters.  
+    * `prometheus.auth.region`, `prometheus.auth.access_key` and `prometheus.auth.secret_key`
docs/user/ppl/admin/connectors/s3glue_connector.md (1)

19-30: Critical: Nested list indentation still non-compliant despite prior review.

The nested list items at lines 19-30 are indented inconsistently (4 or 8 spaces) when they should follow the 2-space increment standard. This issue was previously flagged and marked as addressed in commit f9b0c04, but the violations persist in the current code. This blocks the file from passing Markdown linting.

Apply this diff to fix the list indentation:

 * `glue.auth.type` [Required]  
-    * This parameters provides the authentication type information required for execution engine to connect to glue.  
-    * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-        * `glue.auth.role_arn`  
+  * This parameters provides the authentication type information required for execution engine to connect to glue.  
+  * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
+    * `glue.auth.role_arn`  
 * `glue.indexstore.opensearch.*` [Required]  
-    * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
-    * `glue.indexstore.opensearch.uri` [Required]  
-    * `glue.indexstore.opensearch.auth` [Required]  
-        * Accepted values include ["noauth", "basicauth", "awssigv4"]  
-        * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
-        * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region`  and `glue.auth.role_arn`  
-    * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]
+  * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
+  * `glue.indexstore.opensearch.uri` [Required]  
+  * `glue.indexstore.opensearch.auth` [Required]  
+    * Accepted values include ["noauth", "basicauth", "awssigv4"]  
+    * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
+    * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region` and `glue.auth.role_arn`  
+  * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]
🟡 Minor comments (21)
docs/user/ppl/cmd/lookup.md-20-28 (1)

20-28: Add language specifier to the fenced code block.

The Usage section code block is missing a language identifier. This should be labeled as bash or shell for consistency with the other code blocks in the document and to enable proper syntax highlighting.

Apply this diff to fix the issue:

-```
+```bash
 source = table1 | lookup table2 id
 source = table1 | lookup table2 id, name
 source = table1 | lookup table2 id as cid, name
docs/user/ppl/admin/connectors/security_lake_connector.md-39-39 (1)

39-39: Use correct code block language for JSON configuration.

The code block contains JSON configuration, not bash/shell script. Change the language identifier to ```json for proper syntax highlighting.

-```bash
+```json
docs/user/ppl/admin/connectors/security_lake_connector.md-17-17 (1)

17-17: Fix British English phrase to match US English convention.

Change "in future" to "in the future" to align with the documentation's English style.

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.
docs/user/ppl/cmd/timechart.md-173-180 (1)

173-180: Fix description mismatch in Example 5.

Line 175 states the example "counts events for each second," but the query on line 179 uses span=1h, which groups by hour. Update the description to match the query span.

-This example counts events for each second and groups them by category
+This example counts events for each hour and groups them by category
docs/user/ppl/admin/cross_cluster_search.md-27-27 (1)

27-27: Complete the sentence fragment with proper punctuation.

Line 27 is an incomplete fragment. Add a colon or restructure for grammatical completeness.

-Example PPL query
+Example PPL query:
docs/user/ppl/admin/cross_cluster_search.md-26-26 (1)

26-26: Use backticks for inline code representation instead of escaped angle brackets.

Line 26 renders escaped angle brackets literally, which appears unintended. Format the cluster and index name placeholder as inline code using backticks.

-Perform cross-cluster search by using "\<cluster-name\>:\<index-name\>" as the index identifier.
+Perform cross-cluster search by using `` `<cluster-name>:<index-name>` `` as the index identifier.
docs/user/ppl/cmd/search.md-545-547 (1)

545-547: Add language specifier to fenced code block.

The output block around line 546–547 lacks a language specifier. Add text to indicate plain text output:

 Expected output:
 
-```
+```text
 fetched rows / total rows = 1/1
docs/user/ppl/cmd/search.md-666-666 (1)

666-666: Convert emphasized section titles to proper Markdown headings.

Lines 666 and 705 use bold emphasis for section titles instead of proper Markdown headings. Convert these to ### headings for consistency with document structure and accessibility:

-**Backslash in file paths**
+### Backslash in file paths
-**Text with special characters**
+### Text with special characters

Also applies to: 705-705

docs/user/ppl/cmd/search.md-89-89 (1)

89-89: Use hyphen in compound adjective "multi-field".

Line 89 should use "multi-field" (with hyphen) as a compound adjective modifying the noun:

-* Limitations: No wildcards for partial IP matching. For wildcard search use multi field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'`  
+* Limitations: No wildcards for partial IP matching. For wildcard search use a multi-field keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'`  
docs/user/ppl/cmd/streamstats.md-77-90 (1)

77-90: Add language identifier to usage examples code block.

The code block at line 77 is missing the ppl language identifier. Update the opening fence for proper syntax highlighting:

-```
+```ppl
 source = table | streamstats avg(a)
 source = table | streamstats current = false avg(a)
 source = table | streamstats window = 5 sum(b)
 source = table | streamstats current = false window = 2 max(a)
 source = table | where a < 50 | streamstats count(c)
 source = table | streamstats min(c), max(c) by b
 source = table | streamstats count(c) as count_by by b | where count_by > 1000
 source = table | streamstats dc(field) as distinct_count
 source = table | streamstats distinct_count(category) by region
 source = table | streamstats current=false window=2 global=false avg(a) by b
 source = table | streamstats window=2 reset_before=a>31 avg(b)
 source = table | streamstats current=false reset_after=a>31 avg(b) by c
-```
+```ppl
docs/user/ppl/cmd/multisearch.md-30-34 (1)

30-34: Add language identifier to code block for syntax highlighting.

The code block at lines 30–34 lacks a language specification. Add ppl to the opening fence for proper PPL syntax highlighting, consistent with other examples in the document:

-```
+```ppl
 | multisearch [search source=table | where condition1] [search source=table | where condition2]
 | multisearch [search source=index1 | fields field1, field2] [search source=index2 | fields field1, field2]
 | multisearch [search source=table | where status="success"] [search source=table | where status="error"]
-```
+```ppl
docs/user/ppl/cmd/eventstats.md-64-71 (1)

64-71: Fix code block language identifier for PPL syntax highlighting.

The code block at line 64 is labeled sql ignore, but these are PPL examples and should use ppl for proper syntax highlighting. Remove the sql ignore label.

-```sql ignore
+```ppl
 source = table | eventstats avg(a)
 source = table | where a < 50 | eventstats count(c)
 source = table | eventstats min(c), max(c) by b
 source = table | eventstats count(c) as count_by by b | where count_by > 1000
 source = table | eventstats dc(field) as distinct_count
 source = table | eventstats distinct_count(category) by region
-```
+```ppl
docs/user/ppl/cmd/streamstats.md-36-37 (1)

36-37: Fix nested list indentation to align with Markdown standards.

Lines 36–37 show inconsistent indentation for nested list items. Markdown requires 2 spaces to properly nest items under a parent bullet. Adjust indentation:

 * bucket_nullable: optional. Controls whether the streamstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+ * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+  * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  

Committable suggestion skipped: line range outside the PR's diff.

docs/user/ppl/cmd/join.md-82-92 (1)

82-92: Specify code block language for extended syntax examples.

Fenced code blocks should have a language identifier. Lines 82–91 contain extended join syntax examples without a language tag.

-```
+```text
 source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
 source = table1 | join type=left left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
docs/user/ppl/cmd/join.md-63-78 (1)

63-78: Specify code block language for syntax examples.

Fenced code blocks should have a language identifier. Lines 63–78 contain join syntax examples without a language tag.

-```
+```text
 source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
 source = table1 | inner join left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
docs/user/ppl/admin/connectors/s3glue_connector.md-77-77 (1)

77-77: Wrap bare URL in Markdown link syntax.

Line 77 contains a bare URL that should be wrapped in Markdown link format for proper rendering.

-Documentation for Index Queries: https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md
+Documentation for Index Queries: [opensearch-spark index documentation](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md)
docs/user/ppl/admin/connectors/prometheus_connector.md-229-230 (1)

229-230: Fix nested list indentation at lines 229–230.

These list items are indented with 4 spaces when they should use 2 spaces per the Markdown standard.

 * Prometheus connector offers `query_range` table function. This table function can be used to query metrics in a specific time range using promQL.  
 * The function takes inputs similar to parameters mentioned for query range api mentioned here: [Prometheus query_range API](https://prometheus.io/docs/prometheus/latest/querying/api/)  
-* Arguments should be either passed by name or positionArguments should be either passed by name or position.  
-    - `source=my_prometheus.query_range('prometheus_http_requests_total', 1686694425, 1686700130, 14)`  
-    - `source=my_prometheus.query_range(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130, step=14)`
+* Arguments should be either passed by name or position. 
+  - `source=my_prometheus.query_range('prometheus_http_requests_total', 1686694425, 1686700130, 14)`  
+  - `source=my_prometheus.query_range(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130, step=14)`

Committable suggestion skipped: line range outside the PR's diff.

docs/user/ppl/admin/connectors/prometheus_connector.md-260-261 (1)

260-261: Fix nested list indentation at lines 260–261.

These list items are indented with 4 spaces when they should use 2 spaces per the Markdown standard. Also note: line 259 appears to have a duplicated phrase "Arguments should be either passed by name or position" that should be deduplicated.

 * This table function can be used to fetch exemplars of a query in a specific time range.  
 * The function takes inputs similar to parameters mentioned for query exemplars api mentioned here: [Prometheus query_exemplars API](https://prometheus.io/docs/prometheus/latest/querying/api/)  
-* Arguments should be either passed by name or positionArguments should be either passed by name or position.  
-    - `source=my_prometheus.query_exemplars('prometheus_http_requests_total', 1686694425, 1686700130)`  
-    - `source=my_prometheus.query_exemplars(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130)`
+* Arguments should be either passed by name or position.  
+  - `source=my_prometheus.query_exemplars('prometheus_http_requests_total', 1686694425, 1686700130)`  
+  - `source=my_prometheus.query_exemplars(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130)`

Committable suggestion skipped: line range outside the PR's diff.

docs/user/ppl/admin/datasources.md-38-38 (1)

38-38: Fix grammar issues throughout the file.

Apply these diffs:

-* In case of security disabled domains, authorization is disbaled.  
+* In case of security-disabled domains, authorization is disabled.
-Datasource configuration can be managed using below REST APIs. All the examples below are for OpenSearch domains enabled with secure domain.
-we can remove authorization and other details in case of security disabled domains.
+Datasource configuration can be managed using the REST APIs below. All examples assume OpenSearch domains with security enabled.
+You can remove authorization and other details in case of security-disabled domains.
-* Sample python script to generate a 24 character master key  
+* Sample Python script to generate a 24-character master key
-* In versions prior to 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.  
+* Before version 2.7, the `plugins.query.federation.datasources.config` keystore setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.
-In the current state, `information_schema` only support metadata of tables.
-This schema will be extended for views, columns and other metadata info in future.
+Currently, `information_schema` only supports metadata of tables.
+This schema will be extended with views, columns, and other metadata in the future.

Also applies to: 48-48, 151-151, 205-205, 229-229

docs/user/ppl/admin/datasources.md-93-105 (1)

93-105: Move explanatory text outside code block (lines 103-105).

Line 103 contains bold formatted text that should be outside the code block for proper rendering and clarity.

Apply this diff:

 PATCH https://localhost:9200/_plugins/_query/_datasources
 content-type: application/json
 Authorization: Basic {{username}} {{password}}
 
 {
     "name" : "my_prometheus",
     "allowedRoles" : ["all_access"]
 }
 
-**Name is required and must exist. Connector cannot be modified and will be ignored.**
-

+Name is required and must exist. Connector cannot be modified and will be ignored.


</blockquote></details>
<details>
<summary>docs/user/ppl/admin/datasources.md-109-116 (1)</summary><blockquote>

`109-116`: **Move explanatory text outside code block (lines 114-116).**

Line 114 contains bold formatted text inside the code block that should be moved outside for proper rendering.

Apply this diff:

```diff
 GET https://localhost:9200/_plugins/_query/_datasources/my_prometheus
 content-type: application/json
 Authorization: Basic {{username}} {{password}}
 
-**Authentication Information won't be vended out in GET API's response.**
-

+Authentication Information won't be vended out in GET API's response.


</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (11)</summary><blockquote>

<details>
<summary>docs/user/ppl/cmd/rename.md (2)</summary><blockquote>

`9-10`: **Address repetitive phrasing for better readability.**

Lines 9-10 both use "you want," which reduces writing variety. Consider rephrasing one of these lines to improve clarity and flow.


```diff
- * source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`.  
- * target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source.  
+ * source-field: mandatory. The field name to rename. Supports wildcard patterns using `*`.  
+ * target-field: mandatory. The new field name. Must have the same number of wildcards as the source.  

8-10: Remove unnecessary backslash escaping in Markdown.

The syntax section uses backslash escaping (\<, \>) which is not needed in Markdown; these are leftover artifacts from RST conversion. Markdown renders these correctly without escaping.

-rename \<source-field\> AS \<target-field\>["," \<source-field\> AS \<target-field\>]...
-* source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`.  
-* target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source.  
+rename <source-field> AS <target-field>["," <source-field> AS <target-field>]...
+* source-field: mandatory. The field name to rename. Supports wildcard patterns using `*`.  
+* target-field: mandatory. The new field name. Must have the same number of wildcards as the source.  
docs/user/ppl/admin/monitoring.md (1)

5-5: Polish grammar and style for consistency.

The static analysis tool flagged three minor improvements in this sentence:

  • "able to" → "can" (more concise)
  • "node level" → "node-level" (compound adjective needs hyphenation)
  • "Cluster level" → "Cluster-level" (compound adjective needs hyphenation)
-By a stats endpoint, you are able to collect metrics for the plugin within the interval. Note that only node level statistics collecting is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster level statistics have yet to be implemented.
+By a stats endpoint, you can collect metrics for the plugin within the interval. Note that only node-level statistics collecting is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster-level statistics have yet to be implemented.
docs/user/ppl/admin/connectors/security_lake_connector.md (1)

18-18: Use proper markdown headings for section labels.

Lines 18 and 56 use plain text labels followed by periods instead of markdown headings, breaking the document hierarchy. Upgrade these to proper headings using ##.

-Glue Connector Properties.
+## Glue Connector Properties
-Sample Security Lake datasource queries APIS  
+## Sample Security Lake Datasource Queries

Also applies to: 56-56

docs/user/ppl/cmd/rex.md (1)

188-195: Minor: Consider rewording the label before Example 8's invalid query. [optional]

Line 189 reads "Invalid PPL query with underscores" as a standalone sentence fragment. While clear, it would align better with the pattern used in Example 9 if incorporated into the preceding explanation paragraph:

  This example demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations.
- Invalid PPL query with underscores
  
  ```ppl

Becomes:

  This example demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations. The following invalid PPL query illustrates this restriction:
  
  ```ppl

This is a minor readability suggestion; current formatting is acceptable.

docs/user/ppl/admin/cross_cluster_search.md (1)

54-54: Add blank line before heading for proper Markdown spacing.

Headings in Markdown should be preceded by a blank line for consistent formatting and proper parsing.

 the local cluster keep a field mapping system index with the same index name.
 This can be done by creating an index on the local cluster with the same name and schema as the remote cluster index.
+
 ## Authentication and Permission
docs/user/ppl/admin/settings.md (2)

5-5: Minor: Consider "can change" instead of "are able to change".

The phrase "are able to change" could be more concise as "can change." This is a stylistic refinement.

-When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings are able to change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.
+When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings can change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.

209-209: Minor: Hyphenate "performance-sensitive" when used as compound adjective.

The phrase "performance sensitive" should be hyphenated as "performance-sensitive" when modifying the noun "join types."

-Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
+Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance-sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
docs/dev/testing-doctest.md (1)

61-61: Minor: Capitalize "Markdown" as a proper noun.

The word "markdown" should be capitalized as "Markdown" when referring to the markup language.

-#### RST Format (SQL docs only. On Deprecation path. Use markdown for PPL)
+#### RST Format (SQL docs only. On Deprecation path. Use Markdown for PPL)
docs/user/ppl/admin/connectors/prometheus_connector.md (1)

101-102: Minor: Use hyphens for compound adjectives and standard time abbreviations.

Line 101: "endtime" should have a space: "end time". Line 102: "auto determined" should be hyphenated as "auto-determined" when modifying "time range."

-* Time range is determined through filter clause on `@timestamp`. If there is no such filter clause, time range will be set to 1h with endtime set to now().  
-* In case of stats, resolution is determined by `span(@timestamp,15s)` expression. For normal select queries, resolution is auto determined from the time range set.
+* Time range is determined through filter clause on `@timestamp`. If there is no such filter clause, time range will be set to 1h with end time set to now().  
+* In case of stats, resolution is determined by `span(@timestamp,15s)` expression. For normal select queries, resolution is auto-determined from the time range set.
docs/user/ppl/admin/datasources.md (1)

39-42: Minor: Fix regex pattern formatting and verify list structure.

Line 39 has unusual bracket notation in the regex pattern: regex[[@*A-Za-z]+?[a-zA-Z_-0-9]]. The outer square brackets should be removed so the pattern reads as a proper inline code example. Additionally, lines 40-42 could be clearer: "Allowed Connectors" should likely flow as a single list item rather than introducing a sublist.

Apply this diff:

-* All the datasource names should be unique and match the following regex[`[@*A-Za-z]+?[*a-zA-Z_\-0-9]*`].  
+* All the datasource names should be unique and match the following regex: `[@*A-Za-z]+?[*a-zA-Z_\-0-9]*`.  
-* Allowed Connectors.  
-  * `prometheus` [More details: [Prometheus Connector](connectors/prometheus_connector.md)]  
+* Allowed Connectors: `prometheus` [More details: [Prometheus Connector](connectors/prometheus_connector.md)]  
 * All the allowed config parameters in `properties` are defined in individual connector pages mentioned above.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5bb58 and c9d6355.

📒 Files selected for processing (109)
  • DEVELOPER_GUIDE.rst (1 hunks)
  • docs/category.json (2 hunks)
  • docs/dev/ppl-commands.md (1 hunks)
  • docs/dev/testing-doctest.md (1 hunks)
  • docs/user/ppl/admin/connectors/prometheus_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/prometheus_connector.rst (0 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.rst (0 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.rst (0 hunks)
  • docs/user/ppl/admin/cross_cluster_search.md (1 hunks)
  • docs/user/ppl/admin/cross_cluster_search.rst (0 hunks)
  • docs/user/ppl/admin/datasources.md (1 hunks)
  • docs/user/ppl/admin/datasources.rst (0 hunks)
  • docs/user/ppl/admin/monitoring.md (1 hunks)
  • docs/user/ppl/admin/monitoring.rst (0 hunks)
  • docs/user/ppl/admin/security.md (1 hunks)
  • docs/user/ppl/admin/security.rst (0 hunks)
  • docs/user/ppl/admin/settings.md (1 hunks)
  • docs/user/ppl/admin/settings.rst (0 hunks)
  • docs/user/ppl/cmd/ad.md (1 hunks)
  • docs/user/ppl/cmd/ad.rst (0 hunks)
  • docs/user/ppl/cmd/append.md (1 hunks)
  • docs/user/ppl/cmd/append.rst (0 hunks)
  • docs/user/ppl/cmd/appendcol.md (1 hunks)
  • docs/user/ppl/cmd/appendcol.rst (0 hunks)
  • docs/user/ppl/cmd/appendpipe.md (1 hunks)
  • docs/user/ppl/cmd/appendpipe.rst (0 hunks)
  • docs/user/ppl/cmd/bin.md (1 hunks)
  • docs/user/ppl/cmd/bin.rst (0 hunks)
  • docs/user/ppl/cmd/chart.md (1 hunks)
  • docs/user/ppl/cmd/chart.rst (0 hunks)
  • docs/user/ppl/cmd/dedup.md (1 hunks)
  • docs/user/ppl/cmd/dedup.rst (0 hunks)
  • docs/user/ppl/cmd/describe.md (1 hunks)
  • docs/user/ppl/cmd/describe.rst (0 hunks)
  • docs/user/ppl/cmd/eval.md (1 hunks)
  • docs/user/ppl/cmd/eval.rst (0 hunks)
  • docs/user/ppl/cmd/eventstats.md (1 hunks)
  • docs/user/ppl/cmd/eventstats.rst (0 hunks)
  • docs/user/ppl/cmd/expand.md (1 hunks)
  • docs/user/ppl/cmd/expand.rst (0 hunks)
  • docs/user/ppl/cmd/explain.md (1 hunks)
  • docs/user/ppl/cmd/explain.rst (0 hunks)
  • docs/user/ppl/cmd/fields.md (1 hunks)
  • docs/user/ppl/cmd/fields.rst (0 hunks)
  • docs/user/ppl/cmd/fillnull.md (1 hunks)
  • docs/user/ppl/cmd/fillnull.rst (0 hunks)
  • docs/user/ppl/cmd/flatten.md (1 hunks)
  • docs/user/ppl/cmd/flatten.rst (0 hunks)
  • docs/user/ppl/cmd/grok.md (1 hunks)
  • docs/user/ppl/cmd/grok.rst (0 hunks)
  • docs/user/ppl/cmd/head.md (1 hunks)
  • docs/user/ppl/cmd/head.rst (0 hunks)
  • docs/user/ppl/cmd/join.md (1 hunks)
  • docs/user/ppl/cmd/join.rst (0 hunks)
  • docs/user/ppl/cmd/kmeans.md (1 hunks)
  • docs/user/ppl/cmd/kmeans.rst (0 hunks)
  • docs/user/ppl/cmd/lookup.md (1 hunks)
  • docs/user/ppl/cmd/lookup.rst (0 hunks)
  • docs/user/ppl/cmd/ml.md (1 hunks)
  • docs/user/ppl/cmd/ml.rst (0 hunks)
  • docs/user/ppl/cmd/multisearch.md (1 hunks)
  • docs/user/ppl/cmd/multisearch.rst (0 hunks)
  • docs/user/ppl/cmd/parse.md (1 hunks)
  • docs/user/ppl/cmd/parse.rst (0 hunks)
  • docs/user/ppl/cmd/patterns.md (1 hunks)
  • docs/user/ppl/cmd/patterns.rst (0 hunks)
  • docs/user/ppl/cmd/rare.md (1 hunks)
  • docs/user/ppl/cmd/rare.rst (0 hunks)
  • docs/user/ppl/cmd/regex.md (1 hunks)
  • docs/user/ppl/cmd/regex.rst (0 hunks)
  • docs/user/ppl/cmd/rename.md (1 hunks)
  • docs/user/ppl/cmd/rename.rst (0 hunks)
  • docs/user/ppl/cmd/replace.md (1 hunks)
  • docs/user/ppl/cmd/replace.rst (0 hunks)
  • docs/user/ppl/cmd/reverse.md (1 hunks)
  • docs/user/ppl/cmd/reverse.rst (0 hunks)
  • docs/user/ppl/cmd/rex.md (1 hunks)
  • docs/user/ppl/cmd/rex.rst (0 hunks)
  • docs/user/ppl/cmd/search.md (1 hunks)
  • docs/user/ppl/cmd/search.rst (0 hunks)
  • docs/user/ppl/cmd/showdatasources.md (1 hunks)
  • docs/user/ppl/cmd/showdatasources.rst (0 hunks)
  • docs/user/ppl/cmd/sort.md (1 hunks)
  • docs/user/ppl/cmd/sort.rst (0 hunks)
  • docs/user/ppl/cmd/spath.md (1 hunks)
  • docs/user/ppl/cmd/spath.rst (0 hunks)
  • docs/user/ppl/cmd/stats.md (1 hunks)
  • docs/user/ppl/cmd/stats.rst (0 hunks)
  • docs/user/ppl/cmd/streamstats.md (1 hunks)
  • docs/user/ppl/cmd/streamstats.rst (0 hunks)
  • docs/user/ppl/cmd/subquery.md (1 hunks)
  • docs/user/ppl/cmd/subquery.rst (0 hunks)
  • docs/user/ppl/cmd/syntax.md (1 hunks)
  • docs/user/ppl/cmd/syntax.rst (0 hunks)
  • docs/user/ppl/cmd/table.md (1 hunks)
  • docs/user/ppl/cmd/table.rst (0 hunks)
  • docs/user/ppl/cmd/timechart.md (1 hunks)
  • docs/user/ppl/cmd/timechart.rst (0 hunks)
  • docs/user/ppl/cmd/top.md (1 hunks)
  • docs/user/ppl/cmd/top.rst (0 hunks)
  • docs/user/ppl/cmd/trendline.md (1 hunks)
  • docs/user/ppl/cmd/trendline.rst (0 hunks)
  • docs/user/ppl/cmd/where.md (1 hunks)
  • docs/user/ppl/cmd/where.rst (0 hunks)
  • docs/user/ppl/functions/aggregations.md (1 hunks)
  • docs/user/ppl/functions/aggregations.rst (0 hunks)
  • docs/user/ppl/functions/collection.md (1 hunks)
💤 Files with no reviewable changes (52)
  • docs/user/ppl/cmd/dedup.rst
  • docs/user/ppl/cmd/replace.rst
  • docs/user/ppl/cmd/stats.rst
  • docs/user/ppl/cmd/multisearch.rst
  • docs/user/ppl/cmd/top.rst
  • docs/user/ppl/cmd/bin.rst
  • docs/user/ppl/cmd/timechart.rst
  • docs/user/ppl/cmd/subquery.rst
  • docs/user/ppl/admin/cross_cluster_search.rst
  • docs/user/ppl/cmd/table.rst
  • docs/user/ppl/admin/connectors/prometheus_connector.rst
  • docs/user/ppl/admin/connectors/s3glue_connector.rst
  • docs/user/ppl/cmd/fields.rst
  • docs/user/ppl/functions/aggregations.rst
  • docs/user/ppl/cmd/where.rst
  • docs/user/ppl/cmd/chart.rst
  • docs/user/ppl/admin/security.rst
  • docs/user/ppl/cmd/eval.rst
  • docs/user/ppl/cmd/explain.rst
  • docs/user/ppl/admin/monitoring.rst
  • docs/user/ppl/cmd/fillnull.rst
  • docs/user/ppl/cmd/rare.rst
  • docs/user/ppl/cmd/appendcol.rst
  • docs/user/ppl/cmd/appendpipe.rst
  • docs/user/ppl/cmd/describe.rst
  • docs/user/ppl/cmd/flatten.rst
  • docs/user/ppl/cmd/parse.rst
  • docs/user/ppl/cmd/sort.rst
  • docs/user/ppl/cmd/ml.rst
  • docs/user/ppl/admin/connectors/security_lake_connector.rst
  • docs/user/ppl/cmd/streamstats.rst
  • docs/user/ppl/cmd/search.rst
  • docs/user/ppl/cmd/head.rst
  • docs/user/ppl/cmd/join.rst
  • docs/user/ppl/cmd/append.rst
  • docs/user/ppl/cmd/kmeans.rst
  • docs/user/ppl/cmd/rex.rst
  • docs/user/ppl/cmd/spath.rst
  • docs/user/ppl/cmd/trendline.rst
  • docs/user/ppl/cmd/reverse.rst
  • docs/user/ppl/admin/datasources.rst
  • docs/user/ppl/cmd/lookup.rst
  • docs/user/ppl/cmd/grok.rst
  • docs/user/ppl/cmd/expand.rst
  • docs/user/ppl/cmd/regex.rst
  • docs/user/ppl/cmd/patterns.rst
  • docs/user/ppl/cmd/syntax.rst
  • docs/user/ppl/cmd/eventstats.rst
  • docs/user/ppl/cmd/ad.rst
  • docs/user/ppl/admin/settings.rst
  • docs/user/ppl/cmd/showdatasources.rst
  • docs/user/ppl/cmd/rename.rst
✅ Files skipped from review due to trivial changes (4)
  • docs/user/ppl/cmd/eval.md
  • docs/user/ppl/cmd/ad.md
  • docs/user/ppl/cmd/top.md
  • docs/user/ppl/cmd/syntax.md
🚧 Files skipped from review as they are similar to previous changes (23)
  • docs/user/ppl/cmd/trendline.md
  • docs/user/ppl/cmd/appendcol.md
  • docs/user/ppl/cmd/sort.md
  • docs/user/ppl/cmd/head.md
  • docs/user/ppl/cmd/append.md
  • docs/user/ppl/cmd/dedup.md
  • docs/user/ppl/cmd/stats.md
  • docs/user/ppl/cmd/table.md
  • docs/user/ppl/cmd/reverse.md
  • docs/user/ppl/cmd/rare.md
  • docs/user/ppl/functions/aggregations.md
  • docs/user/ppl/cmd/showdatasources.md
  • docs/user/ppl/cmd/grok.md
  • docs/user/ppl/cmd/appendpipe.md
  • DEVELOPER_GUIDE.rst
  • docs/user/ppl/cmd/expand.md
  • docs/user/ppl/cmd/fields.md
  • docs/user/ppl/cmd/flatten.md
  • docs/user/ppl/cmd/fillnull.md
  • docs/user/ppl/cmd/ml.md
  • docs/user/ppl/cmd/regex.md
  • docs/user/ppl/cmd/kmeans.md
  • docs/user/ppl/cmd/bin.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/dev/ppl-commands.md
  • docs/user/ppl/cmd/multisearch.md
  • docs/user/ppl/cmd/parse.md
🪛 LanguageTool
docs/user/ppl/admin/monitoring.md

[style] ~5-~5: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ntroduction By a stats endpoint, you are able to collect metrics for the plugin within t...

(BE_ABLE_TO)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...within the interval. Note that only node level statistics collecting is implement...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...s for the node you're accessing. Cluster level statistics have yet to be implemen...

(QB_NEW_EN_HYPHEN)

docs/dev/testing-doctest.md

[uncategorized] ~61-~61: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...SQL docs only. On Deprecation path. Use markdown for PPL) 1. If you want to add a new do...

(MARKDOWN_NNP)

docs/user/ppl/admin/connectors/prometheus_connector.md

[grammar] ~101-~101: Ensure spelling is correct
Context: ...ause, time range will be set to 1h with endtime set to now(). * In case of stats, res...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~102-~102: Use a hyphen to join words.
Context: ...ormal select queries, resolution is auto determined from the time range set. ...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/functions/collection.md

[grammar] ~76-~76: Ensure spelling is correct
Context: ...------| | 3 | +--------+ ``` ## FORALL ### Description Usage: `forall(array, fun...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/admin/connectors/security_lake_connector.md

[locale-violation] ~17-~17: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/user/ppl/admin/connectors/s3glue_connector.md

[locale-violation] ~16-~16: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/user/ppl/admin/datasources.md

[grammar] ~38-~38: Use a hyphen to join words.
Context: ... secure domains. * In case of security disabled domains, authorization is disba...

(QB_NEW_EN_HYPHEN)


[grammar] ~48-~48: Use a hyphen to join words.
Context: ...on and other details in case of security disabled domains. * Datasource Creation ...

(QB_NEW_EN_HYPHEN)


[grammar] ~151-~151: Use a hyphen to join words.
Context: ... * Sample python script to generate a 24 character master key ```bash import...

(QB_NEW_EN_HYPHEN)


[style] ~205-~205: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...tasource configuration * In versions prior to 2.7, the plugins.query.federation.datas...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[locale-violation] ~229-~229: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ... views, columns and other metadata info in future. ### Syntax source = datasource.info...

(IN_FUTURE)

docs/user/ppl/admin/settings.md

[style] ~5-~5: As a shorter alternative for ‘able to’, consider using “can”.
Context: ... cluster settings. Most of the settings are able to change dynamically so you can control t...

(BE_ABLE_TO)


[grammar] ~209-~209: Use a hyphen to join words.
Context: ...right, full, cross are performance sensitive join types which are disabled ...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/chart.md

[grammar] ~19-~19: Ensure spelling is correct
Context: ...are more categories than the limit. * usenull: optional. Controls whether to group eve...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/cmd/patterns.md

[grammar] ~26-~26: Use a hyphen to join words.
Context: ...e lower bound of frequency to ignore low frequency words. Default: 0.3. ...

(QB_NEW_EN_HYPHEN)


[grammar] ~95-~95: Use a hyphen to join words.
Context: ...patterns from a raw log field using user defined patterns. ```ppl source=apach...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/rename.md

[style] ~10-~10: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...* target-field: mandatory. The name you want to rename to. Must have same number of wil...

(REP_WANT_TO_VB)

docs/user/ppl/cmd/replace.md

[style] ~10-~10: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... * replacement: mandatory. The text you want to replace with. * field-name: mandatory...

(REP_WANT_TO_VB)


[grammar] ~111-~111: Use a hyphen to join words.
Context: ...se LIKE command with replace for pattern matching needs. ```ppl source=account...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/search.md

[uncategorized] ~24-~24: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...rs), ? (exactly one character) Full Text Search: Unlike other PPL commands, se...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~89-~89: Use a hyphen to join words.
Context: ... matching. For wildcard search use multi field with keyword: `search ip_address.k...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/spath.md

[grammar] ~18-~18: Ensure spelling is correct
Context: ...Simple Field Extraction The simplest spath is to extract a single field. This exam...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/cmd/timechart.md

[grammar] ~31-~31: Ensure spelling is correct
Context: ...ore distinct values than the limit. * usenull: optional. Controls whether NULL values ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/streamstats.md

36-36: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


36-36: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


37-37: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


37-37: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


153-153: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/user/ppl/admin/connectors/prometheus_connector.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


16-16: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


17-17: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


18-18: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


229-229: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


230-230: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


260-260: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


261-261: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

docs/user/ppl/cmd/describe.md

8-8: Reference links and images should use a label that is defined
Missing link or image reference definition: "schema."

(MD052, reference-links-images)

docs/user/ppl/admin/connectors/s3glue_connector.md

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


29-29: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


77-77: Bare URL used

(MD034, no-bare-urls)

docs/user/ppl/cmd/eventstats.md

28-28: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


28-28: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


29-29: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


29-29: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

docs/user/ppl/cmd/join.md

63-63: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/lookup.md

20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/multisearch.md

30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/parse.md

95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


103-103: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


111-111: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


119-119: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


127-127: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/search.md

92-92: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


93-93: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


547-547: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


666-666: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


705-705: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/user/ppl/cmd/subquery.md

77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


92-92: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


196-196: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: WhiteSource Security Check

@kylehounslow kylehounslow force-pushed the feat/markdown-doctests branch from 75d9cf4 to 95867e4 Compare December 9, 2025 20:09
Swiddis
Swiddis previously approved these changes Dec 9, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

♻️ Duplicate comments (6)
docs/user/ppl/cmd/subquery.md (1)

196-196: Replace hard tab with spaces to resolve MD010 linting error.

Line 196 contains a hard tab character that violates the no-hard-tabs rule. Replace it with spaces.

   """
-	}'
+}'

<!-- review_comment_end] -->

<!-- file_end -->

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/streamstats.md (1)</summary><blockquote>

`148-163`: **Unresolved: Fix table formatting in Example 3 explanatory section.**

This issue was flagged in a past review comment as addressed (commit f9b0c04), but the problematic formatting persists. Lines 149–163 still use indented ASCII-art table syntax instead of proper Markdown formatting. Replace the informal "original data" label with a bold header and convert the indented code block to a fenced code block.




Apply this diff:

```diff
  This example shows how to calculate the running average of age across accounts by country, using global argument.
- original data
-    +-------+---------+------------+-------+------+-----+
-    | name  | country | state      | month | year | age |
-  
-    |-------+---------+------------+-------+------+-----+
-    | Jake  | USA     | California | 4     | 2023 | 70  |
-    | Hello | USA     | New York   | 4     | 2023 | 30  |
-    | John  | Canada  | Ontario    | 4     | 2023 | 25  |
-    | Jane  | Canada  | Quebec     | 4     | 2023 | 20  |
-    | Jim   | Canada  | B.C        | 4     | 2023 | 27  |
-    | Peter | Canada  | B.C        | 4     | 2023 | 57  |
-    | Rick  | Canada  | B.C        | 4     | 2023 | 70  |
-    | David | USA     | Washington | 4     | 2023 | 40  |
-  
-    +-------+---------+------------+-------+------+-----+
+ **Original data:**
+
+ ```text
+ | name  | country | state      | month | year | age |
+ |-------|---------|------------|-------|------|-----|
+ | Jake  | USA     | California | 4     | 2023 | 70  |
+ | Hello | USA     | New York   | 4     | 2023 | 30  |
+ | John  | Canada  | Ontario    | 4     | 2023 | 25  |
+ | Jane  | Canada  | Quebec     | 4     | 2023 | 20  |
+ | Jim   | Canada  | B.C        | 4     | 2023 | 27  |
+ | Peter | Canada  | B.C        | 4     | 2023 | 57  |
+ | Rick  | Canada  | B.C        | 4     | 2023 | 70  |
+ | David | USA     | Washington | 4     | 2023 | 40  |
+ ```
docs/user/ppl/cmd/parse.md (1)

95-97: Add language specifications to fenced code blocks in Limitations section.

The code blocks in the Limitations section lack language specifications. For consistency with examples earlier in the document (which use ppl) and to enable proper syntax highlighting, add the ppl identifier to each block.

-```
+```ppl
 source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;
-```
+```

-```
+```ppl
 source=accounts | parse address '\d+ (?<street>.+)' | eval street='1' | where street='1' ;
-```
+```

-```
+```ppl
 source=accounts | parse address '\d+ (?<street>.+)' | eval address='1' ;
-```
+```

-```
+```ppl
 source=accounts | parse email '.+@(?<host>.+)' | stats avg(age) by host | where host=pyrami.com ;
-```
+```

-```
+```ppl
 source=accounts | parse email '.+@(?<host>.+)' | fields email, host ;
-```
+```

Also applies to: 103-105, 111-113, 119-121, 127-129

docs/user/ppl/cmd/eventstats.md (1)

28-29: Fix list indentation for nested items.

The nested list items at lines 28-29 have inconsistent indentation. Align them with proper spacing to match Markdown nested list conventions:

 * bucket_nullable: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+  * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+  * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
docs/user/ppl/cmd/search.md (2)

89-93: Fix list indentation and hyphenation in IP field tips section.

Lines 91–93 are still indented as nested bullets and “multi field” at line 89 should be hyphenated.

-**IP Fields**: Exact matching, CIDR notation
-* `search client_ip="192.168.1.0/24" source=logs`  
-* Limitations: No wildcards for partial IP matching. For wildcard search use multi field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'`  
-
-**Field Type Performance Tips**:
-   * Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
-   * For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
+**IP Fields**: Exact matching, CIDR notation
+* `search client_ip="192.168.1.0/24" source=logs`  
+* Limitations: No wildcards for partial IP matching. For wildcard search use multi-field with keyword: `search ip_address.keyword='1*' source=logs` or WHERE clause: `source=logs | where cast(ip_address as string) like '1%'`  
+
+**Field Type Performance Tips:**
+* Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
+* For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  

545-554: Add a language to the fenced code block under “Example 8: Complex Expressions”.

The code block at line 547 still lacks a language, which trips MD040 and reduces readability.

-Expected output:
-
-```
+Expected output:
+
+```text
 fetched rows / total rows = 1/1
 +---------------------------------------------------------+
 | body                                                    |
@@ -552,4 +553,4 @@
 | Payment failed: Insufficient funds for [email protected] |
 +---------------------------------------------------------+
-```
+```
🧹 Nitpick comments (24)
docs/user/ppl/cmd/eval.md (1)

8-10: Remove unnecessary backslash escaping in Markdown syntax definition.

The syntax section uses escaped angle brackets (\<field\>, \<expression\>), which is unnecessary in Markdown and appears to be a conversion artifact from the original RST format. In Markdown, you can safely use unescaped angle brackets or alternative formatting.

Consider simplifying this to either:

  • <field> and <expression> (unescaped)
  • field and expression (plain text, often clearer in documentation)
-eval \<field\>=\<expression\> ["," \<field\>=\<expression\> ]...
-* field: mandatory. If the field name does not exist, a new field is added. If the field name already exists, it will be overridden.  
-* expression: mandatory. Any expression supported by the system.
+eval <field>=<expression> ["," <field>=<expression> ]...
+* field: mandatory. If the field name does not exist, a new field is added. If the field name already exists, it will be overridden.
+* expression: mandatory. Any expression supported by the system.
docs/user/ppl/cmd/rex.md (1)

188-189: Clarify section structure for Example 8's invalid query subsection.

Lines 188–189 introduce "Invalid PPL query with underscores" but do not format it as a subheading. This creates ambiguity about the document structure. Consider reformatting as a subheading (e.g., ### Invalid PPL query with underscores) or restructuring the example narrative for clarity.

 This example demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations.
-Invalid PPL query with underscores
+
+### Invalid PPL query with underscores
docs/user/ppl/cmd/rename.md (1)

8-10: Reduce repetitive phrasing in parameter descriptions.

Lines 8–10 repeat "mandatory. The name..." structure. Vary the wording for clarity:

 * source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`.  
-* target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source.
+* target-field: mandatory. The new field name. Must contain the same number of wildcards as the source field.
docs/user/ppl/cmd/replace.md (2)

9-10: Reduce repetition in parameter descriptions.

Lines 9–10 both begin with "mandatory" and use similar phrasing ("want to replace" / "replace with"). Vary the wording for better readability.

 * pattern: mandatory. The text pattern you want to replace.  
-* replacement: mandatory. The text you want to replace with.  
+* replacement: mandatory. The text to substitute for the matched pattern.  

111-111: Hyphenate compound adjective "pattern-matching".

When "pattern matching" modifies a noun ("pattern matching needs"), it should be hyphenated as a compound adjective.

-Since replace command only supports plain string literals, you can use LIKE command with replace for pattern matching needs.
+Since replace command only supports plain string literals, you can use LIKE command with replace for pattern-matching needs.
docs/user/ppl/cmd/describe.md (1)

8-8: Consider escaping brackets in syntax notation to avoid linter warnings.

The markdownlint tool flags [schema.] as a potential reference link, though in context it's clearly syntax notation for an optional parameter. To satisfy the linter without changing meaning, you could escape the brackets:

-describe [dataSource.][schema.]\<tablename\>
+describe \[dataSource.\]\[schema.\]\<tablename\>

Alternatively, this is a false positive and can be safely ignored.

docs/user/ppl/admin/monitoring.md (1)

5-5: Minor grammar improvements for compound adjectives and conciseness.

The LanguageTool static analysis flags three minor improvements:

-By a stats endpoint, you are able to collect metrics for the plugin within the interval. Note that only node level statistics collecting is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster level statistics have yet to be implemented.
+By a stats endpoint, you can collect metrics for the plugin within the interval. Note that only node-level statistics collection is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster-level statistics have yet to be implemented.

Changes: "able to" → "can", hyphenate compound adjectives ("node-level", "cluster-level"), and improve verb form ("statistics collecting" → "statistics collection").

docs/user/ppl/cmd/patterns.md (2)

22-27: Fix hyphenation in compound adjectives (line 26).

Line 26 contains a compound adjective that needs hyphenation. This issue was flagged in a past review but remains unresolved:

-    * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3.
+    * `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest-word combination. This sets the lower bound of frequency to ignore low-frequency words. **Default:** 0.3.

93-100: Fix hyphenation in compound adjective at line 95.

Line 95 contains a compound adjective missing a hyphen. This issue was flagged in a past review but remains unresolved:

-This example shows how to extract patterns from a raw log field using user defined patterns.
+This example shows how to extract patterns from a raw log field using user-defined patterns.
docs/user/ppl/admin/connectors/prometheus_connector.md (1)

101-102: Fix hyphenation in compound adjectives.

Lines 101–102 use unhyphenated compound adjectives that should be hyphenated when preceding nouns.

-* Time range is determined through filter clause on `@timestamp`. If there is no such filter clause, time range will be set to 1h with endtime set to now().  
-* In case of stats, resolution is determined by `span(@timestamp,15s)` expression. For normal select queries, resolution is auto determined from the time range set.  
+* Time range is determined through filter clause on `@timestamp`. If there is no such filter clause, time range will be set to 1h with end time set to now().  
+* In case of stats, resolution is determined by `span(@timestamp,15s)` expression. For normal select queries, resolution is auto-determined from the time range set.  
docs/user/ppl/admin/settings.md (2)

5-5: Simplify phrasing for clarity.

Consider using "can" instead of "able to" for a more concise phrasing.

-When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings are able to change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.
+When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings can change dynamically so you can control the behavior of PPL plugin without needing to bounce your cluster.

209-209: Hyphenate compound adjective.

Line 209 uses "performance sensitive" which should be hyphenated when preceding a noun.

-Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance sensitive join types which are disabled by default.
+Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance-sensitive join types which are disabled by default.
docs/user/ppl/admin/connectors/s3glue_connector.md (1)

16-16: Use American English phrasing for "in the future".

Line 16 uses British English "in future"; use the American English standard "in the future".

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.

Note: Also capitalize the first letter of the sentence.

docs/user/ppl/admin/connectors/security_lake_connector.md (1)

17-17: Use American English phrasing and capitalize sentence.

Line 17 uses British English "in future" and starts with lowercase "we".

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.
docs/dev/testing-doctest.md (1)

61-61: Capitalize "Markdown" as a proper noun.

Line 61 should capitalize "Markdown" to refer to the format name correctly.

-#### RST Format (SQL docs only. On Deprecation path. Use markdown for PPL)
+#### RST Format (SQL docs only. On Deprecation path. Use Markdown for PPL)
docs/user/ppl/admin/datasources.md (4)

38-38: Hyphenate compound adjectives.

Lines 38 and 48 use unhyphenated compound adjectives that should be hyphenated when preceding nouns.

-* In case of security disabled domains, authorization is disbaled.
+* In case of security-disabled domains, authorization is disabled.

and

-Datasource configuration can be managed using below REST APIs. All the examples below are for OpenSearch domains enabled with secure domain.
-we can remove authorization and other details in case of security disabled domains.
+Datasource configuration can be managed using below REST APIs. All the examples below are for OpenSearch domains enabled with secure domain.
+We can remove authorization and other details in case of security-disabled domains.

Note: Also fixed capitalization of "We" on line 48 and typo "disbaled" → "disabled" on line 38.

Also applies to: 48-48


148-148: Hyphenate compound adjective.

Line 148 uses "24 character" which should be hyphenated when preceding a noun.

-* Sample python script to generate a 24 character master key  
+* Sample python script to generate a 24-character master key  

202-202: Use more concise phrasing.

Line 202 uses "prior to" which is somewhat wordy; consider using "before" instead.

-* In versions prior to 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.  
+* In versions before 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.  

226-226: Use American English phrasing.

Line 226 uses British English "in future"; use the American English standard "in the future".

-In the current state, `information_schema` only support metadata of tables.
-This schema will be extended for views, columns and other metadata info in future.
+In the current state, `information_schema` only support metadata of tables.
+This schema will be extended for views, columns and other metadata info in the future.
docs/user/ppl/cmd/search.md (1)

666-706: Use proper headings instead of bold text for escaping subsections.

“Backslash in file paths”, “Quotes within strings”, and “Text with special characters” are acting as section headings but are formatted as bold text. Converting them to ### headings improves structure and fixes MD036.

-**Backslash in file paths**
+### Backslash in file paths
@@
-**Quotes within strings**
+### Quotes within strings
@@
-**Text with special characters**
+### Text with special characters
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

198-199: SPLIT registration and empty-delimiter handling look good; double‑check regex semantics.

The new SPLIT wiring (CHARACTER, CHARACTER) and the CASE between SqlLibraryOperators.SPLIT and REGEXP_EXTRACT_ALL for empty delimiters are consistent with the rest of this table and with the existing CASE usage (e.g., NULLIF).

One thing to verify is the behavior of:

RexNode dotPattern = builder.makeLiteral(".");
RexNode splitChars =
    builder.makeCall(SqlLibraryOperators.REGEXP_EXTRACT_ALL, str, dotPattern);

Some Calcite versions/functions return matches based on a capturing group and may expect a pattern like (.) for character‑by‑character extraction; using "." without a group might not yield the intended array of individual characters depending on the exact REGEXP_EXTRACT_ALL contract.

If tests don’t already cover this, can you confirm (or add a test) that split("abc", "") really produces ["a","b","c"] with your Calcite version? If not, the change would just be:

- RexNode dotPattern = builder.makeLiteral(".");
+ RexNode dotPattern = builder.makeLiteral("(.)");

Also applies to: 993-1020

.github/workflows/sql-cli-integration-test.yml (1)

35-96: Workflow wiring looks correct; consider enabling Gradle caching.

The job correctly builds SQL into Maven Local and runs sql-cli tests against it. To speed up repeated runs, you could optionally add Gradle caching to the setup-java step:

      - name: Set up JDK ${{ matrix.java }}
        uses: actions/setup-java@v4
        with:
          distribution: temurin
          java-version: ${{ matrix.java }}
          cache: gradle

Not required, but it will typically shave time off CI for this workflow.

api/src/main/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspiler.java (1)

19-39: Enforce non‑null dialect (and optionally plan) to make failures clearer.

UnifiedQueryTranspiler fundamentally requires a non‑null SqlDialect, but the Lombok builder doesn’t enforce this and toSql will currently fail later inside Calcite if dialect is missing. Similarly, the Javadoc says plan “must not be null” but that isn’t checked.

You could make the preconditions explicit:

  public String toSql(RelNode plan) {
+   if (dialect == null) {
+     throw new IllegalStateException("SqlDialect must be configured on UnifiedQueryTranspiler");
+   }
+   if (plan == null) {
+     throw new IllegalArgumentException("Logical plan must not be null");
+   }
    try {
      RelToSqlConverter converter = new RelToSqlConverter(dialect);
      SqlNode sqlNode = converter.visitRoot(plan).asStatement();
      return sqlNode.toSqlString(dialect).getSql();

Optionally, you could also switch to a static factory instead of exposing the raw Lombok builder if you want to guarantee required fields at construction time.

api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java (1)

17-58: Dialect coverage in transpiler tests is solid; consider normalizing both sides for line endings.

The tests nicely validate both the plain SparkSqlDialect and OpenSearchSparkSqlDialect (including the SAFE_CAST → TRY_CAST mapping). For true cross‑platform robustness, you might want normalize applied to actualSql as well as expectedSql, so differences in line separator handling don’t cause spurious failures on non‑Linux platforms.

-    String actualSql = transpiler.toSql(plan);
-    String expectedSql = normalize("SELECT *\nFROM `catalog`.`employees`");
-    assertEquals(
+    String actualSql = normalize(transpiler.toSql(plan));
+    String expectedSql = normalize("SELECT *\nFROM `catalog`.`employees`");
+    assertEquals(
         "Transpiled SQL using SparkSqlDialect should match expected SQL", expectedSql, actualSql);

And similarly for the custom dialect test.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9d6355 and 95867e4.

📒 Files selected for processing (118)
  • .github/workflows/sql-cli-integration-test.yml (1 hunks)
  • DEVELOPER_GUIDE.rst (1 hunks)
  • api/README.md (2 hunks)
  • api/build.gradle (2 hunks)
  • api/src/main/java/org/opensearch/sql/api/EmptyDataSourceService.java (1 hunks)
  • api/src/main/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspiler.java (1 hunks)
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java (9 hunks)
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java (1 hunks)
  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2 hunks)
  • docs/category.json (2 hunks)
  • docs/dev/ppl-commands.md (1 hunks)
  • docs/dev/testing-doctest.md (1 hunks)
  • docs/user/ppl/admin/connectors/prometheus_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/prometheus_connector.rst (0 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.rst (0 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.rst (0 hunks)
  • docs/user/ppl/admin/cross_cluster_search.md (1 hunks)
  • docs/user/ppl/admin/cross_cluster_search.rst (0 hunks)
  • docs/user/ppl/admin/datasources.md (1 hunks)
  • docs/user/ppl/admin/datasources.rst (0 hunks)
  • docs/user/ppl/admin/monitoring.md (1 hunks)
  • docs/user/ppl/admin/monitoring.rst (0 hunks)
  • docs/user/ppl/admin/security.md (1 hunks)
  • docs/user/ppl/admin/security.rst (0 hunks)
  • docs/user/ppl/admin/settings.md (1 hunks)
  • docs/user/ppl/admin/settings.rst (0 hunks)
  • docs/user/ppl/cmd/ad.md (1 hunks)
  • docs/user/ppl/cmd/ad.rst (0 hunks)
  • docs/user/ppl/cmd/append.md (1 hunks)
  • docs/user/ppl/cmd/append.rst (0 hunks)
  • docs/user/ppl/cmd/appendcol.md (1 hunks)
  • docs/user/ppl/cmd/appendcol.rst (0 hunks)
  • docs/user/ppl/cmd/appendpipe.md (1 hunks)
  • docs/user/ppl/cmd/appendpipe.rst (0 hunks)
  • docs/user/ppl/cmd/bin.md (1 hunks)
  • docs/user/ppl/cmd/bin.rst (0 hunks)
  • docs/user/ppl/cmd/chart.md (1 hunks)
  • docs/user/ppl/cmd/chart.rst (0 hunks)
  • docs/user/ppl/cmd/dedup.md (1 hunks)
  • docs/user/ppl/cmd/dedup.rst (0 hunks)
  • docs/user/ppl/cmd/describe.md (1 hunks)
  • docs/user/ppl/cmd/describe.rst (0 hunks)
  • docs/user/ppl/cmd/eval.md (1 hunks)
  • docs/user/ppl/cmd/eval.rst (0 hunks)
  • docs/user/ppl/cmd/eventstats.md (1 hunks)
  • docs/user/ppl/cmd/eventstats.rst (0 hunks)
  • docs/user/ppl/cmd/expand.md (1 hunks)
  • docs/user/ppl/cmd/expand.rst (0 hunks)
  • docs/user/ppl/cmd/explain.md (1 hunks)
  • docs/user/ppl/cmd/explain.rst (0 hunks)
  • docs/user/ppl/cmd/fields.md (1 hunks)
  • docs/user/ppl/cmd/fields.rst (0 hunks)
  • docs/user/ppl/cmd/fillnull.md (1 hunks)
  • docs/user/ppl/cmd/fillnull.rst (0 hunks)
  • docs/user/ppl/cmd/flatten.md (1 hunks)
  • docs/user/ppl/cmd/flatten.rst (0 hunks)
  • docs/user/ppl/cmd/grok.md (1 hunks)
  • docs/user/ppl/cmd/grok.rst (0 hunks)
  • docs/user/ppl/cmd/head.md (1 hunks)
  • docs/user/ppl/cmd/head.rst (0 hunks)
  • docs/user/ppl/cmd/join.md (1 hunks)
  • docs/user/ppl/cmd/join.rst (0 hunks)
  • docs/user/ppl/cmd/kmeans.md (1 hunks)
  • docs/user/ppl/cmd/kmeans.rst (0 hunks)
  • docs/user/ppl/cmd/lookup.md (1 hunks)
  • docs/user/ppl/cmd/lookup.rst (0 hunks)
  • docs/user/ppl/cmd/ml.md (1 hunks)
  • docs/user/ppl/cmd/ml.rst (0 hunks)
  • docs/user/ppl/cmd/multisearch.md (1 hunks)
  • docs/user/ppl/cmd/multisearch.rst (0 hunks)
  • docs/user/ppl/cmd/parse.md (1 hunks)
  • docs/user/ppl/cmd/parse.rst (0 hunks)
  • docs/user/ppl/cmd/patterns.md (1 hunks)
  • docs/user/ppl/cmd/patterns.rst (0 hunks)
  • docs/user/ppl/cmd/rare.md (1 hunks)
  • docs/user/ppl/cmd/rare.rst (0 hunks)
  • docs/user/ppl/cmd/regex.md (1 hunks)
  • docs/user/ppl/cmd/regex.rst (0 hunks)
  • docs/user/ppl/cmd/rename.md (1 hunks)
  • docs/user/ppl/cmd/rename.rst (0 hunks)
  • docs/user/ppl/cmd/replace.md (1 hunks)
  • docs/user/ppl/cmd/replace.rst (0 hunks)
  • docs/user/ppl/cmd/reverse.md (1 hunks)
  • docs/user/ppl/cmd/reverse.rst (0 hunks)
  • docs/user/ppl/cmd/rex.md (1 hunks)
  • docs/user/ppl/cmd/rex.rst (0 hunks)
  • docs/user/ppl/cmd/search.md (1 hunks)
  • docs/user/ppl/cmd/search.rst (0 hunks)
  • docs/user/ppl/cmd/showdatasources.md (1 hunks)
  • docs/user/ppl/cmd/showdatasources.rst (0 hunks)
  • docs/user/ppl/cmd/sort.md (1 hunks)
  • docs/user/ppl/cmd/sort.rst (0 hunks)
  • docs/user/ppl/cmd/spath.md (1 hunks)
  • docs/user/ppl/cmd/spath.rst (0 hunks)
  • docs/user/ppl/cmd/stats.md (1 hunks)
  • docs/user/ppl/cmd/stats.rst (0 hunks)
  • docs/user/ppl/cmd/streamstats.md (1 hunks)
  • docs/user/ppl/cmd/streamstats.rst (0 hunks)
  • docs/user/ppl/cmd/subquery.md (1 hunks)
  • docs/user/ppl/cmd/subquery.rst (0 hunks)
  • docs/user/ppl/cmd/syntax.md (1 hunks)
  • docs/user/ppl/cmd/syntax.rst (0 hunks)
  • docs/user/ppl/cmd/table.md (1 hunks)
  • docs/user/ppl/cmd/table.rst (0 hunks)
  • docs/user/ppl/cmd/timechart.md (1 hunks)
  • docs/user/ppl/cmd/timechart.rst (0 hunks)
  • docs/user/ppl/cmd/top.md (1 hunks)
  • docs/user/ppl/cmd/top.rst (0 hunks)
  • docs/user/ppl/cmd/trendline.md (1 hunks)
  • docs/user/ppl/cmd/trendline.rst (0 hunks)
  • docs/user/ppl/cmd/where.md (1 hunks)
  • docs/user/ppl/cmd/where.rst (0 hunks)
  • docs/user/ppl/functions/aggregations.md (1 hunks)
  • docs/user/ppl/functions/aggregations.rst (0 hunks)
💤 Files with no reviewable changes (52)
  • docs/user/ppl/cmd/dedup.rst
  • docs/user/ppl/cmd/multisearch.rst
  • docs/user/ppl/cmd/appendpipe.rst
  • docs/user/ppl/cmd/grok.rst
  • docs/user/ppl/cmd/rex.rst
  • docs/user/ppl/cmd/top.rst
  • docs/user/ppl/admin/datasources.rst
  • docs/user/ppl/cmd/rare.rst
  • docs/user/ppl/cmd/ml.rst
  • docs/user/ppl/cmd/replace.rst
  • docs/user/ppl/admin/connectors/security_lake_connector.rst
  • docs/user/ppl/cmd/streamstats.rst
  • docs/user/ppl/cmd/timechart.rst
  • docs/user/ppl/cmd/reverse.rst
  • docs/user/ppl/cmd/eval.rst
  • docs/user/ppl/cmd/fields.rst
  • docs/user/ppl/cmd/search.rst
  • docs/user/ppl/admin/connectors/prometheus_connector.rst
  • docs/user/ppl/cmd/describe.rst
  • docs/user/ppl/cmd/spath.rst
  • docs/user/ppl/cmd/eventstats.rst
  • docs/user/ppl/cmd/ad.rst
  • docs/user/ppl/cmd/sort.rst
  • docs/user/ppl/admin/connectors/s3glue_connector.rst
  • docs/user/ppl/cmd/explain.rst
  • docs/user/ppl/cmd/expand.rst
  • docs/user/ppl/cmd/append.rst
  • docs/user/ppl/cmd/syntax.rst
  • docs/user/ppl/cmd/trendline.rst
  • docs/user/ppl/cmd/kmeans.rst
  • docs/user/ppl/cmd/flatten.rst
  • docs/user/ppl/cmd/fillnull.rst
  • docs/user/ppl/cmd/lookup.rst
  • docs/user/ppl/cmd/showdatasources.rst
  • docs/user/ppl/cmd/stats.rst
  • docs/user/ppl/cmd/appendcol.rst
  • docs/user/ppl/admin/security.rst
  • docs/user/ppl/cmd/bin.rst
  • docs/user/ppl/cmd/chart.rst
  • docs/user/ppl/cmd/table.rst
  • docs/user/ppl/cmd/parse.rst
  • docs/user/ppl/functions/aggregations.rst
  • docs/user/ppl/cmd/head.rst
  • docs/user/ppl/cmd/rename.rst
  • docs/user/ppl/cmd/join.rst
  • docs/user/ppl/admin/cross_cluster_search.rst
  • docs/user/ppl/cmd/where.rst
  • docs/user/ppl/cmd/subquery.rst
  • docs/user/ppl/cmd/regex.rst
  • docs/user/ppl/admin/monitoring.rst
  • docs/user/ppl/admin/settings.rst
  • docs/user/ppl/cmd/patterns.rst
✅ Files skipped from review due to trivial changes (9)
  • docs/user/ppl/cmd/table.md
  • docs/user/ppl/cmd/head.md
  • api/README.md
  • docs/user/ppl/cmd/flatten.md
  • docs/user/ppl/admin/cross_cluster_search.md
  • docs/user/ppl/cmd/stats.md
  • docs/user/ppl/cmd/ad.md
  • api/src/main/java/org/opensearch/sql/api/EmptyDataSourceService.java
  • docs/user/ppl/cmd/appendcol.md
🚧 Files skipped from review as they are similar to previous changes (18)
  • docs/user/ppl/cmd/dedup.md
  • docs/user/ppl/cmd/append.md
  • docs/user/ppl/cmd/kmeans.md
  • docs/user/ppl/cmd/expand.md
  • docs/user/ppl/cmd/showdatasources.md
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/join.md
  • docs/user/ppl/cmd/fillnull.md
  • docs/user/ppl/cmd/top.md
  • docs/dev/ppl-commands.md
  • DEVELOPER_GUIDE.rst
  • docs/user/ppl/cmd/reverse.md
  • docs/user/ppl/cmd/rare.md
  • docs/user/ppl/cmd/regex.md
  • docs/user/ppl/cmd/where.md
  • docs/user/ppl/cmd/fields.md
  • docs/user/ppl/cmd/appendpipe.md
  • docs/user/ppl/cmd/ml.md
🧰 Additional context used
📓 Path-based instructions (3)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/main/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspiler.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/main/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspiler.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/cmd/syntax.md
  • docs/user/ppl/cmd/eval.md
  • docs/user/ppl/cmd/multisearch.md
  • docs/user/ppl/cmd/parse.md
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x

Applied to files:

  • api/build.gradle
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Use `./gradlew :integ-test:integTest` for integration testing in OpenSearch SQL

Applied to files:

  • .github/workflows/sql-cli-integration-test.yml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • .github/workflows/sql-cli-integration-test.yml
  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/main/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspiler.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify changes with `./gradlew :integ-test:integTest` before merge

Applied to files:

  • .github/workflows/sql-cli-integration-test.yml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • .github/workflows/sql-cli-integration-test.yml
  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java
🧬 Code graph analysis (2)
api/src/test/java/org/opensearch/sql/api/transpiler/UnifiedQueryTranspilerTest.java (2)
api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java (1)
  • UnifiedQueryTestBase (20-60)
ppl/src/main/java/org/opensearch/sql/ppl/calcite/OpenSearchSparkSqlDialect.java (1)
  • OpenSearchSparkSqlDialect (19-71)
api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java (1)
api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java (1)
  • UnifiedQueryPlanner (45-227)
🪛 LanguageTool
docs/user/ppl/admin/connectors/s3glue_connector.md

[locale-violation] ~16-~16: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/dev/testing-doctest.md

[uncategorized] ~61-~61: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...SQL docs only. On Deprecation path. Use markdown for PPL) 1. If you want to add a new do...

(MARKDOWN_NNP)

docs/user/ppl/cmd/search.md

[uncategorized] ~24-~24: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...rs), ? (exactly one character) Full Text Search: Unlike other PPL commands, se...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~89-~89: Use a hyphen to join words.
Context: ... matching. For wildcard search use multi field with keyword: `search ip_address.k...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/admin/datasources.md

[grammar] ~38-~38: Use a hyphen to join words.
Context: ... secure domains. * In case of security disabled domains, authorization is disba...

(QB_NEW_EN_HYPHEN)


[grammar] ~48-~48: Use a hyphen to join words.
Context: ...on and other details in case of security disabled domains. * Datasource Creation ...

(QB_NEW_EN_HYPHEN)


[grammar] ~148-~148: Use a hyphen to join words.
Context: ... * Sample python script to generate a 24 character master key ```bash import...

(QB_NEW_EN_HYPHEN)


[style] ~202-~202: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...tasource configuration * In versions prior to 2.7, the plugins.query.federation.datas...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[locale-violation] ~226-~226: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ... views, columns and other metadata info in future. ### Syntax source = datasource.info...

(IN_FUTURE)

docs/user/ppl/admin/connectors/prometheus_connector.md

[grammar] ~101-~101: Ensure spelling is correct
Context: ...ause, time range will be set to 1h with endtime set to now(). * In case of stats, res...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~102-~102: Use a hyphen to join words.
Context: ...ormal select queries, resolution is auto determined from the time range set. ...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/admin/connectors/security_lake_connector.md

[locale-violation] ~17-~17: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/user/ppl/admin/monitoring.md

[style] ~5-~5: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ntroduction By a stats endpoint, you are able to collect metrics for the plugin within t...

(BE_ABLE_TO)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...within the interval. Note that only node level statistics collecting is implement...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...s for the node you're accessing. Cluster level statistics have yet to be implemen...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/admin/settings.md

[style] ~5-~5: As a shorter alternative for ‘able to’, consider using “can”.
Context: ... cluster settings. Most of the settings are able to change dynamically so you can control t...

(BE_ABLE_TO)


[grammar] ~209-~209: Use a hyphen to join words.
Context: ...right, full, cross are performance sensitive join types which are disabled ...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/chart.md

[grammar] ~19-~19: Ensure spelling is correct
Context: ...are more categories than the limit. * usenull: optional. Controls whether to group eve...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/cmd/patterns.md

[grammar] ~26-~26: Use a hyphen to join words.
Context: ...e lower bound of frequency to ignore low frequency words. Default: 0.3. ...

(QB_NEW_EN_HYPHEN)


[grammar] ~95-~95: Use a hyphen to join words.
Context: ...patterns from a raw log field using user defined patterns. ```ppl source=apach...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/rename.md

[style] ~10-~10: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...* target-field: mandatory. The name you want to rename to. Must have same number of wil...

(REP_WANT_TO_VB)

docs/user/ppl/cmd/replace.md

[style] ~10-~10: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... * replacement: mandatory. The text you want to replace with. * field-name: mandatory...

(REP_WANT_TO_VB)


[grammar] ~111-~111: Use a hyphen to join words.
Context: ...se LIKE command with replace for pattern matching needs. ```ppl source=account...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/spath.md

[grammar] ~18-~18: Ensure spelling is correct
Context: ...Simple Field Extraction The simplest spath is to extract a single field. This exam...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/cmd/timechart.md

[grammar] ~31-~31: Ensure spelling is correct
Context: ...ore distinct values than the limit. * usenull: optional. Controls whether NULL values ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/explain.md

87-87: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/search.md

92-92: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


93-93: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


547-547: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


666-666: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


705-705: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/user/ppl/admin/connectors/prometheus_connector.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


16-16: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


17-17: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


18-18: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


229-229: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


230-230: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


260-260: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


261-261: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

docs/user/ppl/admin/connectors/security_lake_connector.md

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


23-23: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


29-29: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


30-30: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


33-33: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


63-63: Bare URL used

(MD034, no-bare-urls)

docs/user/ppl/cmd/describe.md

8-8: Reference links and images should use a label that is defined
Missing link or image reference definition: "schema."

(MD052, reference-links-images)

docs/user/ppl/cmd/eventstats.md

28-28: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


28-28: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


29-29: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


29-29: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

docs/user/ppl/cmd/lookup.md

20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/multisearch.md

30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/parse.md

95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


103-103: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


111-111: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


119-119: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


127-127: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/patterns.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/streamstats.md

36-36: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


36-36: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


37-37: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


37-37: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


153-153: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/user/ppl/cmd/subquery.md

77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


92-92: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


196-196: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (26)
docs/user/ppl/cmd/eval.md (2)

12-128: Examples are clear and well-documented.

All five examples are well-structured with clear descriptions and properly formatted expected output. The progression from simple field creation to more complex operations (chaining fields, string concatenation, type casting) provides good coverage of eval command capabilities.


1-132: Verify compliance with PPL command documentation checklist.

Per the learnings from prior reviews, PPL command PRs should refer to docs/dev/ppl-commands.md and verify the PR satisfies the command documentation checklist. Please confirm this documentation page meets all required standards for PPL command documentation.

docs/user/ppl/cmd/rex.md (2)

8-8: Verify Markdown rendering of angle-bracket placeholders in syntax line.

The syntax line uses RST-style escaping (\<mode\>, \<field\>, etc.) for placeholder angle brackets. Confirm this renders correctly in Markdown, or consider switching to inline code formatting (`<mode>`) or plain angle brackets (<mode>), depending on the project's documentation style guide.


1-290: Excellent documentation coverage and clarity.

The documentation is well-structured, comprehensive, and user-friendly. The nine examples effectively demonstrate core features (extraction, non-matching patterns, multiple matches, sed mode, offset tracking, chaining, error handling), each with clear expected outputs. The comparison table and limitations section provide helpful context. Naming restrictions, pattern requirements, max-match behavior, and system limits are explicitly documented with concrete error examples.

docs/user/ppl/cmd/rename.md (2)

1-141: Verify this rename command documentation against the PPL command checklist in docs/dev/ppl-commands.md.

Ensure the documentation includes all required sections and follows the format specified for PPL commands (e.g., Description, Syntax, Behavior, Examples, and Limitations).


8-8: Verify angle bracket escaping in the syntax line.

Line 8 contains \<source-field\> and \> which appear to be reStructuredText escape sequences. In standard Markdown, these will render as literal backslashes. Verify the intended rendering and update accordingly:

  • If using < and > for placeholder notation, remove the backslashes: <source-field> and <target-field>
  • If visual distinction is preferred, use inline code formatting: `<source-field>` and `<target-field>`
docs/user/ppl/cmd/timechart.md (1)

1-375: Documentation is comprehensive and well-structured.

The Markdown conversion is complete with clear sections, detailed parameter documentation, 11 illustrative examples with expected outputs, limitations, and constants. Cross-references use proper Markdown syntax, and previously flagged issues (RST syntax, missing parameter docs) have been resolved. Examples effectively demonstrate parameter behavior including edge cases.

docs/user/ppl/cmd/replace.md (1)

13-325: Examples and content are well-structured and semantically sound.

The 13 examples comprehensively cover single/multiple field replacement, wildcards (prefix, suffix, multi-capture), escape sequences, and edge cases. Spot-checked outputs (Examples 6, 8, 9, 13) are logically consistent with the described wildcard semantics. Examples clearly demonstrate both literal and pattern-based replacements, making the command's capabilities easy to understand.

docs/user/ppl/cmd/trendline.md (1)

1-114: Documentation is well-structured and properly formatted.

The trendline command documentation is comprehensive, with clear descriptions, syntax explanations, and four well-organized examples. All code blocks are properly marked with language identifiers. No issues identified.

api/build.gradle (1)

8-8: Build configuration changes are appropriate and well-integrated.

The addition of the Lombok plugin (line 8) and Spotless licenseHeader configuration (lines 29-32) are standard practices for reducing boilerplate and ensuring license compliance. The Apache-2.0 license header is properly formatted and will be automatically injected into source files.

Also applies to: 29-32

docs/user/ppl/cmd/grok.md (1)

1-86: Documentation is well-structured and properly formatted.

The grok command documentation is comprehensive with clear syntax, three illustrative examples, and proper cross-referencing to related parse command limitations. All code blocks are appropriately marked with language identifiers. No issues identified.

docs/user/ppl/cmd/syntax.md (1)

1-18: Documentation is clear and appropriately concise.

This syntax overview document effectively explains command ordering and argument notation conventions with a clear example. The code block is properly formatted. No issues identified.

docs/user/ppl/cmd/chart.md (1)

1-200: ✅ Well-structured chart command documentation.

The chart command documentation is comprehensive, with clear parameter descriptions, practical examples with expected outputs, and appropriate syntax highlighting on code blocks. The layout follows a consistent pattern that will serve users well.

docs/user/ppl/cmd/multisearch.md (1)

40-147: ✅ Well-structured examples and outputs.

Examples 1–4 have proper syntax highlighting with ppl language identifiers and clear expected outputs. The descriptions and use-case documentation are helpful.

docs/user/ppl/admin/monitoring.md (1)

11-34: ✅ Table and code examples are well-formatted.

The Markdown table is properly formatted, and code examples have appropriate language identifiers (bash ignore for cURL, json for response).

docs/user/ppl/cmd/patterns.md (1)

49-260: ✅ Comprehensive examples and consistent formatting.

The documentation includes diverse, well-documented examples for both simple_pattern and brain methods. Code blocks use proper PPL syntax highlighting, and the organization makes it easy for users to find relevant use cases.

docs/user/ppl/cmd/sort.md (1)

1-255: ✅ Excellent sort command documentation.

The sort command docs are thorough with clear syntax explanation, proper handling of edge cases (NULL ordering, count parameter), and diverse examples covering single-field, multi-field, ascending/descending, and field-type specification scenarios. Formatting is consistent throughout.

docs/category.json (3)

6-8: ✅ Proper endpoint and protocol references in bash_calcite.

The bash_calcite section correctly updated from RST to Markdown files (endpoint.md, protocol.md).


10-61: ✅ Comprehensive ppl_cli_calcite index with proper Markdown entries.

The ppl_cli_calcite section has been significantly expanded with 52 entries covering commands, functions, and general documentation—all properly referenced as .md files. This aligns well with the migration from RST to Markdown documentation format.


77-79: ✅ Updated bash_settings reference to Markdown.

The bash_settings section correctly updated to reference settings.md instead of an RST equivalent.

docs/user/ppl/cmd/spath.md (1)

1-110: ✅ Clear spath command documentation with practical examples.

The spath documentation includes a helpful performance note about pushdown optimization, clear syntax explanation with a reference to json_extract, and diverse examples progressing from simple field extraction to complex nested structures and escaped paths. All code blocks are properly formatted.

docs/user/ppl/admin/security.md (1)

1-65: ✅ Well-organized security settings documentation.

The security documentation clearly outlines required permissions, provides REST API examples with version information, and includes Security Dashboard steps. Code blocks are properly formatted with bash language identifier. The past spelling issue ("grant" vs. "grand") has been corrected.

docs/user/ppl/functions/aggregations.md (1)

1-652: Well-structured aggregation functions documentation.

This comprehensive guide to PPL aggregation functions is well-organized with clear descriptions, consistent examples, and properly formatted output expectations. The documentation provides good context for each function including NULL/MISSING handling behavior and usage variations.

core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java (1)

68-85: SPLIT enum addition is consistent with existing collection functions.

Placing SPLIT(FunctionName.of("split")) alongside the other collection helpers (ARRAY, MVINDEX, etc.) looks correct and keeps naming consistent.

api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java (1)

19-59: Shared test schema and planner setup is straightforward and reusable.

The in‑memory employees table plus UnifiedQueryPlanner configuration under the catalog namespace is a good base for planner/transpiler tests and keeps individual tests focused on behavior instead of wiring.

api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerTest.java (1)

19-157: Planner namespace tests correctly cover single, multi‑level, and multi‑catalog setups.

Extending UnifiedQueryTestBase for the shared schema while building dedicated planners per test (including testDeepSchema for catalog.opensearch) gives good coverage of default namespace, multi‑catalog resolution, metadata caching, and error cases without overcomplicating the setup.

@kylehounslow kylehounslow dismissed Swiddis’s stale review December 9, 2025 20:18

The merge-base changed after approval.

@kylehounslow kylehounslow force-pushed the feat/markdown-doctests branch from 95867e4 to 3a2370e Compare December 9, 2025 20:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

♻️ Duplicate comments (9)
docs/user/ppl/cmd/lookup.md (1)

20-28: Add language specification to the usage examples code block.

This issue was flagged in the prior review. For consistency with other code blocks in the document (which specify bash, json, etc.) and to enable proper syntax highlighting, add a language identifier to this block.

-```
+```bash
 source = table1 | lookup table2 id
 source = table1 | lookup table2 id, name
 source = table1 | lookup table2 id as cid, name
 source = table1 | lookup table2 id as cid, name replace dept as department
 source = table1 | lookup table2 id as cid, name replace dept as department, city as location
 source = table1 | lookup table2 id as cid, name append dept as department
 source = table1 | lookup table2 id as cid, name append dept as department, city as location
-```
+```
docs/user/ppl/cmd/timechart.md (2)

69-69: Clarify how null values are represented in results.

Line 69 states that null values in the "by" field "appear as null in the results," but Example 10 (lines 337–347) demonstrates they are displayed using the nullstr label (default: "NULL"). Align the note with the actual behavior.

-* **Null values**: Documents with null values in the "by" field are treated as a separate category and appear as null in the results.
+* **Null values**: Documents with null values in the "by" field are treated as a separate category and are displayed with the label specified by the `nullstr` parameter (default: "NULL").

8-8: Syntax line missing usenull and nullstr parameters.

The syntax definition does not include the usenull and nullstr parameters that are documented in lines 31–35 and demonstrated in Example 10 (line 325). The syntax should reflect all supported parameters.

Update the syntax to include these optional parameters:

-timechart [timefield=\<field_name\>] [span=\<time_interval\>] [limit=\<number\>] [useother=\<boolean\>] \<aggregation_function\> [by \<field\>]
+timechart [timefield=\<field_name\>] [span=\<time_interval\>] [limit=\<number\>] [useother=\<boolean\>] [usenull=\<boolean\>] [nullstr=\<string\>] \<aggregation_function\> [by \<field\>]
docs/user/ppl/cmd/eventstats.md (1)

28-29: Fix list indentation inconsistency.

The nested list items require proper 2-space indentation. Adjust lines 28–29 to align uniformly with the parent list item at line 27:

 * bucket_nullable: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
-  * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
-  * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
+   * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+   * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
docs/user/ppl/admin/connectors/security_lake_connector.md (2)

63-63: Wrap bare URL in Markdown link syntax.

Line 63 contains a bare URL that should be wrapped in Markdown link format for proper rendering and accessibility.

-Documentation for Index Queries: https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md
+Documentation for Index Queries: [Index Queries Documentation](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md)

21-31: Fix list indentation to comply with Markdown standards.

Lines 21-31 contain nested list items with inconsistent indentation. Top-level items should use 0 spaces, and nested items should use 2 additional spaces per level.

 Glue Connector Properties.
 * `resultIndex` is a new parameter specific to glue connector. Stores the results of queries executed on the data source. If unavailable, it defaults to .query_execution_result.  
 * `glue.auth.type` [Required]  
-    * This parameters provides the authentication type information required for execution engine to connect to glue.  
-    * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-        * `glue.auth.role_arn`  
+  * This parameters provides the authentication type information required for execution engine to connect to glue.  
+  * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
+    * `glue.auth.role_arn`  
 * `glue.indexstore.opensearch.*` [Required]  
-    * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
-    * `glue.indexstore.opensearch.uri` [Required]  
-    * `glue.indexstore.opensearch.auth` [Required]  
-        * Accepted values include ["noauth", "basicauth", "awssigv4"]  
-        * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
-        * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region`  and `glue.auth.role_arn`  
-    * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
+  * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
+  * `glue.indexstore.opensearch.uri` [Required]  
+  * `glue.indexstore.opensearch.auth` [Required]  
+    * Accepted values include ["noauth", "basicauth", "awssigv4"]  
+    * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
+    * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region` and `glue.auth.role_arn`  
+  * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
docs/user/ppl/cmd/subquery.md (1)

196-196: Replace hard tab with spaces.

Line 196 contains a hard tab character before the closing }' which violates MD010. Replace with spaces.

   """
-	}'
+}'

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/search.md (1)</summary><blockquote>

`91-93`: **Fix list indentation for Field Type Performance Tips.**

Lines 91-93 show list items indented with 3 spaces instead of 0 (top-level), making them appear as nested items when they should be at the top level.

```diff
-**Field Type Performance Tips**:
-   * Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
-   * For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
+**Field Type Performance Tips:**
+* Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
+* For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
docs/user/ppl/admin/connectors/s3glue_connector.md (1)

19-30: Fix list indentation to comply with Markdown standards.

Lines 19-30 contain nested list items with inconsistent indentation. Top-level items should use 0 spaces, and nested items should use 2 additional spaces per level. Currently using 4 and 8 spaces respectively.

 Glue Connector Properties.
 * `resultIndex` is a new parameter specific to glue connector. Stores the results of queries executed on the data source. If unavailable, it defaults to .query_execution_result.  
 * `glue.auth.type` [Required]  
-    * This parameters provides the authentication type information required for execution engine to connect to glue.  
-    * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-        * `glue.auth.role_arn`  
+  * This parameters provides the authentication type information required for execution engine to connect to glue.  
+  * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
+    * `glue.auth.role_arn`  
 * `glue.indexstore.opensearch.*` [Required]  
-    * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
-    * `glue.indexstore.opensearch.uri` [Required]  
-    * `glue.indexstore.opensearch.auth` [Required]  
-        * Accepted values include ["noauth", "basicauth", "awssigv4"]  
-        * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
-        * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region`  and `glue.auth.role_arn`  
-    * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
+  * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
+  * `glue.indexstore.opensearch.uri` [Required]  
+  * `glue.indexstore.opensearch.auth` [Required]  
+    * Accepted values include ["noauth", "basicauth", "awssigv4"]  
+    * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
+    * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region` and `glue.auth.role_arn`  
+  * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
🧹 Nitpick comments (25)
docs/user/ppl/cmd/spath.md (2)

8-11: Consider standardizing parameter syntax formatting.

The syntax notation uses backslash escapes for angle brackets (\<field\>), which is unconventional in Markdown. Most Markdown documentation either wraps placeholders in backticks (`<field>`) or uses plain angle brackets depending on the flavor/style guide.

Verify this formatting approach is consistent with other command documentation in the same directory (e.g., other files in docs/user/ppl/cmd/).


18-18: Minor phrasing refinement.

The opening "The simplest spath is to extract a single field" is slightly awkward. Consider: "The simplest use case for spath is extracting a single field" or similar phrasing for better clarity.

docs/user/ppl/cmd/fillnull.md (2)

8-10: Inconsistent escaping of angle brackets in syntax declarations.

Lines 8–10 use escaped angle brackets (\<replacement\>, \<field\>), while lines 15–17 use unescaped angle brackets (<replacement>, <field>). Within Markdown code blocks or syntax examples, the unescaped version is more idiomatic. For consistency and clarity, align all syntax declarations to use unescaped angle brackets.

Apply this diff to unify the syntax notation:

-fillnull with \<replacement\> [in \<field-list\>]
-fillnull using \<field\> = \<replacement\> [, \<field\> = \<replacement\>]
-fillnull value=\<replacement\> [\<field-list\>]
+fillnull with <replacement> [in <field-list>]
+fillnull using <field> = <replacement> [, <field> = <replacement>]
+fillnull value=<replacement> [<field-list>]

Also applies to: 15-17


171-171: Incorrect code block language specifier and unsupported ignore flag.

Line 171 uses ```sql ignore but the content is pseudo-code/comments explaining a PPL fillnull command failure, not valid SQL. The language should be text or ppl, and the ignore flag is not standard Markdown.

Apply this diff:

-```sql ignore
+```text
docs/user/ppl/cmd/eval.md (1)

89-89: Minor: Remove trailing whitespace.

Line 89 has a trailing space after source=accounts .

-source=accounts 
+source=accounts
docs/user/ppl/cmd/rename.md (1)

9-10: Refine repetitive phrasing in parameter descriptions.

Lines 9–10 use similar phrasing ("name" and "rename to"). Consolidate for conciseness.

Apply this diff to improve clarity:

 * source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`.  
-* target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source.
+* target-field: mandatory. The new name for the field. Must have the same number of wildcards as the source.
docs/user/ppl/cmd/stats.md (1)

11-12: Align nested list indentation with standard Markdown formatting.

Lines 11–12 use 1-space indentation for nested list items, but proper Markdown requires 2-space indentation for consistent formatting. While this may render correctly in most viewers, it's worth aligning with the standard for maintainability:

 * bucket_nullable: optional. Controls whether the stats command includes null buckets in group-by aggregations. When set to `false`, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
-  * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
-  * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
+   * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+   * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
docs/user/ppl/cmd/join.md (1)

63-78: Add language tags to fenced code blocks.

Lines 63 and 82 contain fenced code blocks without language identifiers. These should be marked as ppl for proper syntax highlighting.

-```
+```ppl
 source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c

Apply the same fix to line 82.

Also applies to: 82-91

docs/user/ppl/admin/monitoring.md (1)

5-5: Fix grammar and hyphenation issues in introduction.

Line 5 has several minor issues:

  1. "able to collect" → "can collect" (clearer style)
  2. "node level statistics collecting" → "node-level statistics collection"
  3. "Cluster level statistics" → "Cluster-level statistics"
  4. "is implement" → "is implemented" (incomplete verb)
-By a stats endpoint, you are able to collect metrics for the plugin within the interval. Note that only node level statistics collecting is implement for now. In other words, you only get the metrics for the node you're accessing. Cluster level statistics have yet to be implemented.
+By a stats endpoint, you can collect metrics for the plugin within the interval. Note that only node-level statistics collection is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster-level statistics have yet to be implemented.
docs/user/ppl/cmd/replace.md (2)

8-8: Move syntax definition to a code block and remove backslash escaping.

The syntax line uses unnecessary backslash escaping (\<pattern\>) that doesn't render properly in Markdown. Angle brackets don't need escaping in code blocks.

-replace '\<pattern\>' WITH '\<replacement\>' [, '\<pattern\>' WITH '\<replacement\>']... IN \<field-name\>[, \<field-name\>]...
+```
+replace '<pattern>' WITH '<replacement>' [, '<pattern>' WITH '<replacement>']... IN <field-name>[, <field-name>]...
+```

111-111: Hyphenate the compound adjective "pattern-matching".

Line 111 uses "pattern matching" as a compound adjective before "needs" and should be hyphenated.

-Since replace command only supports plain string literals, you can use LIKE command with replace for pattern matching needs.
+Since replace command only supports plain string literals, you can use LIKE command with replace for pattern-matching needs.
docs/user/ppl/cmd/explain.md (1)

87-97: Add language tag to JSON code block.

Line 87 contains a fenced code block without a language identifier. Since it displays JSON output, it should be marked with the json language tag.

-```
+```json
 {
   "calcite": {
     "logical": """LogicalProject
docs/user/ppl/cmd/patterns.md (3)

32-43: Add language tag to configuration code block.

Line 32 contains a fenced code block without a language identifier. Since it displays JSON configuration, it should be marked with the json language tag (or bash if including the full PUT command).

-```
+```json
   PUT _cluster/settings
   {
     "persistent": {

Alternatively, if the curl/PUT command format is intended, use ```bash instead.


26-26: Hyphenate compound adjectives "low-frequency" and "longest-word".

Line 26 contains compound adjectives that should be hyphenated when preceding nouns.

-* `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3.
+* `frequency_threshold_percentage`: optional double. Brain's log pattern is selected based on longest-word combination. This sets the lower bound of frequency to ignore low-frequency words. **Default:** 0.3.

95-95: Hyphenate compound adjective "user-defined".

Line 95 uses "user defined" as a compound adjective and should be hyphenated.

-This example shows how to extract patterns from a raw log field using user defined patterns.
+This example shows how to extract patterns from a raw log field using user-defined patterns.
docs/user/ppl/admin/connectors/prometheus_connector.md (2)

16-18: Fix nested list indentation inconsistency.

Lines 16 and 18 have inconsistent indentation (8 spaces instead of 4 for nested items). Markdown lists should use 2 additional spaces per nesting level, not 4.

     * If prometheus.auth.type is basicauth, following are required parameters.  
-        * `prometheus.auth.username` and `prometheus.auth.password`.  
+      * `prometheus.auth.username` and `prometheus.auth.password`.  
     * If prometheus.auth.type is awssigv4, following are required parameters.  
-        * `prometheus.auth.region`, `prometheus.auth.access_key` and `prometheus.auth.secret_key`
+      * `prometheus.auth.region`, `prometheus.auth.access_key` and `prometheus.auth.secret_key`

102-102: Hyphenate the compound adjective "auto-determined".

Line 102 uses "auto determined" as a compound adjective and should be hyphenated for grammatical correctness.

-* In case of stats, resolution is auto determined from the time range set.
+* In case of stats, resolution is auto-determined from the time range set.
docs/user/ppl/admin/settings.md (2)

5-5: Simplify "able to" to "can".

Line 5 uses the phrase "are able to change" which can be simplified to "can change" for better readability.

-When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings are able to change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.
+When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings can change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.

209-209: Use hyphen for compound adjective "performance-sensitive".

Line 209 contains the compound adjective "performance sensitive" which should be hyphenated when modifying the noun "join types" that follows.

-Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
+Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance-sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
docs/user/ppl/admin/connectors/s3glue_connector.md (1)

16-16: Use "in the future" instead of "in future".

Line 16 uses British English phrasing "in future" which should be "in the future" for standard American English.

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.

Also note the capitalization issue: "we will" should be "We will" at the start of the sentence.

docs/user/ppl/admin/connectors/security_lake_connector.md (1)

17-17: Use "in the future" instead of "in future" and fix capitalization.

Line 17 uses British English phrasing "in future" and has a capitalization issue at the sentence start.

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.
docs/user/ppl/admin/datasources.md (3)

148-148: Use hyphen for compound adjective "24-character".

Line 148 contains the compound adjective "24 character" which should be hyphenated when modifying the noun "master key" that follows.

-* Sample python script to generate a 24 character master key
+* Sample python script to generate a 24-character master key

202-202: Replace wordy "prior to" with "before".

Line 202 uses the wordy phrase "prior to" which could be simplified to "before" for better readability.

-* In versions prior to 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.
+* In versions before 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.

226-226: Use "in the future" instead of "in future".

Line 226 uses British English phrasing "in future" which should be "in the future" for standard American English.

-In the current state, `information_schema` only support metadata of tables.
-This schema will be extended for views, columns and other metadata info in future.
+In the current state, `information_schema` only supports metadata of tables.
+This schema will be extended for views, columns and other metadata info in the future.

Also note: "only support" should be "only supports" to match the plural subject "information_schema".

docs/user/ppl/functions/aggregations.md (1)

27-27: Consider standardizing "Example" label formatting.

The "Example" labels (appearing at lines 27, 45, 68, 92, and throughout the file) use plain text rather than Markdown formatting like bold or subheadings. For consistency with documentation best practices and improved readability, consider using either **Example** or #### Example formatting. Verify if your docs style guide specifies a convention for example labels.

For example, change:

- Usage: SUM(expr). Returns the sum of expr.
- Example

To:

+ Usage: SUM(expr). Returns the sum of expr.
+
+ **Example**

Also applies to: 45-45, 68-68, 92-92, 117-117, 159-159, 200-200, 246-246, 269-269, 294-294, 318-318, 343-343, 365-365, 391-391, 413-413, 439-439, 465-465, 529-529, 556-556, 581-581, 610-610, 640-640

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95867e4 and 3a2370e.

📒 Files selected for processing (108)
  • DEVELOPER_GUIDE.rst (1 hunks)
  • docs/category.json (2 hunks)
  • docs/dev/ppl-commands.md (1 hunks)
  • docs/dev/testing-doctest.md (1 hunks)
  • docs/user/ppl/admin/connectors/prometheus_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/prometheus_connector.rst (0 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.rst (0 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.rst (0 hunks)
  • docs/user/ppl/admin/cross_cluster_search.md (1 hunks)
  • docs/user/ppl/admin/cross_cluster_search.rst (0 hunks)
  • docs/user/ppl/admin/datasources.md (1 hunks)
  • docs/user/ppl/admin/datasources.rst (0 hunks)
  • docs/user/ppl/admin/monitoring.md (1 hunks)
  • docs/user/ppl/admin/monitoring.rst (0 hunks)
  • docs/user/ppl/admin/security.md (1 hunks)
  • docs/user/ppl/admin/security.rst (0 hunks)
  • docs/user/ppl/admin/settings.md (1 hunks)
  • docs/user/ppl/admin/settings.rst (0 hunks)
  • docs/user/ppl/cmd/ad.md (1 hunks)
  • docs/user/ppl/cmd/ad.rst (0 hunks)
  • docs/user/ppl/cmd/append.md (1 hunks)
  • docs/user/ppl/cmd/append.rst (0 hunks)
  • docs/user/ppl/cmd/appendcol.md (1 hunks)
  • docs/user/ppl/cmd/appendcol.rst (0 hunks)
  • docs/user/ppl/cmd/appendpipe.md (1 hunks)
  • docs/user/ppl/cmd/appendpipe.rst (0 hunks)
  • docs/user/ppl/cmd/bin.md (1 hunks)
  • docs/user/ppl/cmd/bin.rst (0 hunks)
  • docs/user/ppl/cmd/chart.md (1 hunks)
  • docs/user/ppl/cmd/chart.rst (0 hunks)
  • docs/user/ppl/cmd/dedup.md (1 hunks)
  • docs/user/ppl/cmd/dedup.rst (0 hunks)
  • docs/user/ppl/cmd/describe.md (1 hunks)
  • docs/user/ppl/cmd/describe.rst (0 hunks)
  • docs/user/ppl/cmd/eval.md (1 hunks)
  • docs/user/ppl/cmd/eval.rst (0 hunks)
  • docs/user/ppl/cmd/eventstats.md (1 hunks)
  • docs/user/ppl/cmd/eventstats.rst (0 hunks)
  • docs/user/ppl/cmd/expand.md (1 hunks)
  • docs/user/ppl/cmd/expand.rst (0 hunks)
  • docs/user/ppl/cmd/explain.md (1 hunks)
  • docs/user/ppl/cmd/explain.rst (0 hunks)
  • docs/user/ppl/cmd/fields.md (1 hunks)
  • docs/user/ppl/cmd/fields.rst (0 hunks)
  • docs/user/ppl/cmd/fillnull.md (1 hunks)
  • docs/user/ppl/cmd/fillnull.rst (0 hunks)
  • docs/user/ppl/cmd/flatten.md (1 hunks)
  • docs/user/ppl/cmd/flatten.rst (0 hunks)
  • docs/user/ppl/cmd/grok.md (1 hunks)
  • docs/user/ppl/cmd/grok.rst (0 hunks)
  • docs/user/ppl/cmd/head.md (1 hunks)
  • docs/user/ppl/cmd/head.rst (0 hunks)
  • docs/user/ppl/cmd/join.md (1 hunks)
  • docs/user/ppl/cmd/join.rst (0 hunks)
  • docs/user/ppl/cmd/kmeans.md (1 hunks)
  • docs/user/ppl/cmd/kmeans.rst (0 hunks)
  • docs/user/ppl/cmd/lookup.md (1 hunks)
  • docs/user/ppl/cmd/lookup.rst (0 hunks)
  • docs/user/ppl/cmd/ml.md (1 hunks)
  • docs/user/ppl/cmd/ml.rst (0 hunks)
  • docs/user/ppl/cmd/multisearch.md (1 hunks)
  • docs/user/ppl/cmd/multisearch.rst (0 hunks)
  • docs/user/ppl/cmd/parse.md (1 hunks)
  • docs/user/ppl/cmd/parse.rst (0 hunks)
  • docs/user/ppl/cmd/patterns.md (1 hunks)
  • docs/user/ppl/cmd/patterns.rst (0 hunks)
  • docs/user/ppl/cmd/rare.md (1 hunks)
  • docs/user/ppl/cmd/rare.rst (0 hunks)
  • docs/user/ppl/cmd/regex.md (1 hunks)
  • docs/user/ppl/cmd/regex.rst (0 hunks)
  • docs/user/ppl/cmd/rename.md (1 hunks)
  • docs/user/ppl/cmd/rename.rst (0 hunks)
  • docs/user/ppl/cmd/replace.md (1 hunks)
  • docs/user/ppl/cmd/replace.rst (0 hunks)
  • docs/user/ppl/cmd/reverse.md (1 hunks)
  • docs/user/ppl/cmd/reverse.rst (0 hunks)
  • docs/user/ppl/cmd/rex.md (1 hunks)
  • docs/user/ppl/cmd/rex.rst (0 hunks)
  • docs/user/ppl/cmd/search.md (1 hunks)
  • docs/user/ppl/cmd/search.rst (0 hunks)
  • docs/user/ppl/cmd/showdatasources.md (1 hunks)
  • docs/user/ppl/cmd/showdatasources.rst (0 hunks)
  • docs/user/ppl/cmd/sort.md (1 hunks)
  • docs/user/ppl/cmd/sort.rst (0 hunks)
  • docs/user/ppl/cmd/spath.md (1 hunks)
  • docs/user/ppl/cmd/spath.rst (0 hunks)
  • docs/user/ppl/cmd/stats.md (1 hunks)
  • docs/user/ppl/cmd/stats.rst (0 hunks)
  • docs/user/ppl/cmd/streamstats.md (1 hunks)
  • docs/user/ppl/cmd/streamstats.rst (0 hunks)
  • docs/user/ppl/cmd/subquery.md (1 hunks)
  • docs/user/ppl/cmd/subquery.rst (0 hunks)
  • docs/user/ppl/cmd/syntax.md (1 hunks)
  • docs/user/ppl/cmd/syntax.rst (0 hunks)
  • docs/user/ppl/cmd/table.md (1 hunks)
  • docs/user/ppl/cmd/table.rst (0 hunks)
  • docs/user/ppl/cmd/timechart.md (1 hunks)
  • docs/user/ppl/cmd/timechart.rst (0 hunks)
  • docs/user/ppl/cmd/top.md (1 hunks)
  • docs/user/ppl/cmd/top.rst (0 hunks)
  • docs/user/ppl/cmd/trendline.md (1 hunks)
  • docs/user/ppl/cmd/trendline.rst (0 hunks)
  • docs/user/ppl/cmd/where.md (1 hunks)
  • docs/user/ppl/cmd/where.rst (0 hunks)
  • docs/user/ppl/functions/aggregations.md (1 hunks)
  • docs/user/ppl/functions/aggregations.rst (0 hunks)
💤 Files with no reviewable changes (52)
  • docs/user/ppl/cmd/kmeans.rst
  • docs/user/ppl/admin/security.rst
  • docs/user/ppl/cmd/head.rst
  • docs/user/ppl/cmd/fillnull.rst
  • docs/user/ppl/admin/connectors/security_lake_connector.rst
  • docs/user/ppl/cmd/eventstats.rst
  • docs/user/ppl/admin/datasources.rst
  • docs/user/ppl/cmd/search.rst
  • docs/user/ppl/cmd/bin.rst
  • docs/user/ppl/cmd/where.rst
  • docs/user/ppl/cmd/rename.rst
  • docs/user/ppl/cmd/multisearch.rst
  • docs/user/ppl/cmd/lookup.rst
  • docs/user/ppl/cmd/table.rst
  • docs/user/ppl/cmd/flatten.rst
  • docs/user/ppl/cmd/subquery.rst
  • docs/user/ppl/cmd/sort.rst
  • docs/user/ppl/cmd/expand.rst
  • docs/user/ppl/cmd/rex.rst
  • docs/user/ppl/cmd/regex.rst
  • docs/user/ppl/cmd/rare.rst
  • docs/user/ppl/cmd/replace.rst
  • docs/user/ppl/cmd/timechart.rst
  • docs/user/ppl/cmd/trendline.rst
  • docs/user/ppl/cmd/describe.rst
  • docs/user/ppl/cmd/showdatasources.rst
  • docs/user/ppl/cmd/dedup.rst
  • docs/user/ppl/cmd/append.rst
  • docs/user/ppl/cmd/patterns.rst
  • docs/user/ppl/cmd/grok.rst
  • docs/user/ppl/cmd/explain.rst
  • docs/user/ppl/cmd/streamstats.rst
  • docs/user/ppl/admin/settings.rst
  • docs/user/ppl/cmd/chart.rst
  • docs/user/ppl/cmd/ml.rst
  • docs/user/ppl/cmd/ad.rst
  • docs/user/ppl/admin/cross_cluster_search.rst
  • docs/user/ppl/cmd/eval.rst
  • docs/user/ppl/admin/connectors/prometheus_connector.rst
  • docs/user/ppl/cmd/stats.rst
  • docs/user/ppl/cmd/join.rst
  • docs/user/ppl/cmd/reverse.rst
  • docs/user/ppl/cmd/syntax.rst
  • docs/user/ppl/cmd/spath.rst
  • docs/user/ppl/cmd/parse.rst
  • docs/user/ppl/admin/connectors/s3glue_connector.rst
  • docs/user/ppl/cmd/top.rst
  • docs/user/ppl/cmd/fields.rst
  • docs/user/ppl/cmd/appendpipe.rst
  • docs/user/ppl/cmd/appendcol.rst
  • docs/user/ppl/functions/aggregations.rst
  • docs/user/ppl/admin/monitoring.rst
✅ Files skipped from review due to trivial changes (7)
  • docs/user/ppl/cmd/ml.md
  • docs/user/ppl/cmd/regex.md
  • docs/user/ppl/cmd/table.md
  • docs/user/ppl/cmd/appendcol.md
  • docs/user/ppl/cmd/parse.md
  • docs/user/ppl/cmd/sort.md
  • docs/user/ppl/cmd/ad.md
🚧 Files skipped from review as they are similar to previous changes (19)
  • docs/user/ppl/cmd/showdatasources.md
  • docs/user/ppl/cmd/expand.md
  • docs/user/ppl/cmd/kmeans.md
  • docs/user/ppl/cmd/syntax.md
  • docs/user/ppl/cmd/describe.md
  • docs/user/ppl/admin/cross_cluster_search.md
  • docs/user/ppl/cmd/multisearch.md
  • docs/user/ppl/cmd/where.md
  • docs/user/ppl/admin/security.md
  • docs/user/ppl/cmd/trendline.md
  • docs/user/ppl/cmd/top.md
  • docs/user/ppl/cmd/head.md
  • docs/user/ppl/cmd/dedup.md
  • docs/user/ppl/cmd/rare.md
  • docs/user/ppl/cmd/rex.md
  • docs/user/ppl/cmd/appendpipe.md
  • docs/dev/ppl-commands.md
  • docs/user/ppl/cmd/streamstats.md
  • DEVELOPER_GUIDE.rst
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/cmd/eval.md
🪛 LanguageTool
docs/user/ppl/cmd/timechart.md

[grammar] ~31-~31: Ensure spelling is correct
Context: ...ore distinct values than the limit. * usenull: optional. Controls whether NULL values ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/admin/datasources.md

[grammar] ~38-~38: Use a hyphen to join words.
Context: ... secure domains. * In case of security disabled domains, authorization is disba...

(QB_NEW_EN_HYPHEN)


[grammar] ~48-~48: Use a hyphen to join words.
Context: ...on and other details in case of security disabled domains. * Datasource Creation ...

(QB_NEW_EN_HYPHEN)


[grammar] ~148-~148: Use a hyphen to join words.
Context: ... * Sample python script to generate a 24 character master key ```bash import...

(QB_NEW_EN_HYPHEN)


[style] ~202-~202: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...tasource configuration * In versions prior to 2.7, the plugins.query.federation.datas...

(EN_WORDINESS_PREMIUM_PRIOR_TO)


[locale-violation] ~226-~226: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ... views, columns and other metadata info in future. ### Syntax source = datasource.info...

(IN_FUTURE)

docs/user/ppl/admin/settings.md

[style] ~5-~5: As a shorter alternative for ‘able to’, consider using “can”.
Context: ... cluster settings. Most of the settings are able to change dynamically so you can control t...

(BE_ABLE_TO)


[grammar] ~209-~209: Use a hyphen to join words.
Context: ...right, full, cross are performance sensitive join types which are disabled ...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/admin/connectors/s3glue_connector.md

[locale-violation] ~16-~16: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/dev/testing-doctest.md

[uncategorized] ~61-~61: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...SQL docs only. On Deprecation path. Use markdown for PPL) 1. If you want to add a new do...

(MARKDOWN_NNP)

docs/user/ppl/admin/connectors/prometheus_connector.md

[grammar] ~101-~101: Ensure spelling is correct
Context: ...ause, time range will be set to 1h with endtime set to now(). * In case of stats, res...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~102-~102: Use a hyphen to join words.
Context: ...ormal select queries, resolution is auto determined from the time range set. ...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/admin/connectors/security_lake_connector.md

[locale-violation] ~17-~17: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/user/ppl/cmd/replace.md

[style] ~10-~10: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... * replacement: mandatory. The text you want to replace with. * field-name: mandatory...

(REP_WANT_TO_VB)


[grammar] ~111-~111: Use a hyphen to join words.
Context: ...se LIKE command with replace for pattern matching needs. ```ppl source=account...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/admin/monitoring.md

[style] ~5-~5: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ntroduction By a stats endpoint, you are able to collect metrics for the plugin within t...

(BE_ABLE_TO)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...within the interval. Note that only node level statistics collecting is implement...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...s for the node you're accessing. Cluster level statistics have yet to be implemen...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/chart.md

[grammar] ~19-~19: Ensure spelling is correct
Context: ...are more categories than the limit. * usenull: optional. Controls whether to group eve...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/user/ppl/cmd/patterns.md

[grammar] ~26-~26: Use a hyphen to join words.
Context: ...e lower bound of frequency to ignore low frequency words. Default: 0.3. ...

(QB_NEW_EN_HYPHEN)


[grammar] ~95-~95: Use a hyphen to join words.
Context: ...patterns from a raw log field using user defined patterns. ```ppl source=apach...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/rename.md

[style] ~10-~10: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...* target-field: mandatory. The name you want to rename to. Must have same number of wil...

(REP_WANT_TO_VB)

docs/user/ppl/cmd/search.md

[uncategorized] ~24-~24: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...rs), ? (exactly one character) Full Text Search: Unlike other PPL commands, se...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~89-~89: Use a hyphen to join words.
Context: ... matching. For wildcard search use multi field with keyword: `search ip_address.k...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/spath.md

[grammar] ~18-~18: Ensure spelling is correct
Context: ...Simple Field Extraction The simplest spath is to extract a single field. This exam...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/admin/connectors/s3glue_connector.md

12-12: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/admin/connectors/prometheus_connector.md

12-12: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


147-147: Spaces inside code span elements

(MD038, no-space-in-code)


148-148: Bare URL used

(MD034, no-bare-urls)


159-159: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


191-191: Bare URL used

(MD034, no-bare-urls)


192-192: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


192-192: Bare URL used

(MD034, no-bare-urls)


201-201: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


224-224: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


262-262: Bare URL used

(MD034, no-bare-urls)


264-264: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/admin/connectors/security_lake_connector.md

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


23-23: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


28-28: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


29-29: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


30-30: Unordered list indentation
Expected: 4; Actual: 8

(MD007, ul-indent)


31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


33-33: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


63-63: Bare URL used

(MD034, no-bare-urls)

docs/user/ppl/cmd/eventstats.md

28-28: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


28-28: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


29-29: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


29-29: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

docs/user/ppl/cmd/explain.md

87-87: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/join.md

63-63: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/lookup.md

30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user/ppl/cmd/search.md

92-92: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


93-93: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


547-547: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


666-666: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


705-705: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/user/ppl/cmd/subquery.md

77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


92-92: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


196-196: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: WhiteSource Security Check
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (24)
docs/user/ppl/cmd/spath.md (2)

11-11: Verify the link to json_extract reference.

The cross-reference link points to ../functions/json.md#json_extract. Confirm that this file exists at the expected path and the anchor is valid in the Markdown-migrated version.


1-109: Solid documentation structure and content quality.

The documentation is well-organized with clear progression from basic to advanced scenarios. Examples are self-contained with expected outputs, and the performance note about pushdown limitations is valuable context. The migration to Markdown appears successful overall.

docs/user/ppl/cmd/fillnull.md (1)

12-12: Line 12 grammar may need further refinement.

The past review suggested a more comprehensive restructuring of this sentence, but the current version only partially implements the fix (adding "It" to the start). The clause "List of fields to apply the replacement to." still reads as a fragment. Consider adopting the originally suggested fix to fully restructure: "A comma-delimited or space-delimited list of fields to apply the replacement to. It can be comma-delimited..." for better flow.

docs/user/ppl/cmd/reverse.md (2)

1-13: Documentation structure and safety guidance are solid.

The command description is clear, syntax is straightforward, and the memory warning appropriately alerts users about resource consumption on large datasets—this is good safety communication for a command that materializes results.


14-133: All five examples are logically consistent and well-organized.

Each example follows a clear pattern (title, explanation, command, expected output) and demonstrates distinct use cases:

  • Example 1 (basic), Example 2 (with sort), Example 3 (with head), Example 4 (idempotent property), Example 5 (complex pipeline)
  • Example outputs correctly reflect expected behavior (e.g., Example 2 shows descending age order via reverse-after-sort; Example 4 demonstrates that double-reverse returns to original order)

Please confirm:

  1. Does the original reverse.rst file contain any doctest markers or executable examples that should be preserved/migrated to this Markdown version?
  2. Are these examples (lines 18–22, 42–47, 67–72, 90–95, 115–120) tested as part of the doctest framework mentioned in the PR objectives, or are they documentation-only examples?

If doctests exist in the original RST, they should be included here with appropriate doctest identifiers in code blocks. If they do not exist, this comment can be disregarded.

docs/user/ppl/cmd/eval.md (1)

12-132: Documentation structure and examples are well-organized and comprehensive.

The eval command documentation is clearly structured with a concise description, syntax specification, five progressively complex examples (each with expected output), and appropriate limitations. The examples cover key use cases: field creation, field overriding, field composition, string concatenation, and type casting.

Code blocks correctly use the ppl fence without shell prefixes or extraneous output, aligning with the PR objective to enable clean copy-paste. Documentation is ready for automated export to the main OpenSearch docs website.

docs/user/ppl/cmd/rename.md (1)

1-141: Comprehensive and well-structured documentation.

The documentation is clear, covers the command syntax and behavior thoroughly, includes five well-chosen examples demonstrating single-field, multi-field, wildcard, and edge-case scenarios, and appropriately notes limitations. The structure and content quality are strong.

docs/user/ppl/cmd/grok.md (3)

86-86: Verify the link to parse.md exists and anchor is correct.

Line 86 references ./parse.md#Limitations. Confirm that the parse.md file exists in the same directory and has a ## Limitations heading to match the anchor.


16-20: Doctest coverage—verify integration per PR objectives.

The PR summary states "all existing doctest coverage migrated" but this file contains example code blocks without doctest markers. Confirm whether:

  • Doctests are being auto-detected from ppl code blocks via the new markdown_parser.py
  • Manual doctest annotations are required here
  • Coverage is handled separately elsewhere

Also applies to: 40-44, 64-68


1-86: Documentation structure and clarity—well done.

The migration preserves clear structure: description, syntax, three progressively complex examples (field creation, field override, log parsing), and appropriate output tables. Examples are instructive and cover the command's key use cases effectively.

docs/user/ppl/cmd/chart.md (3)

1-200: Documentation is comprehensive and well-structured.

The chart command documentation is clear, detailed, and thorough. Parameter descriptions include defaults, constraints, and interdependencies; seven end-to-end examples cover basic to advanced usage with expected outputs; and the notes section captures important field-handling behavior. The migration from RST to Markdown is complete, and the Markdown formatting is consistent throughout.

Minor observation: The nested bullet hierarchy under the limit parameter (lines 9–14) is dense; consider whether additional line breaks between sub-bullets would improve scannability, though readability is acceptable as-is.


19-19: Static analysis hint is a false positive.

The LanguageTool hint flagging line 19 for spelling appears to be a false positive; "events" is spelled correctly. No action needed.


8-8: Backslash escaping in syntax line requires verification.

The syntax definition on line 8 uses backslash escapes (\<, \>) for angle brackets. Verify whether these are required by the documentation parser (e.g., markdown_parser.py) or if they should be removed. Check how other similar syntax definitions in the documentation handle angle brackets for consistency.

docs/user/ppl/cmd/timechart.md (1)

38-38: Verify relative path to linked stats documentation.

The Markdown link on line 38 references [stats](stats.md), assuming stats.md exists in the same directory (docs/user/ppl/cmd/). Confirm this relative path is correct based on your actual file structure.

docs/user/ppl/cmd/stats.md (1)

1-499: Excellent documentation structure and examples.

The file provides comprehensive coverage of the stats command with 18 well-organized examples, clear parameter descriptions, and detailed expected outputs. Example 17 is particularly helpful for illustrating null-bucket behavior with explicit data tables. Code blocks are consistently marked with the correct ppl and ppl ignore language identifiers for syntax highlighting.

Please verify this documentation satisfies the checklist in docs/dev/ppl-commands.md (referenced in your learnings) to ensure PPL command documentation standards are met.

docs/user/ppl/cmd/bin.md (1)

41-469: Verify Markdown doctest parser compatibility with the bin.md examples.

The PR's Markdown parser implementation should correctly parse the ppl code blocks and text-based output format throughout this documentation file. Confirm that markdown_parser.py (or equivalent) handles the code block format and expected output validation as specified in the PR objectives.

docs/category.json (1)

10-61: LGTM on structure.

The ppl_cli_calcite entries are comprehensive and properly formatted. The documented duplicate search.md entry from prior review appears to have been resolved.

docs/user/ppl/cmd/append.md (1)

1-63: Documentation is clear and complete.

The append command documentation is well-organized with clear syntax, proper examples, and expected outputs. Code blocks are properly tagged and formatting is consistent.

docs/user/ppl/admin/connectors/prometheus_connector.md (1)

1-325: Authentication examples and connector properties are clear and comprehensive.

The Prometheus connector documentation provides thorough coverage of configuration options, authentication methods, PPL integration, and PromQL support. Examples and expected outputs are well-documented. Bare URLs have been properly converted to Markdown link syntax and JSON syntax is valid.

docs/dev/testing-doctest.md (1)

61-61: Capitalize "Markdown" as proper noun.

Line 61 uses lowercase "markdown" which should be "Markdown" to follow standard capitalization for the proper noun referring to the markup language.

-#### RST Format (SQL docs only. On Deprecation path. Use markdown for PPL)
+#### RST Format (SQL docs only. On Deprecation path. Use Markdown for PPL)
docs/user/ppl/cmd/flatten.md (1)

1-93: Documentation is well-structured and properly formatted.

The flatten command documentation is comprehensive, with clear examples and proper Markdown formatting. The explanation of lexicographical ordering is helpful, and the limitations section effectively guides users.

docs/user/ppl/functions/aggregations.md (3)

1-653: Content structure and formatting look good.

The migration from RST to Markdown appears well-executed with consistent documentation structure across all 21 aggregation functions. Each function includes clear descriptions, practical examples with both query and output blocks, parameter documentation (for functions with parameters), and relevant notes. Code blocks properly use language identifiers (ppl for queries, text for output tables), and the NULL/MISSING handling table at the start provides useful reference information.

Once the relative and external links are verified, this documentation file should be ready for publication.


634-634: Verify the relative link path and target anchor.

The link at line 634 references ../admin/settings.md#plugins-ppl-values-max-limit. Confirm that:

  1. The relative path correctly resolves from docs/user/ppl/functions/aggregations.md to the intended settings documentation file
  2. The anchor #plugins-ppl-values-max-limit exists in the target file

292-292: No action needed — OpenSearch documentation link is current and accessible.

The link to the OpenSearch Cardinality Aggregation documentation at docs.opensearch.org/latest/aggregations/metric/cardinality/ is current and contains the referenced precision control information. The documentation confirms the precision_threshold parameter details and HyperLogLog++ algorithm accuracy.

The `bin` command groups numeric values into buckets of equal intervals, making it useful for creating histograms and analyzing data distribution. It takes a numeric or time-based field and generates a new field with values that represent the lower bound of each bucket.
## Syntax

bin \<field\> [span=\<interval\>] [minspan=\<interval\>] [bins=\<count\>] [aligntime=(earliest \| latest \| \<time-specifier\>)] [start=\<value\>] [end=\<value\>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove escaped angle brackets (RST syntax artifacts in Markdown).

Lines 8 and 27 contain backslash-escaped angle brackets (\<field\>, \<interval\>, \<time-specifier\>) which are reStructuredText escaping conventions. In Markdown, these render as literal backslashes. Either remove the escaping or reformat using backticks or bold/italic for clarity.

Apply this diff to remove RST escaping:

-bin \<field\> [span=\<interval\>] [minspan=\<interval\>] [bins=\<count\>] [aligntime=(earliest \| latest \| \<time-specifier\>)] [start=\<value\>] [end=\<value\>]
+bin <field> [span=<interval>] [minspan=<interval>] [bins=<count>] [aligntime=(earliest | latest | <time-specifier>)] [start=<value>] [end=<value>]

Also update line 27:

-  * \<time-specifier\>: Align bins to a specific epoch time value or time modifier expression
+  * <time-specifier>: Align bins to a specific epoch time value or time modifier expression

Also applies to: 27-27

🤖 Prompt for AI Agents
In docs/user/ppl/cmd/bin.md around lines 8 and 27, remove the reStructuredText
backslash-escaped angle brackets (e.g. "\<field\>", "\<interval\>",
"\<time-specifier\>") which render incorrectly in Markdown; replace them with
either plain angle-bracket placeholders (e.g. <field>) or, preferably, inline
code formatting using backticks (e.g. `<field>`, `<interval>`,
`<time-specifier>`) so the placeholders render correctly and without literal
backslashes. Ensure both occurrences on lines 8 and 27 are updated consistently.

For detailed documentation of each function, see [Aggregation Functions](../functions/aggregations.md).
## Usage

Eventstats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove orphaned text.

Line 62 contains just "Eventstats" standing alone between the "## Usage" header (line 60) and the code block (line 64). This appears to be stray text left from conversion; remove it:

 ## Usage  
 
-Eventstats
-  
 ```ppl ignore
🤖 Prompt for AI Agents
In docs/user/ppl/cmd/eventstats.md around line 62, there is an orphaned
standalone line containing only "Eventstats" between the "## Usage" header and
the following code block; remove that stray line so the header is immediately
followed by the code block (preserve surrounding blank lines/formatting as
needed).

Comment on lines +77 to +88
```
source = outer | where a in [ source = inner | fields b ]
source = outer | where (a) in [ source = inner | fields b ]
source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]
source = outer | where a not in [ source = inner | fields b ]
source = outer | where (a) not in [ source = inner | fields b ]
source = outer | where (a,b,c) not in [ source = inner | fields d,e,f ]
source = outer a in [ source = inner | fields b ] // search filtering with subquery
source = outer a not in [ source = inner | fields b ] // search filtering with subquery)
source = outer | where a in [ source = inner1 | where b not in [ source = inner2 | fields c ] | fields b ] // nested
source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source = inner | fields d ] | fields l.a, r.a, b, c //as join filter
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add language identifier to InSubquery code block.

The fenced code block at line 77 is missing a language specifier, which violates the MD040 linting rule. Add ppl as the language identifier for syntax highlighting.

 InSubquery:
   
-```
+```ppl
 source = outer | where a in [ source = inner | fields b ]
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/user/ppl/cmd/subquery.md around lines 77 to 88, the fenced code block
lacks a language specifier causing MD040 lint failures; update the opening fence
from ``` to ```ppl so the block is marked as PPL for syntax highlighting (ensure
you change the single opening fence that starts the block at line ~77).

Swiddis
Swiddis previously approved these changes Dec 9, 2025
@Swiddis Swiddis added the maintenance Improves code quality, but not the product label Dec 9, 2025
@ahkcs
Copy link
Contributor

ahkcs commented Dec 9, 2025

UT failure due to test.com is down
Fix: afcfbd9

Signed-off-by: Kyle Hounslow <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
docs/user/ppl/admin/connectors/s3glue_connector.md (1)

19-34: List indentation persists despite prior addressing—requires immediate fix.

The nested list items still use inconsistent indentation (4 spaces and 8 spaces where 2 and 4 spaces are expected). Markdown list items should be indented by 2 spaces per nesting level, not 4 and 8. This was flagged in the previous review as addressed in commit f9b0c04, but the issue remains in the current code.

Apply this diff to fix the list indentation:

 * `glue.auth.type` [Required]  
-    * This parameters provides the authentication type information required for execution engine to connect to glue.  
-    * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-        * `glue.auth.role_arn`  
+  * This parameters provides the authentication type information required for execution engine to connect to glue.  
+  * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
+    * `glue.auth.role_arn`  
 * `glue.indexstore.opensearch.*` [Required]  
-    * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
-    * `glue.indexstore.opensearch.uri` [Required]  
-    * `glue.indexstore.opensearch.auth` [Required]  
-        * Accepted values include ["noauth", "basicauth", "awssigv4"]  
-        * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
-        * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region`  and `glue.auth.role_arn`  
-    * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
+  * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
+  * `glue.indexstore.opensearch.uri` [Required]  
+  * `glue.indexstore.opensearch.auth` [Required]  
+    * Accepted values include ["noauth", "basicauth", "awssigv4"]  
+    * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
+    * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region` and `glue.auth.role_arn`  
+  * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
docs/user/ppl/admin/connectors/security_lake_connector.md (2)

21-33: List indentation persists despite prior addressing—requires immediate fix.

The nested list items still use inconsistent indentation (4 spaces and 8 spaces where 2 and 4 spaces are expected). Markdown list items should be indented by 2 spaces per nesting level. This was marked as addressed in previous commits but the issue persists.

Apply this diff to fix the list indentation:

 * `glue.auth.type` [Required]  
-    * This parameters provides the authentication type information required for execution engine to connect to glue.  
-    * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-        * `glue.auth.role_arn`  
+  * This parameters provides the authentication type information required for execution engine to connect to glue.  
+  * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
+    * `glue.auth.role_arn`  
 * `glue.indexstore.opensearch.*` [Required]  
-    * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
-    * `glue.indexstore.opensearch.uri` [Required]  
-    * `glue.indexstore.opensearch.auth` [Required]  
-        * Accepted values include ["noauth", "basicauth", "awssigv4"]  
-        * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
-        * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region`  and `glue.auth.role_arn`  
-    * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  
+  * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
+  * `glue.indexstore.opensearch.uri` [Required]  
+  * `glue.indexstore.opensearch.auth` [Required]  
+    * Accepted values include ["noauth", "basicauth", "awssigv4"]  
+    * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
+    * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region` and `glue.auth.role_arn`  
+  * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]  

63-63: Wrap bare URL in Markdown link syntax.

Line 63 contains a bare URL that was marked as addressed in previous commits but persists. The URL should be wrapped in Markdown link format for proper rendering.

-Documentation for Index Queries: https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md
+Documentation for Index Queries: [Index Queries Documentation](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md)
🧹 Nitpick comments (7)
README.md (1)

91-91: Minor: Remove redundant "Language" from "SQL Language Reference Manual".

The phrase "SQL Language Reference Manual" contains a tautology since SQL already stands for "Structured Query Language". Consider simplifying to "SQL Reference Manual" for conciseness.

-Please refer to the [SQL Language Reference Manual](./docs/user/index.rst), [Piped Processing Language (PPL) Reference Manual](./docs/user/ppl/index.md), [OpenSearch SQL/PPL Engine Development Manual](./docs/dev/index.md) and [Technical Documentation](https://opensearch.org/docs/latest/search-plugins/sql/index/) for detailed information on installing and configuring plugin.
+Please refer to the [SQL Reference Manual](./docs/user/index.rst), [Piped Processing Language (PPL) Reference Manual](./docs/user/ppl/index.md), [OpenSearch SQL/PPL Engine Development Manual](./docs/dev/index.md) and [Technical Documentation](https://opensearch.org/docs/latest/search-plugins/sql/index/) for detailed information on installing and configuring plugin.
docs/user/ppl/admin/connectors/s3glue_connector.md (2)

37-37: Use proper Markdown heading for section header.

Line 37 reads "Glue datasource configuration" which appears to be a section header but lacks the ## Markdown heading syntax. For consistency with lines 35, 69, and proper document structure, it should be a level-2 heading.

-Glue datasource configuration
+## Glue datasource configuration

71-71: Remove redundant text line.

Line 71 "Sample Queries" appears to be redundant since line 69 already has the section heading ## Sample s3Glue datasource queries APIS. The standalone text should be removed.

 ## Sample s3Glue datasource queries APIS  
 
-Sample Queries
 * Select Query : `select * from mys3.default.http_logs limit 1"`  
docs/user/ppl/admin/connectors/security_lake_connector.md (4)

37-37: Remove redundant text line.

Line 37 "Glue datasource configuration" appears to be redundant since line 35 already has the section heading ## Sample Glue dataSource configuration. The standalone text should be removed.

 ## Sample Glue dataSource configuration  
 
-Glue datasource configuration
 ```bash

56-56: Refine heading phrasing and fix acronym capitalization.

Line 56 has awkward phrasing ("datasource queries APIS") and should use proper acronym capitalization ("APIs" not "APIS"). Consider simplifying to "Sample Security Lake Queries" or "Example Queries".

-## Sample Security Lake datasource queries APIS  
+## Sample Security Lake Queries

58-58: Remove redundant text line.

Line 58 "Sample Queries" appears to be redundant since line 56 already has the section heading. The standalone text should be removed.

 ## Sample Security Lake datasource queries APIS  
 
-Sample Queries
 * Select Query : `select * from mysl.amazon_security_lake_glue_db_eu_west_1.amazon_security_lake_table_eu_west_1_vpc_flow_2_0 limit 1`  

62-62: Fix grammar/typo in sentence.

Line 62 has a possible grammar issue: "These queries would work only top of async queries" should likely be "These queries would work only on top of async queries" or "These queries would work only with async queries."

-These queries would work only top of async queries. Documentation: [Async Query APIs](../../../interfaces/asyncqueryinterface.rst)
+These queries would work only with async queries. Documentation: [Async Query APIs](../../../interfaces/asyncqueryinterface.rst)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a2370e and 4246f39.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • docs/dev/intro-v3-engine.md (1 hunks)
  • docs/user/ppl/admin/connectors/s3glue_connector.md (1 hunks)
  • docs/user/ppl/admin/connectors/security_lake_connector.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/dev/intro-v3-engine.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
🪛 LanguageTool
README.md

[style] ~91-~91: This phrase is redundant (‘L’ stands for ‘language’). Use simply “SQL”.
Context: ... ## Documentation Please refer to the [SQL Language Reference Manual](./docs/user/index.rst...

(ACRONYM_TAUTOLOGY)

docs/user/ppl/admin/connectors/s3glue_connector.md

[locale-violation] ~16-~16: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

docs/user/ppl/admin/connectors/security_lake_connector.md

[locale-violation] ~17-~17: The phrase ‘in future’ is British English. Did you mean: “in the future”?
Context: ...etadata store. we will add more support in future. Glue Connector Properties. * `resultIn...

(IN_FUTURE)

🔇 Additional comments (2)
README.md (1)

91-91: Documentation link correctly migrated from RST to Markdown.

The PPL reference has been properly updated from index.rst to index.md, aligning with the PR's migration objective. The link is consistent with the converted documentation structure.

docs/user/ppl/admin/connectors/s3glue_connector.md (1)

76-76: Wrap bare URL in Markdown link syntax.

Line 76 contains a bare URL without Markdown link formatting. The URL should be wrapped in link syntax for proper rendering.

-Documentation for Index Queries: https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md
+Documentation for Index Queries: [Index Queries Documentation](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md)

Likely an incorrect or invalid review comment.

@kylehounslow kylehounslow requested a review from Swiddis December 9, 2025 21:19
@Swiddis Swiddis merged commit 5f963a0 into opensearch-project:main Dec 9, 2025
37 checks passed
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
johtani added a commit to johtani/documentation-website that referenced this pull request Dec 17, 2025
PPL Plugin docs migrate from rst to md by 
opensearch-project/sql#4912

Signed-off-by: Jun Ohtani <[email protected]>
kolchfa-aws pushed a commit to opensearch-project/documentation-website that referenced this pull request Dec 17, 2025
PPL Plugin docs migrate from rst to md by 
opensearch-project/sql#4912

Signed-off-by: Jun Ohtani <[email protected]>
opensearch-trigger-bot bot pushed a commit to opensearch-project/documentation-website that referenced this pull request Dec 17, 2025
PPL Plugin docs migrate from rst to md by
opensearch-project/sql#4912

Signed-off-by: Jun Ohtani <[email protected]>
(cherry picked from commit c669d37)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Improves code quality, but not the product

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants