Skip to content

Conversation

@penghuo
Copy link
Collaborator

@penghuo penghuo commented Jan 29, 2026

Description

Add validation to reject expand command on scalar types with a clear error message. This addresses issue #5065 where expand fails with a confusing "UNNEST argument must be a collection" error when used on OpenSearch multi-value fields.

Root Cause:
OpenSearch doesn't have an ARRAY type. Multi-value fields like [1, 2, 3] are stored as repeated scalar values (e.g., type long). When Calcite's uncollect operation is triggered during codegen, it expects ArraySqlType but receives BasicSqlType(BIGINT), causing a CalciteException.

Solution:
Validate the field type at planning time and fail fast with a descriptive error message explaining that expand only works on explicitly defined array types, not OpenSearch's implicit multi-value fields.

Impact:

  • Only affects expand command
  • No performance impact
  • Existing expand tests on true array types continue to work
  • Users get clear guidance instead of cryptic Calcite errors

Related Issues

Resolves #5065

Testing

  • Added integration test Issue5065IT that verifies the error message
  • All existing expand tests pass (CalciteExpandCommandIT)
  • Test command: ./gradlew :integ-test:integTest --tests "org.opensearch.sql.calcite.Issue5065IT"

Check List

  • New functionality includes testing
  • Commits are signed per the DCO using --signoff
  • Public documentation issue/PR created

Signed-off-by: Peng Huo <[email protected]>
…/index keywords

When a PPL query contains duplicate 'source' or 'index' keywords
(e.g., 'source source=index_name'), the parser was accepting it as
valid syntax, treating the first keyword as a search expression.
This caused confusing errors later when OpenSearch tried to expand
fields.

This fix adds validation in AstBuilder.visitSearchFrom() to detect
when reserved keywords 'source' or 'index' appear as search
expressions before the fromClause. It now throws a clear
SyntaxCheckException with a helpful error message suggesting the
correct syntax.

Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
…#5065)

- Validate field type in buildExpandRelNode before uncollect operation
- Throw UnsupportedOperationException with clear message for scalar types
- OpenSearch multi-value fields stored as scalars cannot be expanded when codegen triggered
- Add integration test to verify error message

Signed-off-by: Peng Huo <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Summary by CodeRabbit

Release Notes

Bug Fixes

  • Improved error handling for the expand operation in PPL queries. The system now properly validates array-typed fields before expansion and provides clearer error messages when attempting to expand incompatible field types.

Documentation

  • Added comprehensive guides and templates for PR review workflows, processes, and testing procedures.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Introduces comprehensive documentation for a multi-agent PR review system (PPL doctor and RCA-fix agents), adds a Python-based PR data collector tool with GitHub CLI integration, establishes standardized PR review workflows and checklists, and implements a runtime type validation in CalciteRelNodeVisitor to handle array-type fields correctly in expand operations.

Changes

Cohort / File(s) Summary
Agent Orchestration System
.kiro/agents/ppl-doctor.json, .kiro/agents/ppl-doctor.prompt.md, .kiro/agents/rca-fix-agent.json, .kiro/agents/rca-fix-agent.prompt.md, .kiro/agents/README.md
Configuration and prompt definitions for a two-tier agent system: PPL-doctor orchestrator that handles intake/reproduction/PR follow-up, delegating RCA and fixes to rca-fix-agent. Specifies tool permissions (GitHub, Slack MCP), resource access, model assignments (sonnet-4.5-1m, opus-4.5), and multi-stage workflow with validation gates.
PR Review Skills & Templates
.kiro/skills/opensearch-sql-pr-review/SKILL.md, .kiro/skills/opensearch-sql-pr-review/assets/templates/review-comment.md, .kiro/skills/opensearch-sql-pr-review/references/checklist-opensearch-sql.md, .kiro/skills/opensearch-sql-pr-review/references/review-patterns.md
Structured documentation for SQL PR reviews: skill definition with five-step workflow, domain-specific checklist (correctness, safety, integration, tests, docs), comment patterns, and review template.
PR Review Workflow Assets
.kiro/resources/pr-reviews/templates/session.md, .kiro/resources/pr-reviews/templates/test-plan.md, .kiro/resources/pr-reviews/templates/test-report.md
Markdown templates for PR review sessions, test plans (scenarios, metrics, exit criteria), and test reports with results summary and artifact logging.
PR Collector Tool
.kiro/scripts/pr_collector/pr_collector.py, .kiro/scripts/pr_collector/README.md
Python script that fetches GitHub PRs via gh CLI, aggregates PR metadata (reviews, comments, files), formats into markdown, and saves to .kiro/resources/{repo_name} directories. Includes error handling per-PR and support for date-range, state, and limit filtering.
Steering Documentation
.kiro/steering/opensearch-sql-pr-review.md, .kiro/steering/pr-collector.md
High-level workflow guides: opensearch-sql PR review process with triage, risk scan, integration testing, and deep review stages; pr-collector quick-start, usage examples, and troubleshooting.
Issue #5065 Bug Fix
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java, integ-test/src/test/java/org/opensearch/sql/calcite/Issue5065IT.java, integ-test/src/test/resources/correctness/bugfix/issue5065.txt, integ-test/src/test/resources/correctness/bugfix/issue5065_setup.txt
Adds runtime ARRAY-type validation in buildExpandRelNode to reject non-array field types with descriptive error; includes integration test covering expand on incorrectly mapped array field, setup resource, and query fixture.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant PPLDoctor as PPL Doctor<br/>(Orchestrator)
    participant GitHub as GitHub API
    participant RCAAgent as RCA-Fix Agent
    participant Slack as Slack Notification

    User->>PPLDoctor: Submit PPL issue intake
    PPLDoctor->>GitHub: Fetch issue/PR details
    GitHub-->>PPLDoctor: PR metadata, diff, comments
    PPLDoctor->>PPLDoctor: Intake validation & gate check
    
    alt Issue Reproducible
        PPLDoctor->>PPLDoctor: Run reproduction workflow
        PPLDoctor->>GitHub: Create/update test case
        PPLDoctor->>RCAAgent: Delegate RCA + fix request
        RCAAgent->>RCAAgent: Root cause analysis
        RCAAgent->>RCAAgent: Implement & verify fix
        RCAAgent-->>PPLDoctor: Return fix with artifacts
        PPLDoctor->>GitHub: Create/update PR with fix
        PPLDoctor->>Slack: Notify PR ready for review
    else Issue Not Reproducible
        PPLDoctor->>GitHub: Request clarification comment
        PPLDoctor->>Slack: Notify awaiting feedback
    end
    
    PPLDoctor-->>User: Final PR/review envelope
Loading
sequenceDiagram
    actor User
    participant CLI as PR Collector CLI
    participant GH as GitHub CLI
    participant Formatter as Markdown Formatter
    participant Disk as File System

    User->>CLI: pr_collector.py --repo owner/repo --start-date X --end-date Y
    CLI->>GH: gh pr list --repo --created-at range
    GH-->>CLI: PR list (numbers, metadata)
    
    loop For each PR
        CLI->>GH: gh pr view PR_NUM --json details
        GH-->>CLI: PR fields, URL, author, dates
        CLI->>GH: gh pr review-status PR_NUM
        GH-->>CLI: Reviews & review comments
        CLI->>GH: gh pr comments PR_NUM
        GH-->>CLI: Issue comments (filtered)
    end
    
    CLI->>CLI: Aggregate PR data list
    CLI->>Formatter: format_pr_markdown(pr_data)
    Formatter->>Formatter: Render headers, description, reviews
    Formatter->>Formatter: Filter out bot/CodeRabbit comments
    Formatter-->>CLI: Formatted markdown
    CLI->>Disk: save_pr_data(.kiro/resources/{repo_name})
    Disk-->>CLI: File saved + summary
    CLI-->>User: Output location & PR count
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Support spath with dynamic fields #5058 — Main PR's CalciteRelNodeVisitor validation for ARRAY types directly extends the PR's spath/dynamic-field handling changes to the same visitor.
  • Support nested aggregation when calcite enabled #4979 — Both PRs modify CalciteRelNodeVisitor; main PR adds array-type runtime validation in buildExpandRelNode while the related PR modifies aggregation hinting in the same class.
  • Feature/mvcombine #5025 — Both address multi-value/array field semantics in CalciteRelNodeVisitor; main adds array-type validation while the related PR handles mvcombine operations.

Suggested labels

bugFix, PPL, calcite

Suggested reviewers

  • anirudha
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • ykmr1224
  • LantaoJin
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@penghuo penghuo closed this Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Calcite PPL doesn't handle array value columns if codegen triggered

1 participant