Skip to content

Conversation

@dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Jun 11, 2025

Description

This PR implements Option A: Modular Publishing from the Granularity section of the design. It introduces the ability to publish internal modules (e.g., core, ppl, opensearch) as separate Maven artifacts. This enables downstream consumers such as Spark to depend only on the components they require.

Current naming proposal (subject to feedback):

  • Group ID: org.opensearch.query — places all artifacts under a single query/ folder in the Maven repository.
  • Artifact ID: unified-query-<module>, e.g., unified-query-core, unified-query-ppl.
  • Versioning: Currently follows the OpenSearch version (e.g., 3.1.0.0-SNAPSHOT), but may be decoupled in the future — similar to async-query-core, which uses its own versioning (e.g., 1.0.0).

Local Publishing Test

Example after running the local publish command:

$ ./gradlew publishUnifiedQueryPublicationToMavenLocal

$ pwd
~/.m2/repository/org/opensearch
$ tree .
.
├── plugin  <-- no impact on plugin publishing
│   └── opensearch-sql-plugin
│       └── 3.1.0.0-SNAPSHOT
└── query
    ├── unified-query-common
    │   └── 3.1.0.0-SNAPSHOT
    ├── unified-query-core
    │   └── 3.1.0.0-SNAPSHOT
    ├── unified-query-opensearch
    │   └── 3.1.0.0-SNAPSHOT
    ├── unified-query-ppl
    │   └── 3.1.0.0-SNAPSHOT
    ├── unified-query-protocol
    │   └── 3.1.0.0-SNAPSHOT
    └── unified-query-sql
        └── 3.1.0.0-SNAPSHOT

Spark Integration Test

Verified dependency resolution and query execution using ongoing Calcite 2.19 PR code and local Maven artifacts.

# Checkout Calcite backport draft PR #3752 and patch this PR changes
$ git fetch upstream pull/3752/head:pr-3752
$ git checkout pr-3752

# Build by JDK 11 and publish 2.19 artifacts locally
$ java -version
openjdk version "11.0.25" 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-11.0.25.9.1 (build 11.0.25+9-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.25.9.1 (build 11.0.25+9-LTS, mixed mode)
$ ./gradlew clean publishUnifiedQueryPublicationToMavenLocal
.
├── plugin
│   └── opensearch-sql-plugin
└── query
    ├── unified-query-common
    │   └── 2.19.3.0-SNAPSHOT
    ├── unified-query-core
    │   └── 2.19.3.0-SNAPSHOT
    ├── unified-query-opensearch
    │   └── 2.19.3.0-SNAPSHOT
    ├── unified-query-ppl
    │   └── 2.19.3.0-SNAPSHOT
    ├── unified-query-protocol
    │   └── 2.19.3.0-SNAPSHOT
    └── unified-query-sql

# Add dependency to Spark: https://github.com/dai-chen/opensearch-spark/tree/verify-unified-ppl-dependency
    resolvers ++= Seq(
      "Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository",
      "OpenSearch Snapshots" at "https://aws.oss.sonatype.org/content/repositories/snapshots/",
      "JitPack" at "https://jitpack.io"
    ),
    libraryDependencies ++= Seq(
      "com.amazonaws" % "aws-java-sdk" % "1.12.397" % "provided"
        exclude ("com.fasterxml.jackson.core", "jackson-databind"),
      ...
      "org.opensearch.query" % "unified-query-ppl" % "2.19.3.0-SNAPSHOT"
        exclude("org.opensearch.query", "unified-query-protocol")
        exclude("org.opensearch.query", "unified-query-opensearch")),

# Run IT with new FlintSparkPPLCalciteParser registered to Spark extension
25/06/11 19:40:10 INFO FlintSparkPPLCalciteParser: 
 PPL => SparkSQL
   PPL query: source = spark_catalog.default.flint_ppl_test | eval f = GET_FORMAT(DATE, 'USA') | fields f
   SQL query: SELECT `GET_FORMAT`('DATE', 'USA') `f`
FROM `spark_catalog`.`default`.`flint_ppl_test`

Related Issues

Resolves part 1 in #3734.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Jun 11, 2025
@dai-chen dai-chen added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jun 11, 2025
@LantaoJin
Copy link
Member

Why not change the Group ID to org.opensearch.unified-query?

@Swiddis
Copy link
Collaborator

Swiddis commented Jun 12, 2025

Why not change the Group ID to org.opensearch.unified-query?

I like being able to find the repo corresponding to a given Sonatype package as a 1-1 mapping, sql hasn't been strictly sql for a while so I think it's fine to stay consistent

@dai-chen
Copy link
Collaborator Author

dai-chen commented Jun 13, 2025

@LantaoJin @Swiddis I decided to go with query to align with existing usage in our plugin. Thanks!

@peterzhuamazon Just checking if any concerns or procedures we should follow to publish multiple JARs from the SQL plugin to snapshot repository (https://aws.oss.sonatype.org/content/repositories/snapshots/)? The motivation for this can be found in the related issue above.

@dai-chen dai-chen merged commit 4f3cbc5 into opensearch-project:feature/unified-ppl Jun 16, 2025
20 checks passed
@dai-chen dai-chen deleted the publish-internal-modules-separately branch June 16, 2025 17:26
dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Jun 24, 2025
…project#3763)

* Add common Gradle task for all published modules

Signed-off-by: Chen Dai <[email protected]>

* Add publish workflow file

Signed-off-by: Chen Dai <[email protected]>

* Rename group ID from sql to query

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants