Skip to content

[Improve][Connector-V2][starrocks-source] Support to specify warehouse for starrocks sink#10294

Open
LiJie20190102 wants to merge 1 commit into
apache:devfrom
LiJie20190102:starrocks_warehouse1
Open

[Improve][Connector-V2][starrocks-source] Support to specify warehouse for starrocks sink#10294
LiJie20190102 wants to merge 1 commit into
apache:devfrom
LiJie20190102:starrocks_warehouse1

Conversation

@LiJie20190102
Copy link
Copy Markdown
Contributor

Support to specify warehouse for starrocks sink

Purpose of this pull request

Reference StarRocks/starrocks-connector-for-apache-flink#423

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@DanielLeens
Copy link
Copy Markdown
Contributor

Hi @LiJie20190102, thanks for the focused StarRocks improvement. I reviewed this PR locally on branch seatunnel-review-10294, with PR head 5d45ca6ec22b45300ec2b0077817b1b4e240e850 and merge base 0df747401d3ea38622ca84aa032e5bc16717d52c.

What This PR Solves

  • User pain point: in StarRocks multi-warehouse / shared-data deployments, Stream Load may need to be routed to a specific warehouse.
  • Fix approach: the PR adds a warehouse_name sink option, stores it in SinkConfig, and maps it to the Stream Load HTTP header warehouse.
  • One sentence: this lets SeaTunnel StarRocks Sink explicitly choose the warehouse used by Stream Load.

Simple example: if the StarRocks cluster has default_warehouse and etl_warehouse, users can configure warehouse_name = "etl_warehouse" and the Stream Load request will include warehouse: etl_warehouse.

1. Code Change Review

1.1 Core Logic Analysis

The runtime path is:

User configures StarRocks Sink
  -> warehouse_name = "etl_warehouse"

Config parsing
  -> SinkConfig.of(ReadonlyConfig)
      -> config.get(StarRocksSinkOptions.WAREHOUSE_NAME) [SinkConfig.java:103]
      -> sinkConfig.setWarehouseName(...)

Write path
  -> StarRocksSinkWriter flush
      -> StarRocksSinkManager
          -> StarRocksStreamLoadVisitor.doStreamLoad(flushData) [StarRocksStreamLoadVisitor.java:71-140]
              -> builds /api/{db}/{table}/_stream_load URL [L78-85]
              -> getStreamLoadHttpHeader(label) [L268-299]
                  -> copies starrocks.config stream-load props [L284-287]
                  -> adds label / format / Authorization [L289-296]
                  -> if warehouseName != null, adds headerMap["warehouse"] [L297-298]
              -> httpHelper.doHttpPut(loadUrl, payload, headerMap)

Before:

headerMap.put("format", sinkConfig.getLoadFormat().name().toUpperCase());
headerMap.put(
        "Authorization",
        getBasicAuthHeader(sinkConfig.getUsername(), sinkConfig.getPassword()));
return headerMap;

After:

headerMap.put(
        "Authorization",
        getBasicAuthHeader(sinkConfig.getUsername(), sinkConfig.getPassword()));
Optional.ofNullable(sinkConfig.getWarehouseName())
        .ifPresent(warehouse -> headerMap.put("warehouse", warehouse));
return headerMap;

Key findings:

  • The normal StarRocks Sink Stream Load path does hit this change.
  • The mapping from SeaTunnel config warehouse_name to StarRocks HTTP header warehouse matches StarRocks Stream Load warehouse usage.
  • No runtime logic blocker was found.
  • The current blocker is merge/doc hygiene: the docs have moved on current dev, while this PR still changes the old paths.

1.2 Compatibility Impact

Judgment: fully compatible.

No public Java API, default value, protocol, or serialization format changes. If warehouse_name is not configured, no warehouse header is sent and the historical behavior is preserved.

1.3 Performance / Side Effects

The impact is negligible: one optional string read and one HTTP header entry when configured. Retry, label idempotency, checkpoint behavior, resource release, and existing StarRocks flush logic are unchanged.

1.4 Error Handling And Logs

The PR does not add new error handling. If StarRocks rejects an invalid warehouse, the existing Stream Load response handling will throw StarRocksConnectorException.

Issue 1: The PR conflicts with the current dev docs migration and updates the old docs paths

  • Location: docs/en/connector-v2/sink/StarRocks.md:57
  • Description: local git merge-tree reports removed in local for docs/en/connector-v2/sink/StarRocks.md and docs/zh/connector-v2/sink/StarRocks.md. GitHub reports mergeable=CONFLICTING and mergeStateStatus=DIRTY. Current dev has moved the StarRocks docs to docs/en/connectors/sink/StarRocks.md and docs/zh/connectors/sink/StarRocks.md.
  • Risk: the PR cannot be merged as-is, and the new warehouse_name option would be missing from the current published docs path.
  • Suggested fix: rebase on current dev, move the doc edits to the new docs/en/connectors/sink/StarRocks.md and docs/zh/connectors/sink/StarRocks.md files, and remove the old-path changes.
  • Severity: High

Issue 2: The added docs rows contain trailing whitespace

  • Location: docs/en/connector-v2/sink/StarRocks.md:57
  • Description: local git diff --check upstream/dev...HEAD reports trailing whitespace in both English and Chinese StarRocks sink docs.
  • Risk: formatting checks stay red after rebase unless this is cleaned up.
  • Suggested fix: remove the trailing spaces after moving the docs to the new paths.
  • Severity: Low

2. Code Quality

The Java implementation is small and consistent with the existing option/config/header pattern. The test config adds warehouse_name = "default_warehouse", and the historical Build is green, but a focused unit test for header construction would make the behavior easier to protect in the future.

3. Architecture

This is a precise, maintainable change. It keeps the SeaTunnel-facing option readable while mapping to the StarRocks Stream Load header at the boundary where HTTP requests are built.

4. Issue Summary

# Issue Location Severity
1 Docs conflict with current dev and old docs paths are updated docs/en/connector-v2/sink/StarRocks.md:57 High
2 Added docs rows contain trailing whitespace docs/en/connector-v2/sink/StarRocks.md:57 Low

5. Merge Decision

Conclusion: can merge after fixes

Blocking item:

  • Issue 1: please rebase and move the documentation change to the current docs paths.

Recommended non-blocking item:

  • Issue 2: clean up trailing whitespace.

Overall, the code-level change looks correct. After the docs conflict is resolved and formatting is cleaned up, this should be mergeable. A small header-construction regression test would be a nice extra safeguard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants