StarRocks Roadmap 2025

> Refer to previous roadmap  [2024](https://github.com/StarRocks/starrocks/issues/39686) [2023](https://github.com/StarRocks/starrocks/issues/16445) [2022](https://github.com/StarRocks/starrocks/issues/1244)

# Execution Engine
- Query Stability
  - [ ] Query Plan Manager: Enhance the robustness of the query plan generator to minimize plan instability.
  - [ ] Data Skew Handling: Develop dynamic algorithms to detect and adjust for data skew, optimizing query execution.
  - [ ] Cache Resilience: Implement smarter caching mechanisms to reduce query jittering during CN changes.

- Performance Tuning
  - [ ] Operator Improvements: Introduce poller-free execution and runtime filter pushdown to the storage layer.
  - [ ] History-Based Optimizer: Leverage query feedback to refine optimization strategies.
  - [ ] ARM Performance Tuning: Resolve performance bottlenecks and edge cases for ARM architectures.

- Query Optimizer
  - [ ] Improve NDV (Number of Distinct Values) Accuracy: Enhance the precision of NDV statistical information.
  - [ ] Improve Multi-Column Statistics Accuracy: Optimize the accuracy of statistics for multi-column data.
  - [ ] Optimize Sampling Estimation Algorithms: Refine algorithms for estimating statistics through sampling.
  - [ ] Column Property Propagation Refactoring. 
 
- Batch Processing
  - [ ] Adaptive Concurrency: Dynamically adjust the number of concurrent tasks based on system load and resource availability.
  - [ ] Query Queue and Spill Stability: Improve stability and efficiency for large-scale batch processing on 1000+ core clusters.

- Materialized Views
  - [ ] Incremental MV Framework: Reduce full recomputation costs by enabling incremental updates for materialized views.

- Data Types
  - [ ] New Data Types: Support for advanced data types such as BigString, and Datetime/Timestamp with timezone, maybe Geo.

- Functions
  - [ ] Trino-Compatible Functions: Expand function compatibility with Trino (see [#40894](https://github.com/StarRocks/starrocks/issues/40894)).
  - [ ] Causal Inference: Introduce functions for causal analysis and inference.
  - [ ] Others

# LakeHouse
- Iceberg as a Fully Featured LakeHouse
  - [ ] Performant and Cost-Effective Query Engine: Enhance statistics collection, indexing, and materialized view support.
  - [ ] Iceberg V3 Spec Compliance: Support for Variant, deletion vectors, geo types, and auth specifications.
  - [ ] Full Operation Support: Enable DDL, DML, procedures, and seamless table migration.
  - [ ] Compaction and Layout Optimization: Introduce compaction services and automatic layout arrangement.

- Paimon as a fully Featured streaming lakehouse
  - [ ] Query: Metadata optimization, manifest cache, index, point lookup optimization
  - [ ] Full operation support: time-travel, management for tagging & branching, DDL, DML...
  - [ ] Paimon new features: varient type, view, materialized view, incremental MV

- Other Open Lake Formats
For other formats, we will prioritize query performance improvements:
  - [ ] Hudi: Enhance RLI (Record-Level Indexing), bloom filters on Parquet, and metadata table support.
  - [ ] Delta Lake: Implement optimizations as needed based on user demand.

# Shared Data
- Make shared-data as default architecture. Focus on stability and real-time/search capability improvement.
  - [ ] Batch data ingestion stability issues 
  - [ ] Cost reduction for both batch and streaming data ingestion 

- Enhanced Functionality
  - [ ] Time Travel and Snapshots: Improve support for time travel and snapshot functionality. https://github.com/StarRocks/starrocks/issues/53999
  - [ ] Merge Into: Enable efficient data merging operations.
  - [ ] Hybird search: improve the mixed vector/full text/scalar search capability 

- Real-Time Storage Engine
  - [ ] Data Freshness: Improve data freshness with readable memtables.
  - [ ] Compaction Optimization: Optimize compaction for time-series data.
  - [ ] Better Pipe: Expand the use of Pipe for continuous data ingestion.

- Multi-Statement Transactions
  - [ ] Enhance support for multi-statement transactions to support delete, update, and handle better transaction conflict.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StarRocks Roadmap 2025 #55526

Execution Engine

LakeHouse

Shared Data

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StarRocks Roadmap 2025 #55526

Description

Execution Engine

LakeHouse

Shared Data

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions