Skip to content

StarRocks Roadmap 2025 #55526

@Dshadowzh

Description

@Dshadowzh

Refer to previous roadmap 2024 2023 2022

Execution Engine

  • Query Stability

    • Query Plan Manager: Enhance the robustness of the query plan generator to minimize plan instability.
    • Data Skew Handling: Develop dynamic algorithms to detect and adjust for data skew, optimizing query execution.
    • Cache Resilience: Implement smarter caching mechanisms to reduce query jittering during CN changes.
  • Performance Tuning

    • Operator Improvements: Introduce poller-free execution and runtime filter pushdown to the storage layer.
    • History-Based Optimizer: Leverage query feedback to refine optimization strategies.
    • ARM Performance Tuning: Resolve performance bottlenecks and edge cases for ARM architectures.
  • Query Optimizer

    • Improve NDV (Number of Distinct Values) Accuracy: Enhance the precision of NDV statistical information.
    • Improve Multi-Column Statistics Accuracy: Optimize the accuracy of statistics for multi-column data.
    • Optimize Sampling Estimation Algorithms: Refine algorithms for estimating statistics through sampling.
    • Column Property Propagation Refactoring.
  • Batch Processing

    • Adaptive Concurrency: Dynamically adjust the number of concurrent tasks based on system load and resource availability.
    • Query Queue and Spill Stability: Improve stability and efficiency for large-scale batch processing on 1000+ core clusters.
  • Materialized Views

    • Incremental MV Framework: Reduce full recomputation costs by enabling incremental updates for materialized views.
  • Data Types

    • New Data Types: Support for advanced data types such as BigString, and Datetime/Timestamp with timezone, maybe Geo.
  • Functions

    • Trino-Compatible Functions: Expand function compatibility with Trino (see #40894).
    • Causal Inference: Introduce functions for causal analysis and inference.
    • Others

LakeHouse

  • Iceberg as a Fully Featured LakeHouse

    • Performant and Cost-Effective Query Engine: Enhance statistics collection, indexing, and materialized view support.
    • Iceberg V3 Spec Compliance: Support for Variant, deletion vectors, geo types, and auth specifications.
    • Full Operation Support: Enable DDL, DML, procedures, and seamless table migration.
    • Compaction and Layout Optimization: Introduce compaction services and automatic layout arrangement.
  • Paimon as a fully Featured streaming lakehouse

    • Query: Metadata optimization, manifest cache, index, point lookup optimization
    • Full operation support: time-travel, management for tagging & branching, DDL, DML...
    • Paimon new features: varient type, view, materialized view, incremental MV
  • Other Open Lake Formats
    For other formats, we will prioritize query performance improvements:

    • Hudi: Enhance RLI (Record-Level Indexing), bloom filters on Parquet, and metadata table support.
    • Delta Lake: Implement optimizations as needed based on user demand.

Shared Data

  • Make shared-data as default architecture. Focus on stability and real-time/search capability improvement.

    • Batch data ingestion stability issues
    • Cost reduction for both batch and streaming data ingestion
  • Enhanced Functionality

    • Time Travel and Snapshots: Improve support for time travel and snapshot functionality. Snapshot for shared-data #53999
    • Merge Into: Enable efficient data merging operations.
    • Hybird search: improve the mixed vector/full text/scalar search capability
  • Real-Time Storage Engine

    • Data Freshness: Improve data freshness with readable memtables.
    • Compaction Optimization: Optimize compaction for time-series data.
    • Better Pipe: Expand the use of Pipe for continuous data ingestion.
  • Multi-Statement Transactions

    • Enhance support for multi-statement transactions to support delete, update, and handle better transaction conflict.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/enhancementMake an enhancement to StarRocks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions