-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
type/enhancementMake an enhancement to StarRocksMake an enhancement to StarRocks
Description
Execution Engine
-
Query Stability
- Query Plan Manager: Enhance the robustness of the query plan generator to minimize plan instability.
- Data Skew Handling: Develop dynamic algorithms to detect and adjust for data skew, optimizing query execution.
- Cache Resilience: Implement smarter caching mechanisms to reduce query jittering during CN changes.
-
Performance Tuning
- Operator Improvements: Introduce poller-free execution and runtime filter pushdown to the storage layer.
- History-Based Optimizer: Leverage query feedback to refine optimization strategies.
- ARM Performance Tuning: Resolve performance bottlenecks and edge cases for ARM architectures.
-
Query Optimizer
- Improve NDV (Number of Distinct Values) Accuracy: Enhance the precision of NDV statistical information.
- Improve Multi-Column Statistics Accuracy: Optimize the accuracy of statistics for multi-column data.
- Optimize Sampling Estimation Algorithms: Refine algorithms for estimating statistics through sampling.
- Column Property Propagation Refactoring.
-
Batch Processing
- Adaptive Concurrency: Dynamically adjust the number of concurrent tasks based on system load and resource availability.
- Query Queue and Spill Stability: Improve stability and efficiency for large-scale batch processing on 1000+ core clusters.
-
Materialized Views
- Incremental MV Framework: Reduce full recomputation costs by enabling incremental updates for materialized views.
-
Data Types
- New Data Types: Support for advanced data types such as BigString, and Datetime/Timestamp with timezone, maybe Geo.
-
Functions
- Trino-Compatible Functions: Expand function compatibility with Trino (see #40894).
- Causal Inference: Introduce functions for causal analysis and inference.
- Others
LakeHouse
-
Iceberg as a Fully Featured LakeHouse
- Performant and Cost-Effective Query Engine: Enhance statistics collection, indexing, and materialized view support.
- Iceberg V3 Spec Compliance: Support for Variant, deletion vectors, geo types, and auth specifications.
- Full Operation Support: Enable DDL, DML, procedures, and seamless table migration.
- Compaction and Layout Optimization: Introduce compaction services and automatic layout arrangement.
-
Paimon as a fully Featured streaming lakehouse
- Query: Metadata optimization, manifest cache, index, point lookup optimization
- Full operation support: time-travel, management for tagging & branching, DDL, DML...
- Paimon new features: varient type, view, materialized view, incremental MV
-
Other Open Lake Formats
For other formats, we will prioritize query performance improvements:- Hudi: Enhance RLI (Record-Level Indexing), bloom filters on Parquet, and metadata table support.
- Delta Lake: Implement optimizations as needed based on user demand.
Shared Data
-
Make shared-data as default architecture. Focus on stability and real-time/search capability improvement.
- Batch data ingestion stability issues
- Cost reduction for both batch and streaming data ingestion
-
Enhanced Functionality
- Time Travel and Snapshots: Improve support for time travel and snapshot functionality. Snapshot for shared-data #53999
- Merge Into: Enable efficient data merging operations.
- Hybird search: improve the mixed vector/full text/scalar search capability
-
Real-Time Storage Engine
- Data Freshness: Improve data freshness with readable memtables.
- Compaction Optimization: Optimize compaction for time-series data.
- Better Pipe: Expand the use of Pipe for continuous data ingestion.
-
Multi-Statement Transactions
- Enhance support for multi-statement transactions to support delete, update, and handle better transaction conflict.
rohitrs1983, mrhamburg, LiShuMing, wangruin, kevinbds and 27 moreGustavo-Coutinho
Metadata
Metadata
Assignees
Labels
type/enhancementMake an enhancement to StarRocksMake an enhancement to StarRocks