Skip to content

Releases: apache/druid

Druid 32.0.0

13 Feb 08:00
Compare
Choose a tag to compare

Apache Druid 32.0.0 contains over 220 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 52 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the incompatible changes before you upgrade to Druid 32.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features

This section contains important information about new and existing features.

# New Overlord APIs

APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service:

  • Mark all segments of a datasource as unused:
    POST /druid/indexer/v1/datasources/{dataSourceName}

  • Mark all (non-overshadowed) segments of a datasource as used:
    DELETE /druid/indexer/v1/datasources/{dataSourceName}

  • Mark multiple segments as used
    POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed

  • Mark multiple (non-overshadowed) segments as unused
    POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused

  • Mark a single segment as used:
    POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}

  • Mark a single segment as unused:
    DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}

As part of this change, the corresponding Coordinator APIs have been deprecated and will be removed in a future release:

  • POST /druid/coordinator/v1/datasources/{dataSourceName}
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/markUsed
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/markUnused
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}
  • DELETE /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}
  • DELETE /druid/coordinator/v1/datasources/{dataSourceName}

The Coordinator now calls the Overlord to serve these requests.

#17545

# Realtime query processing for multi-value strings

Realtime query processing no longer considers all strings as multi-value strings during expression processing, fixing a number of bugs and unexpected failures. This should also improve realtime query performance of expressions on string columns.

This change impacts topN queries for realtime segments where rows of data are implicitly null, such as from a property missing from a JSON object.

Before this change, these were handled as [] instead of null, leading to inconsistency between processing realtime segments and published segments. When processing segments, the value was treated as [], which topN ignores. After publishing, the value became null, which topN does not ignore. The same query could have different results before and after being persisted

After this change, the topN engine now treats [] as null when processing realtime segments, which is consistent with published segments.

This change doesn't impact actual multi-value string columns, regardless of if they're realtime.

#17386

# Join hints in MSQ task engine queries

Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.

#17541

# Changes and deprecations

# ANSI-SQL compatibility and query results

Support for the configs that let you maintain older behavior that wasn't ANSI-SQL compliant have been removed:

  • druid.generic.useDefaultValueForNull=true
  • druid.expressions.useStrictBooleans=false
  • druid.generic.useThreeValueLogicForNativeFilters=false

They no longer affect your query results. Only SQL-compliant non-legacy behavior is supported now.

If the configs are set to the legacy behavior, Druid services will fail to start.

If you want to continue to get the same results without these settings, you must update your queries or your results will be incorrect after you upgrade.

For more information about how to update your queries, see the migration guide.

#17568 #17609

# Java support

Java support in Druid has been updated:

  • Java 8 support has been removed
  • Java 11 support is deprecated

We recommend that you upgrade to Java 17.

#17466

# Hadoop-based ingestion

Hadoop-based ingestion is now deprecated. We recommend that you migrate to SQL-based ingestion.

# Join hints in MSQ task engine queries

Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.

select /*+ sort_merge */ w1.cityName, w2.countryName
from
(
  select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName
) w1
JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName
where w1.cityName='New York';

(#17406)

# Functional area and related changes

This section contains detailed release notes separated by areas.

# Web console

# Explore view (experimental)

Several improvements have been made to the Explore view in the web console.

#17627

# Segment timeline view

The segment timeline is now more interactive and no longer forces day granularity.

#17521

# Other web conso...

Read more

Druid 31.0.1

25 Dec 05:42
Compare
Choose a tag to compare

Apache Druid 31.0.1 is a patch release that contains important fixes for topN queries using query granularity other than 'ALL' and for the new complex metric column compression feature introduced in Druid 31.0.0. It also contains fixes for the web console, the new projections feature, and a fix for a minor performance regression.

See the complete set of changes for 31.0.1 for additional details.

For information about new features in Druid 31, see the Druid 31 release notes.

#Bug fixes

  • Fixes an issue with topN queries that use a query granularity other than 'ALL', which could cause some query correctness issues #17565
  • Fixes an issue with complex metric compression that caused some data to be read incorrectly, resulting in segment data corruption or system instability due to out-of-memory exceptions. We recommend that you reingest data if you use compression for complex metric columns #17422
  • Fixes an issue with projection segment merging #17460
  • Fixes web console progress indicator #17334
  • Fixes a minor performance regression with query processing #17397

# Credits

@clintropolis
@findingrish
@gianm
@techdocsmith
@vogievetsky

Druid 31.0.0

22 Oct 18:15
Compare
Choose a tag to compare

Apache Druid 31.0.0 contains over 589 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 64 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 31.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features, changes, and deprecations

This section contains important information about new and existing features.

# Compaction features

Druid now supports the following features:

  • Compaction scheduler with greater flexibility and control over when and what to compact.
  • MSQ task engine-based auto-compaction for more performant compaction jobs.

For more information, see Compaction supervisors.

#16291 #16768

Additionally, compaction tasks that take advantage of concurrent append and replace is now generally available as part of concurrent append and replace becoming GA.

# Window functions are GA

Window functions are now generally available in Druid's native engine and in the MSQ task engine.

  • You no longer need to use the query context enableWindowing to use window functions. #17087

# Concurrent append and replace GA

Concurrent append and replace is now GA. The feature safely replaces the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this feature is appending new data (such as with streaming ingestion) to an interval while compaction of that interval is already in progress.

# Delta Lake improvements

The community extension for Delta Lake has been improved to support complex types and snapshot versions.

# Iceberg improvements

The community extension for Iceberg has been improved. For more information, see Iceberg improvements

# Projections (experimental)

Druid 31.0.0 includes experimental support for new feature called projections. Projections are grouped pre-aggregates of a segment that are automatically used at query time to optimize execution for any queries which 'fit' the shape of the projection by reducing both computation and i/o cost by reducing the number of rows which need to be processed. Projections are contained within segments of a datasource and do increase the segment size. But they can share data, such as value dictionaries of dictionary encoded columns, with the columns of the base segment.

Projections currently only support JSON-based ingestion, but they can be used by queries that use the MSQ task engine or the new Dart engine. Future development will allow projections to be created as part of SQL-based ingestion.

We have a lot of plans to continue to improve this feature in the coming releases, but are excited to get it out there so users can begin experimentation since projections can dramatically improve query performance.

For more information, see Projections.

# Low latency high complexity queries using Dart (experimental)

Distributed Asynchronous Runtime Topology (Dart) is designed to support high complexity queries, such as large joins, high cardinality group by, subqueries and common table expressions, commonly found in ad-hoc, data warehouse workloads. Instead of using data warehouse engines like Spark or Presto to execute high-complexity queries, you can use Dart, alleviating the need for additional infrastructure.

For more information, see Dart.

#17140

# Storage improvements

Druid 31.0.0 includes several improvements to how data is stored by Druid, including compressed columns and flexible segment sorting. For more information, see Storage improvements.

# Upgrade-related changes

See the Upgrade notes for more information about the following upgrade-related changes:

# Deprecations

# Java 8 support

Java 8 support is now deprecated and will be removed in 32.0.0.

# Other deprecations

# Functional areas and related changes

This section contains detailed release notes separated by areas.

# Web console

# Improvements to the stages display

A number of improvements have been made to the query stages visualization
new_stages
These changes include:

  • Added a graph visualization to illustrate the flow of query stages #17135
  • Added a column for CPU counters in the query stages detail view when they are present. Also added tool tips to expose potentially hidden data like CPU time #17132

# Dart

Added the ability to detect the presence of the Dart engine and to run Dart queries from the console as well as to see currently running Dart queries.
dart.png

#17147

<a name="31.0.0-functional-areas-and-related-changes-web-console-copy-query-results-as-sql" href="#31.0.0-functional-areas-and-relat...

Read more

druid-30.0.1

17 Sep 17:10
Compare
Choose a tag to compare

The Apache Druid team is proud to announce the release of Apache Druid 30.0.1.
Druid is a high performance analytics data store for event-driven data.

Apache Druid 30.0.1 contains security fixes for CVE-2024-45384, CVE-2024-45537.
The release also contains minor doc and task monitor fixes.

Source and binary distributions can be downloaded from:
https://druid.apache.org/downloads.html

Full Changelog: druid-30.0.0...druid-30.0.1

A big thank you to all the contributors in this milestone release!

Druid 30.0.0

17 Jun 03:03
Compare
Choose a tag to compare

Apache Druid 30.0.0 contains over 407 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 50 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 30.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Upcoming removals

As part of the continued improvements to Druid, we are deprecating certain features and behaviors in favor of newer iterations that offer more robust features and are more aligned with standard ANSI SQL. Many of these new features have been the default for new deployments for several releases.

The following features are deprecated, and we currently plan to remove support in Druid 32.0.0:

  • Non-SQL compliant null handling: By default, Druid now differentiates between an empty string and a record with no data as well as between an empty numerical record and 0. For more information, see NULL values. For a tutorial on the SQL-compliant logic, see the Null handling tutorial.
  • Non-strict Boolean handling: Druid now strictly uses 1 (true) or 0 (false). Previously, true and false could be represented either as true and false or as 1 and 0, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL. For more information, see Boolean logic. For examples of filters that use the SQL-compliant logic, see Query filters.
  • Two-value logic: By default, Druid now uses three-valued logic for both ingestion and querying. This primarily affects filters using logical NOT operations on columns with NULL values. For more information, see Boolean logic. For examples of filters that use the SQL-compliant logic, see Query filters.

# Important features, changes, and deprecations

This section contains important information about new and existing features.

# Concurrent append and replace improvements

Streaming ingestion supervisors now support concurrent append, that is streaming tasks can run concurrently with a replace task (compaction or re-indexing) if it also happens to be using concurrent locks. Set the context parameter useConcurrentLocks to true to enable concurrent append.

Once you update the supervisor to have "useConcurrentLocks": true, the transition to concurrent append happens seamlessly without causing any ingestion lag or task failures.

#16369

Druid now performs active cleanup of stale pending segments by tracking the set of tasks using such pending segments.
This allows concurrent append and replace to upgrade only a minimal set of pending segments and thus improve performance and eliminate errors.
Additionally, it helps in reducing load on the metadata store.

#16144

# Grouping on complex columns

Druid now supports grouping on complex columns and nested arrays.
This means that both native queries and the MSQ task engine can group on complex columns and nested arrays while returning results.

Additionally, the MSQ task engine can roll up and sort on the supported complex columns, such as JSON columns, during ingestion.

#16068
#16322

# Removed ZooKeeper-based segment loading

ZooKeeper-based segment loading is being removed due to known issues.
It has been deprecated for several releases.
Recent improvements to the Druid Coordinator have significantly enhanced performance with HTTP-based segment loading.

#15705

# Improved groupBy queries

Before Druid pushes realtime segments to deep storage, the segments consist of spill files.
Segment metrics such as query/segment/time now report on each spill file for a realtime segment, rather than for the entire segment.
This change eliminates the need to materialize results on the heap, which improves the performance of groupBy queries.

#15757

# Improved AND filter performance

Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters.
Known as filter partitioning, it can result in dramatic performance increases, depending on the order of filters in the query.

For example, take a query like SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'. Previously, Druid used indexes when processing filters if they are available.
That's not always ideal; imagine if stringColumn1 = '1000' matches 100 rows. With indexes, we have to find every value of stringColumn2 LIKE '%1%' that is true to compute the indexes for the filter. If stringColumn2 has more than 100 values, it ends up being worse than simply checking for a match in those 100 remaining rows.

With the new logic, Druid now checks the selectivity of indexes as it processes each clause of the AND filter.
If it determines it would take more work to compute the index than to match the remaining rows, Druid skips computing the index.

The order you write filters in a WHERE clause of a query can improve the performance of your query.
More improvements are coming, but you can try out the existing improvements by reordering a query.
Put indexes that are less intensive to compute such as IS NULL, =, and comparisons (>, >=, <, and <=) near the start of AND filters so that Druid more efficiently processes your queries.
Not ordering your filters in this way won’t degrade performance from previous releases since the fallback behavior is what Druid did previously.

#15838

# Centralized datasource schema (alpha)

You can now configure Druid to manage datasource schema centrally on the Coordinator.
Previously, Brokers...

Read more

druid-29.0.1

03 Apr 05:02
Compare
Choose a tag to compare

Druid 29.0.1

Apache Druid 29.0.1 is a patch release that fixes some issues in the Druid 29.0.0 release.

Bug fixes

  • Added type verification for INSERT and REPLACE to validate that strings and string arrays aren't mixed #15920
  • Concurrent replace now allows pending Peon segments to be upgraded using the Supervisor #15995
  • Changed the targetDataSource attribute to return a string containing the name of the datasource. This reverts the breaking change introduced in Druid 29.0.0 for INSERT and REPLACE MSQ queries #16004 #16031
  • Decreased the size of the distribution Docker image #15968
  • Fixed an issue with SQL-based ingestion where string inputs, such as from CSV, TSV, or string-value fields in JSON, are ingested as null values when they are typed as LONG or BIGINT #15999
  • Fixed an issue where a web console-generated Kafka supervisor spec has flattenSpec in the wrong location #15946
  • Fixed an issue with filters on expression virtual column indexes incorrectly considering values null in some cases for expressions which translate null values into not null values #15959
  • Fixed an issue where the data loader crashes if the incoming data can't be parsed #15983
  • Improved DOUBLE type detection in the web console #15998
  • Web console-generated queries now only set the context parameter arrayIngestMode to array when you explicitly opt in to use arrays #15927
  • The web console now displays the results of an MSQ query that writes to an external destination through the EXTERN function #15969

Incompatible changes

Changes to targetDataSource in EXPLAIN queries

Druid 29.0.1 includes a breaking change that restores the behavior for targetDataSource to its 28.0.0 and earlier state, different from Druid 29.0.0 and only 29.0.0. In 29.0.0, targetDataSource returns a JSON object that includes the datasource name. In all other versions, targetDataSource returns a string containing the name of the datasource.

If you're upgrading from any version other than 29.0.0, there is no change in behavior.

If you are upgrading from 29.0.0, this is an incompatible change.

#16004

Dependency updates

  • Updated PostgreSQL JDBC Driver version to 42.7.2 #15931

Credits

@abhishekagarwal87
@adarshsanjeev
@AmatyaAvadhanula
@clintropolis
@cryptoe
@dependabot[bot]
@ektravel
@gargvishesh
@gianm
@kgyrtkirk
@LakshSingla
@somu-imply
@techdocsmith
@vogievetsky

Druid 29.0.0

21 Feb 05:51
Compare
Choose a tag to compare

Apache Druid 29.0.0 contains over 350 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 67 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes before you upgrade to Druid 29.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.

# Important features, changes, and deprecations

This section contains important information about new and existing features.

# MSQ export statements (experimental)

Druid 29.0.0 adds experimental support for export statements to the MSQ task engine. This allows query tasks to write data to an external destination through the EXTERN function.

#15689

# SQL PIVOT and UNPIVOT (experimental)

Druid 29.0.0 adds experimental support for the SQL PIVOT and UNPIVOT operators.

The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:

PIVOT (aggregation_function(column_to_aggregate)
  FOR column_with_values_to_pivot
  IN (pivoted_column1 [, pivoted_column2 ...])
)

The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:

UNPIVOT (values_column 
  FOR names_column
  IN (unpivoted_column1 [, unpivoted_column2 ... ])
)

# Range support in window functions (experimental)

Window functions (experimental) now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter windowingStrictValidation to false.

The following example shows a window expression with RANGE frame specifications:

(ORDER BY c)
(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)

#15703 #15746

# Improved INNER joins

Druid now supports arbitrary join conditions for INNER join. Any sub-conditions that can't be evaluated as part of the join are converted to a post-join filter. Improved join capabilities allow Druid to more effectively support applications like Tableau.

#15302

# Improved concurrent append and replace (experimental)

You no longer have to manually determine the task lock type for concurrent append and replace (experimental) with the taskLockType task context. Instead, Druid can now determine it automatically for you. You can use the context parameter "useConcurrentLocks": true for individual tasks and datasources or enable concurrent append and replace at a cluster level using druid.indexer.task.default.context.

#15684

# First and last aggregators for double, float, and long data types

Druid now supports first and last aggregators for the double, float, and long types in native and MSQ ingestion spec and MSQ queries. Previously, they were only supported for native queries. For more information, see First and last aggregators.

#14462

Additionally, the following functions can now return numeric values:

  • EARLIEST and EARLIEST_BY
  • LATEST and LATEST_BY

You can use these functions as aggregators at ingestion time.

#15607

# Support for logging audit events

Added support for logging audit events and improved coverage of audited REST API endpoints.
To enable logging audit events, set config druid.audit.manager.type to log in both the Coordinator and Overlord or in common.runtime.properties. When you set druid.audit.manager.type to sql, audit events are persisted to metadata store.

In both cases, Druid audits the following events:

  • Coordinator
    • Update load rules
    • Update lookups
    • Update coordinator dynamic config
    • Update auto-compaction config
  • Overlord
    • Submit a task
    • Create/update a supervisor
    • Update worker config
  • Basic security extension
    • Create user
    • Delete user
    • Update user credentials
    • Create role
    • Delete role
    • Assign role to user
    • Set role permissions

#15480 #15653

Also fixed an issue with the basic auth integration test by not persisting logs to the database.

#15561

# Enabled empty ingest queries

The MSQ task engine now allows empty ingest queries by default. Previously, ingest queries that produced no data would fail with the InsertCannotBeEmpty MSQ fault.
For more information, see Empty ingest queries in the upgrade notes.

#15674 #15495

In the web console, you can use a toggle to control whether an ingestion fails if the ingestion query produces no data.

#15627

# MSQ support for Google Cloud Storage

The MSQ task engine now supports Google Cloud Storage (GCS). You can use durable storage with GCS. See Durable storage configurations for more information.

#15398

# Experimental extensions

Druid 29.0.0 adds the following extensions.

# DDSketch

A new DDSketch extension is available as a community contribution. The DDSketch extension (druid-ddsketch) provides support for approximate quantile queries using the DDSketch library.

#15049

# Spectator histogram

A new histogram extension is available as a community contribution. The Spectator-based histogram extension (druid-spectator-histogram) provides approximate histogram aggregators and percentile post-aggregators based on Spectator fixed-bucket histograms.

#15340

# Delta Lake

A new Delta Lake extension is available as a community contribution. The Delta Lake extension...

Read more

Druid 28.0.1

21 Dec 11:24
Compare
Choose a tag to compare

Description

Apache Druid 28.0.1 is a patch release that fixes some issues in the 28.0.0 release. See the complete set of changes for additional details.

# Notable Bug fixes

  • #15405 To make the start-druid script more robust
  • #15402 Fixes the query caching bug for groupBy queries with multiple post-aggregation metrics
  • #15430 Fixes the failure of tasks during an upgrade due to the addition of new task action RetrieveSegmentsToReplaceAction which would not be available on the overlord at the time of rolling upgrade
  • #15500 Bug fix with NullFilter which is commonly utilised with the newly default SQL compatible mode.

# Credits

Thanks to everyone who contributed to this release!

@cryptoe
@gianm
@kgyrtkirk
@LakshSingla
@vogievetsky

Druid 28.0.0

15 Nov 10:19
Compare
Choose a tag to compare

Apache Druid 28.0.0 contains over 420 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 57 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 28.0.0.

# Important features, changes, and deprecations

In Druid 28.0.0, we have made substantial improvements to querying to make the system more ANSI SQL compatible. This includes changes in handling NULL and boolean values as well as boolean logic. At the same time, the Apache Calcite library has been upgraded to the latest version. While we have documented known query behavior changes, please read the upgrade notes section carefully. Test your application before rolling out to broad production scenarios while closely monitoring the query status.

# SQL compatibility

Druid continues to make SQL query execution more consistent with how standard SQL behaves. However, there are feature flags available to restore the old behavior if needed.

# Three-valued logic

Druid native filters now observe SQL three-valued logic (true, false, or unknown) instead of Druid's classic two-state logic by default, when the following default settings apply:

  • druid.generic.useThreeValueLogicForNativeFilters = true
  • druid.expressions.useStrictBooleans = true
  • druid.generic.useDefaultValueForNull = false

#15058

# Strict booleans

druid.expressions.useStrictBooleans is now enabled by default.
Druid now handles booleans strictly using 1 (true) or 0 (false).
Previously, true and false could be represented either as true and false as well as 1 and 0, respectively.
In addition, Druid now returns a null value for Boolean comparisons like True && NULL.

If you don't explicitly configure this property in runtime.properties, clusters now use LONG types for any ingested boolean values and in the output of boolean functions for transformations and query time operations.
For more information, see SQL compatibility in the upgrade notes.

#14734

# NULL handling

druid.generic.useDefaultValueForNull is now disabled by default.
Druid now differentiates between empty records and null records.
Previously, Druid might treat empty records as empty or null.
For more information, see SQL compatibility in the upgrade notes.

#14792

# SQL planner improvements

Druid uses Apache Calcite for SQL planning and optimization. Starting in Druid 28.0.0, the Calcite version has been upgraded from 1.21 to 1.35. This upgrade brings in many bug fixes in SQL planning from Calcite.

# Dynamic parameters

As part of the Calcite upgrade, the behavior of type inference for dynamic parameters has changed. To avoid any type interference issues, explicitly CAST all dynamic parameters as a specific data type in SQL queries. For example, use:

SELECT (1 * CAST (? as DOUBLE))/2 as tmp

Do not use:

SELECT (1 * ?)/2 as tmp

# Async query and query from deep storage

Query from deep storage is no longer an experimental feature. When you query from deep storage, more data is available for queries without having to scale your Historical services to accommodate more data. To benefit from the space saving that query from deep storage offers, configure your load rules to unload data from your Historical services.

# Support for multiple result formats

Query from deep storage now supports multiple result formats.
Previously, the /druid/v2/sql/statements/ endpoint only supported results in the object format. Now, results can be written in any format specified in the resultFormat parameter.
For more information on result parameters supported by the Druid SQL API, see Responses.

#14571

# Broadened access for queries from deep storage

Users with the STATE permission can interact with status APIs for queries from deep storage. Previously, only the user who submitted the query could use those APIs. This enables the web console to monitor the running status of the queries. Users with the STATE permission can access the query results.

#14944

# MSQ queries for realtime tasks

The MSQ task engine can now include real time segments in query results. To do this, use the includeSegmentSource context parameter and set it to REALTIME.

#15024

# MSQ support for UNION ALL queries

You can now use the MSQ task engine to run UNION ALL queries with UnionDataSource.

#14981

# Ingest from multiple Kafka topics to a single datasource

You can now ingest streaming data from multiple Kafka topics to a datasource using a single supervisor.
You configure the topics for the supervisor spec using a regex pattern as the value for topicPattern in the IO config. If you add new topics to Kafka that match the regex, Druid automatically starts ingesting from those new topics.

If you enable multi-topic ingestion for a datasource, downgrading will cause the Supervisor to fail.
For more information, see Stop supervisors that ingest from multiple Kafka topics before downgrading.

#14424
#14865

# SQL UNNEST and ingestion flattening

The UNNEST function is no longer experimental.

Druid now supports UNNEST in SQL-based batch ingestion and query from deep storage, so you can flatten arrays easily. For more information, see UNNEST and Unnest arrays within a column.

You no longer need to include the context parameter enableUnnest: true to use UNNEST.

#14886

# Recommended syntax for SQL UNNEST

The recommended syntax for SQL UNNEST has changed. We recommend using CROSS JOIN instead of commas for most queries to prevent issues with precedence. For example, use:

SELECT column_alias_name1 FROM datasource CROSS JOIN UNNEST(source_ex...
Read more

Druid 27.0.0

11 Aug 09:28
Compare
Choose a tag to compare

Apache Druid 27.0.0 contains over 316 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 50 contributors.

See the complete set of changes for additional details, including bug fixes.

Review the upgrade notes and incompatible changes before you upgrade to Druid 27.0.0.

# Highlights

# New Explore view in the web console (experimental)

The Explore view is a simple, stateless, SQL backed, data exploration view to the web console. It lets users explore data in Druid with point-and-click interaction and visualizations (instead of writing SQL and looking at a table). This can provide faster time-to-value for a user new to Druid and can allow a Druid veteran to quickly chart some data that they care about.

image-4

The Explore view is accessible from the More (...) menu in the header:

image-5

#14540

# Query from deep storage (experimental)

Druid now supports querying segments that are stored only in deep storage. When you query from deep storage, you can query larger data available for queries without necessarily having to scale your Historical processes to accommodate more data. To take advantage of the potential storage savings, make sure you configure your load rules to not load all your segments onto Historical processes.

Note that at least one segment of a datasource must be loaded onto a Historical process so that the Broker can plan the query. It can be any segment though.

For more information, see the following:

#14416 #14512 #14527

# Schema auto-discovery and array column types

Type-aware schema auto-discovery is now generally available. Druid can determine the schema for the data you ingest rather than you having to manually define the schema.

As part of the type-aware schema discovery improvements, array column types are now generally available. Druid can determine the column types for your schema and assign them to these array column types when you ingest data using type-aware schema auto-discovery with the auto column type.

For more information about this feature, see the following:

# Smart segment loading

The Coordinator is now much more stable and user-friendly. In the new smartSegmentLoading mode, it dynamically computes values for several configs which maximize performance.

The Coordinator can now prioritize load of more recent segments and segments that are completely unavailable over load of segments that already have some replicas loaded in the cluster. It can also re-evaluate decisions taken in previous runs and cancel operations that are not needed anymore. Moreoever, move operations started by segment balancing do not compete with the load of unavailable segments thus reducing the reaction time for changes in the cluster and speeding up segment assignment decisions.

Additionally, leadership changes have less impact now, and the Coordinator doesn't get stuck even if re-election happens while a Coordinator run is in progress.

Lastly, the cost balancer strategy performs much better now and is capable of moving more segments in a single Coordinator run. These improvements were made by borrowing ideas from the cachingCost strategy. We recommend using cost instead of cachingCost since cachingCost is now deprecated.

For more information, see the following:

#13197 #14385 #14484

# New query filters

Druid now supports the following filters:

  • Equality: Use in place of the selector filter. It never matches null values.
  • Null: Match null values. Use in place of the selector filter.
  • Range: Filter on ranges of dimension values. Use in place of the bound filter. It never matches null values

Note that Druid's SQL planner uses these new filters in place of their older counterparts by default whenever druid.generic.useDefaultValueForNull=false or if sqlUseBoundAndSelectors is set to false on the SQL query context.

You can use these filters for filtering equality and ranges on ARRAY columns instead of only strings with the previous selector and bound filters.

For more information, see Query filters.

#14542

# Guardrail for subquery results

Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in the Broker's config or maxSubqueryRows in the query context. This guardrail is recommended over row-based limiting.

This feature is experimental for now and defaults back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.

#13952

# Added a new OSHI system monitor

Added a new OSHI system monitor (OshiSysMonitor) to replace SysMonitor. The new monitor has a wider support for different machine architectures including ARM instances. We recommend switching to the new monitor. SysMonitor is now deprecated and will be removed in future releases.

#14359

# Java 17 support

Druid now fully supports Java 17.

#14384

# Hadoop 2 deprecated

Support for Hadoop 2 is now deprecated. It will be removed in a future release.

For more information, see the upgrade notes.

# Additional features and improvements

# SQL-based ingestion

# Improved query planning behavior

Druid now fails query planning if a CLUSTERED BY column contains descending order.
Previously, queries would successfully plan if any CLUSTERED BY columns contained descending order.

The MSQ fault, InsertCannotOrderByDescending, is deprecated. An INSERT or REPLACE query containing a CLUSTERED BY expression cannot be in descending order. Druid's segment generation code only supports ascending order. Instead of the fault, Druid now throws a query ValidationException.

#14436 #14370

# Improved segment sizes

The default clusterStatisticsMergeMode is now `S...

Read more