Skip to content

Commit

Permalink
adjust docs
Browse files Browse the repository at this point in the history
  • Loading branch information
clintropolis committed Aug 18, 2023
1 parent f1e2ca2 commit 8b866e0
Show file tree
Hide file tree
Showing 11 changed files with 57 additions and 60 deletions.
2 changes: 1 addition & 1 deletion docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ Prior to version 0.13.0, Druid string columns treated `''` and `null` values as

|Property|Description|Default|
|---|---|---|
|`druid.generic.useDefaultValueForNull`|When set to `true`, `null` values will be stored as `''` for string columns and `0` for numeric columns. Set to `false` to store and query data in SQL compatible mode.|`true`|
|`druid.generic.useDefaultValueForNull`|Set to `false` to store and query data in SQL compatible mode. When set to `true` (legacy mode), `null` values will be stored as `''` for string columns and `0` for numeric columns.|`false`|
|`druid.generic.ignoreNullsForStringCardinality`|When set to `true`, `null` values will be ignored for the built-in cardinality aggregator over string columns. Set to `false` to include `null` values while estimating cardinality of only string columns using the built-in cardinality aggregator. This setting takes effect only when `druid.generic.useDefaultValueForNull` is set to `true` and is ignored in SQL compatibility mode. Additionally, empty strings (equivalent to null) are not counted when this is set to `true`. |`false`|
This mode does have a storage size and query performance cost, see [segment documentation](../design/segments.md#handling-null-values) for more details.

Expand Down
12 changes: 6 additions & 6 deletions docs/design/segments.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,16 +82,16 @@ For each row in the list of column data, there is only a single bitmap that has

## Handling null values

By default, Druid runs in a SQL compatible null handling mode, which allows Druid to create segments _at ingestion time_ in which the following occurs:
By default Druid stores segments in a SQL compatible null handling mode. String columns always store the null value as id 0, the first position in the value dictionary and an associated entry in the bitmap value indexes used to filter null values. Numeric columns also store a null value bitmap index to indicate the null valued rows, which is used to null check aggregations and for filter matching null values.

* String columns can distinguish `''` from `null`,
* Numeric columns can represent `null` valued rows instead of `0`.
Druid also has a legacy mode which uses default values instead of nulls, which was the default prior to Druid 28.0.0. This legacy mode can be enabled by setting `druid.generic.useDefaultValueForNull=true`.

Druid also has a legacy null handling mode which was the default prior to Druid 28.0.0. In this mode string dimension columns use the values `''` and `null` interchangeably. Numeric and metric columns cannot represent `null` but use nulls to mean `0`. You can enable this classic behavior at the system level through `druid.generic.useDefaultValueForNull` and setting to `true`.
In legacy mode, Druid segments created _at ingestion time_ have the following characteristics:

String dimension columns contain no additional column structures in SQL compatible null handling mode. Instead, they reserve an additional dictionary entry for the `null` value. Numeric columns are stored in the segment with an additional bitmap in which the set bits indicate `null`-valued rows.
* String columns can not distinguish `''` from `null`, they are treated interchangebly as the same value
* Numeric columns can not represent `null` valued rows, and instead store a `0`.

In addition to slightly increased segment sizes, SQL compatible null handling can incur a performance cost at query time, due to the need to check the null bitmap. This performance cost only occurs for columns that actually contain null values.
In legacy mode, numeric columns do not have the null value bitmap, and so can have slightly decreased segment sizes, and queries involving numeric columns can have slightly increased performance in some cases since there is no need to check the null value bitmap.

## Segments with different schemas

Expand Down
2 changes: 0 additions & 2 deletions docs/ingestion/schema-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,8 +263,6 @@ native boolean types, Druid ingests these values as strings if `druid.expression
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested
columns can be queried with the [JSON functions](../querying/sql-json-functions.md).

We also highly recommend setting `druid.generic.useDefaultValueForNull=false` (the default) when using these columns since it also enables out of the box `ARRAY` type filtering. If not set to `false`, setting `sqlUseBoundsAndSelectors` to `false` on the [SQL query context](../querying/sql-query-context.md) can enable `ARRAY` filtering instead.

Mixed type columns are stored in the _least_ restrictive type that can represent all values in the column. For example:

- Mixed numeric columns are `DOUBLE`
Expand Down
8 changes: 4 additions & 4 deletions docs/querying/math-expr.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ See javadoc of java.lang.Math for detailed explanation for each function.
|remainder|remainder(x, y) returns the remainder operation on two arguments as prescribed by the IEEE 754 standard|
|rint|rint(x) returns value that is closest in value to x and is equal to a mathematical integer|
|round|round(x, y) returns the value of the x rounded to the y decimal places. While x can be an integer or floating-point number, y must be an integer. The type of the return value is specified by that of x. y defaults to 0 if omitted. When y is negative, x is rounded on the left side of the y decimal points. If x is `NaN`, x returns 0. If x is infinity, x will be converted to the nearest finite double. |
|safe_divide|safe_divide(x,y) returns the division of x by y if y is not equal to 0. In case y is 0 it returns 0 or `null` if `druid.generic.useDefaultValueForNull=false` |
|safe_divide|safe_divide(x,y) returns the division of x by y if y is not equal to 0. In case y is 0 it returns `null` or 0 if `druid.generic.useDefaultValueForNull=true` (legacy mode) |
|scalb|scalb(d, sf) returns d * 2^sf rounded as if performed by a single correctly rounded floating-point multiply to a member of the double value set|
|signum|signum(x) returns the signum function of the argument x|
|sin|sin(x) returns the trigonometric sine of an angle x|
Expand All @@ -183,8 +183,8 @@ See javadoc of java.lang.Math for detailed explanation for each function.
| array_ordinal(arr,long) | returns the array element at the 1 based index supplied, or null for an out of range index |
| array_contains(arr,expr) | returns 1 if the array contains the element specified by expr, or contains all elements specified by expr if expr is an array, else 0 |
| array_overlap(arr1,arr2) | returns 1 if arr1 and arr2 have any elements in common, else 0 |
| array_offset_of(arr,expr) | returns the 0 based index of the first occurrence of expr in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false`if no matching elements exist in the array. |
| array_ordinal_of(arr,expr) | returns the 1 based index of the first occurrence of expr in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false` if no matching elements exist in the array. |
| array_offset_of(arr,expr) | returns the 0 based index of the first occurrence of expr in the array, or `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (legacy mode) if no matching elements exist in the array. |
| array_ordinal_of(arr,expr) | returns the 1 based index of the first occurrence of expr in the array, or `null` or `-1` if `druid.generic.useDefaultValueForNull=true` (legacy mode) if no matching elements exist in the array. |
| array_prepend(expr,arr) | adds expr to arr at the beginning, the resulting array type determined by the type of the array |
| array_append(arr,expr) | appends expr to arr, the resulting array type determined by the type of the first array |
| array_concat(arr1,arr2) | concatenates 2 arrays, the resulting array type determined by the type of the first array |
Expand Down Expand Up @@ -309,7 +309,7 @@ Supported features:
* other: `parse_long` is supported for numeric and string types

## Logical operator modes
Prior to the 0.23 release of Apache Druid, boolean function expressions have inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`, the default) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false.
Prior to the 0.23 release of Apache Druid, boolean function expressions have inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false.

After 0.23, while the inconsistent legacy behavior is still the default, it can be optionally be changed by setting `druid.expressions.useStrictBooleans=true`, so that these operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. Additionally,

Expand Down
Loading

0 comments on commit 8b866e0

Please sign in to comment.