Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/user/ppl/cmd/bin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ bin
:depth: 2


.. note::

Available since version 3.3


Description
============
| The ``bin`` command groups numeric values into buckets of equal intervals, making it useful for creating histograms and analyzing data distribution. It takes a numeric field and generates a new field with values that represent the lower bound of each bucket.
Expand Down
12 changes: 11 additions & 1 deletion docs/user/ppl/cmd/eventstats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,17 @@ DISTINCT_COUNT, DC(Since 3.3)
Description
>>>>>>>>>>>

Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values of expr using HyperLogLog++ algorithm. Both ``DISTINCT_COUNT`` and ``DC`` are equivalent and provide the same functionality.
Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent.

**Algorithm & Accuracy:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better link to OpenSearch doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link

Copy link
Collaborator

@dai-chen dai-chen Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant probably we don't need to copy these 2 sections from OS-core doc?

The HyperLogLog++ algorithm provides memory-efficient cardinality estimation, avoiding the high memory usage and latency of exact hash set calculations. Counts are typically accurate up to the precision threshold and within 6% of true values in most cases.

**Precision Control:**
Use the ``precision_threshold`` setting to balance memory usage and accuracy. Values below this threshold are highly accurate, while higher values may be less precise.

* Default: 3,000
* Maximum: 40,000


Example::

Expand Down
2 changes: 2 additions & 0 deletions docs/user/ppl/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ The query start with search command and then flowing a set of command delimited

- `appendcol command <cmd/appendcol.rst>`_

- `bin command <cmd/bin.rst>`_

- `dedup command <cmd/dedup.rst>`_

- `describe command <cmd/describe.rst>`_
Expand Down
Loading