Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SDK 3 FTS to further detail facets in results #99

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 157 additions & 18 deletions rfc/0052-sdk3-full-text-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -615,18 +615,154 @@ interface SearchRowLocation {

### SearchFacetResult

An individual facet result has both metadata and details, as each facet can define ranges into which results are categorized.
The `facets` part of the result, a sibling to `hits`, contains information relative to the facets the user asked for. If no `facets` section was provided in the query, this section is omitted.

Note that the whole `facets` section **is a single JSON object**. The `name` of each facet, which is defined at query time by the user, is represented in the attribute names.

An individual facet result has both metadata and details, as each facet can define ranges into which results are categorized. The JSON representation of a single facet's results looks like this:

```
SearchFacetResult {
"field": string
"total": long
"missing": long
"other": long
"terms": [ {TermRange}, ... ] <1>
"numeric_ranges": [ {NumericRange}, ...] <1>
"date_ranges": [ {DateRange}, ... ] <1>
}
```
(1) all mutually exclusive (`terms`, `numeric_ranges` and `date_ranges` cannot be combined).
dnault marked this conversation as resolved.
Show resolved Hide resolved

See the following JSON example:

```
"facets": {
"category": { <1>
"field": "style",
...
"terms": [ ... ]
},
"strength": { <2>
"field": "abv",
...
"numeric_ranges": [ ... ]
},
"updateRange": { <3>
"field": "updated",
...
"date_ranges": [ ... ]
}
}
```
We omitted part of the section, but this shows that we have 3 facets: a _term_ facet **(1)** named "category", a _numeric_ facet **(2)** named "strength" and a _date_ facet **(3)** named "updateRange".

The section could be represented by a `Map` (or equivalent, eg. associative array).

```
Map<String, SearchFacetResult> facets();
```

It could also be represented by any other adequate `Facets` class or structure, as long as the following operations are possible:

- iteration of all the facets
- retrieval of a facet by name

In code, individual facet results could be represented by an interface and three concrete implementations, one for each facet type:

```
interface SearchFacetResult {
String Name();
String Field();
uint64 Total();
uint64 Missing();
uint64 Other();
String name();
String field();
long total();
long missing();
long other();
}
```
Note the `name` is made part of the `FacetResult`, so that it can be used in isolation.

The results could also be implemented by a single concrete implementation:

```
class SearchFacetResult {
String name(); // deprecated
String field();
long total();
long missing();
long other();
List<TermFacetResult> terms();
dnault marked this conversation as resolved.
Show resolved Hide resolved
List<NumericRangeFacetResult> numericRanges();
List<DateRangeFacetResult> dateRanges();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the names now also need to be modified

}
```

Note that the `name()` is deprecated in this implementation, `name` is not required in this implemenatation due to it belonging the `SearchFacetResult` classes themselves.

##### Term Search Facet Results
```
class TermSearchFacetResult implements SearchFacetResult {
//... all from FacetResult, plus:
List<SearchTermRange> terms();
}
```

The JSON structure of each **term range** is:

```
TermFacet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchTermRange?

"term": string //the "category" or term
"count": long //how many times this term was seen
}
```
This can be directly mapped to a `SearchTermRange` class or the appropriate structure in the target language.

When requesting a term facet, the user provides a `size` and a `field`. Each term in this field will be considered as a unique range (a "bucket"), and the `TermSearchFacetResult` will contain the first _size_ terms, ordered by occurrence count. So in a `SearchTermRange`, the `term` is the "category" or term and the `count` is the number of times this term was seen.

##### Numeric Range Search Facet Results
```
class NumericRangeSearchFacetResult implements SearchFacetResult {
//... all from FacetResult, plus:
List<SearchNumericRange> numericRanges();
}
```

The JSON structure of each **numeric range** is:

```
NumericRange {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchNumericRange?

"name": string //the name given to the range in facet query
"min": double //the minimum value considered in the range (inclusive)
"max": double //the maximum value considered in the range (exclusive)
"count": long //how many terms were included in this range?
}
```
This can be directly mapped to a `SearchNumericRange` class or the appropriate structure in the target language.

When requesting a numeric facet, the user defines ranges (also called buckets) and gives them names. Note that lower bound is inclusive and upper bound is exclusive, but ranges can overlap so a document can be categorized into several buckets.

##### Date Range Search Facet Results
```
class DateRangeSearchFacetResult implements SearchFacetResult {
//... all from FacetResult, plus:
List<SearchDateRange> dateRanges();
}
```

The JSON structure of each **date range** is:

```
DateRange {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchDateRange?

"name": string //the name given to the range in facet query
"start": string //the minimum date ("YYYY-MM-DD HH:MM:SS") in range (inclusive)
"end": string //the maximum date in range (exclusive)
"count": long //how many terms were included in this range?
}
```
This can be directly mapped to a `SearchDateRange` class or the appropriate structure in the target language.

When requesting a date facet, the user defines ranges (also called buckets) and gives them names. Note that lower bound is inclusive and upper bound is exclusive, but ranges can overlap so a document can be categorized into several buckets.


### SearchMetaData

Represents the meta-data returned along with a search query result.
Expand Down Expand Up @@ -692,21 +828,24 @@ interface SearchMetrics {
* August 27, 2020
* Converted to Markdown.

* October 14, 2020 - Revision #8 (by Charles Dixon)
* Expanded `SearchFacetResult` providing further implementation details.

* December 7, 2021 (by Charles Dixon)
* Added `includeLocations` to `SearchOptions`.
* Added `MatchOperator` and added `operator` to `MatchQuery`.

# Signoff

| Language | Team Member | Signoff Date | Revision |
|------------|---------------------|----------------|----------|
| Node.js | Brett Lawson | April 16, 2020 | #7 |
| Go | Charles Dixon | April 22, 2020 | #7 |
| Connectors | David Nault | April 29, 2020 | #7 |
| PHP | Sergey Avseyev | April 22, 2020 | #7 |
| Python | Ellis Breen | April 29, 2020 | #7 |
| Scala | Graham Pople | April 30, 2020 | #7 |
| .NET | Jeffry Morris | April 22, 2020 | #7 |
| Java | Michael Nitschinger | April 16, 2020 | #7 |
| C | Sergey Avseyev | April 22, 2020 | #7 |
| Ruby | Sergey Avseyev | April 22, 2020 | #7 |
| Language | Team Member | Signoff Date | Revision |
|------------|---------------------|-------------------|----------|
| Node.js | Brett Lawson | April 16, 2020 | #7 |
| Go | Charles Dixon | October 14, 2020 | #8 |
| Connectors | David Nault | April 29, 2020 | #7 |
| PHP | Sergey Avseyev | December 06, 2020 | #8 |
| Python | Ellis Breen | April 29, 2020 | #7 |
| Scala | Graham Pople | April 30, 2020 | #7 |
| .NET | Jeffry Morris | April 22, 2020 | #7 |
| Java | Michael Nitschinger | April 16, 2020 | #7 |
| C | Sergey Avseyev | December 06, 2020 | #8 |
| Ruby | Sergey Avseyev | December 06, 2020 | #8 |