-
Notifications
You must be signed in to change notification settings - Fork 195
[Streams] Update data quality calculations #4566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Streams] Update data quality calculations #4566
Conversation
✅ Vale Linting ResultsNo issues found on modified lines! The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
🔍 Preview links for changed docs |
nastasha-solomon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some light suggestions. Take em or leave em!
|
|
||
| ## Failure store | ||
| * **Degraded documents:** Documents from the last backing index of the stream with the `ignored` property, usually because of malformed fields or exceeding the limit of total fields when `ignore_above` is set to `false`. This component shows: | ||
| * Total number of degraded documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think these list items need periods since they're not complete sentences.
| * Total number of degraded documents. | |
| * Total number of degraded documents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normally follow the Microsoft guideline where if the introductory sentence is a fragment, and the list items complete the sentence, then use punctuation, but I'm not sure if we have guidance in that particular instance in our style guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our official guidance is:
Use parallel construction for bulleted lists (i.e., all sentence fragments or all complete sentences, rather than a mix of the two).
A period should be added to the end of each bullet point when the bullet points contain complete sentences and/or two or more sentences. When using short fragments, bullets should not include punctuation.
https://brand.elastic.co/302f66895/p/194a3b-writing-style-guide/t/446788
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's the brand guide and I have my own brand(on) guide that I follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets a little hazy when there's a item that has a complete sentence (refer to...) but I don't want to rewrite all of these bullets as complete sentences. I have learned in the past to use periods on all in that situation, but it's all a style guide thang, so who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 👍
| * **Failed documents:** Documents that were rejected during ingestion because of mapping conflicts or pipeline failures. This component shows: | ||
| * Total number of failed documents that correspond with this stream from within the specified time range in the date picker. Refer to [Failure store](#streams-data-quality-failure) for more information. | ||
| * Percentage of failed documents relative to the total document count from the stream's last backing index. | ||
| * The data quality status (**Good**, **Degraded**, **Poor**). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * The data quality status (**Good**, **Degraded**, **Poor**). | |
| * The data quality status (**Good**, **Degraded**, **Poor**) |
| * **Degraded:** Either the **Degraded documents** percentage or the **Failed documents** percentage are greater than 0 and less than or equal to 3. | ||
| * **Poor:** Either the **Degraded documents** percentage or the **Failed documents** percentage are greater than 3. | ||
|
|
||
| ## Failure store [streams-data-quality-failure] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this section needed if there's already a whole page dedicated to explaining a failure store?
|
|
||
| For example, for a stream called `my-stream`, Streams fetches all documents from the `my-stream::failures` index from within the specified time range in the date picker. | ||
|
|
||
| ### Required permissions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. It looks like this information is already doc'd in https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/4566/manage-data/data-store/data-streams/failure-store,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth making a snippet, I would like to have everything a user needs to user Streams without sending them to another page if possible, so I put a brief description, but maybe it's not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ to the snippet idea. You could snippetize the intro too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph, that is:
A failure store is a secondary set of indices inside a data stream, dedicated to storing failed documents. A failed document is any document that, without the failure store enabled, would cause an ingest pipeline exception or that has a structure that conflicts with a data stream's mappings. In the absence of the failure store, a failed document would cause the indexing operation to fail, with an error message returned in the operation response.
This PR closes #4345 and updates information on how Streams calculates data quality.