- 
                Notifications
    
You must be signed in to change notification settings  - Fork 838
 
Add bucket index support to querier #3614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
c922c4a
              a8bf92d
              e1e9039
              e93aa5a
              7aedad1
              5139f2d
              601f05c
              1367aec
              1f8c3f6
              b3a4080
              c466b56
              31b83bc
              9f6748d
              65d9bff
              716710c
              9ed6fd7
              7e9b43b
              a89c483
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| --- | ||
| title: "Bucket Index" | ||
| linkTitle: "Bucket Index" | ||
| weight: 5 | ||
| slug: bucket-index | ||
| --- | ||
| 
     | 
||
| The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor and used by queriers to discover blocks in the storage. | ||
| 
     | 
||
| The bucket index usage is **optional** and can be enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true` (or its respective YAML config option). | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explain why it is useful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. WDYT?  | 
||
| 
     | 
||
| ## Structure of the index | ||
| 
     | 
||
| The `bucket-index.json.gz` contains: | ||
| 
     | 
||
| - **`blocks`**<br /> | ||
| List of complete blocks of a tenant, including blocks marked for deletion (partial blocks are excluded from the index). | ||
| - **`block_deletion_marks`**<br /> | ||
| List of block deletion marks. | ||
| - **`updated_at`**<br /> | ||
| Unix timestamp (seconds precision) of when the index has been updated (written in the storage) the last time. | ||
| 
     | 
||
| ## How it gets updated | ||
| 
     | 
||
| The [compactor](./compactor.md) periodically scans the bucket and uploads an updated bucket index to the storage. The frequency at which the bucket index is updated can be configured via `-compactor.cleanup-interval`. | ||
| 
     | 
||
| The bucket index is built and updated by the compactor even if `-blocks-storage.bucket-store.bucket-index.enabled` has **not** been enabled. This is intentional and the overhead introduced by keeping the bucket index is non significative. | ||
                
       | 
||
| 
     | 
||
| ## How it's used by the querier | ||
| 
     | 
||
| The [querier](./querier.md), at query time, checks whether the bucket index for the tenant has already been loaded in memory. If not, the querier downloads it from the storage and cache it in memory. Given it's a small file, lazy downloading it doesn't significantly impact on 1st query performances, but allows to get a querier up and running without pre-downloading every tenant's bucket index. | ||
| 
     | 
||
| While in-memory, a background process will keep it **updated at periodic intervals**, so that subsequent queries from the same tenant to the same querier instance will use the cached (and periodically updated) bucket index. There are two config options involved: | ||
| 
     | 
||
| - `-blocks-storage.bucket-store.bucket-index.update-on-stale-interval`<br /> | ||
| This option configures how frequently a cached bucket index should be refreshed. | ||
| - `-blocks-storage.bucket-store.bucket-index.update-on-error-interval`<br /> | ||
| If downloading a bucket index fails, the failure is cached for a short time in order to avoid hammering the backend storage. This option configures how frequently a bucket index, which previously failed to load, should be tried to load again. | ||
| 
     | 
||
| If a bucket index is unused for a long time (configurable via `-blocks-storage.bucket-store.bucket-index.idle-timeout`), e.g. because that querier instance is not receiving any query from the tenant, the querier will offload it, stopping to keep it updated at regular intervals. This is particularly for tenants which are resharded to different queriers when [shuffle sharding](../guides/shuffle-sharding.md) is enabled. | ||
| 
     | 
||
| Finally, the querier, at query time, checks how old is a bucket index (based on its `updated_at`) and fail a query if its age is older than `-blocks-storage.bucket-store.bucket-index.max-stale-period`. This circuit breaker is used to ensure queriers will not return any partial query results due to a stale view over the long-term storage. | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have also added caching of index into caching-bucket. Is that worth mentioning as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, sure, done. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It isn't possible to fall back to the behavior used when the bucket index is not enabled? Or this is undesirable or some reason? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think falling back is a viable solution. The "bucket scan" logic requires a preventive bucket scanning, which we don't do when we enable the bucket index. Lazily bucket scanning would be too slow to do at query time. Moreover, fallback logic unfrequently exercised would bring further risks that fallback logic doesn't work as expected. In my opinion, when you run Cortex with bucket index, the bucket index is an essential part of the system and it's required to be kept updated. The compactor already exports a metric with the timestamp of the last time the bucket index of each tenant has been updated, so that we can alert on it before the  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think?  | 
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to cause issues for users running with compactor now(not sure if there are such people)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, bucket-index.md says the bucket index is optional which seems like it contradicts this statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean "with compactor" or "without compactor"? I'm no sure I understand the question.
The compactor always writes the bucket index. The flag to enable bucket index is whether it should be used or not in queriers (and in the store-gateway in the upcoming PR too). The point is that, before enabling the bucket index in the queriers and store-gateway, you have to rollout the compactor first, so that the bucket index for all tenants is created before you enable it in querier and store-gateway. To simplify it (at least I thought it would have simplified), I've kept it always enabled in the compactor.