Skip to content

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Jul 24, 2023

What this PR does:

This pr changes GetClientsFor method of BlocksStoreSet interface to take an additional map of retried zones.
blocksStoreReplicationSet is mainly changed. If zone awareness is not enabled, the logic should be the same. If enabled, the algorithm is:

  1. There is a map to track number of attempts for each zone per block.
  2. For each block, we get the min attempts over all zones.
  3. Iterate all instances in the replication set, if the instance is located in the zone where its attempts == minAttempts, pick the target instance.

Which issue(s) this PR fixes:
Fixes #5468

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@yeya24 yeya24 force-pushed the retry-different-zones branch from 812f143 to 963b10f Compare July 25, 2023 01:34
@yeya24 yeya24 force-pushed the retry-different-zones branch from df034a7 to 746375b Compare July 25, 2023 16:50
Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Ben.

Signed-off-by: Ben Ye <[email protected]>
@yeya24
Copy link
Contributor Author

yeya24 commented Jul 25, 2023

This seems concerning https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476.

query_fuzz_test.go:190: case 105 results mismatch.
        range query: sqrt(
          topk(
            --scalar(count_values by (job, series) ("value", -max({__name__="test_series_a"}))),
            count_values by (series, status_code, __name__) (
              "value",
              max_over_time({__name__="test_series_b"}[1h:1m] offset -3m38s)
            )
          )
        )
        res1: {series="3", status_code="200", value="65"} =>
        1 @[1690304[58](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:59)2.945]
        {series="3", status_code="200", value="67"} =>
        1 @[1690304[61](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:62)2.945]
        1 @[1690304642.945]
        {series="3", status_code="200", value="69"} =>
        1 @[1690304672.945]
        1 @[1690304702.945]
        {series="3", status_code="200", value="75"} =>
        1 @[1690304852.945]
        1 @[1690304882.945]
        {series="4", status_code="400", value="91"} =>
        1 @[1690304732.945]
        1 @[16903047[62](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:63).945]
        {series="4", status_code="400", value="97"} =>
        1 @[1690304912.945]
        1 @[1690304942.945]
        {series="4", status_code="400", value="99"} =>
        1 @[1690304972.945]
        1 @[1690305002.945]
        1 @[1690305032.945]
        1 @[1690305062.945]
        1 @[1690305092.945]
        1 @[1690305122.945]
        {series="5", status_code="500", value="113"} =>
        1 @[1690304792.945]
        1 @[1690304822.945]
        res2: {series="3", status_code="200", value="69"} =>
        1 @[1690304672.945]
        1 @[1690304702.945]
        {series="3", status_code="200", value="73"} =>
        1 @[1690304792.945]
        1 @[1690304822.945]
        {series="4", status_code="400", value="85"} =>
        1 @[1690304582.945]
        {series="4", status_code="400", value="87"} =>
        1 @[1690304612.945]
        1 @[1690304[64](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:65)2.945]
        {series="4", status_code="400", value="91"} =>
        1 @[1[69](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:70)0304732.945]
        1 @[1690304762.945]
        {series="4", status_code="400", value="97"} =>
        1 @[1690304912.945]
        1 @[1690304942.945]
        {series="5", status_code="500", value="115"} =>
        1 @[1690304852.945]
        1 @[1690304882.945]
        {series="5", status_code="500", value="119"} =>
        1 @[16903049[72](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:73).945]
        1 @[1690305002.945]
        1 @[1690305032.945]
        1 @[1690305062.945]
        1 @[1690305092.945]
        1 @[1690305122.945]
    query_fuzz_test.go:1[95](https://github.com/cortexproject/cortex/actions/runs/5659654449/job/15333902668?pr=5476#step:10:96): 
        	Error Trace:	/home/runner/work/cortex/cortex/integration/query_fuzz_test.go:195
        	Error:      	finished query fuzzing tests
        	Test:       	TestVerticalShardingFuzz
        	Messages:   	1 test cases failed

@alanprot
Copy link
Member

Thanks! LGTM

@yeya24 yeya24 merged commit e0bcca5 into cortexproject:master Jul 25, 2023
@yeya24 yeya24 deleted the retry-different-zones branch July 26, 2023 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retries in blocksStoreQuerier.queryWithConsistencyCheck() doesn't query all zones

3 participants