[Fetch Migration] Enable migration for identical, empty target cluster index#390
Conversation
Codecov Report
@@ Coverage Diff @@
## main #390 +/- ##
============================================
+ Coverage 63.55% 63.64% +0.09%
- Complexity 715 718 +3
============================================
Files 82 82
Lines 3298 3298
Branches 303 303
============================================
+ Hits 2096 2099 +3
+ Misses 1014 1011 -3
Partials 188 188
Flags with carried forward coverage won't be shown. Click here to find out more. |
d968a25 to
29b6d8f
Compare
| resp = requests.get(actual_endpoint, auth=endpoint.auth, verify=endpoint.verify_ssl) | ||
| def doc_count(indices: set, endpoint: EndpointInfo) -> IndexDocCount: | ||
| actual_endpoint = endpoint.url + ','.join(indices) + __SEARCH_COUNT_PATH | ||
| resp = requests.get(actual_endpoint, auth=endpoint.auth, verify=endpoint.verify_ssl, json=__SEARCH_COUNT_PAYLOAD) |
There was a problem hiding this comment.
This should be wrapped in an exception block printing errors or warnings when there is a connection problem. Additionally, status codes should be checked.
There was a problem hiding this comment.
I've got an open issue tracking such fixes across the implementation
| sink: | ||
| - opensearch: | ||
| bulk_size: 10 | ||
| # DO NOT CHANGE the hosts value - this will be updated by the CDK |
There was a problem hiding this comment.
Same as comment above: I would add a note that they would need to change these only if running locally (in Docker)
There was a problem hiding this comment.
After the most recent updates to the Fetch Migration CDK, the hosts value in the sink will be replaced by an inline target host value through the CDK, so changing this value will not affect that logic nor reflect in the final setup. I've updated the comment here to:
# Note - CDK code will replace this value with the target cluster endpoint
This is in contrast to the hosts value in source where the string value matters because the CDK performs a string substitution to inject the source cluster endpoint.
c9fc513 to
fd9edd2
Compare
This is encapsulated in an IndexDocCount class, which also track the total doc count Signed-off-by: Kartik Ganesh <gkart@amazon.com>
…cluster indices This change includes refactoring the index difference logic to an IndexDiff class, and an additional network call to the target cluster to fetch document counts for identical indices to determine which ones are empty. Signed-off-by: Kartik Ganesh <gkart@amazon.com>
This change adds documentation to the dp_pipeline_template.yaml file and changes some reasonable defaults. It also bumps the Fetch Migration task definition to 1 CPu and 4 GB memory. Signed-off-by: Kartik Ganesh <gkart@amazon.com>
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
fd9edd2 to
54bead7
Compare
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
Description
This change enables Fetch Migration to migrate data for indices that are identical between the source and target cluster, and the document count on the target cluster is zero. Such behavior allow users to execute Fetch Migration in an idempotent manner without having to manually remove empty indices on the target cluster (that were created by a previous run of Fetch Migration that did not complete successfully)
Prior to this change, all indices that were identical between the source and target clusters were ineligible for data migration. Now, we check the doc_count for identical indices on the target cluster and only exclude identical indices with a non-zero document count.
Testing
Also tested end-to-end via the CDK. Log output from a test run:
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.