Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Level Encryption plugin #12902

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

asonje
Copy link

@asonje asonje commented Mar 25, 2024

Description

This pull request adds index level encryption features to OpenSearch based on the issue #3469. Each OpenSearch index is individually encrypted based on user provided encryption keys. A new cryptofs store type index.store.type is introduced which instantiates a CryptoDirectory that encrypts and decrypts files as they are written and read respectively

Related Issues

Resolves #3469

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

asonje added 4 commits March 25, 2024 09:23
Signed-off-by: Olasoji Denloye <[email protected]>
Signed-off-by: Olasoji Denloye <[email protected]>
Signed-off-by: Olasoji Denloye <[email protected]>
@asonje
Copy link
Author

asonje commented Nov 9, 2024

{"run-benchmark-test": "id_2"}

Copy link
Contributor

github-actions bot commented Nov 9, 2024

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1618/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/1618/

Metric Task Value Unit
Cumulative indexing time of primary shards 167.658 min
Min cumulative indexing time across primary shards 0 min
Median cumulative indexing time across primary shards 6.52187 min
Max cumulative indexing time across primary shards 125.447 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 76.8255 min
Cumulative merge count of primary shards 82
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 1.20253 min
Max cumulative merge time across primary shards 69.8232 min
Cumulative merge throttle time of primary shards 28.686 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0.19055 min
Max cumulative merge throttle time across primary shards 27.5956 min
Cumulative refresh time of primary shards 2.40717 min
Cumulative refresh count of primary shards 187
Min cumulative refresh time across primary shards 0 min
Median cumulative refresh time across primary shards 0.167683 min
Max cumulative refresh time across primary shards 1.28473 min
Cumulative flush time of primary shards 10.9901 min
Cumulative flush count of primary shards 108
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0.559483 min
Max cumulative flush time across primary shards 7.92678 min
Total Young Gen GC time 13.311 s
Total Young Gen GC count 388
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 18.8594 GB
Translog size 1.02632e-06 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 8
Min Throughput index-append 79293.6 docs/s
Mean Throughput index-append 81955.2 docs/s
Median Throughput index-append 82151.8 docs/s
Max Throughput index-append 84177.4 docs/s
50th percentile latency index-append 438.723 ms
90th percentile latency index-append 604.241 ms
99th percentile latency index-append 1139.63 ms
99.9th percentile latency index-append 5744.85 ms
99.99th percentile latency index-append 7254.43 ms
100th percentile latency index-append 8112.11 ms
50th percentile service time index-append 438.702 ms
90th percentile service time index-append 604.214 ms
99th percentile service time index-append 1139.67 ms
99.9th percentile service time index-append 5744.85 ms
99.99th percentile service time index-append 7254.43 ms
100th percentile service time index-append 8112.11 ms
error rate index-append 0 %
Min Throughput wait-until-merges-finish 0.01 ops/s
Mean Throughput wait-until-merges-finish 0.01 ops/s
Median Throughput wait-until-merges-finish 0.01 ops/s
Max Throughput wait-until-merges-finish 0.01 ops/s
100th percentile latency wait-until-merges-finish 149662 ms
100th percentile service time wait-until-merges-finish 149662 ms
error rate wait-until-merges-finish 0 %
Min Throughput wait-until-merges-1-seg-finish 123.5 ops/s
Mean Throughput wait-until-merges-1-seg-finish 123.5 ops/s
Median Throughput wait-until-merges-1-seg-finish 123.5 ops/s
Max Throughput wait-until-merges-1-seg-finish 123.5 ops/s
100th percentile latency wait-until-merges-1-seg-finish 7.79346 ms
100th percentile service time wait-until-merges-1-seg-finish 7.79346 ms
error rate wait-until-merges-1-seg-finish 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/23/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 170.336 167.658 -2.67867 min
Min cumulative indexing time across primary shard 0 0 0 min
Median cumulative indexing time across primary shard 7.13363 6.52187 -0.61177 min
Max cumulative indexing time across primary shard 127.078 125.447 -1.63033 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 76.2919 76.8255 0.53358 min
Cumulative merge count of primary shards 85 82 -3
Min cumulative merge time across primary shard 0 0 0 min
Median cumulative merge time across primary shard 1.09522 1.20253 0.10732 min
Max cumulative merge time across primary shard 68.8309 69.8232 0.99227 min
Cumulative merge throttle time of primary shards 34.2203 28.686 -5.5343 min
Min cumulative merge throttle time across primary shard 0 0 0 min
Median cumulative merge throttle time across primary shard 0.212167 0.19055 -0.02162 min
Max cumulative merge throttle time across primary shard 32.7638 27.5956 -5.16812 min
Cumulative refresh time of primary shards 1.77567 2.40717 0.6315 min
Cumulative refresh count of primary shards 200 187 -13
Min cumulative refresh time across primary shard 0 0 0 min
Median cumulative refresh time across primary shard 0.134283 0.167683 0.0334 min
Max cumulative refresh time across primary shard 0.9094 1.28473 0.37533 min
Cumulative flush time of primary shards 9.66832 10.9901 1.32183 min
Cumulative flush count of primary shards 108 108 0
Min cumulative flush time across primary shard 0 0 0 min
Median cumulative flush time across primary shard 0.412817 0.559483 0.14667 min
Max cumulative flush time across primary shard 6.84105 7.92678 1.08573 min
Total Young Gen GC time 10.689 13.311 2.622 s
Total Young Gen GC count 331 388 57
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 18.9309 18.8594 -0.07145 GB
Translog size 1.02632e-06 1.02632e-06 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 8 8 0
Min Throughput index-append 82296 79293.6 -3002.41 docs/s
Mean Throughput index-append 84559.8 81955.2 -2604.62 docs/s
Median Throughput index-append 85043.9 82151.8 -2892.06 docs/s
Max Throughput index-append 86642 84177.4 -2464.55 docs/s
50th percentile latency index-append 422.46 438.723 16.2632 ms
90th percentile latency index-append 577.558 604.241 26.6835 ms
99th percentile latency index-append 1186.47 1139.63 -46.838 ms
99.9th percentile latency index-append 5119.46 5744.85 625.393 ms
99.99th percentile latency index-append 6135.93 7254.43 1118.5 ms
100th percentile latency index-append 6654.76 8112.11 1457.35 ms
50th percentile service time index-append 422.461 438.702 16.2409 ms
90th percentile service time index-append 577.579 604.214 26.635 ms
99th percentile service time index-append 1185.57 1139.67 -45.9 ms
99.9th percentile service time index-append 5119.46 5744.85 625.393 ms
99.99th percentile service time index-append 6135.93 7254.43 1118.5 ms
100th percentile service time index-append 6654.76 8112.11 1457.35 ms
error rate index-append 0 0 0 %
Min Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
Mean Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
Median Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
Max Throughput wait-until-merges-finish 0.00288805 0.00668173 0.00379 ops/s
100th percentile latency wait-until-merges-finish 346254 149662 -196593 ms
100th percentile service time wait-until-merges-finish 346254 149662 -196593 ms
error rate wait-until-merges-finish 0 0 0 %
Min Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
Mean Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
Median Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
Max Throughput wait-until-merges-1-seg-finish 112.547 123.496 10.9489 ops/s
100th percentile latency wait-until-merges-1-seg-finish 8.39773 7.79346 -0.60427 ms
100th percentile service time wait-until-merges-1-seg-finish 8.39773 7.79346 -0.60427 ms
error rate wait-until-merges-1-seg-finish 0 0 0 %

@kumargu
Copy link
Contributor

kumargu commented Nov 10, 2024

{"run-benchmark-test": "id_1"}

@kumargu
Copy link
Contributor

kumargu commented Nov 11, 2024

{"run-benchmark-test": "id_1"}

Copy link
Contributor

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1633/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/1633/

Metric Task Value Unit
Cumulative indexing time of primary shards 212.702 min
Min cumulative indexing time across primary shards 212.702 min
Median cumulative indexing time across primary shards 212.702 min
Max cumulative indexing time across primary shards 212.702 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 99.9794 min
Cumulative merge count of primary shards 68
Min cumulative merge time across primary shards 99.9794 min
Median cumulative merge time across primary shards 99.9794 min
Max cumulative merge time across primary shards 99.9794 min
Cumulative merge throttle time of primary shards 19.0518 min
Min cumulative merge throttle time across primary shards 19.0518 min
Median cumulative merge throttle time across primary shards 19.0518 min
Max cumulative merge throttle time across primary shards 19.0518 min
Cumulative refresh time of primary shards 13.6069 min
Cumulative refresh count of primary shards 134
Min cumulative refresh time across primary shards 13.6069 min
Median cumulative refresh time across primary shards 13.6069 min
Max cumulative refresh time across primary shards 13.6069 min
Cumulative flush time of primary shards 4.33373 min
Cumulative flush count of primary shards 34
Min cumulative flush time across primary shards 4.33373 min
Median cumulative flush time across primary shards 4.33373 min
Max cumulative flush time across primary shards 4.33373 min
Total Young Gen GC time 15.343 s
Total Young Gen GC count 353
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 29.1484 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 39
Min Throughput index 45333.5 docs/s
Mean Throughput index 47687 docs/s
Median Throughput index 47061.2 docs/s
Max Throughput index 52957.1 docs/s
50th percentile latency index 1562.9 ms
90th percentile latency index 2132.58 ms
99th percentile latency index 6522.47 ms
99.9th percentile latency index 12381.8 ms
99.99th percentile latency index 14977.1 ms
100th percentile latency index 17174 ms
50th percentile service time index 1563.13 ms
90th percentile service time index 2132.71 ms
99th percentile service time index 6527.01 ms
99.9th percentile service time index 12381.8 ms
99.99th percentile service time index 14977.1 ms
100th percentile service time index 17174 ms
error rate index 0.01 %
Min Throughput wait-until-merges-finish 0 ops/s
Mean Throughput wait-until-merges-finish 0 ops/s
Median Throughput wait-until-merges-finish 0 ops/s
Max Throughput wait-until-merges-finish 0 ops/s
100th percentile latency wait-until-merges-finish 238751 ms
100th percentile service time wait-until-merges-finish 238751 ms
error rate wait-until-merges-finish 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/24/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 207.016 212.702 5.68563 min
Min cumulative indexing time across primary shard 207.016 212.702 5.68563 min
Median cumulative indexing time across primary shard 207.016 212.702 5.68563 min
Max cumulative indexing time across primary shard 207.016 212.702 5.68563 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 102.178 99.9794 -2.19833 min
Cumulative merge count of primary shards 62 68 6
Min cumulative merge time across primary shard 102.178 99.9794 -2.19833 min
Median cumulative merge time across primary shard 102.178 99.9794 -2.19833 min
Max cumulative merge time across primary shard 102.178 99.9794 -2.19833 min
Cumulative merge throttle time of primary shards 21.7156 19.0518 -2.66373 min
Min cumulative merge throttle time across primary shard 21.7156 19.0518 -2.66373 min
Median cumulative merge throttle time across primary shard 21.7156 19.0518 -2.66373 min
Max cumulative merge throttle time across primary shard 21.7156 19.0518 -2.66373 min
Cumulative refresh time of primary shards 12.7603 13.6069 0.84663 min
Cumulative refresh count of primary shards 125 134 9
Min cumulative refresh time across primary shard 12.7603 13.6069 0.84663 min
Median cumulative refresh time across primary shard 12.7603 13.6069 0.84663 min
Max cumulative refresh time across primary shard 12.7603 13.6069 0.84663 min
Cumulative flush time of primary shards 4.21003 4.33373 0.1237 min
Cumulative flush count of primary shards 31 34 3
Min cumulative flush time across primary shard 4.21003 4.33373 0.1237 min
Median cumulative flush time across primary shard 4.21003 4.33373 0.1237 min
Max cumulative flush time across primary shard 4.21003 4.33373 0.1237 min
Total Young Gen GC time 14.03 15.343 1.313 s
Total Young Gen GC count 328 353 25
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 29.3125 29.1484 -0.16412 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 42 39 -3
Min Throughput index 48898.2 45333.5 -3564.64 docs/s
Mean Throughput index 51667 47687 -3980.02 docs/s
Median Throughput index 51457.8 47061.2 -4396.6 docs/s
Max Throughput index 55159.3 52957.1 -2202.21 docs/s
50th percentile latency index 1434.93 1562.9 127.968 ms
90th percentile latency index 2009.98 2132.58 122.606 ms
99th percentile latency index 6369.37 6522.47 153.095 ms
99.9th percentile latency index 12323.6 12381.8 58.1934 ms
99.99th percentile latency index 15396.3 14977.1 -419.13 ms
100th percentile latency index 16550.6 17174 623.381 ms
50th percentile service time index 1434.9 1563.13 128.23 ms
90th percentile service time index 2011.09 2132.71 121.626 ms
99th percentile service time index 6388.21 6527.01 138.796 ms
99.9th percentile service time index 12323.6 12381.8 58.1934 ms
99.99th percentile service time index 15396.3 14977.1 -419.13 ms
100th percentile service time index 16550.6 17174 623.381 ms
error rate index 0.0065703 0.00654922 -2e-05 %
Min Throughput wait-until-merges-finish 0.0065218 0.00418846 -0.00233 ops/s
Mean Throughput wait-until-merges-finish 0.0065218 0.00418846 -0.00233 ops/s
Median Throughput wait-until-merges-finish 0.0065218 0.00418846 -0.00233 ops/s
Max Throughput wait-until-merges-finish 0.0065218 0.00418846 -0.00233 ops/s
100th percentile latency wait-until-merges-finish 153332 238751 85419.2 ms
100th percentile service time wait-until-merges-finish 153332 238751 85419.2 ms
error rate wait-until-merges-finish 0 0 0 %

@kumargu
Copy link
Contributor

kumargu commented Nov 13, 2024

{"run-benchmark-test": "id_3"}

Copy link
Contributor

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1654/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

The benchmark job https://build.ci.opensearch.org/job/benchmark-pull-request/1654/ failed.
Please see logs to debug.

@rishabh6788
Copy link
Contributor

rishabh6788 commented Nov 13, 2024

@kumargu The OS process is crashing with below error while trying to register snapshot repository:

java.lang.NoSuchFieldError: Class org.opensearch.common.blobstore.BlobStore$Metric does not have member field 'org.opensearch.common.blobstore.BlobStore$Metric GENERIC_STATS'
        at org.opensearch.repositories.s3.S3BlobStore.extendedStats(S3BlobStore.java:243) ~[?:?]
        at org.opensearch.repositories.blobstore.BlobStoreRepository.stats(BlobStoreRepository.java:857) ~[opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.repositories.blobstore.MeteredBlobStoreRepository.statsSnapshot(MeteredBlobStoreRepository.java:72) ~[opensearch-3.0.0.jar:3.0.0]
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]

Try rebasing your PR and running again.

Signed-off-by: Olasoji Denloye <[email protected]>
Copy link
Contributor

github-actions bot commented Dec 5, 2024

❌ Gradle check result for 3bd44e5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Olasoji Denloye <[email protected]>
Copy link
Contributor

github-actions bot commented Dec 5, 2024

❌ Gradle check result for ded0f58: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@asonje
Copy link
Author

asonje commented Dec 5, 2024

{"run-benchmark-test": "id_3"}

Copy link
Contributor

github-actions bot commented Dec 5, 2024

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1810/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/1810/

Metric Task Value Unit
Cumulative indexing time of primary shards 0 min
Min cumulative indexing time across primary shards 0 min
Median cumulative indexing time across primary shards 0 min
Max cumulative indexing time across primary shards 0 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0 min
Cumulative merge count of primary shards 0
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 0 min
Max cumulative merge time across primary shards 0 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 0 min
Cumulative refresh count of primary shards 2
Min cumulative refresh time across primary shards 0 min
Median cumulative refresh time across primary shards 0 min
Max cumulative refresh time across primary shards 0 min
Cumulative flush time of primary shards 0 min
Cumulative flush count of primary shards 1
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0 min
Max cumulative flush time across primary shards 0 min
Total Young Gen GC time 0.333 s
Total Young Gen GC count 13
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 23.6241 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 32
Min Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
Mean Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
Median Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
Max Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
100th percentile latency wait-for-snapshot-recovery 600574 ms
100th percentile service time wait-for-snapshot-recovery 600574 ms
error rate wait-for-snapshot-recovery 0 %
Min Throughput default 3.01 ops/s
Mean Throughput default 3.02 ops/s
Median Throughput default 3.02 ops/s
Max Throughput default 3.04 ops/s
50th percentile latency default 7.2419 ms
90th percentile latency default 8.27506 ms
99th percentile latency default 8.72156 ms
100th percentile latency default 8.79778 ms
50th percentile service time default 6.12717 ms
90th percentile service time default 6.93989 ms
99th percentile service time default 7.81181 ms
100th percentile service time default 7.84552 ms
error rate default 0 %
Min Throughput range 0.7 ops/s
Mean Throughput range 0.7 ops/s
Median Throughput range 0.7 ops/s
Max Throughput range 0.71 ops/s
50th percentile latency range 159.906 ms
90th percentile latency range 163.337 ms
99th percentile latency range 174.24 ms
100th percentile latency range 182.367 ms
50th percentile service time range 156.87 ms
90th percentile service time range 160.573 ms
99th percentile service time range 170.992 ms
100th percentile service time range 178.964 ms
error rate range 0 %
Min Throughput distance_amount_agg 0.07 ops/s
Mean Throughput distance_amount_agg 0.07 ops/s
Median Throughput distance_amount_agg 0.07 ops/s
Max Throughput distance_amount_agg 0.07 ops/s
50th percentile latency distance_amount_agg 1.32056e+06 ms
90th percentile latency distance_amount_agg 1.84544e+06 ms
99th percentile latency distance_amount_agg 1.96348e+06 ms
100th percentile latency distance_amount_agg 1.97006e+06 ms
50th percentile service time distance_amount_agg 13592.9 ms
90th percentile service time distance_amount_agg 13707.9 ms
99th percentile service time distance_amount_agg 13806.9 ms
100th percentile service time distance_amount_agg 13817.7 ms
error rate distance_amount_agg 0 %
Min Throughput autohisto_agg 1.51 ops/s
Mean Throughput autohisto_agg 1.51 ops/s
Median Throughput autohisto_agg 1.51 ops/s
Max Throughput autohisto_agg 1.52 ops/s
50th percentile latency autohisto_agg 13.9108 ms
90th percentile latency autohisto_agg 14.7679 ms
99th percentile latency autohisto_agg 21.0951 ms
100th percentile latency autohisto_agg 23.0766 ms
50th percentile service time autohisto_agg 12.3491 ms
90th percentile service time autohisto_agg 13.0162 ms
99th percentile service time autohisto_agg 19.8559 ms
100th percentile service time autohisto_agg 21.7752 ms
error rate autohisto_agg 0 %
Min Throughput date_histogram_agg 1.51 ops/s
Mean Throughput date_histogram_agg 1.52 ops/s
Median Throughput date_histogram_agg 1.51 ops/s
Max Throughput date_histogram_agg 1.53 ops/s
50th percentile latency date_histogram_agg 13.3376 ms
90th percentile latency date_histogram_agg 13.855 ms
99th percentile latency date_histogram_agg 28.4331 ms
100th percentile latency date_histogram_agg 37.9943 ms
50th percentile service time date_histogram_agg 11.9196 ms
90th percentile service time date_histogram_agg 12.2359 ms
99th percentile service time date_histogram_agg 27.0351 ms
100th percentile service time date_histogram_agg 36.3977 ms
error rate date_histogram_agg 0 %
Min Throughput desc_sort_tip_amount 0.5 ops/s
Mean Throughput desc_sort_tip_amount 0.5 ops/s
Median Throughput desc_sort_tip_amount 0.5 ops/s
Max Throughput desc_sort_tip_amount 0.51 ops/s
50th percentile latency desc_sort_tip_amount 35.4273 ms
90th percentile latency desc_sort_tip_amount 36.0668 ms
99th percentile latency desc_sort_tip_amount 41.6973 ms
100th percentile latency desc_sort_tip_amount 42.6475 ms
50th percentile service time desc_sort_tip_amount 32.7031 ms
90th percentile service time desc_sort_tip_amount 33.1555 ms
99th percentile service time desc_sort_tip_amount 38.9371 ms
100th percentile service time desc_sort_tip_amount 39.7858 ms
error rate desc_sort_tip_amount 0 %
Min Throughput asc_sort_tip_amount 0.5 ops/s
Mean Throughput asc_sort_tip_amount 0.51 ops/s
Median Throughput asc_sort_tip_amount 0.5 ops/s
Max Throughput asc_sort_tip_amount 0.51 ops/s
50th percentile latency asc_sort_tip_amount 8.78618 ms
90th percentile latency asc_sort_tip_amount 9.31272 ms
99th percentile latency asc_sort_tip_amount 10.2617 ms
100th percentile latency asc_sort_tip_amount 10.2739 ms
50th percentile service time asc_sort_tip_amount 5.91895 ms
90th percentile service time asc_sort_tip_amount 6.25749 ms
99th percentile service time asc_sort_tip_amount 7.16934 ms
100th percentile service time asc_sort_tip_amount 7.17124 ms
error rate asc_sort_tip_amount 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/25/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 0 0 0 min
Min cumulative indexing time across primary shard 0 0 0 min
Median cumulative indexing time across primary shard 0 0 0 min
Max cumulative indexing time across primary shard 0 0 0 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 0 0 0 min
Cumulative merge count of primary shards 0 0 0
Min cumulative merge time across primary shard 0 0 0 min
Median cumulative merge time across primary shard 0 0 0 min
Max cumulative merge time across primary shard 0 0 0 min
Cumulative merge throttle time of primary shards 0 0 0 min
Min cumulative merge throttle time across primary shard 0 0 0 min
Median cumulative merge throttle time across primary shard 0 0 0 min
Max cumulative merge throttle time across primary shard 0 0 0 min
Cumulative refresh time of primary shards 0 0 0 min
Cumulative refresh count of primary shards 2 2 0
Min cumulative refresh time across primary shard 0 0 0 min
Median cumulative refresh time across primary shard 0 0 0 min
Max cumulative refresh time across primary shard 0 0 0 min
Cumulative flush time of primary shards 0 0 0 min
Cumulative flush count of primary shards 1 1 0
Min cumulative flush time across primary shard 0 0 0 min
Median cumulative flush time across primary shard 0 0 0 min
Max cumulative flush time across primary shard 0 0 0 min
Total Young Gen GC time 0.325 0.333 0.008 s
Total Young Gen GC count 13 13 0
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 23.6241 23.6241 0 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 32 32 0
Min Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
Mean Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
Median Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
Max Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
100th percentile latency wait-for-snapshot-recovery 600592 600574 -18.3125 ms
100th percentile service time wait-for-snapshot-recovery 600592 600574 -18.3125 ms
error rate wait-for-snapshot-recovery 0 0 0 %
Min Throughput default 3.01355 3.01317 -0.00038 ops/s
Mean Throughput default 3.02202 3.02142 -0.0006 ops/s
Median Throughput default 3.02008 3.0195 -0.00058 ops/s
Max Throughput default 3.03885 3.03768 -0.00117 ops/s
50th percentile latency default 6.30176 7.2419 0.94013 ms
90th percentile latency default 7.23365 8.27506 1.04142 ms
99th percentile latency default 8.1162 8.72156 0.60537 ms
100th percentile latency default 8.53889 8.79778 0.25889 ms
50th percentile service time default 5.29412 6.12717 0.83305 ms
90th percentile service time default 6.00782 6.93989 0.93207 ms
99th percentile service time default 6.91444 7.81181 0.89737 ms
100th percentile service time default 7.18129 7.84552 0.66423 ms
error rate default 0 0 0 %
Min Throughput range 0.702121 0.702159 4e-05 ops/s
Mean Throughput range 0.703487 0.703551 6e-05 ops/s
Median Throughput range 0.703176 0.703235 6e-05 ops/s
Max Throughput range 0.706283 0.706399 0.00012 ops/s
50th percentile latency range 159.567 159.906 0.33825 ms
90th percentile latency range 163.17 163.337 0.16733 ms
99th percentile latency range 168.481 174.24 5.75862 ms
100th percentile latency range 169.503 182.367 12.8635 ms
50th percentile service time range 157.297 156.87 -0.42642 ms
90th percentile service time range 160.425 160.573 0.14827 ms
99th percentile service time range 165.635 170.992 5.3576 ms
100th percentile service time range 166.422 178.964 12.5422 ms
error rate range 0 0 0 %
Min Throughput distance_amount_agg 0.0745231 0.0733222 -0.0012 ops/s
Mean Throughput distance_amount_agg 0.0750837 0.0733966 -0.00169 ops/s
Median Throughput distance_amount_agg 0.0749722 0.0734084 -0.00156 ops/s
Max Throughput distance_amount_agg 0.0757835 0.0734192 -0.00236 ops/s
50th percentile latency distance_amount_agg 1.29251e+06 1.32056e+06 28054.8 ms
90th percentile latency distance_amount_agg 1.81601e+06 1.84544e+06 29427.4 ms
99th percentile latency distance_amount_agg 1.93337e+06 1.96348e+06 30107.9 ms
100th percentile latency distance_amount_agg 1.9399e+06 1.97006e+06 30158.4 ms
50th percentile service time distance_amount_agg 13599.7 13592.9 -6.83984 ms
90th percentile service time distance_amount_agg 13728.3 13707.9 -20.3667 ms
99th percentile service time distance_amount_agg 13891.7 13806.9 -84.8272 ms
100th percentile service time distance_amount_agg 13921.2 13817.7 -103.477 ms
error rate distance_amount_agg 0 0 0 %
Min Throughput autohisto_agg 1.50711 1.50695 -0.00016 ops/s
Mean Throughput autohisto_agg 1.51174 1.51147 -0.00027 ops/s
Median Throughput autohisto_agg 1.5107 1.51044 -0.00026 ops/s
Max Throughput autohisto_agg 1.52112 1.52062 -0.00049 ops/s
50th percentile latency autohisto_agg 13.8174 13.9108 0.09342 ms
90th percentile latency autohisto_agg 14.4257 14.7679 0.34225 ms
99th percentile latency autohisto_agg 20.6309 21.0951 0.4642 ms
100th percentile latency autohisto_agg 20.8432 23.0766 2.2334 ms
50th percentile service time autohisto_agg 12.2654 12.3491 0.08371 ms
90th percentile service time autohisto_agg 12.9628 13.0162 0.05343 ms
99th percentile service time autohisto_agg 19.0487 19.8559 0.8072 ms
100th percentile service time autohisto_agg 19.0793 21.7752 2.6959 ms
error rate autohisto_agg 0 0 0 %
Min Throughput date_histogram_agg 1.5097 1.50961 -8e-05 ops/s
Mean Throughput date_histogram_agg 1.51602 1.51589 -0.00013 ops/s
Median Throughput date_histogram_agg 1.51459 1.51446 -0.00013 ops/s
Max Throughput date_histogram_agg 1.52884 1.52864 -0.0002 ops/s
50th percentile latency date_histogram_agg 13.7754 13.3376 -0.43773 ms
90th percentile latency date_histogram_agg 14.3006 13.855 -0.44553 ms
99th percentile latency date_histogram_agg 15.9162 28.4331 12.5168 ms
100th percentile latency date_histogram_agg 16.4133 37.9943 21.5809 ms
50th percentile service time date_histogram_agg 12.3155 11.9196 -0.39588 ms
90th percentile service time date_histogram_agg 12.6337 12.2359 -0.39774 ms
99th percentile service time date_histogram_agg 14.3141 27.0351 12.721 ms
100th percentile service time date_histogram_agg 15.1024 36.3977 21.2953 ms
error rate date_histogram_agg 0 0 0 %
Min Throughput desc_sort_tip_amount 0.502272 0.502143 -0.00013 ops/s
Mean Throughput desc_sort_tip_amount 0.503736 0.503524 -0.00021 ops/s
Median Throughput desc_sort_tip_amount 0.503399 0.503206 -0.00019 ops/s
Max Throughput desc_sort_tip_amount 0.506743 0.50636 -0.00038 ops/s
50th percentile latency desc_sort_tip_amount 35.634 35.4273 -0.20672 ms
90th percentile latency desc_sort_tip_amount 36.4391 36.0668 -0.37232 ms
99th percentile latency desc_sort_tip_amount 42.9616 41.6973 -1.26432 ms
100th percentile latency desc_sort_tip_amount 45.5025 42.6475 -2.85496 ms
50th percentile service time desc_sort_tip_amount 32.9943 32.7031 -0.29126 ms
90th percentile service time desc_sort_tip_amount 33.5623 33.1555 -0.40684 ms
99th percentile service time desc_sort_tip_amount 40.3316 38.9371 -1.39457 ms
100th percentile service time desc_sort_tip_amount 42.2373 39.7858 -2.45145 ms
error rate desc_sort_tip_amount 0 0 0 %
Min Throughput asc_sort_tip_amount 0.503111 0.503087 -2e-05 ops/s
Mean Throughput asc_sort_tip_amount 0.505123 0.50508 -4e-05 ops/s
Median Throughput asc_sort_tip_amount 0.504659 0.50462 -4e-05 ops/s
Max Throughput asc_sort_tip_amount 0.509265 0.509189 -8e-05 ops/s
50th percentile latency asc_sort_tip_amount 8.64147 8.78618 0.14471 ms
90th percentile latency asc_sort_tip_amount 9.20375 9.31272 0.10897 ms
99th percentile latency asc_sort_tip_amount 21.0795 10.2617 -10.8178 ms
100th percentile latency asc_sort_tip_amount 32.0886 10.2739 -21.8147 ms
50th percentile service time asc_sort_tip_amount 5.80252 5.91895 0.11642 ms
90th percentile service time asc_sort_tip_amount 6.20851 6.25749 0.04897 ms
99th percentile service time asc_sort_tip_amount 18.1958 7.16934 -11.0265 ms
100th percentile service time asc_sort_tip_amount 29.2886 7.17124 -22.1174 ms
error rate asc_sort_tip_amount 0 0 0 %

@kumargu
Copy link
Contributor

kumargu commented Dec 16, 2024

Listing down my meeting notes with @asonje

Tenets

  1. Performance: There should not be more than single digit millisecond impact to search queries (let's keep it as goal and iterate backwards from there). Similarly, there should not be more then 5% impact to indexing throughout
  2. There should be minimal impact to mem, disk, cpu usage.
  3. When it comes to security, data at rest at all known rest storage should be encrypted
  4. All existing features must continue to work with this feature.
  5. Key (master key) rotation may not be a requirement right now; but the design decisions must be taken in view key rotation feature can be supported ideally without the need of reindexing data (take motivation from the Solr implementation)

High level items pending in this PR

  1. Bring in optimizations to reduce search latencies (which, right now are above expected baselines)
  2. Design ideas on how master key rotation via a customer action would not need a reindexing.
  3. Check if crypto-fs can be deleted if customer has lost their master key. (this is a important security requirement). However there should be safeguards in place to identify if the caller can deleted data for the lost key.
  4. Snapshots in remote store (S3, Azure blob etc) for the encrypted index must also be encrypted with the same CMK with which the index was encrypted at Opensearch rest. This is a must to have requirement. In lack of this, we loose the definition of encryption at rest the remote store rest; as any user can now read all the data from snapshot while they are not supposed to.
  5. Ability to recover from snapshots (should work by default)
  6. When using Remote backed storage, the indices to be encrypted must be also encrypted at the remote storage (rest)
  7. Support for FIPS (asonje@ confirmed its already supported).
  8. Backoff when KMS is down

Todo

(@kumargu)
Write a doc describing how fscrypt can be useful here also listing down its limitations. On a high level, i think fscrypt can meet perf requirement, and give us ability to delete crypto fs in lack of a key (lost).

@kumargu
Copy link
Contributor

kumargu commented Jan 6, 2025

@asonje checking if you have bandwidth to resume working on this. thanks.

@asonje
Copy link
Author

asonje commented Jan 7, 2025

@kumargu yes i do. I am currently working on some optimizations and will update you once its ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request RFC Issues requesting major changes security Anything security related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On The Fly Encryption Feature Proposal
9 participants