Skip to content

Comments

Fix internal cluster and single node security tests#121466

Merged
slobodanadamovic merged 28 commits intoelastic:mainfrom
slobodanadamovic:sa-fix-internal-cluster-tests
Feb 16, 2025
Merged

Fix internal cluster and single node security tests#121466
slobodanadamovic merged 28 commits intoelastic:mainfrom
slobodanadamovic:sa-fix-internal-cluster-tests

Conversation

@slobodanadamovic
Copy link
Contributor

@slobodanadamovic slobodanadamovic commented Jan 31, 2025

This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

  • The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.
  • The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves #121022
Resolves #121096
Resolves #121101
Resolves #120988
Resolves #121108
Resolves #120983
Resolves #120987
Resolves #121179
Resolves #121183
Resolves #121346
Resolves #121151
Resolves #120985
Resolves #121039
Resolves #121483
Resolves #121116
Resolves #121258
Resolves #121486

@slobodanadamovic slobodanadamovic changed the title Fix internal cluster tests Fix internal cluster and single node security tests Feb 4, 2025
@slobodanadamovic slobodanadamovic added >test Issues or PRs that are addressing/adding tests :Security/Security Security issues without another label Team:Security Meta label for security team auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.1 v9.0.1 labels Feb 4, 2025
@slobodanadamovic slobodanadamovic requested a review from a team February 4, 2025 09:29
@slobodanadamovic slobodanadamovic merged commit 369c641 into elastic:main Feb 16, 2025
22 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 121466

slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

- The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the elastic#120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.

 - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves elastic#121022
Resolves elastic#121096
Resolves elastic#121101
Resolves elastic#120988
Resolves elastic#121108
Resolves elastic#120983
Resolves elastic#120987
Resolves elastic#121179
Resolves elastic#121183
Resolves elastic#121346
Resolves elastic#121151
Resolves elastic#120985
Resolves elastic#121039
Resolves elastic#121483
Resolves elastic#121116
Resolves elastic#121258
Resolves elastic#121486

(cherry picked from commit 369c641)

# Conflicts:
#	muted-tests.yml
#	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java
slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

- The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the elastic#120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.

 - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves elastic#121022
Resolves elastic#121096
Resolves elastic#121101
Resolves elastic#120988
Resolves elastic#121108
Resolves elastic#120983
Resolves elastic#120987
Resolves elastic#121179
Resolves elastic#121183
Resolves elastic#121346
Resolves elastic#121151
Resolves elastic#120985
Resolves elastic#121039
Resolves elastic#121483
Resolves elastic#121116
Resolves elastic#121258
Resolves elastic#121486

(cherry picked from commit 369c641)

# Conflicts:
#	muted-tests.yml
@slobodanadamovic
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x
9.0
8.18

Questions ?

Please refer to the Backport tool documentation

slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

- The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the elastic#120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.

 - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves elastic#121022
Resolves elastic#121096
Resolves elastic#121101
Resolves elastic#120988
Resolves elastic#121108
Resolves elastic#120983
Resolves elastic#120987
Resolves elastic#121179
Resolves elastic#121183
Resolves elastic#121346
Resolves elastic#121151
Resolves elastic#120985
Resolves elastic#121039
Resolves elastic#121483
Resolves elastic#121116
Resolves elastic#121258
Resolves elastic#121486

(cherry picked from commit 369c641)

# Conflicts:
#	muted-tests.yml
#	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java
elasticsearchmachine pushed a commit that referenced this pull request Feb 17, 2025
This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

- The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.

 - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves #121022
Resolves #121096
Resolves #121101
Resolves #120988
Resolves #121108
Resolves #120983
Resolves #120987
Resolves #121179
Resolves #121183
Resolves #121346
Resolves #121151
Resolves #120985
Resolves #121039
Resolves #121483
Resolves #121116
Resolves #121258
Resolves #121486

(cherry picked from commit 369c641)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Feb 17, 2025
…122732)

* Fix internal cluster and single node security tests (#121466)

This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

- The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.

 - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves #121022
Resolves #121096
Resolves #121101
Resolves #120988
Resolves #121108
Resolves #120983
Resolves #120987
Resolves #121179
Resolves #121183
Resolves #121346
Resolves #121151
Resolves #120985
Resolves #121039
Resolves #121483
Resolves #121116
Resolves #121258
Resolves #121486

(cherry picked from commit 369c641)

# Conflicts:
#	muted-tests.yml
#	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java

* fix compilation error
elasticsearchmachine pushed a commit that referenced this pull request Feb 17, 2025
…122734)

* Fix internal cluster and single node security tests (#121466)

This PR fixes SecuritySingleNodeTestCase and ProfileIntegTests tests.

- The security single node test failures are solved by ensuring every test starts with security index created and available. This is in order to have consistent state for every test. With the changes introduce in the #120323 PR, only the first test would execute with .security index being created async. Subsequent tests would execute without security index creation due to the fact that whole cluster is wiped after each test. This caused a flakiness only for the first test, because there was no mechanism in place to ensure that the .security index is active before test execution.

 - The profile integration tests are solved by introducing an anonymous role which don't have application privileges. The application privileges are resolved from the .security index and assigned to all users, including the es_test_root user which is used during cluster wiping. Due to asynchronous nature of cluster setup and .security index creation, this now causes flakiness. The main problem is that wiping is done asynchronously and uses es_test_root which had assigned anonymous rac_role which depends on .security index being available for search in order to resolve application privileges. The application privilege resolution is done in buildRoleFromDescriptors which currently does not wait for security index availability(can be improved - but still wouldn't fix internal cluster tests). This wasn't a problem before just because we simply return empty results when .security index does not exist. There is some complexity in making internal clusters wait for availability of security shards before the test, so I think this solution is acceptable given that it's not required for this tests to have anonymous role with application privileges.

Resolves #121022
Resolves #121096
Resolves #121101
Resolves #120988
Resolves #121108
Resolves #120983
Resolves #120987
Resolves #121179
Resolves #121183
Resolves #121346
Resolves #121151
Resolves #120985
Resolves #121039
Resolves #121483
Resolves #121116
Resolves #121258
Resolves #121486

(cherry picked from commit 369c641)

# Conflicts:
#	muted-tests.yml
#	x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java

* fix compilation error
ankit--sethi added a commit to ankit--sethi/elasticsearch that referenced this pull request Jul 8, 2025
ankit--sethi added a commit to ankit--sethi/elasticsearch that referenced this pull request Jul 11, 2025
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 14, 2025
…king

* upstream/main: (33 commits)
  Allow both WithEntitlementsOnTestCode and EntitledTestPackages together (elastic#130826)
  Move streams status actions to cluster:monitor group (elastic#131015)
  Update JDK base image for OIDC fixture (elastic#131176)
  Mute org.elasticsearch.xpack.esql.ccq.MultiClustersIT testLookupJoinAliases elastic#131166
  Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDiskSpaceTests testEnqueuedMergeTasksAreUnblockedWhenEstimatedMergeSizeChanges elastic#131165
  Mute org.elasticsearch.xpack.esql.ccq.MultiClustersIT testNotLikeListKeyword elastic#131155
  Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#131154
  Check file entitlements on the Lucene FilterFileSystem in tests (elastic#130825)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {lookup-join.MvJoinKeyOnFromAfterStats ASYNC} elastic#131148
  Move FrequencyCappedAction to common package (elastic#131060)
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672
  Remove nesting from multi allocation decision (elastic#130844)
  Disable async search rest tests in release builds (elastic#131132)
  Fix testStopQueryLocal (elastic#131130)
  Fixes based on resharding disruption tests (elastic#130870)
  Remove inactive logger (elastic#131121)
  Add wait for remote start for the test (elastic#131124)
  Add existing shards allocator settings to failure store allowed list. (elastic#131056)
  Don't allow field caps to use semantic queries as index filters (elastic#131111)
  issue should be already fixed by elastic#121466 (elastic#130860)
  ...
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment