Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
d4d6912
HDDS-6084. Handle upgrades to version supporting S3 multi-tenancy
smengcl Jan 24, 2022
d6effa7
Weave aspect; Add layout feature limit to tenant read-only requests, …
smengcl Jan 24, 2022
7fcc689
Add annotation to the rest of the tenant write requests.
smengcl Jan 25, 2022
10c3bac
ElementType.CONSTRUCTOR doesn't work; cleanup.
smengcl Jan 25, 2022
32dcfe2
checkLayoutFeature should work with all methods with OM as the first …
smengcl Jan 25, 2022
18ab2aa
Add test; checkstyle.
smengcl Jan 25, 2022
977f1f9
Address test failures.
smengcl Jan 26, 2022
59859ea
Add TODO, auto-generate weaver XML in the future.
smengcl Jan 28, 2022
074dada
Integration test: trigger finalization from the client rather than di…
smengcl Feb 2, 2022
c8466ad
Integration check: preFinalizationCheck / cleanup.
smengcl Feb 2, 2022
b4010ac
Cleanup addressed comment.
smengcl Feb 2, 2022
57af2cf
More restrictive checking of JoinPoint in checkLayoutFeature.
smengcl Feb 2, 2022
62269f1
Acceptance test: add aws s3api testing before and after upgrade final…
smengcl Feb 4, 2022
3ce0326
QoL: Fix comment syntax in docker-config leading to annoying `WARNING…
smengcl Feb 4, 2022
e3baa68
QoL: Fix docker deprecate warning: `--no-ansi` -> `--ansi never`
smengcl Feb 4, 2022
993f1c2
Integration test: Move pre-finalization checks and finalization to cl…
smengcl Feb 4, 2022
6c37edb
Move all create operations to generate; Remove HDDS-6261 repro.
smengcl Feb 8, 2022
47c104c
Create custom key instead of relying on /opt/hadoop/ files.
smengcl Feb 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions hadoop-ozone/dist/src/main/compose/testlib.sh
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,8 @@ start_docker_env(){
create_results_dir
export OZONE_SAFEMODE_MIN_DATANODES="${datanode_count}"

docker-compose --no-ansi down
if ! { docker-compose --no-ansi up -d --scale datanode="${datanode_count}" \
docker-compose --ansi never down
if ! { docker-compose --ansi never up -d --scale datanode="${datanode_count}" \
&& wait_for_safemode_exit \
&& wait_for_om_leader ; }; then
[[ -n "$OUTPUT_NAME" ]] || OUTPUT_NAME="$COMPOSE_ENV_NAME"
Expand Down Expand Up @@ -235,7 +235,7 @@ execute_command_in_container(){
## @param List of container names, eg datanode_1 datanode_2
stop_containers() {
set -e
docker-compose --no-ansi stop $@
docker-compose --ansi never stop $@
set +e
}

Expand All @@ -244,7 +244,7 @@ stop_containers() {
## @param List of container names, eg datanode_1 datanode_2
start_containers() {
set -e
docker-compose --no-ansi start $@
docker-compose --ansi never start $@
set +e
}

Expand Down Expand Up @@ -280,9 +280,9 @@ wait_for_port(){

## @description Stops a docker-compose based test environment (with saving the logs)
stop_docker_env(){
docker-compose --no-ansi logs > "$RESULT_DIR/docker-$OUTPUT_NAME.log"
docker-compose --ansi never logs > "$RESULT_DIR/docker-$OUTPUT_NAME.log"
if [ "${KEEP_RUNNING:-false}" = false ]; then
docker-compose --no-ansi down
docker-compose --ansi never down
fi
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ OZONE-SITE.XML_ozone.om.address.omservice.om1=om1
OZONE-SITE.XML_ozone.om.address.omservice.om2=om2
OZONE-SITE.XML_ozone.om.address.omservice.om3=om3
OZONE-SITE.XML_ozone.om.ratis.enable=true
// setting ozone.scm.ratis.enable to false for now, as scm ha upgrade is
// not supported yet. This is supposed to work without SCM HA configuration
# setting ozone.scm.ratis.enable to false for now, as scm ha upgrade is
# not supported yet. This is supposed to work without SCM HA configuration
OZONE-SITE.XML_ozone.scm.ratis.enable=false
OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,7 @@ with_old_version_downgraded() {

with_new_version_finalized() {
_check_hdds_mlvs 2
# OM currently only has one layout version.
_check_om_mlvs 0
Comment on lines -67 to -68
Copy link
Contributor

@adoroszlai adoroszlai May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smengcl @errose28 maybe I'm missing something, but wouldn't this fail if we actually ran upgrade path test?

https://github.com/adoroszlai/hadoop-ozone/runs/6613025379#step:5:738

I think actual OM MLV for 1.2.0 and 1.2.1 is still 0, regardless of this change.

Copy link
Contributor Author

@smengcl smengcl May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Though in the latest feature branch it is still 0 (probably fixed when I was resolving conflicts when doing the merge master 3 days ago or some time ago):

https://github.com/smengcl/hadoop-ozone/blob/HDDS-4944/hadoop-ozone/dist/src/main/compose/upgrade/upgrades/non-rolling-upgrade/1.1.0-1.2.0/callback.sh#L47-L49

Not sure why I bumped it when doing this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the om version bump in with_new_version_finalized is intended. As after the finalization it should be bumped to the latest version. But in pre_finalized it should still be zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this callback applies when new version is 1.2.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say when I was merging master to the feature branch just 3 days ago, I had to bump the _check_om_mlvs further to 3 to pass acceptance (misc):

https://github.com/apache/ozone/blame/HDDS-4944/hadoop-ozone/dist/src/main/compose/upgrade/upgrades/non-rolling-upgrade/1.2.1-1.3.0/callback.sh#L74

But then I realized this is the one from 1.1.0 to 1.2.0.

Yup so in this case it should still be zero (in 1.2.0).

Copy link
Contributor Author

@smengcl smengcl May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the upgrade test.sh script was actually testing the latest compiled version rather than the actual 1.2.0 at the time, and didn't have 1.2.1-1.3.0 yet. And yes it should be reverted back to 0 now (if I haven't done that yet).

Thanks for catching this @adoroszlai !

Copy link
Contributor Author

@smengcl smengcl May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this in commit: 00a5007 (PR #3450)

_check_om_mlvs 1

validate old1
validate new1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Resource ../commonlib.robot
Test Timeout 5 minutes
Test Setup Run Keyword if '${SECURITY_ENABLED}' == 'true' Kinit test user testuser testuser.keytab

** Test Cases ***
*** Test Cases ***
Finalize SCM
${result} = Execute ozone admin scm finalizeupgrade
#Wait Until Keyword Succeeds 3min 10sec Should contain ${result} OM Preparation successful!
Expand Down
28 changes: 27 additions & 1 deletion hadoop-ozone/dist/src/main/smoketest/upgrade/generate.robot
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Documentation Generate data
Library OperatingSystem
Library BuiltIn
Resource ../commonlib.robot
Resource ../s3/commonawslib.robot
Test Timeout 5 minutes

*** Variables ***
Expand All @@ -29,5 +30,30 @@ Create a volume, bucket and key
Should not contain ${output} Failed
${output} = Execute ozone sh bucket create /${PREFIX}-volume/${PREFIX}-bucket
Should not contain ${output} Failed
${output} = Execute ozone sh key put /${PREFIX}-volume/${PREFIX}-bucket/${PREFIX}-key /opt/hadoop/NOTICE.txt
Execute and checkrc echo "${PREFIX}: key created using Ozone Shell" > /tmp/sourcekey 0
${output} = Execute ozone sh key put /${PREFIX}-volume/${PREFIX}-bucket/${PREFIX}-key /tmp/sourcekey
Should not contain ${output} Failed
Execute and checkrc rm /tmp/sourcekey 0

Create a bucket and key in volume s3v
${output} = Execute ozone sh bucket create /s3v/${PREFIX}-bucket
Should not contain ${output} Failed
Execute and checkrc echo "${PREFIX}: another key created using Ozone Shell" > /tmp/sourcekey 0
${output} = Execute ozone sh key put /s3v/${PREFIX}-bucket/key1-shell /tmp/sourcekey
Should not contain ${output} Failed
Execute and checkrc rm /tmp/sourcekey 0

Setup credentials for S3
# TODO: Run "Setup secure v4 headers" instead when security is enabled
Run Keyword Setup dummy credentials for S3

Try to create a bucket using S3 API
# Note: S3 API does not return error if the bucket already exists
${output} = Create bucket with name ${PREFIX}-bucket
Should Be Equal ${output} ${None}

Create key using S3 API
Execute and checkrc echo "${PREFIX}: key created using S3 API" > /tmp/sourcekey 0
${output} = Execute AWSS3APICli and checkrc put-object --bucket ${PREFIX}-bucket --key key2-s3api --body /tmp/sourcekey 0
Should not contain ${output} error
Execute and checkrc rm /tmp/sourcekey 0
2 changes: 1 addition & 1 deletion hadoop-ozone/dist/src/main/smoketest/upgrade/prepare.robot
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Resource ../commonlib.robot
Test Timeout 5 minutes
Test Setup Run Keyword if '${SECURITY_ENABLED}' == 'true' Kinit test user testuser testuser.keytab

** Test Cases ***
*** Test Cases ***
Prepare Ozone Manager
${result} = Execute ozone admin om prepare -id %{OM_SERVICE_ID}
Wait Until Keyword Succeeds 3min 10sec Should contain ${result} OM Preparation successful!
Expand Down
22 changes: 22 additions & 0 deletions hadoop-ozone/dist/src/main/smoketest/upgrade/validate.robot
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Documentation Smoketest ozone cluster startup
Library OperatingSystem
Library BuiltIn
Resource ../commonlib.robot
Resource ../s3/commonawslib.robot
Test Timeout 5 minutes

*** Variables ***
Expand All @@ -28,3 +29,24 @@ Read data from previously created key
${random} = Generate Random String 5 [NUMBERS]
${output} = Execute ozone sh key get /${PREFIX}-volume/${PREFIX}-bucket/${PREFIX}-key /tmp/key-${random}
Should not contain ${output} Failed
${output} = Execute and checkrc cat /tmp/key-${random} 0
Should contain ${output} ${PREFIX}: key created using Ozone Shell
Execute and checkrc rm /tmp/key-${random} 0

Setup credentials for S3
# TODO: Run "Setup secure v4 headers" instead when security is enabled
Run Keyword Setup dummy credentials for S3

Read key created with Ozone Shell using S3 API
${output} = Execute AWSS3APICli and checkrc get-object --bucket ${PREFIX}-bucket --key key1-shell /tmp/get-result 0
Should contain ${output} "ContentLength"
${output} = Execute and checkrc cat /tmp/get-result 0
Should contain ${output} ${PREFIX}: another key created using Ozone Shell
Execute and checkrc rm /tmp/get-result 0

Read key created with S3 API using S3 API
${output} = Execute AWSS3APICli and checkrc get-object --bucket ${PREFIX}-bucket --key key2-s3api /tmp/get-result 0
Should contain ${output} "ContentLength"
${output} = Execute and checkrc cat /tmp/get-result 0
Should contain ${output} ${PREFIX}: key created using S3 API
Execute and checkrc rm /tmp/get-result 0
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_DATANODE_PIPELINE_LIMIT;
import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY;
import static org.apache.hadoop.hdds.scm.pipeline.Pipeline.PipelineState.OPEN;
import static org.apache.hadoop.hdds.upgrade.HDDSLayoutFeature.INITIAL_VERSION;
import static org.apache.hadoop.ozone.upgrade.InjectedUpgradeFinalizationExecutor.UpgradeTestInjectionPoints.AFTER_COMPLETE_FINALIZATION;
import static org.apache.hadoop.ozone.upgrade.InjectedUpgradeFinalizationExecutor.UpgradeTestInjectionPoints.AFTER_POST_FINALIZE_UPGRADE;
import static org.apache.hadoop.ozone.upgrade.InjectedUpgradeFinalizationExecutor.UpgradeTestInjectionPoints.AFTER_PRE_FINALIZE_UPGRADE;
Expand Down Expand Up @@ -78,6 +77,7 @@
import org.apache.hadoop.ozone.client.io.OzoneOutputStream;
import org.apache.hadoop.ozone.container.common.interfaces.Container;
import org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine;
import org.apache.hadoop.ozone.om.upgrade.OMLayoutFeature;
import org.apache.hadoop.ozone.upgrade.BasicUpgradeFinalizer;
import org.apache.hadoop.ozone.upgrade.InjectedUpgradeFinalizationExecutor;
import org.apache.hadoop.ozone.upgrade.InjectedUpgradeFinalizationExecutor.UpgradeTestInjectionPoints;
Expand Down Expand Up @@ -155,8 +155,9 @@ public static void initClass() {
.setTotalPipelineNumLimit(NUM_DATA_NODES + 1)
.setHbInterval(500)
.setHbProcessorInterval(500)
.setScmLayoutVersion(INITIAL_VERSION.layoutVersion())
.setDnLayoutVersion(INITIAL_VERSION.layoutVersion());
.setOmLayoutVersion(OMLayoutFeature.INITIAL_VERSION.layoutVersion())
.setScmLayoutVersion(HDDSLayoutFeature.INITIAL_VERSION.layoutVersion())
.setDnLayoutVersion(HDDSLayoutFeature.INITIAL_VERSION.layoutVersion());

// Setting the provider to a max of 100 clusters. Some of the tests here
// use multiple clusters, so its hard to know exactly how many will be
Expand Down Expand Up @@ -219,7 +220,7 @@ private void createKey() throws IOException {
* Helper function to test Pre-Upgrade conditions on the SCM
*/
private void testPreUpgradeConditionsSCM() {
Assert.assertEquals(INITIAL_VERSION.layoutVersion(),
Assert.assertEquals(HDDSLayoutFeature.INITIAL_VERSION.layoutVersion(),
scmVersionManager.getMetadataLayoutVersion());
for (ContainerInfo ci : scmContainerManager.getContainers()) {
Assert.assertEquals(HddsProtos.LifeCycleState.OPEN, ci.getState());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,25 @@
import org.apache.hadoop.ozone.client.rpc.RpcClient;
import org.apache.hadoop.ozone.om.OMMultiTenantManagerImpl;
import org.apache.hadoop.ozone.om.exceptions.OMException;
import org.apache.hadoop.ozone.om.helpers.S3SecretValue;
import org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol;
import org.apache.hadoop.ozone.om.protocol.S3Auth;
import org.apache.hadoop.ozone.om.upgrade.OMLayoutFeature;
import org.apache.hadoop.ozone.upgrade.UpgradeFinalizer;
import org.apache.ozone.test.GenericTestUtils;
import org.apache.ozone.test.LambdaTestUtils;
import org.apache.ozone.test.LambdaTestUtils.VoidCallable;
import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.BeforeClass;
import org.junit.Test;

import java.io.IOException;
import java.util.UUID;
import java.util.concurrent.TimeoutException;

import static org.apache.hadoop.ozone.admin.scm.FinalizeUpgradeCommandUtil.isDone;
import static org.apache.hadoop.ozone.admin.scm.FinalizeUpgradeCommandUtil.isStarting;

/**
* Tests that S3 requests for a tenant are directed to that tenant's volume,
Expand All @@ -43,22 +55,107 @@ public class TestMultiTenantVolume {
private static MiniOzoneCluster cluster;
private static String s3VolumeName;

private static final String TENANT_NAME = "tenant";
private static final String USER_PRINCIPAL = "username";
private static final String BUCKET_NAME = "bucket";
private static final String ACCESS_ID = UUID.randomUUID().toString();

@BeforeClass
public static void initClusterProvider() throws Exception {
OzoneConfiguration conf = new OzoneConfiguration();
conf.setBoolean(
OMMultiTenantManagerImpl.OZONE_OM_TENANT_DEV_SKIP_RANGER, true);
MiniOzoneCluster.Builder builder = MiniOzoneCluster.newBuilder(conf)
.withoutDatanodes();
.withoutDatanodes()
.setOmLayoutVersion(OMLayoutFeature.INITIAL_VERSION.layoutVersion());
cluster = builder.build();
s3VolumeName = HddsClientUtils.getDefaultS3VolumeName(conf);

preFinalizationChecks(getStoreForAccessID(ACCESS_ID));
finalizeOMUpgrade();
}

@AfterClass
public static void shutdownClusterProvider() {
cluster.shutdown();
}

private static void expectFailurePreFinalization(VoidCallable eval)
throws Exception {
LambdaTestUtils.intercept(OMException.class,
"cannot be invoked before finalization", eval);
}

/**
* Perform sanity checks before triggering upgrade finalization.
*/
private static void preFinalizationChecks(ObjectStore store)
throws Exception {

// None of the tenant APIs is usable before the upgrade finalization step
expectFailurePreFinalization(
store::listTenant);
expectFailurePreFinalization(() ->
store.listUsersInTenant(TENANT_NAME, ""));
expectFailurePreFinalization(() ->
store.tenantGetUserInfo(USER_PRINCIPAL));
expectFailurePreFinalization(() ->
store.createTenant(TENANT_NAME));
expectFailurePreFinalization(() ->
store.tenantAssignUserAccessId(USER_PRINCIPAL, TENANT_NAME, ACCESS_ID));
expectFailurePreFinalization(() ->
store.tenantAssignAdmin(USER_PRINCIPAL, TENANT_NAME, true));
expectFailurePreFinalization(() ->
store.tenantRevokeAdmin(ACCESS_ID, TENANT_NAME));
expectFailurePreFinalization(() ->
store.tenantRevokeUserAccessId(ACCESS_ID));
expectFailurePreFinalization(() ->
store.deleteTenant(TENANT_NAME));

// S3 get/set/revoke secret APIs still work before finalization
final String accessId = "testUser1accessId1";
S3SecretValue s3SecretValue = store.getS3Secret(accessId);
Assert.assertEquals(accessId, s3SecretValue.getAwsAccessKey());
final String setSecret = "testsecret";
s3SecretValue = store.setS3Secret(accessId, setSecret);
Assert.assertEquals(accessId, s3SecretValue.getAwsAccessKey());
Assert.assertEquals(setSecret, s3SecretValue.getAwsSecret());
store.revokeS3Secret(accessId);
}

/**
* Trigger OM upgrade finalization from the client and block until completion
* (status FINALIZATION_DONE).
*/
private static void finalizeOMUpgrade()
throws IOException, InterruptedException, TimeoutException {

// Trigger OM upgrade finalization. Ref: FinalizeUpgradeSubCommand#call
final OzoneManagerProtocol client = cluster.getRpcClient().getObjectStore()
.getClientProxy().getOzoneManagerClient();
final String upgradeClientID = "Test-Upgrade-Client-" + UUID.randomUUID();
UpgradeFinalizer.StatusAndMessages finalizationResponse =
client.finalizeUpgrade(upgradeClientID);

// The status should transition as soon as the client call above returns
Assert.assertTrue(isStarting(finalizationResponse.status()));

// Wait for the finalization to be marked as done.
// 10s timeout should be plenty.
GenericTestUtils.waitFor(() -> {
try {
final UpgradeFinalizer.StatusAndMessages progress =
client.queryUpgradeFinalizationProgress(
upgradeClientID, false, false);
return isDone(progress.status());
} catch (IOException e) {
Assert.fail("Unexpected exception while waiting for "
+ "the OM upgrade to finalize: " + e.getMessage());
}
return false;
}, 500, 10000);
}

@Test
public void testDefaultS3Volume() throws Exception {
final String bucketName = "bucket";
Expand All @@ -79,31 +176,31 @@ public void testDefaultS3Volume() throws Exception {

@Test
public void testS3TenantVolume() throws Exception {
final String tenant = "tenant";
final String principal = "username";
final String bucketName = "bucket";
final String accessID = UUID.randomUUID().toString();

ObjectStore store = getStoreForAccessID(accessID);
store.createTenant(tenant);
store.tenantAssignUserAccessId(principal, tenant, accessID);
ObjectStore store = getStoreForAccessID(ACCESS_ID);

store.createTenant(TENANT_NAME);
store.tenantAssignUserAccessId(USER_PRINCIPAL, TENANT_NAME, ACCESS_ID);

// S3 volume pointed to by the store should be for the tenant.
Assert.assertEquals(tenant, store.getS3Volume().getName());
Assert.assertEquals(TENANT_NAME, store.getS3Volume().getName());

// Create bucket in the tenant volume.
store.createS3Bucket(bucketName);
OzoneBucket bucket = store.getS3Bucket(bucketName);
Assert.assertEquals(tenant, bucket.getVolumeName());
store.createS3Bucket(BUCKET_NAME);
OzoneBucket bucket = store.getS3Bucket(BUCKET_NAME);
Assert.assertEquals(TENANT_NAME, bucket.getVolumeName());

// A different user should not see bucket, since they will be directed to
// the s3 volume.
ObjectStore store2 = getStoreForAccessID(UUID.randomUUID().toString());
assertS3BucketNotFound(store2, bucketName);
assertS3BucketNotFound(store2, BUCKET_NAME);

// Delete bucket.
store.deleteS3Bucket(bucketName);
assertS3BucketNotFound(store, bucketName);
store.deleteS3Bucket(BUCKET_NAME);
assertS3BucketNotFound(store, BUCKET_NAME);

store.tenantRevokeUserAccessId(ACCESS_ID);
store.deleteTenant(TENANT_NAME);
}

/**
Expand All @@ -112,7 +209,7 @@ public void testS3TenantVolume() throws Exception {
* by the ObjectStore.
*/
private void assertS3BucketNotFound(ObjectStore store, String bucketName)
throws Exception {
throws IOException {
try {
store.getS3Bucket(bucketName);
} catch(OMException ex) {
Expand All @@ -131,7 +228,8 @@ private void assertS3BucketNotFound(ObjectStore store, String bucketName)
}
}

private ObjectStore getStoreForAccessID(String accessID) throws Exception {
private static ObjectStore getStoreForAccessID(String accessID)
throws IOException {
// Cluster provider will modify our provided configuration. We must use
// this version to build the client.
OzoneConfiguration conf = cluster.getOzoneManager().getConfiguration();
Expand Down
Loading