Skip to content

Conversation

@lennyphan
Copy link
Member

@lennyphan lennyphan commented Mar 27, 2021


NOTE: This pull is superseded by pull #2344


In this PR:

  1. For Domain home source type MII, configuration overrides are generated using WDT model filter. Model filter implementation follows the same logic for generating situational configuration in introspectDomain.py's SitConfigGenerator. Domain home source types Domain-on-PV and Domain-in-Image will continue to use situational configuration overrides.

  2. Added ability to run python based unit tests for model filter implementation.

Passed all MII integration tests except for the single test failure: oracle.weblogic.kubernetes.ItMiiDynamicUpdate.testMiiRemoveTarget which matches Integration Tests Nightly Results as of (03/26)

addEnvVarIfTrue(mockWls(), vars, "MOCK_WLS");

if (getDomainHomeSourceType() == DomainSourceType.FromModel) {
addEnvVar(vars, ServerEnvVars.FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR, "false");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to change our pod hash (weblogic.sha256) which means that any MII pod that is running when the operator is upgraded will be rolled. Can this instead be set in the pod start script (startServer.sh).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify the scenario you're describing for my understanding, this is when upgrading the Operator from a previous MII domain that uses sit config overrides to this version which uses the model filter? The one place that uses this environment is in startNodeManager.sh (as a server startup argument) and I couldn't determine how we know the domain source type is 'FromModel' (MII). serverStart.sh does:

if [ -f /weblogic-operator/introspector/domainzip.secure ]; then
...
fi

I think this should work. Let me make the change and verify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a customer is using operator 3.1.4 and has a MII domain. The admin server and some number of managed servers are running. The customer upgrades to 3.2.0 containing your work. I would expect that pods do not roll simply because I upgrade the operator.

Thinking more deeply about this... The change to add the extra environment variable would have caused the existing pods to roll. I think that we have a workaround in that we can set this in startServer.sh. However, we likely have an earlier issue... Your change to not generate config overrides will mean that the generated domain home will be different. This change would also cause existing pods to roll, correct? If so, we should talk together with @jshum2479 and see if we can come up with a way to roll this out so that existing pods are not rolled or restarted, but newly created pods come up with the new pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add logic in start script to reset the env when the DOMAIN_SOURCE_TYPE is FromModel.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like it'd hurt to leave it alone. It'd become a NO-OP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR is for the core config override logic to failed to start the server if the override file is wrong, the start server script already have the logic to set it to true if the environment is not set, the env is for the case if user choose to override it.

I am not seeing a reason to set it for mii since we are using override and it will cause the roll because of the pod spec hash changed because the newly introduced environment variable.

Copy link

@tbarnes-us tbarnes-us Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall point is that there's no need to change or add the old env var setting in this pull. We don't care what it's set to in the new path...

Copy link
Member Author

@lennyphan lennyphan Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in startNodeManager.sh, we set FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR system property to 'true' by default. Managed servers fail to start/boot with the following error:

<Mar 29, 2021 8:22:23 PM GMT> <The SitConfigRequired configuration element is set to true, but no situational config files were found.>
<Mar 29, 2021 8:22:23 PM GMT> <Server failed. Reason: 141327>
<Mar 29, 2021 8:22:23 PM GMT>

The validation in WLS is in SituationalConfigManagerImpl:
if (isSitConfigRequired() && unexpiredFiles.isEmpty()) {
if (debugLogger.isDebugEnabled()) {
debugLogger.debug( "[SitConfig] isSitConfigRequired is true and no Sit Config Files exist. Failing boot.");
}
String l = ManagementLogger.logSituationalConfigRequired();
throw new ServiceFailureException(l);
}

So it looks like we should either set it to 'false' or not define it for MII.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that we're not supporting custom or internal sit-cfg in MII regardless, so there won't be any sit-cfg that can fail, so the flag should have no effect. If removing it is safe (doesn't cause a roll), then feel free to remove it. Or feel free to leave it untouched.


if istio_enabled == 'true':
name = nap.getName()
if name.startswith('http-') or name.startswith('tcp-') or name.startswith('tls-') or name.startswith('https-'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be fragile? The customer could have a nap name that starts with one of these prefixes. Do we have some earlier validation that ensures that the only names with these prefixes would be naps that were edited by our Istio functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I couldn't find any validation for naps that have these prefixes edited by our Istio functionality. @jshum2479 is there validation for these naps that I didn't see?

Copy link
Member

@jshum2479 jshum2479 Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is to write out the topology in the introspector output (not config override), it is trying to format the name in istio ways. If the customer already have it that way and you skipped writing to the topology yaml, then they won't be setup in the pod correctly, perhaps changing the logic to not prepend the protocol again instead of returning.

@rjeberhard
Copy link
Member

Wow, @lennyphan, this is a ton of work. I think I'm going to need a "show-and-tell" since I don't know enough to review the filter and changes to the introspector script. I want to make sure that you've accounted for all of the situational configuration and Istio changes. I suspect that you have, but can't tell.

On the sit cfg environment variable, unfortunately, we've just been through learning and relearning why kind of changes cause operator upgrade bugs (this is where upgrading the operator causes problems to already running servers, including unexpected restarts). Anything that changes the generated pod model, including adding environment variables to the container, will do this. As noted, you could instead change the startServer.sh script that we inject.

I've delayed releasing 3.2.0 until we resolve one other upgrade bug, which will likely be done late Monday or Tuesday, so we could potentially get this change in :). I will be really happy to tell people about this improvement!

@@ -0,0 +1,243 @@
import os
import unittest
import yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which interpreter are we using here for package yaml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been running with 2.7.1 and I believe that's the version running with WLST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anyone who tries to build it needs to update the python to install the yaml package. I got import error - No module named yaml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Any suggestions on alternative? I know there is a yaml parser in WDT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to topology.yaml, we now also have topology.json. @ankedia added this to make it easier to consume the topology from scripts. Could you use this and jq?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the first time we introduce unit testing python into the test. WDT has it's own yaml parser, I don't think we want to introduce the dependency between two projects. If we agreed to build time python requirement, then we can just make it a requirement. @rjeberhard ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the build or test suite verify that you have the right dependency installed? Can we verify that it's possible to build on both Oracle Linux and MacOS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MacOS comes with Python 2.7 by default and it's possible, but a pain, to replace this with the Homebrew install of Python 3. What does this work require?


import inspect
import os
import sys
Copy link

@tbarnes-us tbarnes-us Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment 1: I suspect the life-cycle and/or architecture documentation describes the propagation of internal sit-cfg. If so, then these docs should be updated.

General comment 2: Does this change lift any of the restrictions detailed in 'runtime-updates.md' for MII? If so, then the doc should be updated. Note that if the update includes adding support for shrinking a cluster we (a) may want to have integration tests before officially documenting support, (b) may want to make sure the documentation clearly states that JTA and JMS are not particularly amenable to such shrinks.

@tbarnes-us
Copy link

General comment: We should update the 'runtime update' user doc to reflect that this will lift some of the restrictions. For example sit-cfg conflict with logHome (and not documented it turns out for dataHome). Also perhaps other restrictions. Johnny, you, and I should probably read through the 'update' doc together to see what should be changed. We will need integration tests for each lifted restriction before we document support.

@lennyphan

This comment has been minimized.


cp ${WDT_FILTER_JSON} ${WDT_ROOT}/lib/model_filters.json
cp ${WDT_CREATE_FILTER} ${WDT_ROOT}/lib
cp ${WDT_MII_FILTER} ${WDT_ROOT}/lib

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider appending || trace SEVERE "cp failed" && exitOrLoop to each of the cp commands (fail-fast)

Copy link

@tbarnes-us tbarnes-us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way cool!

@rjeberhard
Copy link
Member

@lennyphan, you need to adjust or recreate the PR to target the main branch.

@rjeberhard
Copy link
Member

I merged your branch with main and pushed to OWLS-86115-main-merge. You can merge that branch into yours and then update the target of this PR.

@rjeberhard rjeberhard changed the base branch from develop to main April 27, 2021 20:02
@rjeberhard
Copy link
Member

This PR was replaced.

@rjeberhard rjeberhard closed this Apr 30, 2021
@jshum2479 jshum2479 deleted the OWLS-86115 branch December 2, 2021 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants