-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-34426] Fix to handle SECURITY-243 #34
Conversation
This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation. |
In ActiveDirectoryUserDetail.updateUserInfo() you already know the users ID so should not call |
@jtnord If I use your suggestion then I need to have jenkins as bigger version jenkins-1.651.2 in the pom.xml as your fix is very recent. Do you suggest to do this? |
@fbelzunc I would keep the original fix then and include a |
well the problem with keeping the original is IIUC that you will constantly be hitting The plugin implementation without the latest fix is wrong, if there is any call for a TODO IMO then it should be in the pom.xml to remove the HPI hack for the lowest compatible core to be removed once the company that is behind the maintainance of this plugin no longer supports < 1.651.2. |
when I was originally discussing implications with @jglick I recall the pom/hpi hack was what he suggested - so ping'ing him in case I remember incorrectly. |
<index>true</index> | ||
<manifestEntries> | ||
<Hudson-Version>1.609.18.1</Hudson-Version> | ||
<Jenkins-Version>1.609.18.1</Jenkins-Version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm…rather dangerous. It would be much too easy to inadvertently introduce a dependency on some unrelated API added between 1.609 and 1.651.
(In jenkinsci/pipeline-plugin#197 I used this trick only for a baseline swap within an LTS line, where the risk of such an added API was quite low.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather suggest reflection for such cases:
try { // TODO 1.651.2+ remove reflection
return (hudson.model.User) hudson.model.User.class.getMethod("getById", String.class, boolean.class).invoke(getUsername(), true);
} catch (NoSuchMethodException x) {
// fine, older baseline
} catch (Exception x) { // unexpected
LOGGER.log(Level.WARNING, null, x);
}
return hudson.model.User.get(getUsername());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+∞ on using reflection with a //TODO
to replace with direct call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem with reflection is we must use the new call. so it is not fine if it does not exist. (return hudson.model.User.get(getUsername());
will cause a stack trace detailed in JENKINS-34426)
You would still need to set a baseline of jenkins to use. so what would you use?
1.651.2 - can't plugin won;t be available for Cloudbees customers.
1.609.18.2 - can't plugin won't be buildable as we do not publish cloudbees propriatary wars...
1.609.3 - well will still blow up when the API is not available via reflection...
So you still will end up installing this plugin on something without the core update which will break the plugin. so given you need to barf we may as well just barf and keep code the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you install it on something without the core update the reflection will not find the method and fall through to the old code path, so no worse than today. On the core versions with the method, reflection will find and invoke the method and all will be well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course one problem with a reflection-based approach is that hpi:run
and any automated in-plugin tests will be using an old core that does not reflect the behavior of current users. Acceptance tests would be the only things catching “facepalm” regressions.
What is the root cause here? |
@jglick right |
@jglick Exactly, the cache is not reentrant. |
// fine, older baseline | ||
} catch (Exception e) { // unexpected | ||
LOGGER.log(Level.WARNING, String.format("There was a problem obtaining the Jenkins user %s by Id", getUsername()), e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe InvocationTargetException
should be caught to unwind any legitimate exception getById
may throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andresrc Are you sure it is necessary? getById
should not throw any exception. So if .invoke
fails you will just go tot the catch of Exception
to record it. Is it not fine?
I would keep the |
@@ -330,7 +330,7 @@ public UserDetails call() throws AuthenticationException { | |||
closeQuietly(context); | |||
} | |||
} | |||
}); | |||
}).updateUserInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will now cause the user info to be updated always. Previously we only updated the info if the value was not in the cache
I think this reintroduces a 🐛 whereby the user details are continually being saved with no actual changes. You probably want to have a better yet you could revert the type changes in the cache and just have a |
@@ -312,11 +317,12 @@ public UserDetails call() throws AuthenticationException { | |||
Set<GrantedAuthority> groups = resolveGroups(domainDN, dnFormatted, context); | |||
groups.add(SecurityRealm.AUTHENTICATED_AUTHORITY); | |||
|
|||
return new ActiveDirectoryUserDetail(username, password, true, true, true, true, groups.toArray(new GrantedAuthority[groups.size()]), | |||
cacheMiss[0] = new ActiveDirectoryUserDetail(username, password, true, true, true, true, groups.toArray(new GrantedAuthority[groups.size()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not look thread safe to me. cacheMiss is a class level field...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I asked for a method scoped field
🐛 for the threadsafety issue.
I would disagree :-) |
/** | ||
* To determinate if we need or not to update the User Information | ||
*/ | ||
final ActiveDirectoryUserDetail[] cacheMiss = new ActiveDirectoryUserDetail[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be a local variable in retrieveUser
to avoid threading issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 I said add it at line 322 (now 319) not at the class level
Partially incorrect[1] - the code checks and only makes changes to the See - ActiveDirectoryUserDetail.updateUserDetails() Now that code is not correctly checking - and if you get a cache miss due to the cache timeout etc then you are still also making saves for no valid reason as there is no check that the email is different from what is stored. So the bug (unneeded saves) is still present - perhaps this should just be addressed. [1] partially incorrect - as the bug is still there so not re-introduced - just hit more often :) |
Normally I wouldn't agree with myself as well :), but in this specific one updates are only performed in some very specific cases, not always as @stephenc said and provides a safety net against core change behaviors like the one triggering this fix. In any case, detecting the cache miss (correctly) is much better. |
the whole cachemiss thing looks very threading issue prone to me if it is done outside the loop. What if the same user is attempting to login from 2 different computers simultaneously (say automation tooling)? Due to threading nature the User object for one could be returned before it is correctly updated (even created!) if the thread that populates the cache is paused before it updates the user. |
@jtnord nit but true. The alternative would be detecting the "reentrancy" at the |
It would be if you could gaurantee that a cache hit for the same user would not overtake the cache miss and population before creating the user object in |
@jtnord you are right |
@andresrc we have not had any re-entry so far - so that bug just does not happen. Big bugs like re-entry are easy to find and fix. Bugs due to threading issues are a PITA to uncover. |
@jtnord agree |
FFS cacheMiss should be a method scoped final variable |
@@ -97,13 +97,18 @@ | |||
/** | |||
* The {@link UserDetails} cache. | |||
*/ | |||
private final Cache<String, UserDetails> userCache; | |||
private final Cache<String, ActiveDirectoryUserDetail> userCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be a UserDetail, I suggest reverting this
reflection is not done correctly - the method is static so need to start by null |
@@ -177,6 +178,19 @@ public static long getSerialVersionUID() { | |||
* Gets the corresponding {@link hudson.model.User} object. | |||
*/ | |||
public hudson.model.User getJenkinsUser() { | |||
try { // TODO 1.651.2+ remove reflection | |||
return (hudson.model.User) hudson.model.User.class.getMethod("getById", String.class, boolean.class).invoke(getUsername(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invoke(null, getUsername(), true)
🐛 in reflection |
@stephenc @jtnord @andresrc I prefer not to use the Now, I am just using reflection and correctly using Can I have your final review, please? I think now should be fine. |
🐛 put the cacheMiss stuff back, when it is a method scoped variable there is no race condition as the closure does not escape the life of the method call. James was objecting to the use of an instance field |
@stephenc I don't like a lot to have that - but I am doing it right now :-) |
@fbelzunc you had it right and then you decided to remove for no clear reason. There are other ways to achieve the same, eg if you created the closure as a variable then the closure could have a field to hold the return value. But ultimately we need the update to be defanged from the code path it was on and the final array trick is the smallest change to affect this |
@stephenc It is now update with the cacheMiss - I want to release this before it kicks the ass of a lot of people :-) |
🐝 |
No stephen - you had it wrong. THere is also a reace condition here as the and only when there is a cache miss will the jenkins user be created and On 18 May 2016 at 20:05, Stephen Connolly [email protected] wrote:
|
🐛 as there is a race condition on |
@jtnord Happy to do it - I can ping you tomorrow morning. |
If there re two threads they will both have the same object and one of them will have had a cache miss and hence do the update. Since a user "could" be updating their user record in parallel with a slow page render for that same user from a different browser, this is an exactly analogous case. Yes it looks like a race, but Jenkins doesn't care about or provide a framework to prevent concurrent modification, so it should not be an issue |
I don't really care about concurrent modification per-say (last wins so long as there is one) - e.g. the case where something gets an unpopulated |
So the case you are concerned about requires:
And even then it requires a pre-emptive interrupt of the first thread (as it's not going to yield on releasing the lock) Two of those are, IMHO, exceedingly difficult. The likely case for the second thread's page render is a glitch, or worst a stack trace that gets cleared on a page reload or worst case a logout/login again The other side is a definite recursion issue that has been encountered. If we want to "do the completely right thing" then we need to come at this from a different line of attack. You have not convinced me that the recursion (which will likely repeat for that user as they were unable to login and hence will re-trigger on subsequent attempts) is the lesser evil |
does anyone run on a single core server these days :-)
not really - you just need to have more threads than CPU cores in a runnable state and that is not so hard. When the JVM switches to a different thread is non deterministic, so long as there is a single thread that is waiting to run any other thread could be paused...
It may not be rendering a web page - it could be performing a a REST operation - a HTTP/503 response should not be retried by a tool.
It is - but only because we actively changed core to call into the authentication realm on Up to now there has not been any other recursion issues (it would have caused a spinning thread or a stack trace - and there are no reports of this as far as I am aware)
Which has been fixed by using the new API
it's not a lesser evil - but with the fix we should have squashed the recursive call - so there should not be a possibility of recursion. Yes someone in the future could change something again - but we can then change the code again, but right now I see the chance of recursion as zero and the chance of the threading issue as very very very tiny. Very very very tiny is bigger than zero - hence my concern that what we are doing here is wrong. So the lesser evil to me is to need to make the code change again in the future to fix a bug that does not exist today vs accepting a bug that exists today but is extremely hard to reproduce - let alone understand from any stack trace you may get.. |
So the contract for There are other security realms that return a So what difference does it make if we return the The only difference is that other Jenkins code may race to instantiate the |
🐝 |
@reviewbybees done |
704e317
to
ed81365
Compare
[JENKINS-62317]: Upgrade dependencies pre 2.3 release
Stacktrace
and it is produced because of SECURITY-243 - https://github.com/jenkinsci/jenkins/blob/jenkins-2.3/core/src/main/java/hudson/model/User.java#L1048 - Which is producing a recursive load.
@reviewbybees @andresrc @jtnord