-
Notifications
You must be signed in to change notification settings - Fork 217
Description
Enter an issue title
Fix Cacheddownloader's directly handling glitches
Summary
The file_cache class in Cacheddownloader has a method that manages the oldEntries (such which are still in use, but there are newer versions for the object).
The code looks like this
func (c *FileCache) updateOldEntries(logger lager.Logger, cacheKey string, entry *FileCacheEntry) {
if entry != nil {
if !entry.inUse() && entry.ExpandedDirectoryPath != "" { //AAAAAAAAAAAA
// put it in the oldEntries Cache since somebody may still be using the directory
c.OldEntries[cacheKey+entry.ExpandedDirectoryPath] = entry
} else {
// We need to remove it from oldEntries
delete(c.OldEntries, cacheKey+entry.ExpandedDirectoryPath)
}
}
}
The problem is that if entry.inUse() == true
and entry.ExpandedDirectoryPath != ""
then else
clause is executed, and it is deleted from OldEntries.
Later, in CloseDirectory
entry := c.Entries[cacheKey]
if entry != nil && entry.ExpandedDirectoryPath == dirPath {
...
entry.decrementDirectoryInUseCount()
return nil
}
entry = c.OldEntries[cacheKey+dirPath]
if entry == nil {
return EntryNotFound
}
if someone wants to remove the older version of the entry - the first if
will not match, and i the second - the entry will not be found as it was already deleted
Steps to Reproduce
It was not caught by the existing tests, because they do not test the only way how AddDirectory
is used (as of now) in the code. And that is:
- Call GetDirectory, which returns empty cache or the existing entry. And in both cases increments the
directoryInUseCount
- Call AddDirectory, which does not manage
directoryInUseCount
. What it does is that if it was an existing entry, it decrements the usage of the old one. And then calls the flawed method to decide what to do
So in essence this is the scenario where a multiple containers use some dependency, this dependency is updated (in our case one buildpack) and then some of the containers is updated, thus - triggering the addition of a new version of the Cache Entry.
Afterwards all of the other containers are not able to stop any more
Diego repo
https://github.com/cloudfoundry/cacheddownloader
Environment Details
Additional Text Output, Screenshots, contextual information (optional)
The impact of this in our usecase is following:
- During the update of the CF-Deployment, a new version of the buildpack-lifecycle comes, which is used by most apps
- Afterwards, a lot of internal applications are redeployed with their new versions
- All of them fail to stop, thus decreasing the free capacity left for other applications
- In some cases this exhausts all the available free memory and applications can no longer be re/started
- In all cases, restarting the
rep
s that hold such unstoppable containers leads to waiting up to therep_evacuation_timeout
property (set to 15 min in our env) thus - increasing 2-3x the time for updating the diego-cells