-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency issue in org.eclipse.equinox.weaving.caching might cause "ClassFormatError: truncated class file" #233
Comments
I am wondering why the lookup map idea isn't enough in case we remove the item from that lookup map AFTER writing the cache item to disk, but let's continue that discussion probably on the PR: #234 |
tjwatson
pushed a commit
that referenced
this issue
Mar 24, 2023
#233 Fix concurrency issue in Equinox Caching leading to 0-length byte array to be read Signed-off-by: Stefan Winkler <[email protected]>
tjwatson
pushed a commit
that referenced
this issue
Mar 24, 2023
#233 Bumped version of org.eclipse.equinox.weaving.caching to 1.2.300 Signed-off-by: Stefan Winkler <[email protected]>
fixed by #234 |
tjwatson
pushed a commit
that referenced
this issue
Apr 25, 2023
#233 Fix bug in original PR Signed-off-by: Stefan Winkler <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
If Equinox Weaving with AspectJ in conjunction with caching is used in a multithreaded setting, it can happen that a class file that is currently written to disk is read with 0 byte in a concurrent thread. This can lead to a ClassFormatError: truncated class file non-deterministically.
Setting / Environment
We are building an Eclipse RCP / IDE product that is tested on a virtualized CI environment on Windows VMs with RCPTT.
The IDE product itself is composed of multiple custom and 3rd party bundles that also perform a lot of background initialization; also the IDE product includes a small amount of Aspects and thus, includes the Equinox AspectJ load-time weaving and caching bundles.
The RCPTT test runtime uses Equinox AspectJ load-time weaving as well to hook into all kinds of SWT and JFace classes.
Our RCPTT test cases include a lot of restarts of the IDE product.
So basically now the product under test includes concurrent class loads, and Equinox AspectJ load-time weaving and caching with a larger number of Aspects and is restarted frequently.
The Symptom
Almost on every build a different test case fails with the root cause being a failed class load with the error java.lang.ClassFormatError: truncated class file.
Analysis
Since the bug is only reproducible on our CI, not on a local machine (most likely the timing on our CI VMs is more on the exotic side), live debugging is difficult to impossible. So I added logging statements here and there to nail down the root cause.
It turns our that BundleCachingService.storeClass() is called with a valid byte[] array. A few lines of log output later, BundleCachingService.findStoredClass() for the same class is called and results in a CacheEntry containing a 0-byte array.
That 0-byte array is propagated into the ClassLoader and results in the aforementioned ClassFormatError.
What happens internally:
Whenever a candidate class for Aspect Weaving is encountered, the cache checks whether a cache file for the woven class exists in the file system. If so, it is loaded.
If not, the class is woven and then given to the cache for storing. The cache does not store it directly (synchronously), but puts it in a writer queue from which a writer thread takes classes one by one to write to disk.
It can now happen that a class is already woven and exists in the cache writer queue, while another thread wants to load the same class and checks the file system, resulting in a cache miss. In this case, the class is even woven twice (still no problem, only a small performance penalty).
But in a rarer case, the writer thread could currently be in the process of writing the file to disk (so, e.g., an empty file could already exist, just the content is not yet flushed), while another thread checks the cache and file system. This can lead to a 0-byte read, and thus, a 0-byte array.
Solution
I had a short discussion with @martinlippert and from that, I have tried to first implement a lookup map for classes that are currently in the writer queue. This solves the issue of weaving the same class multiple times, because even if the class is not yet written to disk, we have a cache hit and can already reuse that class in the writer queue.
This implementation made things a bit better, but still resulted in the same symptom; it only became rarer.
The problem is most likely that there is no synchronization between cache reads and writes, and it can be the case that the reader has a cache miss, and immediately at that point (before the reader checks the file system), the writer is given the same class to store and it directly starts to store it, which still can result in an incomplete file being present when the reader now checks the file system.
So, I implemented a synchronization mechanism in addition to the queue lookup. This synchronization uses one RWLock per class name, so that we still have optimal throughput as long as no conflicting access occurs.
With this implementation, the RCPTT tests are now rock-solid.
The text was updated successfully, but these errors were encountered: