-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: bndtools takes too long for SHA-256 and is not interruptable #5322
Comments
The issue is not the bnd code, it is the code that is called in the VM (it is confusing that you not show the next 2-3 levels?). Checking for interrupts will only make it slower in the normal case. |
When I have the opportunity, I'll provide a deeper trace. That said, I had opened it and it stopped at a I/O copy operation that was taking long. The machine I am working on unfortunately does not have a very fast I/O. I looked at the ResourceBuilder code and it seemed to me there that the calculation of the SHA-256 happens unconditionally and is also not subject to any type of caching. That may work ok in smaller workspaces and/or when you have a fast I/O. I believe though that it may not scale well to large workspaces with couple of hundered projects in them and in particular not in the face of a somewhat slowish I/O situation. |
Similarly, it takes a very long time on my machine to open a |
You can't really cache a cache key. The SHA is a unique identifier for the file. Not the other way around. All that could be done here is to defer the SHA calculation (perhaps forever) until it is needed. But if it is needed, it still needs to be computed. I have an idea on how to handle deferral here which I will experiment with. |
The SHA-256 is generated from a file, right? As long as the file does not change, it would not be necessary to recalculate the checksum. Maybe a file timestamp could be used as an indication whether the file has changed and if a re-calculation of the sum would be necessary? |
And how do you know the file changed? You know by calculating the hash to see if it is the same or not. You cannot trust the file system metadata (length, modified time) as that can be easily manipulated. |
Sure. What I have done in a potentially similar situation is to keep an in-memory cache of hashes, each associated with a timestamp. When a hash becomes older than a certain time, I re-calculate it. When the file on disk has a newer timestamp than the one in memory, I also re-calculate. It is not perfect, but reduces the need to calculate the hash at every opportunity. |
The osgi.content capability must hold the SHA-256 value of the resource. But this value is rarely used. So, when building a resource from a file, we defer the SHA-256 calculation until actually needed. The can help performance of FileSetRepository used by bndtools m2e. However, we do need the hashCode of the SHA-256 value early, so use a substitute hashCode value based upon the File object. This is not perfect since two files can hold identical content and the hashCode of their SHA-256 values should be identical. But in practice, this should work when creating a resource from a file since the early need for the hashCode of the SHA-256 value is in the computing of the hashCode of the capability so it can be inserted in a set. But creating a resource from a file only creates a single osgi.content capability. Fixes bndtools#5322 Signed-off-by: BJ Hargrave <[email protected]>
The osgi.content capability must hold the SHA-256 value of the resource. But this value is rarely used. So, when building a resource from a file, we defer the SHA-256 calculation until actually needed. The can help performance of FileSetRepository used by bndtools m2e. However, we do need the hashCode of the SHA-256 value early, so use a substitute hashCode value based upon the File object. This is not perfect since two files can hold identical content and the hashCode of their SHA-256 values should be identical. But in practice, this should work when creating a resource from a file since the early need for the hashCode of the SHA-256 value is in the computing of the hashCode of the capability so it can be inserted in a set. But creating a resource from a file only creates a single osgi.content capability. Fixes bndtools#5322 Signed-off-by: BJ Hargrave <[email protected]>
The osgi.content capability must hold the SHA-256 value of the resource. But this value is rarely used. So, when building a resource from a file, we defer the SHA-256 calculation until actually needed. The can help performance of FileSetRepository used by bndtools m2e. However, we do need the hashCode of the SHA-256 value early, so use a substitute hashCode value based upon the File object. This is not perfect since two files can hold identical content and the hashCode of their SHA-256 values should be identical. But in practice, this should work when creating a resource from a file since the early need for the hashCode of the SHA-256 value is in the computing of the hashCode of the capability so it can be inserted in a set. But creating a resource from a file only creates a single osgi.content capability. Fixes bndtools#5322 Signed-off-by: BJ Hargrave <[email protected]>
The situation seems to be improved. I'll keep monitoring it. Thanks! |
So, opening the said |
That said, it appears that the background task that builds the implicit repository can still take a very long time and it seems it can also make the Eclipse UI freeze up for a significant time. A first glance with VisualVM does not point to a particular hotspot. The time seems to be spend in general on artifact resolving distributed over a number of difference methods. |
At some points, the SHA-256 value is needed. There is no way to avoid it being needed. The PR only deferred the calculation into some future which you now seem to be running into. ResourceImpl.equals needs the value to properly determine if two Resources are equal. This equality test needs the values of the attributes of all the resource's capability which includes the SHA-256 value. |
It seems that bndtools is slowing down things significantly when used in a large Eclipse workspace / projects with lots of dependencies. Using the sampler VisualVM, it appears that the time is mostly spent for SHA-256 calculation.
Even when trying to shut down Eclipse, the shut down hangs because bndtools first wants to complete these SHA-256 calculations.
The text was updated successfully, but these errors were encountered: