-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected memory usage #810
Comments
The first function retains the intermediate list right until it extracts the final element, so I'm not surprised it's using linear memory. I'd love to find a way to recognize this particular pattern in the interpreter and do something more clever, but it's not obvious how to do it. I've considered making As to version 2, I think this might be due to Cryptol's lazy semantics. Tail-recursion doesn't really help very much in a lazy language unless you can force the thunks during the computations somehow. In fact, the more direct recursion of number 3 is usually better in a lazy language. With an accumulator, you usually just end up building up a big tower of thunks that all get forced at the end. In Haskell this is a relatively common performance problem known as a "space leak". I don't know what error you're seeing, but I suspect it's GHC's stack overflow error, which tends to happen when you force a deep stack of thunks like this. We've avoided adding the kinds of primitives Haskell programmers use to fix these problems, as they are quite fiddly and feel out of place in a specification DSL... but it's an option. Reading between the lines, I gather your goal is to make this run in constant space, right? I agree this is an important property for scaling up computations. We should definitely find some nice way to make this work. |
Also, I’m pretty sure a list of a million 256-bit hashes doesn’t require anywhere near 174 gigs of memory to represent. |
There's a very important bug in the definition of |
Here's a corrected version of
(Note that I also had to replace |
Nice catch about tailcall. That explains the failures. |
Enclosed is an update to the tests that show the memory issue. Besides fixing the tailcall example, I added a test that makes use of foldl from the Cryptol prelude. Notice that tailcall now has the best performance. |
I've done a little experimentation, this appears to be a classic space leak. I've implemented a pair of primitive fold operations I'm going to try some longer runs and see what happens. Edit: I ran the SHA256 example for 2^20 - 1 iterations using the |
This PR implements the new fold primitives: #868 |
Maybe we don't actually need any new primitives to fix this problem. Here's a stricter variant of
I use the expression |
After further testing it looks like there is a linear component to the |
Yeah, I think some primitive like |
It is great that you are finding a way to improve the performance of two out of the four example implementations submitted. This certainly is forward progress. The examples don't compute anything of interest. They were distilled from problems that occurred when we looked at an interesting application. This line of work arose when we tried to instantiate the Sphincs+ module that resides in the cryptol-specs repository with parameters from the round 2 spec. We immediately ran into memory issues that we believe are represented by the examples associated with this issue. So what we are really interested in is solutions that can be reflected back into Cryptol sources as complex as is the one for Sphincs+ that would allow them to be instantiated into something can executed on existing hardware. |
In all the cases here, increasing the strictness using
|
I've been experimenting with these methods, and with the experimental foldl', and am still seeing some perplexing results. I slightly modified the iteration code to allow the accumulator to have any type, not just a numeric one (basically changing
Now I have two computations to try
These are essentially the same computation, with a different shape of the accumulator. Now here's the fun part. With 10^5 iterations (mem is the "maxresident" size reported by /bin/time; just to load the example Cryptol files gives maxresident of nearly 89000. mem (%) uses the baseline as 100%).
and with 4*10^5 iterations
I should probably subtract off the fixed 89000 before computing percents, but it does not change the picture very much.
I am thinking I should flatten my accumulator types for better memory performance, but it seems a strange thing to have to do. One final peculiarity. If in the REPL I compute "test2_3`{10^^6}", after the answer is printed, if I then execute |
There are a couple of things going on here, I think. First, the
Second, bitvector types get very special handling inside the interpreter, to the point were we cheat sometimes and treat bitvectors more strictly than we should, according to the language semantics (e.g., #640). I've been trying to find ways to fix the strictness bugs without sacrificing too much of the performance benefits, without too much luck so far. At any rate, I think that's why even the baseline version with the flattened accumulator performs better. As to the REPL question, I'm seeing the same behavior if I just ask to evaluate |
Here's another strange example of unexpected memory usage: I have two computations that take a lot of memory. In a single run of Cryptol I can successfully execute
or even
(here
exhausts memory. |
We noticed performance improvements in version 2.10.0. |
It seems like we've made significant enough progress on this to close this bug. @msaaltink, if you're still having problems with the example you mention above, could you open a new ticket with more details? |
@robdockins, I've tried that example again and performance is greatly improved. It does not make much different whether or not I use The only remaining oddity is that while |
Glad to hear things are improved! Maybe that's due to the value of |
No, not after 10 trivial calculations. |
We are experience memory usage patterns that are not as we would expect. I wrote three simple cryptol functions that iterate a hash function provided by the cryptol_spec repository. Here is the source for the three functions.
I expected the first and the second functions to execute using constant space, and the third function to exhibit linear growth. What we observe is that the first function and third use about the same amount of space, and second function simply fails often.
Enclosed is zip file
iter.zip that contains the sources that exhibit the problem. It also contains performance outputs. The Excel spreadsheet contains the results of running the tests at different sizes on a high performance cluster. To run the examples, edit the pqcryptol script to reflect the location of your cryptol_spec repository.
The text was updated successfully, but these errors were encountered: