Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak related to type substitutions #888

Closed
brianhuffman opened this issue Sep 11, 2020 · 6 comments · Fixed by #895
Closed

Memory leak related to type substitutions #888

brianhuffman opened this issue Sep 11, 2020 · 6 comments · Fixed by #895
Assignees

Comments

@brianhuffman
Copy link
Contributor

While doing some memory profiling of the cryptol interpreter recently with +RTS -hb, I discovered that loading cryptol modules produces a large amount of void data on the heap that remains in memory throughout evaluation, basically until the REPL exits.

Profiling with +RTS -hbvoid -hy of +RTS -hbvoid -hc shows that the void data is mostly made of IntMap and (,) values which are created by the @@ operation from Cryptol.TypeCheck.Subst. Retainer analysis shows that there are calls to apSubst that are hanging onto these Subst values that contain the IntMaps of pairs.

As I understand it, here's what is happening: During type inference, the "current substitution" for unification variables grows rather large (tens of megabytes for a moderate-sized module). When type inference is done, the "current substitution" is then applied to some of the result values with apSubst. However, the result values are not sufficiently forced, leaving some applications of apSubst unevaluated. This causes the program to retain the entire (large) substitution value.

@brianhuffman brianhuffman self-assigned this Sep 11, 2020
brianhuffman pushed a commit that referenced this issue Sep 11, 2020
Previously, unevaluated `apSubst` thunks were retaining copies of
a very large `Subst` value.

Fixes #888.
@brianhuffman
Copy link
Contributor Author

Adding strictness can fix the persistence of the void data, but we are still constructing a very large substitution value. It might be worthwhile to revisit #544, and see if there's an algorithmic change we can do to prevent the substitution values from getting so big in the first place.

@robdockins
Copy link
Contributor

It certainly seems like type substitution is the/a major bottleneck in the typechecker. Do we have specific algorithmic changes in mind that might help?

@brianhuffman
Copy link
Contributor Author

The main problem is that to keep the substitution idempotent, every time you extend it you need to traverse the whole substitution to update all the old entries. This keeps getting slower and slower as the map grows. I've been thinking about strategies to "evict" some entries from the substitution map to keep it small, in particular when you leave the scope of some bound type variables. I haven't worked out the details yet. The trick is to implement a strategy like that and improve the asymptotic behavior without slowing down the common cases too much.

@robdockins
Copy link
Contributor

It would be quite the refactoring, but would it work to instead represent unification variables using STRefs or IORefs? Then you wouldn't need to maintain a substitution data structure at all and the question of idempotence doesn't arise. It's my understanding that substitutions have essentially global scope: a unification variable should never be unified differently in different scopes, etc, so a representation using mutable variables would make sense.

Alternately, we could drop the idempotent invariant and instead do path compression on lookups or something. That wouldn't help shrink the substitution, but it would keep us from having to traverse it constantly.

@brianhuffman
Copy link
Contributor Author

That kind of refactoring might be worthwhile. Dropping the idempotency requirement would probably make it necessary to change all the unification code to monadic style; even without path compression, we would at least need a reader monad to do variable lookups. And if we're going to go that far, we might as well use STRefs instead of keys to an IntMap table.

@yav
Copy link
Member

yav commented Sep 15, 2020

This might be relevant: http://okmij.org/ftp/ML/generalization.html

brianhuffman pushed a commit that referenced this issue Sep 23, 2020
We also needed to rewrite `anyJust2` to make it more strict,
to avoid reintroducing the space leak of issue #888.
brianhuffman pushed a commit that referenced this issue Sep 23, 2020
We also needed to rewrite `anyJust2` to make it more strict,
to avoid reintroducing the space leak of issue #888.
@brianhuffman brianhuffman mentioned this issue Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants