fix: Memory leak #282

eddiebergman · 2022-12-15T13:36:28Z

Closes Memory Leak #187

A core issue addressed was the following setup:

get_neighbors with a small std. deviation.
A small set of neighbors to choose from.

This resulted in rejection sampling never finding enough valid neighbors in its Gaussian from which to sample.

The solution was to simply find the next closest neighbor that was not sampled from yet. Iteratively checking around that number until one was found. This also respects the boundaries.

neighbors = [3, 4, 5, 6]
sample = 5
# try 4, nope
# try 6, nope
# try 3, nope
# try 7, yup
neighbors.append(7)

I'm not sure if this is "correct" but I don't see another viable solution.

Another issue came up, as seen in the comment below in that the NormalIntegerHyperparamater (and later diagnosed to also be a problem for BetaIntegerHyperparameter) is that the _compute_normalization and get_max_density require computing the pdf for every possible int value. This caused memory to blow up in the case where the range of possible values was incredibly large, for example every possible int32 value.

To combat this, I implemented an arange_chunked which functions similarly to arange but yields sub-chunks. This was possible because the sum and max operations are possible over partial chunks and do not require the full arange to be in memory at once.

This is still incredibly slow, calculating the pdf and max density over this entire range and it's likely that an analytical solution is possible, as we deal with subsequent numbers.

This is documented in #283

It's also quite difficult to work with this codebase given the .pyx doesn't allow for editors to be very smart (for example jump-to-defintion). Using normal text search is also frustrating due to all classes sharing method names and being in one file. Just voicing again that converting this back to pure python and splitting up the files would make working with ConfigSpace easier. Any performance issues which originally motivated the switch are likely solvable within python itself as numpy can do the heavy lifting in C.

eddiebergman · 2022-12-15T13:40:19Z

import ConfigSpace.hyperparameters as CSH
import numpy as np
rnd = np.random.RandomState(19937)

# This gets to the for loop before hanging
#a = CSH.UniformIntegerHyperparameter('a', lower=1, upper=2147483647, log=True)

# This hangs before the prints
a = CSH.NormalIntegerHyperparameter('a', mu=10, sigma=500, lower=1, upper=2147483647, log=True)

print(a, flush=True)
print(rnd, flush=True)

for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=8, std=0.05)

See #283 for the cause

eddiebergman · 2022-12-15T13:44:54Z

Possibly unrelated error. Doesn't cause memory to explode but it's stuck in an endless loop that can't be killed with a KeyboardInterupt (Ctrl+c).

import ConfigSpace.hyperparameters as CSH
import numpy as np
rnd = np.random.RandomState(19937)
a = CSH.NormalIntegerHyperparameter('a', mu=10, sigma=500, lower=1, upper=1000, log=True)
for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=8)

Exception ignored in: 'ConfigSpace.hyperparameters.NormalFloatHyperparameter._transform_scalar'
Traceback (most recent call last):
  File "/home/skantify/code/ConfigSpace/test_memory_leak.py", line 6, in <module>
    a.get_neighbors(0.031249126501512327, rnd, number=8)
OverflowError: math range error
OverflowError: Python int too large to convert to C long

eddiebergman · 2022-12-15T13:49:34Z

Back to the original memory overflow:

number = 5  # slow but fine
for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=number, std=0.05)
    
number = 6 # Suddenly blows up memory and unkillable process
for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=number, std=0.05)

When querying a large range for a UniformIntegerHyperparameter with a small std.deviation and log scale, this could cause an infinite loop as the reachable neighbors would be quickly exhausted, yet rejection sampling will continue sampling until some arbitrary termination criterion. Why this was causing a memory leak, I'm not entirely sure. The solution now is that is we have seen a sampled value before, we simply take the one "next to it".

Replaced usages of arange with a chunked version to prevent memory blowup. However this is still incredibly slow and needs a more refined solution as a huge amount of values are required to be computed for what can possibly be analytically derived.

codecov · 2023-01-04T14:59:22Z

Codecov Report

Base: 67.64% // Head: 67.97% // Increases project coverage by +0.32% 🎉

Coverage data is based on head (1702342) compared to base (7f1ac3b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #282      +/-   ##
==========================================
+ Coverage   67.64%   67.97%   +0.32%     
==========================================
  Files          24       25       +1     
  Lines        1768     1786      +18     
==========================================
+ Hits         1196     1214      +18     
  Misses        572      572

Impacted Files	Coverage Δ
ConfigSpace/__init__.py	`100.00% <100.00%> (ø)`
ConfigSpace/functional.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

ConfigSpace/functional.py

ConfigSpace/hyperparameters.pyx

mfeurer · 2023-01-04T15:58:54Z

ConfigSpace/hyperparameters.pyx

    def get_num_neighbors(self, value = None) -> int:
        return self.upper - self.lower

    def get_neighbors(


I like this new implementation. It is very lean and we should try to use it for the other hyperparameters, too (without the rounding of course).

I thought I would just deal with memory issues for now, it's quite possible we could unite the neighbor generating algorithm into one lean function and not have many similar implementations.

Yes, I agree. This should be a separate PR. I will leave this open to remember to open an issue on this when this PR is done.

ConfigSpace/hyperparameters.pyx

eddiebergman · 2023-01-04T21:43:34Z

I fixed the compiler directives to actually by active cython_directives -> compiler_directives and it seems I need to convert the center_range generator to be cython to handle big ints for windows. I'm getting this feeling based on the directives I'm not really sure why it's failing but it's my best guess since wraparound is now explicitly set to False where by default it is set to True.

mfeurer

This now looks great and I'd be happy to merge it. Unfortunately, it's a bit slower generating neighbors for the auto-sklearn search space. Do you think you could re-add the cython annotations to make this fast again?

ConfigSpace/hyperparameters.pyx

mfeurer · 2023-01-05T13:46:53Z

RE Windows/wraparound: you could change the flag for that one file?

ConfigSpace/hyperparameters.pyx

eddiebergman · 2023-01-07T21:37:29Z

ConfigSpace/util.pyx

+# OPTIM: To prevent large memory allocations, we place an upperbound on the maximum
+# amount of neighbors to be sampled
+MAX_NEIGHBORHOOD = 10_000


There was an issue here where the number of requested neighbors for a UniformIntegerHyperparameter was set to the full possible range. This would firstly blow up memory as it would try to generate every possible neighbor and secondly, only a fraction of these were used.

This constant can be seen in use later on in this PR.

eddiebergman · 2023-01-07T21:40:15Z

ConfigSpace/util.pyx

+                    neighbors = finite_neighbors_stack.get(hp_name, [])
+                    if len(neighbors) == 0:
                        if isinstance(hp, UniformIntegerHyperparameter):
+                            _n_neighbors = min(n_neighbors_per_hp[hp_name], MAX_NEIGHBORHOOD)
                            neighbors = hp.get_neighbors(
-                                value, random,
-                                number=n_neighbors_per_hp[hp_name], std=stdev,
+                                value,
+                                random,
+                                number=_n_neighbors,
+                                std=stdev


There's a line further below this block neighbors.pop(). After fixing an issue with UniformIntegerHyperparamter::get_num_Neighbors, this cause the benchmark to fail. There was some parameter with only 3 possible neighbors which must have been sampled a few times, causing it to run out of neighbors during this procedure.

Now it will just generate a new set of neighbors if it has run out from all of the neighbors.pop

eddiebergman · 2023-01-07T21:42:24Z

ConfigSpace/hyperparameters.pyx

+        # If there is a value in the range, then that value is not a neighbor of itself
+        # so we need to remove one
+        if value is not None and self.lower <= value <= self.upper:
+            return self.upper - self.lower - 1
+        else:
+            return self.upper - self.lower


I wasn't sure how to handle this but I thought it should act similar to Categorical and that the value would count as a neighbor. However Categorical just implies it implicitly, regardless of the value passed in. Here, I've said that any value in the range can not be a neighbor of itself.

eddiebergman · 2023-01-07T22:38:29Z

See above for some more small bits and peices.

As for the timings, it seems to be back to it's original for me:


fix_memory_leak:
Average time sampling 100 configurations 0.024013697199999972
Average time retrieving a nearest neighbor 0.003898305980000023
Average time checking one configuration 0.0001229973144186039

master:
Average time sampling 100 configurations 0.023249030113220215
Average time retrieving a nearest neighbor 0.003536670207977295
Average time checking one configuration 0.00012289423312202225

On average, fix_memory_leak seems about 0.0003 slower but their distributions overlap for me locally.

RE Windows/wraparound: you could change the flag for that one file?

Let's talk about this on Monday but short answer is I'm not sure how to handle this properly because it appears int on mac/linux is an int64 (long long) but on windows it's an int32. There was no checks for anything of this kind before and cython wasn't setting the compiler directives properly before with wraparound (so it was silently wrapping without throwing an error).

tldr; I think this is a pre-existing error for large integers on windows and was silently eaten.

Update:
With the truncnorm version

truncnorm:
Average time sampling 100 configurations 0.023650693893432616
Average time retrieving a nearest neighbor 0.004139573574066162
Average time checking one configuration 0.00012306449949279311

This truncnorm has some slight overhead due to however scipy generates its truncnorm distribution, however this overhead is considered worth it for the sake of readability and understanding

* test: Add reproducing test * fix: Make sampling neighbors form uniform Int stable * fix: Memory leak with UniformIntegerHyperparameter When querying a large range for a UniformIntegerHyperparameter with a small std.deviation and log scale, this could cause an infinite loop as the reachable neighbors would be quickly exhausted, yet rejection sampling will continue sampling until some arbitrary termination criterion. Why this was causing a memory leak, I'm not entirely sure. The solution now is that is we have seen a sampled value before, we simply take the one "next to it". * fix: Memory issues with Normal and Beta dists Replaced usages of arange with a chunked version to prevent memory blowup. However this is still incredibly slow and needs a more refined solution as a huge amount of values are required to be computed for what can possibly be analytically derived. * chore: Update flake8 * fix: flake8 version compatible with Python 3.7 * fix: Name generators properly * fix: Test numbers * doc: typo fixes * perf: Generate all possible neighbors at once * test: Add test for center_range and arange_chunked * perf: Call transform on np vector from rvs * perf: Use numpy `.astype(int)` instead of `int` * doc: Document how to get flamegraphs for optimizing * fix: Allow for negatives in arange_chunked again * fix: Change build back to raw Extensions * build: Properly set compiler_directives * ci: Update makefile with helpful commands * ci: Fix docs to install build * perf: cython optimizations * perf: Fix possible memory leak with UniformIntegerHyperparam * fix: Duplicates as `list` instead of set * fix: Convert to `long long` vector * perf: Revert clip to truncnorm This truncnorm has some slight overhead due to however scipy generates its truncnorm distribution, however this overhead is considered worth it for the sake of readability and understanding * test: Test values not match implementation * Intermediate commit * INtermediate commit 2 * Update neighborhood generation for UniformIntegerHyperparameter * Update tests * Make the benchmark sampling script more robust * Revert small change in util function * Improve readability Co-authored-by: Matthias Feurer <[email protected]>

test: Add reproducing test

e2566e8

eddiebergman added 4 commits December 15, 2022 20:03

fix: Make sampling neighbors form uniform Int stable

bbc4161

chore: Update flake8

7afcaea

eddiebergman mentioned this pull request Jan 4, 2023

[Optimization] NormalIntegerHyperparameter get_max_density and _compute_normalization #283

Closed

fix: flake8 version compatible with Python 3.7

91d590b

eddiebergman requested a review from mfeurer January 4, 2023 15:23

eddiebergman mentioned this pull request Jan 4, 2023

Memory Leak #187

Closed

eddiebergman added 2 commits January 4, 2023 16:36

fix: Name generators properly

57b4182

fix: Test numbers

9f0b18d

mfeurer reviewed Jan 4, 2023

View reviewed changes

eddiebergman added 11 commits January 4, 2023 17:12

doc: typo fixes

3a8286d

perf: Generate all possible neighbors at once

c36d292

test: Add test for center_range and arange_chunked

367e508

perf: Call transform on np vector from rvs

c5b3281

perf: Use numpy .astype(int) instead of int

e1fac33

doc: Document how to get flamegraphs for optimizing

564c0a4

fix: Allow for negatives in arange_chunked again

c5d2090

fix: Change build back to raw Extensions

5f51df1

build: Properly set compiler_directives

29dd3db

ci: Update makefile with helpful commands

555c5da

ci: Fix docs to install build

d833df9

mfeurer reviewed Jan 5, 2023

View reviewed changes

ConfigSpace/hyperparameters.pyx Show resolved Hide resolved

mfeurer mentioned this pull request Jan 5, 2023

[Optimization] improve neighborhood generation by avoiding rejection sampling #284

Closed

eddiebergman added 2 commits January 7, 2023 20:26

perf: cython optimizations

0a7aafb

perf: Fix possible memory leak with UniformIntegerHyperparam

e24dd25

eddiebergman commented Jan 7, 2023

View reviewed changes

ConfigSpace/hyperparameters.pyx Outdated Show resolved Hide resolved

eddiebergman commented Jan 7, 2023

View reviewed changes

ConfigSpace/hyperparameters.pyx Outdated Show resolved Hide resolved

eddiebergman commented Jan 7, 2023

View reviewed changes

eddiebergman added 2 commits January 7, 2023 23:22

fix: Duplicates as list instead of set

d52f28c

fix: Convert to long long vector

94b4d9c

eddiebergman and others added 8 commits January 10, 2023 15:42

perf: Revert clip to truncnorm

e6d3222

This truncnorm has some slight overhead due to however scipy generates its truncnorm distribution, however this overhead is considered worth it for the sake of readability and understanding

test: Test values not match implementation

395ea60

Intermediate commit

1f69f51

INtermediate commit 2

ba2a174

Update neighborhood generation for UniformIntegerHyperparameter

dd5bd75

Update tests

c4e7f42

Make the benchmark sampling script more robust

2a647a3

Revert small change in util function

e0bef0e

mfeurer approved these changes Jan 11, 2023

View reviewed changes

Improve readability

1702342

mfeurer merged commit c63ed28 into main Jan 11, 2023

mfeurer deleted the fix_memory_leak branch January 11, 2023 12:11

github-actions bot pushed a commit that referenced this pull request Jan 11, 2023

Eddie Bergman: fix: Memory leak (#282)

ea124e1

fix: Memory leak #282

fix: Memory leak #282

Uh oh!

Conversation

eddiebergman commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddiebergman commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddiebergman commented Dec 15, 2022

Uh oh!

eddiebergman commented Dec 15, 2022

Uh oh!

codecov bot commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfeurer Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

mfeurer Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eddiebergman commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mfeurer commented Jan 5, 2023

Uh oh!

Uh oh!

Uh oh!

eddiebergman Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

eddiebergman commented Jan 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eddiebergman commented Dec 15, 2022 •

edited

Loading

eddiebergman commented Dec 15, 2022 •

edited

Loading

codecov bot commented Jan 4, 2023 •

edited

Loading

mfeurer Jan 4, 2023 •

edited

Loading

eddiebergman commented Jan 4, 2023 •

edited

Loading

eddiebergman commented Jan 7, 2023 •

edited

Loading