Skip to content

Conversation

@eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Dec 15, 2022

A core issue addressed was the following setup:

  1. get_neighbors with a small std. deviation.
  2. A small set of neighbors to choose from.

This resulted in rejection sampling never finding enough valid neighbors in its Gaussian from which to sample.


The solution was to simply find the next closest neighbor that was not sampled from yet. Iteratively checking around that number until one was found. This also respects the boundaries.

neighbors = [3, 4, 5, 6]
sample = 5
# try 4, nope
# try 6, nope
# try 3, nope
# try 7, yup
neighbors.append(7)

I'm not sure if this is "correct" but I don't see another viable solution.


Another issue came up, as seen in the comment below in that the NormalIntegerHyperparamater (and later diagnosed to also be a problem for BetaIntegerHyperparameter) is that the _compute_normalization and get_max_density require computing the pdf for every possible int value. This caused memory to blow up in the case where the range of possible values was incredibly large, for example every possible int32 value.

To combat this, I implemented an arange_chunked which functions similarly to arange but yields sub-chunks. This was possible because the sum and max operations are possible over partial chunks and do not require the full arange to be in memory at once.

This is still incredibly slow, calculating the pdf and max density over this entire range and it's likely that an analytical solution is possible, as we deal with subsequent numbers.

This is documented in #283


It's also quite difficult to work with this codebase given the .pyx doesn't allow for editors to be very smart (for example jump-to-defintion). Using normal text search is also frustrating due to all classes sharing method names and being in one file. Just voicing again that converting this back to pure python and splitting up the files would make working with ConfigSpace easier. Any performance issues which originally motivated the switch are likely solvable within python itself as numpy can do the heavy lifting in C.

@eddiebergman
Copy link
Contributor Author

eddiebergman commented Dec 15, 2022

import ConfigSpace.hyperparameters as CSH
import numpy as np
rnd = np.random.RandomState(19937)

# This gets to the for loop before hanging
#a = CSH.UniformIntegerHyperparameter('a', lower=1, upper=2147483647, log=True)

# This hangs before the prints
a = CSH.NormalIntegerHyperparameter('a', mu=10, sigma=500, lower=1, upper=2147483647, log=True)

print(a, flush=True)
print(rnd, flush=True)

for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=8, std=0.05)

See #283 for the cause

@eddiebergman
Copy link
Contributor Author

Possibly unrelated error. Doesn't cause memory to explode but it's stuck in an endless loop that can't be killed with a KeyboardInterupt (Ctrl+c).

import ConfigSpace.hyperparameters as CSH
import numpy as np
rnd = np.random.RandomState(19937)
a = CSH.NormalIntegerHyperparameter('a', mu=10, sigma=500, lower=1, upper=1000, log=True)
for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=8)
Exception ignored in: 'ConfigSpace.hyperparameters.NormalFloatHyperparameter._transform_scalar'
Traceback (most recent call last):
  File "/home/skantify/code/ConfigSpace/test_memory_leak.py", line 6, in <module>
    a.get_neighbors(0.031249126501512327, rnd, number=8)
OverflowError: math range error
OverflowError: Python int too large to convert to C long

@eddiebergman
Copy link
Contributor Author

Back to the original memory overflow:

number = 5  # slow but fine
for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=number, std=0.05)
    
number = 6 # Suddenly blows up memory and unkillable process
for i in range(1, 10000):
    a.get_neighbors(0.031249126501512327, rnd, number=number, std=0.05)

When querying a large range for a UniformIntegerHyperparameter with a
small std.deviation and log scale, this could cause an infinite loop as
the reachable neighbors would be quickly exhausted, yet rejection
sampling will continue sampling until some arbitrary termination
criterion. Why this was causing a memory leak, I'm not entirely sure.

The solution now is that is we have seen a sampled value before, we
simply take the one "next to it".
Replaced usages of arange with a chunked version to prevent memory
blowup. However this is still incredibly slow and needs a more refined
solution as a huge amount of values are required to be computed for what
can possibly be analytically derived.
@codecov
Copy link

codecov bot commented Jan 4, 2023

Codecov Report

Base: 67.64% // Head: 67.97% // Increases project coverage by +0.32% 🎉

Coverage data is based on head (1702342) compared to base (7f1ac3b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #282      +/-   ##
==========================================
+ Coverage   67.64%   67.97%   +0.32%     
==========================================
  Files          24       25       +1     
  Lines        1768     1786      +18     
==========================================
+ Hits         1196     1214      +18     
  Misses        572      572              
Impacted Files Coverage Δ
ConfigSpace/__init__.py 100.00% <100.00%> (ø)
ConfigSpace/functional.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@eddiebergman eddiebergman requested a review from mfeurer January 4, 2023 15:23
@eddiebergman eddiebergman mentioned this pull request Jan 4, 2023
def get_num_neighbors(self, value = None) -> int:
return self.upper - self.lower

def get_neighbors(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this new implementation. It is very lean and we should try to use it for the other hyperparameters, too (without the rounding of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I would just deal with memory issues for now, it's quite possible we could unite the neighbor generating algorithm into one lean function and not have many similar implementations.

Copy link
Contributor

@mfeurer mfeurer Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. This should be a separate PR. I will leave this open to remember to open an issue on this when this PR is done.

@eddiebergman
Copy link
Contributor Author

eddiebergman commented Jan 4, 2023

I fixed the compiler directives to actually by active cython_directives -> compiler_directives and it seems I need to convert the center_range generator to be cython to handle big ints for windows. I'm getting this feeling based on the directives I'm not really sure why it's failing but it's my best guess since wraparound is now explicitly set to False where by default it is set to True.

Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks great and I'd be happy to merge it. Unfortunately, it's a bit slower generating neighbors for the auto-sklearn search space. Do you think you could re-add the cython annotations to make this fast again?

@mfeurer
Copy link
Contributor

mfeurer commented Jan 5, 2023

RE Windows/wraparound: you could change the flag for that one file?

Comment on lines 41 to 43
# OPTIM: To prevent large memory allocations, we place an upperbound on the maximum
# amount of neighbors to be sampled
MAX_NEIGHBORHOOD = 10_000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an issue here where the number of requested neighbors for a UniformIntegerHyperparameter was set to the full possible range. This would firstly blow up memory as it would try to generate every possible neighbor and secondly, only a fraction of these were used.

This constant can be seen in use later on in this PR.

Comment on lines 188 to 196
neighbors = finite_neighbors_stack.get(hp_name, [])
if len(neighbors) == 0:
if isinstance(hp, UniformIntegerHyperparameter):
_n_neighbors = min(n_neighbors_per_hp[hp_name], MAX_NEIGHBORHOOD)
neighbors = hp.get_neighbors(
value, random,
number=n_neighbors_per_hp[hp_name], std=stdev,
value,
random,
number=_n_neighbors,
std=stdev
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a line further below this block neighbors.pop(). After fixing an issue with UniformIntegerHyperparamter::get_num_Neighbors, this cause the benchmark to fail. There was some parameter with only 3 possible neighbors which must have been sampled a few times, causing it to run out of neighbors during this procedure.

Now it will just generate a new set of neighbors if it has run out from all of the neighbors.pop

Comment on lines +1558 to +1563
# If there is a value in the range, then that value is not a neighbor of itself
# so we need to remove one
if value is not None and self.lower <= value <= self.upper:
return self.upper - self.lower - 1
else:
return self.upper - self.lower
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to handle this but I thought it should act similar to Categorical and that the value would count as a neighbor. However Categorical just implies it implicitly, regardless of the value passed in. Here, I've said that any value in the range can not be a neighbor of itself.

@eddiebergman
Copy link
Contributor Author

eddiebergman commented Jan 7, 2023

See above for some more small bits and peices.

As for the timings, it seems to be back to it's original for me:


fix_memory_leak:
Average time sampling 100 configurations 0.024013697199999972
Average time retrieving a nearest neighbor 0.003898305980000023
Average time checking one configuration 0.0001229973144186039

master:
Average time sampling 100 configurations 0.023249030113220215
Average time retrieving a nearest neighbor 0.003536670207977295
Average time checking one configuration 0.00012289423312202225

On average, fix_memory_leak seems about 0.0003 slower but their distributions overlap for me locally.

RE Windows/wraparound: you could change the flag for that one file?

Let's talk about this on Monday but short answer is I'm not sure how to handle this properly because it appears int on mac/linux is an int64 (long long) but on windows it's an int32. There was no checks for anything of this kind before and cython wasn't setting the compiler directives properly before with wraparound (so it was silently wrapping without throwing an error).

tldr; I think this is a pre-existing error for large integers on windows and was silently eaten.

Update:
With the truncnorm version

truncnorm:
Average time sampling 100 configurations 0.023650693893432616
Average time retrieving a nearest neighbor 0.004139573574066162
Average time checking one configuration 0.00012306449949279311

@mfeurer mfeurer merged commit c63ed28 into main Jan 11, 2023
@mfeurer mfeurer deleted the fix_memory_leak branch January 11, 2023 12:11
github-actions bot pushed a commit that referenced this pull request Jan 11, 2023
nchristensen pushed a commit to nchristensen/ConfigSpace that referenced this pull request Jan 31, 2023
* test: Add reproducing test

* fix: Make sampling neighbors form uniform Int stable

* fix: Memory leak with UniformIntegerHyperparameter

When querying a large range for a UniformIntegerHyperparameter with a
small std.deviation and log scale, this could cause an infinite loop as
the reachable neighbors would be quickly exhausted, yet rejection
sampling will continue sampling until some arbitrary termination
criterion. Why this was causing a memory leak, I'm not entirely sure.

The solution now is that is we have seen a sampled value before, we
simply take the one "next to it".

* fix: Memory issues with Normal and Beta dists

Replaced usages of arange with a chunked version to prevent memory
blowup. However this is still incredibly slow and needs a more refined
solution as a huge amount of values are required to be computed for what
can possibly be analytically derived.

* chore: Update flake8

* fix: flake8 version compatible with Python 3.7

* fix: Name generators properly

* fix: Test numbers

* doc: typo fixes

* perf: Generate all possible neighbors at once

* test: Add test for center_range and arange_chunked

* perf: Call transform on np vector from rvs

* perf: Use numpy `.astype(int)` instead of `int`

* doc: Document how to get flamegraphs for optimizing

* fix: Allow for negatives in arange_chunked again

* fix: Change build back to raw Extensions

* build: Properly set compiler_directives

* ci: Update makefile with helpful commands

* ci: Fix docs to install build

* perf: cython optimizations

* perf: Fix possible memory leak with UniformIntegerHyperparam

* fix: Duplicates as `list` instead of set

* fix: Convert to `long long` vector

* perf: Revert clip to truncnorm

This truncnorm has some slight overhead due to however
scipy generates its truncnorm distribution, however this
overhead is considered worth it for the sake of readability
and understanding

* test: Test values not match implementation

* Intermediate commit

* INtermediate commit 2

* Update neighborhood generation for UniformIntegerHyperparameter

* Update tests

* Make the benchmark sampling script more robust

* Revert small change in util function

* Improve readability

Co-authored-by: Matthias Feurer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory Leak

3 participants