Skip to content

Fix hash @indices can grow larger than Int32::MAX bytes#15347

Merged
straight-shoota merged 6 commits intocrystal-lang:masterfrom
ysbaddaden:fix/hash-indices-size-can-grow-larger-than-int32
Jan 18, 2025
Merged

Fix hash @indices can grow larger than Int32::MAX bytes#15347
straight-shoota merged 6 commits intocrystal-lang:masterfrom
ysbaddaden:fix/hash-indices-size-can-grow-larger-than-int32

Conversation

@ysbaddaden
Copy link
Collaborator

@ysbaddaden ysbaddaden commented Jan 15, 2025

fixes #15341

See #15341 (comment) for details.

fixes crystal-lang#15341

Co-authored-by: Johannes Müller <straightshoota@gmail.com>
@ysbaddaden ysbaddaden added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection labels Jan 15, 2025
@ysbaddaden ysbaddaden self-assigned this Jan 15, 2025
@straight-shoota
Copy link
Member

straight-shoota commented Jan 15, 2025

We should add a spec for this. Something like Hash(String, String).new(initial_capacity: Int32::MAX) would do.

@ysbaddaden
Copy link
Collaborator Author

Oh, pw2ceil(Int32::MAX) becomes Int32::MAX + 1 and thus raises a math overflow (oops). We shall fix that too.

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Jan 16, 2025

The spec is painfully slow, wasting over 9s to allocate & initialize the memory 😕

  Hash creates with max capacity
    9.21 seconds spec/std/hash_spec.cr:1396

The overall std spec runtime is 45s on my computer. Adding 9s has a +20% impact 😰

It does allocate too much memory, which breaks on CI, is definitely too large on 32-bit targets, and is far too slow overall, even on fast hardware.
@straight-shoota
Copy link
Member

The spec is painfully slow, wasting over 9s to allocate & initialize the memory

Ouch that's a lot 🙈 Of course, we're trying to allocate 10GB total (8 for the indices, 2 for the entries).
Certainly we shouldn't run this by default. Maybe put it with the manual specs? (although that effectively means it'll almost never be tested 🤷).

What if we try to allocate just (Int32::MAX // 4) + 1 (allocates 2.5GB). This should validate fixing the original issue. It still takes 6 seconds, though (on my machine). So we still shouldn't run it on default. But at least it works doesn't require quite as much memory.

@ysbaddaden
Copy link
Collaborator Author

Where are the manual specs 🔍

@straight-shoota
Copy link
Member

Where are the manual specs 🔍

That's exactly what I meant with "it'll almost never be tested" 😆

👉 https://github.com/crystal-lang/crystal/tree/master/spec/manual

@ysbaddaden
Copy link
Collaborator Author

It actually allocates even more memory than that.

With Int32::MAX that's ~18GB:

  • @entries: 2147483647 bytes (~2GB)
  • @indices: 17179869176 bytes (~16GB)

With Int32::MAX // 4 + 1 that's 4.5GB:

  • @entries: 536870912 bytes (512MB)
  • @indices: 4294967296 bytes (4GB)

In fact I wonder if the indices are worth taking so much memory.

@straight-shoota
Copy link
Member

Oh, right. It's capacity * 2 * 4 for the indices malloc.

Yeah that's quite a lot for huge collections. Probably quite a lot for smaller ones, too?

@straight-shoota straight-shoota added this to the 1.16.0 milestone Jan 16, 2025
@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Jan 16, 2025

Argh, I got the megabytes wrong for @entries I only reported the number of entries, the actual bytes are size * sizeof(Entry(K, V)), so 12 times more bytes per entry for Hash(Int32, Int32) 😨

@straight-shoota
Copy link
Member

One thing is quite certain then: It's pretty hard to understand the code and reason about the allocations 🙈

@crysbot
Copy link
Collaborator

crysbot commented Jan 16, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/goldbachs-conjecture-and-crystal-code/7579/27

@straight-shoota straight-shoota merged commit c4682db into crystal-lang:master Jan 18, 2025
69 of 70 checks passed
@ysbaddaden ysbaddaden deleted the fix/hash-indices-size-can-grow-larger-than-int32 branch January 18, 2025 15:17
@straight-shoota straight-shoota changed the title Fix: Hash @indices can grow larger than Int32::MAX bytes Fix hash @indices can grow larger than Int32::MAX bytes Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inexplicable arithmetic overflow ??

3 participants