Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std/hashes: hash(ref|ptr|pointer) + other improvements #17731

Merged
merged 5 commits into from
Apr 16, 2021

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Apr 15, 2021

Yeah, I agree. It's a strange omission that ref doesn't have a default hash implementation

  • changes a behavior that was introduced all the way back in 405b860
function hashPtr(p: Pointer): THash;
begin
  result := ({@cast}THash(p)) shr 3; // skip the alignment
end;

and doesn't make sense to me (1 of the tests in this PR would fail with this rule)

  • make hash(pointer) use the same scrambling as for hash(int) (ie, honors nimIntHash1) so that code expecting scrambled hashes works with pointer (and ref|ptr) too, without performance drop, ie treats all those types that can be cast to int in an identical way; note that I still consider hashing collision: improve performance using linear/pseudorandom probing mix #13440 the superior, more performant alternative (see benchmarks) but that's a separate issue and can be addressed in future work when i revisit it)
  • improve hash(closure); it was using hash(rawProc(x)) !& hash(rawEnv(x)) which violates hash API's by not finalizing it with !$ (ie, can give bad distributions); instead I'm now using hash((rawProc(x), rawEnv(x))) which "does the right thing"
  • workaround for the issue mentioned in add std/syntaxes to hint syntax highlighters about string literals #17722 (comment) that makes the rendering of std/hashes terrible because of asm """p=Data;"""
  • test ttables.nim in js + cpp (in addition to c)
  • refactor disableVm

future work

  • support VM for hash(ref|ptr|pointer)

@timotheecour timotheecour added the TODO: followup needed remove tag once fixed or tracked elsewhere label Apr 16, 2021
@timotheecour timotheecour marked this pull request as ready for review April 16, 2021 07:45
@Araq Araq merged commit d19e431 into nim-lang:devel Apr 16, 2021
@timotheecour timotheecour deleted the pr_hashes_ref branch April 16, 2021 17:00
@arnetheduck
Copy link
Contributor

how does this not break all code that defines a hash(ref T) on their own?

@timotheecour
Copy link
Member Author

timotheecour commented Apr 19, 2021

important_packages was green, and you can overload hash(ref T) in your code as shown in runnableExamples in this PR.

Note that any change (including pure bugfixes) is a potential breaking change for someone under some circumstance, but this shouldn't be a reason to freeze compiler/stdlib improvements.

In future work (pending #12076 + another PR), you'll be able to selectively disable an overload

@arnetheduck
Copy link
Contributor

important_packages was green, and you can overload hash(ref T) in your code as shown in runnableExamples in this PR.

this is hardly a signal, ie the nim language has many nooks and crannies in which bugs hide and symbol resolution is an unusually common one because the lookup rules - we've been here before, all too often - in the best case, this breaks something at compile time, in the worst case, it will break access to hash tables depending on which imports have been made and the mutual compatibility of the std lib declared hash function and the user-declared one.

potential breaking change

this is why instead of taking a flippant approach to breaking user code, it's worth asking oneself about the mechanics of how a code change in such a central part as hash tables might break due to a change or why it will not break, and if it potentially will break things, notify users how they should work around the issue - it's also fine if you don't know or understand the language well enough to even determine these mechanics, but in this case it's even more important to ask the question before pulling the trigger. We have a large codebase that depends on these features and finding out about breaking changes after the fact has been a real problem for numerous releases and bugfix releases. If this were a fringe library, it wouldn't matter, but it isn't, really.

selectively disable

this does not help previously working code - it also doesn't help footguns such as slightly incompatible definitions of hash being used in different modules do to a lack of imports, or worse, the order of imports, both of which have presented problems in the past.

@Araq
Copy link
Member

Araq commented Apr 20, 2021

this is hardly a signal, ie the nim language has many nooks and crannies in which bugs hide and symbol resolution is an unusually common one because the lookup rules

Yeah, it's the worst part of Nim IMHO. Time to bring up an RFC and attack the problem. We need concepts and we need to attach procs to types.

@Araq
Copy link
Member

Araq commented Apr 21, 2021

@timotheecour since the RFC that would sort out these problems has not even been written yet (sorry), this extension must be opt-in first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO: followup needed remove tag once fixed or tracked elsewhere
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants