Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Term's "get_type()" implementation #581

Merged
merged 5 commits into from
Feb 13, 2024

Conversation

philss
Copy link
Member

@philss philss commented Dec 29, 2023

It was not returning the correct type for integers, probably due to a "garbage" arriving because of the wrong type.
This affects the latest v0.30.

@philss
Copy link
Member Author

philss commented Dec 29, 2023

Looks like this is not working well on Windows. I'm going to investigate tomorrow.

@evnu evnu requested a review from filmor December 29, 2023 07:19
rustler/src/dynamic.rs Show resolved Hide resolved
rustler_sys/build.rs Outdated Show resolved Hide resolved
@filmor
Copy link
Member

filmor commented Dec 29, 2023

Hah, even found out who introduced the bug: #212 🕵️‍♂️

I can't debug the Windows issue for the next week or so since I won't have a Windows machine available.

@philss
Copy link
Member Author

philss commented Jan 1, 2024

I can't debug the Windows issue for the next week or so since I won't have a Windows machine available.

Thanks! I'm trying to setup a Windows VM so I can debug this as well :)

@philss
Copy link
Member Author

philss commented Jan 5, 2024

Just to give an update: I've tested and was able to reproduce the build problem on Windows.
Unfortunately I couldn't yet figure out what is going on - it just return "unknown" for any term there :/
I should go back to this next week.

@filmor
Copy link
Member

filmor commented Jan 5, 2024

Yeah, I'll be able to look into it this weekend as well. My initial guess would be a mismatched base type for the enum, Windows longs are always 32 bit.

@filmor filmor mentioned this pull request Jan 9, 2024
4 tasks
@filmor
Copy link
Member

filmor commented Jan 20, 2024

@philss Did you get any further? I couldn't reproduce it so far, but I'll try again tomorrow.

@philss
Copy link
Member Author

philss commented Jan 20, 2024

@filmor unfortunately I couldn't :/

I tried to figure out where the int type is wrong, but there is no explicit declaration of it.
For a comparison, I did write a small NIF in C and it works good on Windows. So the problem seems to be on our side.
I will give another try this week as well.

@philss philss force-pushed the ps-fix-get_term_type-for-integers branch from d61e4bc to 2c7fa51 Compare January 29, 2024 19:48
@philss
Copy link
Member Author

philss commented Feb 2, 2024

@filmor I tried a bunch of things to fix this problem with the Windows build, but I couldn't find a solution.
I suspect the problem is in the bindings we create in the rustler_sys create for the NIF API.

I also tried to setup rust-bindgen on my windows VM, but without success. The idea was to generate a file that would follow more closely what is expected from the erl_nif.h file.

I think, given that we cannot make it work on Windows easily, we could drop this feature for that OS, documenting this behaviour. WDYT?

@josevalim
Copy link
Contributor

Given it never worked, perhaps we should just nuke it altogether. :)

@filmor
Copy link
Member

filmor commented Feb 7, 2024

Well, it works fine on Unix and is IMO the preferable approach (single function call vs. potentially ~10). I'll try to get the Windows interface in line (I think this is mostly about distinguishing floats from integers).

/edit: Also, this is clearly a potentially nasty bug (e.g. a type confusion) that I would really like to clear up.

@josevalim
Copy link
Contributor

It is a single function call but internally it is 10 bitmap comparisons. If you want to know if something is a float or integer, asking specifically for float and integer would be more efficient.

Also note that nor Erlang nor Elixir have a similar function and we have been just fine for decades. Given it is broken in main, odds are that we really don’t need it.

that said, if you can make it work, then that’s great. I am just saying that, if it doesn’t work, it is not the end of the world. :)

@filmor filmor force-pushed the ps-fix-get_term_type-for-integers branch 3 times, most recently from 39ba3af to 65bb94c Compare February 10, 2024 23:02
philss and others added 3 commits February 11, 2024 00:04
It was not returning the correct type for integers, probably due
to a "garbage" arriving because of the wrong type.
@filmor filmor force-pushed the ps-fix-get_term_type-for-integers branch from 65bb94c to 81e2f30 Compare February 10, 2024 23:05
@filmor
Copy link
Member

filmor commented Feb 10, 2024

@philss @evnu This finally works and passes all tests. It's really not pretty to essentially "copy" this gnarly code for is_float. We could still do as Jose suggests and drop the whole new code path, but since it works fine on non-Windows (which is likely the primary userbase anyhow) and we now have a workaround.

Notes:

  • I will still have to test this manually on 32bit Windows as that is not part of our CI at all and the is_float codepath is totally different (maybe enif_term_type works there as well?)
  • There is apparently a bug in OTP24, in that is_fun doesn't work for remote funs (fun module:function/0 or in Elixir &Function.identity/1). I don't intend to debug this further, it works for "function objects".

@filmor filmor requested a review from evnu February 11, 2024 00:35
@josevalim
Copy link
Contributor

For what is worth, I don't think we should export a function that does not work on Windows as it may become an additional pitfall for both developers and Windows users. When a package using Rustler misbehaves on Windows, it may lead to both users and authors spent time debugging only to land here.

If you really want to go down that route, can it be an opt-in feature? So it is not available by default, reducing the chance of accidental use. Or, make sure that it is not available on Windows (so it is not even compiled) or that it raises a clear error message?

@filmor
Copy link
Member

filmor commented Feb 11, 2024

Which export are you now talking about? enif_term_type is in rustler-sys, which is like all-sys crates "there be dragons".

What exactly would you make optional here?

My intention is to eventually get rid of the custom is_float again and use enif_term_type everywhere. The way it's now implemented, everything works the same way everywhere, apart from checking remote funs on OTP24, but that is a bug in a version of OTP that we are going to stop supporting in a few months (when 27 comes out) anyhow.

@josevalim
Copy link
Contributor

My comment was to make enif_term_type optional. :) However, I was assuming that it still does not work on Windows (while the C NIF does). Given you said:

The way it's now implemented, everything works the same way everywhere, apart from checking remote funs on OTP24

If it all works on Windows too, then just ignore me. :) But it may be that using enif_get_float and attempting to get a float may be more efficient than the several masks from enif_term_type.

@filmor
Copy link
Member

filmor commented Feb 11, 2024

  • On all non-Windows systems with OTP>=22 we use enif_term_type for the get_type function
  • On Windows, we use the old way with an additional hacky-copied version of the non-exposed is_float (which I will probably replace with an enif_get_float)
  • The behavior is now identical, the implementation is not (yet)
  • The bug with remote funs on OTP24 is preexisting ant not something I intend to follow up on apart from writing it down
  • I think the get_type will be useful for serializations, if benchmarks show that that's slower than chaining the enif_is_*, I'd consider that a bug in OTP since that is accoding to the docs why enif_term_type was introduced

Copy link
Member Author

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! For me this is good enough for now 🚢

@filmor thank you for the efforts here!

rustler/src/dynamic.rs Outdated Show resolved Hide resolved
OTP24 apparently does not correctly recognize remote funs
(`fun mod:func/1`) in `is_fun`.
@filmor filmor force-pushed the ps-fix-get_term_type-for-integers branch from 6624826 to 63ec23c Compare February 13, 2024 07:57
@filmor filmor merged commit 88c6e15 into master Feb 13, 2024
7 checks passed
@filmor filmor deleted the ps-fix-get_term_type-for-integers branch February 13, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants