Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavioral inconsistency of strptime between emscripten and glibc/musl #21024

Open
gudzpoz opened this issue Jan 6, 2024 · 0 comments · May be fixed by #22158
Open

Behavioral inconsistency of strptime between emscripten and glibc/musl #21024

gudzpoz opened this issue Jan 6, 2024 · 0 comments · May be fixed by #22158

Comments

@gudzpoz
Copy link

gudzpoz commented Jan 6, 2024

In short, strptime("2012/2/30", "%Y/%m/%d", &date) yields 2012/3/1 in emscripten while gcc a.c and musl-gcc a.c both give 2012/2/30. (Possibly UB though.)

Version of emscripten/emsdk:

$ emcc -v       
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.51-git (c0c2ca1314672a25699846b4663701bcb6f69cca)
clang version 18.0.0git (/startdir/llvm-project f2464ca317bfeeedddb7cbdea3c2c8ec487890bb)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /opt/emscripten-llvm/bin

Failing command line in full:

// a.c
#include <stdio.h>
#include <time.h>

int main() {
  struct tm date;
  strptime("2012/2/30", "%Y/%m/%d", &date);
  printf("%d/%d/%d\n", date.tm_year + 1900, date.tm_mon + 1, date.tm_mday);
  return 0;
}
$ emcc a.c
$ node a.out.js
2012/3/1

In comparison, gcc a.c and musl-gcc a.c both give 2012/2/30.

Spec:
The POSIX spec does not seem to specify what to do when it encounters an invalid number that does not fit into the range, so this is possibly not a but but an undefined behavior instead? (But even if it is a UB, it really helps if emscripten follows the behavior of glibc/musl.)

@gudzpoz gudzpoz changed the title Behavioral Inconsistency of strptime between emscripten and glibc/musl Behavioral inconsistency of strptime between emscripten and glibc/musl Jan 6, 2024
sbc100 added a commit to sbc100/emscripten that referenced this issue Jul 3, 2024
This seems to save on code size compared the JS version as well as
making our Wasm files more standalone.  I measured the combined size of
`core2.test_strptime_days` (with closure compiler enabled) and got the
following results:

- before: 16777 bytes
- after: 17235 bytes

As part of this I updated musl's implementation to support several
glibc features that out current tests depend on:

- Add support for '%z' for parsing timezone offsets.
- Add support for '%U' and '%W' (week of the year)
- Derive and set tm_yday/tm_mday/tm_yday fields where possible.

Regarding '%U' and '%W' it seems this is missing from both musl and
bionic:

- https://github.com/aosp-mirror/platform_bionic/blob/4cfb2bde90fcf0a1f835e2b670b00e9f226c1c7c/libc/tzcode/strptime.c#L364-L375
- https://github.com/bminor/musl/blob/f314e133929b6379eccc632bef32eaebb66a7335/src/time/strptime.c#L126-L132

However, I implemented here (based on code from FreeBSD) in order to
avoid any regressions (since we do have test coverage for these flags).

Fixes: emscripten-core#21024
@sbc100 sbc100 linked a pull request Jul 3, 2024 that will close this issue
sbc100 added a commit to sbc100/emscripten that referenced this issue Jul 3, 2024
This seems to save on code size compared the JS version as well as
making our Wasm files more standalone.  I measured the combined size of
`core2.test_strptime_days` (with closure compiler enabled) and got the
following results:

- before: 16777 bytes
- after: 17235 bytes

As part of this I updated musl's implementation to support several
glibc features that out current tests depend on:

- Add support for '%z' for parsing timezone offsets.
- Add support for '%U' and '%W' (week of the year)
- Derive and set tm_yday/tm_mday/tm_yday fields where possible.

Regarding '%U' and '%W' it seems this is missing from both musl and
bionic:

- https://github.com/aosp-mirror/platform_bionic/blob/4cfb2bde90fcf0a1f835e2b670b00e9f226c1c7c/libc/tzcode/strptime.c#L364-L375
- https://github.com/bminor/musl/blob/f314e133929b6379eccc632bef32eaebb66a7335/src/time/strptime.c#L126-L132

However, I implemented here (based on code from FreeBSD) in order to
avoid any regressions (since we do have test coverage for these flags).

Fixes: emscripten-core#21024
sbc100 added a commit to sbc100/emscripten that referenced this issue Jul 3, 2024
This is a slight regression in code size compared the JS version, but
makes the wasm file more standalone and portable.   I measured the
combined size of `core2.test_strptime_days` (with closure compiler
enabled) and got the following results, which is a 2% regression:

- before: 16777 bytes
- after: 17235 bytes

As part of this I updated musl's implementation to support several
features that out current tests depend on:

- Add support for '%z' for parsing timezone offsets.
- Add support for '%U' and '%W' (week of the year)
- Derive and set tm_yday/tm_mday/tm_yday fields where possible.

Regarding '%U' and '%W' it seems this is missing from both musl and
bionic:

- https://github.com/aosp-mirror/platform_bionic/blob/4cfb2bde90fcf0a1f835e2b670b00e9f226c1c7c/libc/tzcode/strptime.c#L364-L375
- https://github.com/bminor/musl/blob/f314e133929b6379eccc632bef32eaebb66a7335/src/time/strptime.c#L126-L132

However, I implemented here (based on code from FreeBSD) in order to
avoid any regressions (since we do have test coverage for these flags).

Fixes: emscripten-core#21024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant