-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cygwin DLL address conflict with version 5.41.6 #22695
Comments
The DLL load addresses are generated by the linker based on the DLL names, and 5.39.10 we're getting a conflict between cygperl5_41_6.dll and Langinfo.dll. As a workaround, statically link Langinfo into cygperl for CI and mention the problem in perldelta for anyone else who might build perl on cygwin Fixes but should not close Perl#22695
The DLL load addresses are generated by the linker based on the DLL names, and 5.39.10 we're getting a conflict between cygperl5_41_6.dll and Langinfo.dll. As a workaround, statically link Langinfo into cygperl for CI and mention the problem in perldelta for anyone else who might build perl on cygwin Fixes but should not close Perl#22695
The DLL load addresses are generated by the linker based on the DLL names, and 5.39.10 we're getting a conflict between cygperl5_41_6.dll and Langinfo.dll. As a workaround, statically link Langinfo into cygperl for CI and mention the problem in perldelta for anyone else who might build perl on cygwin Fixes but should not close #22695
Cygwin's fork emulation doesn't handle overlapping addresses between different DLLs, since it tries to lay out the address space of the child process to match the parent process, but if there's an address conflict between DLLs, Windows may load those DLLs at different addresses. To avoid having to manually assign addresses to each DLL, since around 5.10 we've used --enable-auto-image-base to assign load addresses for cygperl*.dll and dynamic extension DLLs and this has mostly worked well, but as perl has gotten larger and cygperl*.dll has grown, we've had two cases where there's overlap between the address space for cygperl*.dll and some extension DLL, see Perl#22695 and Perl#22104. This problem occurs because: - cygperl*.dll is large, and with -DDEBUGGING or some other option that increases binary size, even large, occupying more than one of the "slots" that the automatic image base code in ld can assign the DLL to. - unlike the extension DLLs, the name of cygperl*.dll changes with every release, so we roll the dice each release on whether there will be a conflict between cygperl*.dll and some other DLL. Previously I've added an entry to perldelta and updated the CI workflow to workaround the conflict, this change should prevent that particular conflict. The addresses I've chosen here are "just" (for large values of "just") below the base address range used by automatic address space selection. For 64-bit this was done by inspection, examing the output of "rebase -i" on the extension DLLs and looking at the source of ld, in particular: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/emultempl/pep.em;h=00c4ea9e15a765c29b15b621f53d6bfcb499e5ed;hb=HEAD#l144 Note cygwin builds set move_default_addr_high=1 if you read that code. For 32-bit I just looked at the source: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/emultempl/pe.em;h=52f59b8b#l173 since I don't have a 32-bit cygwin install any more, since cygwin no longer ship it and it commonly had the fork address conflicts discussed above. I would have liked to make the load address configurable via -Dcygperl_base or similar, but I didn't see a way to get the base address to pass from Configure through to Makefile.SH. Fixes Perl#22695
As of Dec 26 2024, we're seeing new test failures on the |
That appears to be an separate problem, related to |
I suspect this is the problem which has a cygwin patch available, we just need to wait for a 3.5.6 release, which will probably be delayed due to the holidays. |
FWIW, I tried a Cygwin blead perl build today. I can not reproduce this bug ticket's 2 test fails. I'll requote them So the bug is specific to Debug it without pushing to blead brand a commit with C intrinsic function MS Supposedly GH Actions VMs aren't WWW air-gapped, see these posts from google. https://discourse.julialang.org/t/non-deterministic-segfault-only-triggered-on-github-actions-tips-for-debugging/53367/3 This cygwin failure reminds me alot about this bug [rt.perl.org#88840] I fixed it in d903973 There was also this ticket which at first was ID'ed and fixed as real not-ithread-compatible core interp mem corruption, but the ticket got wrongly tangled up with fork.t race-condition [rt.perl.org#109718] Searchable as RT109718$ #11927 but this absurd race-condition was at first git false-bisected to the real and fixed but irrelevant mem corruption bug introduced in 676a678
Here are some terms I will use, Kernel == disk file # 1 factor I had 4 physical CPUs, in 4U HP Proliant DL585 G1, which had 4 physical Opteron 846 @ 2.0Ghz. 2 cores per physical CPU, total of 8 cores for WinOS. I will bet the NT 5.2 kernel turns off NUMA flag at boot time for all Multi-Core but One Chip X64 motherboards. Nobody has SMP Desktops/Towers >= 2010, not even gamers with unlimited budgets. But I am using Perl in 2013 on an ancient big iron 4U enterprise server WITH Win64 5.2. # 2 factor My HP Proliant DL585 had the slowest enterprise SCSI RAID controller ever built in the 21st century. Seek time was a horrible 65-80 milliseconds in RAID 0. I never figured out why. HP Proliant BIOS did not allow booting from a PCI-X card, Choices were CdRom or HP SmartArray or floppy. No PCI-X cards. so I couldn't boot off my added SATA card. Looking back, I think it was because I got it used, and the HP BIOS yearly license keys were expired, so I was punished in a "shareware mode". The HP ILO hardware module was always nagging for $. # 3 Kernel must be doing the thread pinning/process pinning optimization. I speculate, this is all of course reverse engineering, since you can't git-blame Windows cough cough. My single stepping and reading oldnewthings/reactos repo/Russinovich/Alex Ionescu/conference talks by MS engineers, make guesses pretty accurate. So So the wonderful NT 5.2 kernel, has a new feature, C function call Sleep(0) (and select()/wait()), is physical processor local only. Actually, every syscall Ring 3->0 is phy-CPU local. MS's new algo is, is cur_exec KTHREAD syscalls to Ring 0, and wants to de-schedule/block, Ring 3 C stack and Ring 3 CPU registers wont be detached from the logical CPU. No context switching! Instead each CPU core only iterates through its lock-free local linked of KTHREADs looking for READY_TO_RUN flag, If no READY_TO_RUN flags are found, Sleep(0) insta (50 ns?) returns. Even if on another phy-CPU, all COREs are maxed out 100%, and and there is blocking happening and"performance killing" on a deeper and deeper There is no pool global pool and no mutex/CRITICAL_SECTION anymore. No contention. But also the kernel WONT move a process or KTHREAD away from the "pinned" favorite phy-CPU. Now we have psuedo lock inversion. Atleast until wall time hits the next multiple of 15 milliseconds. MS Public API contract for wall time update granularity and timeslice unit has been 15 mins from 1993 NT 3.1 to my NT 6.1 (7). maybe NT 10.0???? Suddenly the NT3.1-5.1 behavior of CreateProcess() CreateThread() always context switching and descheduling caller Ring3 KTHREAD (parent perl ithread), and an assumed mandatory (thats UB!!!) next 15 ms timeslot on brand new child THREAD/PROCESS obj thru priority boosting algorithm. CHANGED! The race bugs can't be debugged in a C debugger, without setting breakpoints in dissaembler view, and using thread freeze/resume'ing BOTH THREADS at different times, and SINGLE STEP one thread at a time, until the "stars" align. and the deadlock/crash/panic happens. Plus POSIX/Perl are fundamentally unstable/using un-inited variable, if it involves 64bit int 8 byte doubles/NVs are worse, 53 bit ints, but perl/win32/*.c has no NV usage. The race condition bug came back recently and was fixed again!!!! MS Public API contract for wall time update granularity and timeslice unit. win32.c: rework the waitpid(-1, WNOHANG) fix And in 2012 the race had many parts fix over/underflow issues in win32_msgwait Avoid race codition when setting process exit code on Windows. I think these Perl, Windows, and "time IPC problems" race problems, can't be repro-ed on any developers in a VM on any work/home machines, and can't be reproduced on a developer's day job's rackmount servers either. VM or no VM. I'll hypothesize the reason is, AWS/Azure/GCP, are always pegged at 100% CPU usage "host" hypervisor wise. ALL 640 cores per rackmount, every rackmount, is 100% CPU usage 24/7. No dev or company will run other peoples workloads on at-home/on-campus hardware. Dynamic spot pricing, not-public compute time auctions, etc, datacenters became airline tickets Every second/a server's CPU is idle, is lost money like empty airline seats after takeoffs. Crypto mining, weather data, and malware analytics firms, will always pay for 7 seconds TTFB no-guarantees System details for my blead CygPerl, that CAN NOT reproduce the https://github.com/Perl/perl5/actions/runs/12481530925/job/34915098507?pr=2287 and passes bead @ commit 8f5aa22 - Chad Granum - 12/23/2024 4:58:25 PM - cpan/Test-Simple - Update to version 1.302206 -V perlI used Configure -D. ```cygwin1.dll``` calls itself 3004.9.0.0 / 3.4.10. OS Windows 7 Service Pack 1 X64 6.1.7601
|
The fork error bug is well understood, and had a workaround for CI and workaround documentation in perldelta in #22696, and a more permanent fix in #22853. You'll only see this error specifically with a perl with version 5.41.6, due to the way automatic base addresses are calculated. The kill/signal bug is completely unrelated and should have been reported as a new ticket, the symptoms didn't match the original report. That kill/signal bug requires cygwin 3.5.5, you don't say what you tested against. I reproduced it on my desktop:
cygwin have a patch for what looks like the signal issue, which I linked above, it's just waiting on people to get back from holidays and catch up. |
Module:
Description
Similar to #22104 we're seeing address conflicts loading LangInfo.dll in CI:
And rebase shows the conflict:
And can be similarly reproduced:
I plan to prepare a fix and perldelta note PR.
Steps to Reproduce
-DDEBUGGING
, there may or may not be a conflict without it./perl -Ilib -MI18N::LangInfo -efork
Expected behavior
No error on fork.
Perl configuration
The text was updated successfully, but these errors were encountered: