Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite loop in windows memory manager #12248

Merged
merged 1 commit into from
Jul 21, 2015
Merged

Conversation

yuyichao
Copy link
Contributor

I'll do more testing on various commit later (haven't actually even tested this compiles yet). Currently debugging with the most reliable failure from #11569 (comment).

When it fails, this function is called with ActualSize == 0 and somehow JMM->startFunctionBody returns exactly the same size as requested. This breaks the retry logic which is doubling the size every time (because doubling a 0 is getting no where...)

@tkelman Would be nice if you can do a few tests locally.

@Keno @vtjnash I didn't run blame but would be nice if you could check if this makes sense.

Close #7942

@yuyichao yuyichao added system:windows Affects only Windows compiler:codegen Generation of LLVM IR and native code labels Jul 21, 2015
@yuyichao yuyichao added this to the 0.4.0 milestone Jul 21, 2015
@yuyichao
Copy link
Contributor Author

The commit that introduce this is 524e305 and 75b43c0 changes the size to the current one. Both commits are pre-0.3.0. @vtjnash

@yuyichao
Copy link
Contributor Author

OT question, why is pstree / pgrep not showing the correct PID on windows?....

@yuyichao
Copy link
Contributor Author

OK, it compiles and cherry picking this commit on top of 56342be fixes the bootstrap issue...

@pao
Copy link
Member

pao commented Jul 21, 2015

crosses fingers

@yuyichao yuyichao changed the title Try to fix infinite loop in windows memory manager Fix infinite loop in windows memory manager Jul 21, 2015
@vtjnash
Copy link
Member

vtjnash commented Jul 21, 2015

lgtm. thanks for solving this!

@yuyichao
Copy link
Contributor Author

Assuming this is the reason we have JULIA_CPU_CORES=1 on AppVeyor Win64 and the test for the first commit is happily running, I just pushed a second commit to enable multi-process on Win64 and see what happens.

@JeffBezanson
Copy link
Member

Many thanks, this badly needed fixing.

@yuyichao
Copy link
Contributor Author

Anything wrong with Travis Mac ? There doesn't seem to be a active build running right now..

@yuyichao
Copy link
Contributor Author

@tkelman It can't possibly be that my PR that fixes the Windows freeze is the first commit that hit a upstream compiler flag issue, or is it........... T-T

AppVeyor command compiling codegen.cpp for the first commit

AppVeyor command compiling codegen.cpp for the second commit

@vtjnash
Copy link
Member

vtjnash commented Jul 21, 2015

that happens when something goes wrong with running llvm-config

@yuyichao
Copy link
Contributor Author

that happens when something goes wrong with running llvm-config

And IIRC last time it was because sth changed on OBS...

@yuyichao
Copy link
Contributor Author

I don't understand why removing JULIA_CPU_CORES=1 make compilation fail on AppVeyor win 64 (caching issue??). The first commit passes the test (although there's seems to be something wrong with Travis OSX) so I removed the second commit from this PR and will use a separate PR to see what's wrong with it...

Merge now. Let's see if we still see any AppVeyor Win64 freeze in the next few days. Hopefully we can be free of crying Tony for some time...

yuyichao added a commit that referenced this pull request Jul 21, 2015
Fix infinite loop in windows memory manager
@yuyichao yuyichao merged commit 604609e into master Jul 21, 2015
@yuyichao yuyichao deleted the yyc/appveyor-freeze branch July 21, 2015 19:33
@tkelman
Copy link
Contributor

tkelman commented Jul 21, 2015

GCC 5.2.0 came out recently so there's a nonzero chance the ABI broke again. I'll look into it, but man I can't believe this was a 2-line change. If this is it, you're my freaking hero. Should also probably backport this?

@vtjnash
Copy link
Member

vtjnash commented Jul 21, 2015

yep, looks like that's one of the new features (https://gcc.gnu.org/gcc-5/changes.html#libstdcxx)

@yuyichao
Copy link
Contributor Author

@tkelman I opened a new PR for the parallel test.

I'll look into it, but man I can't believe this was a 2-line change.

At least it is coherent with everything I've seen so far.

If this is it, you're my freaking hero.

Not sure about this but @carnaval predict that you will be dancing on the table. =)

yuyichao added a commit that referenced this pull request Jul 21, 2015
@tkelman
Copy link
Contributor

tkelman commented Jul 22, 2015

I dunno if @timholy ever got his rbp < c->dwarf.cfa tattoo but I guess now I'll have to get
if (ActualSize == 0) ActualSize += 64;

@yuyichao
Copy link
Contributor Author

Feel free to replace 64 with your favorate number. ;-p

@timholy
Copy link
Member

timholy commented Jul 22, 2015

I dunno if @timholy ever got his rbp < c->dwarf.cfa tattoo

I did, but it was a temporary so it got garbage collected 😄

@yuyichao
Copy link
Contributor Author

So there hasn't been any Win64 freeze on AppVeyor for the past two days. Comparing to 8 freezes per day on average for the 4 days before that, I think it's safe to say that this patch has fixed the issue.

That said, there is a new LLVM assertion on AppVeyor Win32. It happens twice for the past two days

https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.7131/job/5445bhruvj90382g
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.7137/job/xbajrvhjr0k33csk

I have never seen these before although I wasn't checking every build and the Win64 freezing background may also make it hard to see before.

@tkelman Have you seen these before?
Or maybe there's sth inherently bad about @mbauman and Win32 since both of these are your PR. ;-p

@mbauman
Copy link
Member

mbauman commented Jul 23, 2015

I think it's fairly well-established that I'm doing everything in my power to break Julia on Windows.

@tkelman
Copy link
Contributor

tkelman commented Jul 23, 2015

Yep I'm pretty sure you resolved the long-standing freeze #7942 (which @mbauman and a few others had a knack for getting to trigger earlier in bootstrap as opposed to at startup), but ever exciting, that win32 assertion failure looks new and not present on master yet. I can try to build the branch for #12273 locally if it would help, though looks like there are more changes pending there.

@yuyichao
Copy link
Contributor Author

FWIW, (and you probably already know) I restarted #12270 twice and the assertion only happens once (which is not too surprising since we have multi process testing). Would be nice to have a backtrace...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code system:windows Affects only Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants