Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-0.18] Readd GLPK #1498

Merged
merged 1 commit into from
Oct 1, 2018
Merged

[release-0.18] Readd GLPK #1498

merged 1 commit into from
Oct 1, 2018

Conversation

blegat
Copy link
Member

@blegat blegat commented Sep 26, 2018

No description provided.

@blegat blegat added this to the 0.18.3 milestone Sep 26, 2018
@blegat
Copy link
Member Author

blegat commented Sep 26, 2018

The segfaults are back. One segfault on Julia v0.6 in test/print.jl and one segfault on Julia v1.0 in test/callback.jl...

@ExpandingMan
Copy link
Contributor

Can't exactly say I'm surprised. However, at this point I'm finding it increasingly odd that we are constantly getting screwed on travis but can't reproduce this locally. I have still yet to ever encounter a single segfault locally. Can you reproduce in this PR?

@tkoolen
Copy link
Contributor

tkoolen commented Sep 26, 2018

@mlubin
Copy link
Member

mlubin commented Sep 26, 2018

Does this happen with the pre-binarybuilder version of GLPK?

@ExpandingMan
Copy link
Contributor

As I recall it was occurring on pre-binarybuilder versions of Clp, but I suppose you can check whether things are different with GLPK.

@tkoolen
Copy link
Contributor

tkoolen commented Sep 26, 2018

So https://travis-ci.org/JuliaOpt/JuMP.jl/jobs/433523257#L309 points to a try/finally. There was a change in try/catch/finally in 0.7: https://github.com/JuliaLang/julia/blob/master/HISTORY.md:

try blocks without catch or finally are no longer allowed. An explicit empty catch block should be written instead (#27554).

There's no catch block, just a finally. Add an empty catch block?

@mlubin
Copy link
Member

mlubin commented Sep 26, 2018

Hmm. The try/finally usage on that line is suggested in the Julia 1.0 manual: https://docs.julialang.org/en/v1/manual/control-flow/#finally-Clauses-1.

@ExpandingMan
Copy link
Contributor

I think the statement @tkoolen is referring to refers to blocks which have only try such as

try
    # stuff here
end

@tkoolen
Copy link
Contributor

tkoolen commented Sep 26, 2018

Illegal instructions aren't supposed to happen in any case. This is a Julia bug no matter what. I think it's likely that the less-common try/finally case got messed up in a subtle way while making changes to try/catch/finally.

@tkoolen
Copy link
Contributor

tkoolen commented Sep 26, 2018

@JeffBezanson, is there a chance that this case isn't being handled correctly after JuliaLang/julia#27559? I don't know Scheme at all, so any help would be very much appreciated; definitely not meant to point any fingers. For context, the illegal instructions @mlubin linked on Discourse are blocking a JuMP release that supports Julia 1.0.

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

@tkoolen good spidey senses. I was able to reproduce the crash locally, and it's fixed by commenting out the try/finally/end labels (leaving the code in the middle). It proceeds to fail for expected reasons after that.

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

JuMP's tests passed on 1.0 with GLPK (only) after the workaround in #1498 (comment) plus #1499 and JuliaOpt/GLPKMathProgInterface.jl#54. Haven't tried with other solvers yet.

@ExpandingMan
Copy link
Contributor

Awesome! This is with full tests, nothing commented out?

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

This is with full tests, nothing commented out?

Correct, I didn't comment anything out (besides the try/finally/end in GLPK).

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

I reported the issue at JuliaLang/julia#29382.

@blegat
Copy link
Member Author

blegat commented Sep 27, 2018

Without any solvers, tests pass on Julia v0.6. When adding GLPK, tests fails at this line:
https://travis-ci.org/JuliaOpt/JuMP.jl/jobs/433523249#L825
That is, line 366 of print.jl. However, print.jl does not use solvers !
When adding SCS, it segfaults at exactly the same line:
https://travis-ci.org/JuliaOpt/JuMP.jl/jobs/433972228#L819
So weird...

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

print.jl is loaded after solvers.jl, so it could still be affected somehow. What if we remove try_import (https://github.com/JuliaOpt/JuMP.jl/blob/release-0.18/test/solvers.jl#L15) and explicitly import only GLPKMathProgInterface?

@blegat
Copy link
Member Author

blegat commented Sep 27, 2018

@mlubin @tkoolen Thanks for resolving the segfault on Julia v1.0, it was a nice surprise when waking up.
I have tried resolving the segfaults on Julia v0.6 by disabling crashing lines for Julia v0.6. I started doing this for this branch and the bl/scs branch adding SCS. I made some progress on the bl/scs branch, made print.Jl, operator.jl and model.jl work and arrived at a segfault on test/models.jl. I then disabled some tests and pushed and then it now fails on test/operator.jl without showing which lines triggers the segfault. How could disabling tests on a file that was not yet included trigger a segfault ?

@blegat
Copy link
Member Author

blegat commented Sep 27, 2018

What if we remove try_import (https://github.com/JuliaOpt/JuMP.jl/blob/release-0.18/test/solvers.jl#L15) and explicitly import only GLPKMathProgInterface?

Good idea, let's try that :)

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

I then disabled some tests and pushed and then it now fails on test/operator.jl without showing which lines triggers the segfault. How could disabling tests on a file that was not yet included trigger a segfault ?

I don't think chasing down the line where the segfault happens will be too useful given the existing evidence. It's likely that something went wrong earlier and is revealed at an arbitrary later point.

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

@blegat, have you tested with a non-binarybuilder solver?

@blegat
Copy link
Member Author

blegat commented Sep 27, 2018

c34fab4 didn't seem to have any effect, it still crashes in operators.jl even though the solvers aren't loaded yet at that line !

have you tested with a non-binarybuilder solver?

Let's try https://github.com/JuliaOpt/JuMP.jl/tree/bl/csdp

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

@blegat
Copy link
Member Author

blegat commented Sep 27, 2018

It does not seem related to BinaryBuilder. CSDP does not use BB and loading it make tests fail:
https://travis-ci.org/JuliaOpt/JuMP.jl/jobs/434121473

@blegat
Copy link
Member Author

blegat commented Sep 27, 2018

Debug mode is not enabled:
https://docs.travis-ci.com/user/running-build-in-debug-mode/

"For public repositories, we have to enable it on a repository basis.
To enable debug for your public repositories, please email us at [email protected] and let us know which repositories you want activated."

@mlubin
Copy link
Member

mlubin commented Sep 27, 2018

We have had significant changes in src/operators.jl since 0.18.2, so the issue could be somewhere there.

@tkoolen
Copy link
Contributor

tkoolen commented Sep 27, 2018

I believe the Travis people usually respond pretty quickly to debug mode requests if you shoot them an email.

@mlubin
Copy link
Member

mlubin commented Sep 28, 2018

I'm able to reproduce the segfault locally with pretty high probability. It was too noisy for git bisect, but a manual bisect points to #1432 as the culprit. Here's the relevant valgrind output:

==23867== Invalid read of size 1
==23867==    at 0x667C1A0: llvm::Value::getName() const (in /home/mlubin/julia/julia-9d11f62bcb/lib/julia/libLLVM-3.9.so)
==23867==    by 0x511C0FD: jl_compile_linfo (codegen.cpp:1292)
==23867==    by 0x510D976: emit_invoke (codegen.cpp:3400)
==23867==    by 0x510D976: emit_expr(_jl_value_t*, jl_codectx_t*) (codegen.cpp:4135)
==23867==    by 0x5118A22: emit_stmtpos (codegen.cpp:4064)
==23867==    by 0x5118A22: emit_function(_jl_method_instance_t*, _jl_code_info_t*, unsigned long, _jl_llvm_functions_t*, jl_cgparams_t const*) (codegen.cpp:6248)
==23867==    by 0x511BDB7: jl_compile_linfo (codegen.cpp:1256)
==23867==    by 0x510D976: emit_invoke (codegen.cpp:3400)
==23867==    by 0x510D976: emit_expr(_jl_value_t*, jl_codectx_t*) (codegen.cpp:4135)
==23867==    by 0x5118C5F: emit_function(_jl_method_instance_t*, _jl_code_info_t*, unsigned long, _jl_llvm_functions_t*, jl_cgparams_t const*) (codegen.cpp:6101)
==23867==    by 0x511BDB7: jl_compile_linfo (codegen.cpp:1256)
==23867==    by 0x508157C: jl_compile_for_dispatch (gf.c:1651)
==23867==    by 0x5082B2F: jl_compile_method_internal (julia_internal.h:307)
==23867==    by 0x5082B2F: jl_call_method_internal (julia_internal.h:354)
==23867==    by 0x5082B2F: jl_apply_generic (gf.c:1903)
==23867==    by 0x1247F412: ???
==23867==    by 0x50AE7D4: jl_call_fptr_internal (julia_internal.h:339)
==23867==    by 0x50AE7D4: jl_call_method_internal (julia_internal.h:358)
==23867==    by 0x50AE7D4: jl_toplevel_eval_flex (toplevel.c:589)
==23867==  Address 0x1f is not stack'd, malloc'd or (recently) free'd
==23867== 

signal (11): Segmentation fault
while loading /home/mlubin/.julia/v0.6/JuMP/test/print.jl, in expression starting on line 32
unknown function (ip: 0x667c19f)
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3400 [inlined]
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:4135
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:4064 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6248
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1256
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3400 [inlined]
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:4135
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6101
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1256
jl_compile_for_dispatch at /buildworker/worker/package_linux64/build/src/gf.c:1651
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:307 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:354 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1903
macro expansion at /home/mlubin/.julia/v0.6/JuMP/src/macros.jl:454 [inlined]
macro expansion at /home/mlubin/.julia/v0.6/JuMP/test/print.jl:367 [inlined]
macro expansion at ./test.jl:860 [inlined]
macro expansion at /home/mlubin/.julia/v0.6/JuMP/test/print.jl:341 [inlined]
macro expansion at ./test.jl:860 [inlined]
anonymous at ./<missing> (unknown line)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:589
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:616
include_from_node1 at ./loading.jl:576
unknown function (ip: 0x123f95e2)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1903
include at ./sysimg.jl:14
unknown function (ip: 0x9ab44eb)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1903
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:75
eval at /buildworker/worker/package_linux64/build/src/interpreter.c:242
jl_interpret_toplevel_expr at /buildworker/worker/package_linux64/build/src/interpreter.c:34
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:577
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:616
include_from_node1 at ./loading.jl:576
unknown function (ip: 0x9c29ecb)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1903
include at ./sysimg.jl:14
unknown function (ip: 0x9ab44eb)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1903
process_options at ./client.jl:305
_start at ./client.jl:371
unknown function (ip: 0x9c38af8)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1903
jl_apply at /buildworker/worker/package_linux64/build/ui/../src/julia.h:1424 [inlined]
true_main at /buildworker/worker/package_linux64/build/ui/repl.c:127
main at /buildworker/worker/package_linux64/build/ui/repl.c:264
__libc_start_main at /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
unknown function (ip: 0x4016bc)
Allocations: 17978787 (Pool: 17970497; Big: 8290); GC: 37
==23867== 
==23867== Process terminating with default action of signal 11 (SIGSEGV)
==23867==  Access not within mapped region at address 0x1F
==23867==    at 0x667C1A0: llvm::Value::getName() const (in /home/mlubin/julia/julia-9d11f62bcb/lib/julia/libLLVM-3.9.so)
==23867==    by 0x511C0FD: jl_compile_linfo (codegen.cpp:1292)
==23867==    by 0x510D976: emit_invoke (codegen.cpp:3400)
==23867==    by 0x510D976: emit_expr(_jl_value_t*, jl_codectx_t*) (codegen.cpp:4135)
==23867==    by 0x5118A22: emit_stmtpos (codegen.cpp:4064)
==23867==    by 0x5118A22: emit_function(_jl_method_instance_t*, _jl_code_info_t*, unsigned long, _jl_llvm_functions_t*, jl_cgparams_t const*) (codegen.cpp:6248)
==23867==    by 0x511BDB7: jl_compile_linfo (codegen.cpp:1256)
==23867==    by 0x510D976: emit_invoke (codegen.cpp:3400)
==23867==    by 0x510D976: emit_expr(_jl_value_t*, jl_codectx_t*) (codegen.cpp:4135)
==23867==    by 0x5118C5F: emit_function(_jl_method_instance_t*, _jl_code_info_t*, unsigned long, _jl_llvm_functions_t*, jl_cgparams_t const*) (codegen.cpp:6101)
==23867==    by 0x511BDB7: jl_compile_linfo (codegen.cpp:1256)
==23867==    by 0x508157C: jl_compile_for_dispatch (gf.c:1651)
==23867==    by 0x5082B2F: jl_compile_method_internal (julia_internal.h:307)
==23867==    by 0x5082B2F: jl_call_method_internal (julia_internal.h:354)
==23867==    by 0x5082B2F: jl_apply_generic (gf.c:1903)
==23867==    by 0x1247F412: ???
==23867==    by 0x50AE7D4: jl_call_fptr_internal (julia_internal.h:339)
==23867==    by 0x50AE7D4: jl_call_method_internal (julia_internal.h:358)
==23867==    by 0x50AE7D4: jl_toplevel_eval_flex (toplevel.c:589)

@blegat
Copy link
Member Author

blegat commented Sep 28, 2018

What is your technique to reproduce the bug since the travis status of this PR was green ?

@tkoolen
Copy link
Contributor

tkoolen commented Sep 28, 2018

So if this is happening on line 367 of test/print.jl (on commit ee27487), it seems like GLPK may not be directly involved in this one. Can this be reduced to an example that doesn't involve GLPK at all?

@blegat
Copy link
Member Author

blegat commented Sep 28, 2018

@tkoolen why do you say it is caused by ee27487 ? Have you bisected through the commits of #1432 ?

Can this be reduced to an example that doesn't involve GLPK at all?

The segfault does not occur when there is no solver but it is thrown whatever is the solver loaded. Whether it is GLPK, CSDP or SCS. Even when moving the line include("solvers.jl") after include("print.jl"), the segfault also occurs.

@mlubin
Copy link
Member

mlubin commented Sep 28, 2018

What is your technique to reproduce the bug since the travis status of this PR was green ?

Running release-0.18 on Julia 0.6.4 with at least one solver installed. I had Cbc and SCS present.

@tkoolen
Copy link
Contributor

tkoolen commented Sep 28, 2018

@tkoolen why do you say it is caused by ee27487 ? Have you bisected through the commits of #1432 ?

No, ee27487 is just the last commit in that PR, so I was under the impression that the stack trace corresponds to that commit. Since that particular test case doesn't involve GLPK,

https://github.com/JuliaOpt/JuMP.jl/blob/ee274875697b1487618a2acede72d7d7f0a33d5e/test/print.jl#L349

I thought maybe there could be a chance that the crash is independent of GLPK, but perhaps that the presence of GLPK increases the chances of it happening. On the other hand, it's possible that memory corruption has already happened earlier. In any case, I think a good strategy would be to reduce the test code as much as possible (starting from ee27487) and see when the crash stops happening semi-reliably. Might also want to build Julia in debug mode with assertions enabled.

@blegat
Copy link
Member Author

blegat commented Sep 28, 2018

Running release-0.18 on Julia 0.6.4 with at least one solver installed. I had Cbc and SCS present.

Doesn't work on Arch Linux :/ I have pushed a branch https://github.com/JuliaOpt/JuMP.jl/tree/bl/PR1432 which is the same as the branch of #1432 to do the bisect on travis.
My fear is that we will arrive at a commit that made the segfault more likely instead of a commit that causes it but let's see.
Let's try to remove commits one by one in this branch until the segfault disappear.

@mlubin
Copy link
Member

mlubin commented Sep 28, 2018

@blegat Make sure to try the official linux binaries. It may depend on the version of LLVM given that the crash is inside LLVM.

@blegat
Copy link
Member Author

blegat commented Sep 29, 2018

The Julia v0.6 segfault is resolved by 25a342d 🎉
Waiting for JuliaLang/METADATA.jl#18432 now

@blegat blegat force-pushed the bl/glpk branch 2 times, most recently from 20452dc to 1f61942 Compare September 30, 2018 21:09
@blegat blegat merged commit bea3776 into release-0.18 Oct 1, 2018
@blegat blegat deleted the bl/glpk branch October 16, 2018 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants