Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build results vary from parallelism #9

Open
bmwiedemann opened this issue Aug 2, 2023 · 20 comments
Open

build results vary from parallelism #9

bmwiedemann opened this issue Aug 2, 2023 · 20 comments

Comments

@bmwiedemann
Copy link

While working on reproducible builds for openSUSE, I found that
our ocaml-camlp-streams-5.0.1 package varied randomly unless we build with -j1 or with taskset 1 dune or in a 1-core-VM.
This indicates that a race-condition happens.

To reproduce:

cd ~/rpmbuild/BUILD/ocaml-camlp-streams-5.0.1/ && for i in $(seq 100); do rm -rf _build ; "dune" "build" "--verbose" "--for-release-of-packages=camlp-streams" "-j4" "@install" >/dev/null 2>&1 ; md5sum _build/default/.camlp_streams.objs/byte/genlex.cmti ; done |sort|uniq -c
     87 a5a708508e5844516c0252cbe4061dad  _build/default/.camlp_streams.objs/byte/genlex.cmti
     13 fcd764146f8d5cb31dccafdcf4dc77c1  _build/default/.camlp_streams.objs/byte/genlex.cmti

On another machine, I got a 99:1 distribution, so it might be tricky to reproduce.

stream.cmti is another output file that varies.

Please make build output deterministic.

@bmwiedemann
Copy link
Author

diff looks thus:

--- old /usr/lib64/ocaml/camlp-streams/genlex.cmti (hex)
+++ new /usr/lib64/ocaml/camlp-streams/genlex.cmti (hex)
@@ -1,33 +1,33 @@
 00000380  90 30 8f 8f 63 45 58 79  8e e4 08 df 3c 50 a5 53  |.0..cEXy....<P.S|
 00000390  9b 15 40 84 95 a6 be 00  00 00 04 00 00 00 02 00  |..@.............|
 000003a0  00 00 05 00 00 00 05 a0  90 40 40 43 61 6d 6c 31  |.........@@Caml1|
-000003b0  39 39 39 54 30 33 31 84  95 a6 be 00 00 0e f5 00  |999T031.........|
-000003c0  00 02 f0 00 00 0b c3 00  00 0b 52 08 00 00 3c 00  |..........R...<.|
+000003b0  39 39 39 54 30 33 31 84  95 a6 be 00 00 0e ec 00  |999T031.........|
+000003c0  00 02 ed 00 00 0b bb 00  00 0b 4b 08 00 00 3c 00  |..........K...<.|
 000003d0  26 47 65 6e 6c 65 78 92  b0 a0 b0 9b c0 d0 94 d0  |&Genlex.........|
 000003e0  91 b0 a0 b0 9c c0 d0 c4  d0 a0 a1 90 92 26 53 74  |.............&St|
 000003f0  64 6c 69 62 26 47 65 6e  6c 65 78 a0 90 04 03 b0  |dlib&Genlex.....|
 00000400  c0 2a 67 65 6e 6c 65 78  2e 6d 6c 69 41 40 66 c0  |.*genlex.mliA@f.|
 00000410  04 02 41 40 6c 40 b0 04  04 04 02 40 93 04 0d 08  |..A@l@.....@....|
 00000420  00 00 30 00 a0 40 40 04  01 04 01 04 01 04 01 04  |..0..@@.........|
-00000430  01 04 01 04 01 40 ab a7  b1 b2 b2 b2 b2 b2 b2 b2  |.....@..........|
-00000440  b2 b2 b2 b2 b2 b1 b1 b1  b1 b1 b1 b1 b1 b1 b1 b1  |................|
-00000450  b1 b1 b1 b1 b1 40 a3 25  61 72 72 61 79 48 08 00  |.....@.%arrayH..|
-00000460  00 38 00 a0 c0 90 40 02  05 f5 e1 00 40 4d 40 41  |.8....@.....@M@A|
-00000470  40 41 40 a0 00 7f 40 a0  40 40 40 40 b0 c0 26 5f  |@A@...@.@@@@..&_|
-00000480  6e 6f 6e 65 5f 40 40 00  ff 04 02 41 40 40 40 92  |none_@@....A@@@.|
-00000490  04 0b a3 24 62 6f 6f 6c  45 08 00 00 38 00 40 40  |...$boolE...8.@@|
-000004a0  a1 a0 e0 a3 25 66 61 6c  73 65 5e 90 40 40 04 0d  |....%false^.@@..|
-000004b0  40 92 04 03 a0 e0 a3 24  74 72 75 65 5f 90 40 40  |@......$true_.@@|
-000004c0  04 13 40 92 04 03 40 40  41 40 40 40 40 40 04 14  |..@...@@A@@@@@..|
-000004d0  40 41 40 92 04 10 a3 24  63 68 61 72 42 08 00 00  |@A@....$charB...|
-000004e0  38 00 40 40 40 41 40 40  40 40 40 04 18 40 41 40  |8.@@@A@@@@@..@A@|
-000004f0  92 04 03 a3 23 65 78 6e  47 08 00 00 38 00 40 40  |....#exnG...8.@@|
-00000500  41 41 40 40 40 40 40 04  1c 40 40 40 92 04 03 a3  |AA@@@@@..@@@....|
-00000510  35 65 78 74 65 6e 73 69  6f 6e 5f 63 6f 6e 73 74  |5extension_const|
-00000520  72 75 63 74 6f 72 50 08  00 00 38 00 40 40 40 41  |ructorP...8.@@@A|
-00000530  40 40 40 40 40 04 20 40  40 40 92 04 03 a3 25 66  |@@@@@. @@@....%f|
-00000540  6c 6f 61 74 44 08 00 00  38 00 40 40 40 41 40 40  |loatD...8.@@@A@@|
-00000550  40 40 40 04 24 40 40 40  92 04 03 a3 2a 66 6c 6f  |@@@.$@@@....*flo|
-00000560  61 74 61 72 72 61 79 51  08 00 00 38 00 40 40 40  |atarrayQ...8.@@@|
-00000570  41 40 40 40 40 40 04 28  40 40 40 92 04 03 a3 23  |A@@@@@.(@@@....#|
+00000430  01 04 01 04 01 40 a7 b1  b2 b2 b2 b2 b2 b2 b2 b2  |.....@..........|
+00000440  b2 b2 b2 b2 b1 b1 b1 b1  b1 b1 b1 b1 b1 b1 b1 b1  |................|
+00000450  b1 b1 b1 b1 40 a3 25 61  72 72 61 79 48 08 00 00  |....@.%arrayH...|
+00000460  38 00 a0 c0 90 40 02 05  f5 e1 00 40 4d 40 41 40  |8....@.....@M@A@|
+00000470  41 40 a0 00 7f 40 a0 40  40 40 40 b0 c0 26 5f 6e  |A@...@.@@@@..&_n|
+00000480  6f 6e 65 5f 40 40 00 ff  04 02 41 40 40 40 92 04  |one_@@....A@@@..|
+00000490  0b a3 24 62 6f 6f 6c 45  08 00 00 38 00 40 40 a1  |..$boolE...8.@@.|
+000004a0  a0 e0 a3 25 66 61 6c 73  65 5e 90 40 40 04 0d 40  |...%false^.@@..@|
+000004b0  92 04 03 a0 e0 a3 24 74  72 75 65 5f 90 40 40 04  |......$true_.@@.|
+000004c0  13 40 92 04 03 40 40 41  40 40 40 40 40 04 14 40  |.@...@@A@@@@@..@|
+000004d0  41 40 92 04 10 a3 24 63  68 61 72 42 08 00 00 38  |A@....$charB...8|
+000004e0  00 40 40 40 41 40 40 40  40 40 04 18 40 41 40 92  |.@@@A@@@@@..@A@.|
+000004f0  04 03 a3 23 65 78 6e 47  08 00 00 38 00 40 40 41  |...#exnG...8.@@A|
+00000500  41 40 40 40 40 40 04 1c  40 40 40 92 04 03 a3 35  |A@@@@@..@@@....5|
+00000510  65 78 74 65 6e 73 69 6f  6e 5f 63 6f 6e 73 74 72  |extension_constr|
+00000520  75 63 74 6f 72 50 08 00  00 38 00 40 40 40 41 40  |uctorP...8.@@@A@|
+00000530  40 40 40 40 04 20 40 40  40 92 04 03 a3 25 66 6c  |@@@@. @@@....%fl|
+00000540  6f 61 74 44 08 00 00 38  00 40 40 40 41 40 40 40  |oatD...8.@@@A@@@|
+00000550  40 40 04 24 40 40 40 92  04 03 a3 2a 66 6c 6f 61  |@@.$@@@....*floa|
+00000560  74 61 72 72 61 79 51 08  00 00 38 00 40 40 40 41  |tarrayQ...8.@@@A|
+00000570  40 40 40 40 40 04 28 40  40 40 92 04 03 a3 23 69  |@@@@@.(@@@....#i|

@stedolan
Copy link

stedolan commented Aug 7, 2023

Which versions of OCaml and Dune are you using, and can you provide the two different cmti files?

@bmwiedemann
Copy link
Author

We have
ocaml-4.14.1
ocaml-dune-3.9.1

I now attached the cmti files: genlex.cmti.zip

@bmwiedemann
Copy link
Author

Ping. Do you have any updates?

@xavierleroy
Copy link
Contributor

I have the impression this is a Dune issue, and submitted it: ocaml/dune#9152 .

This is such a trivial project (2 small OCaml files...) that we could force a sequential build. But then we could just as well get rid of Dune and have a simple shell script to build everything.

@nojb
Copy link
Contributor

nojb commented Nov 13, 2023

For what is worth, I tried to reproduce, but was not successful (even increasing the number of iterations to 400).

@bmwiedemann
Copy link
Author

Strange. For some reason I cannot reproduce it anymore with the original reproducer, but with

for i in $(seq 100); do rm -rf _build ; "dune" "build" "--verbose" "--for-release-of-packages=camlp-streams" "-j4" "@install" >/dev/null 2>&1 ; md5sum _build/default/.camlp_streams.objs/byte/stream.cmti ; done |sort|uniq -c

@nojb
Copy link
Contributor

nojb commented Nov 15, 2023

I was able to reproduce with the Makefile below (to be put in an empty directory). I used the following command to test, with a loaded machine (running a separate large parallel build at the same time). I could then get failures consistently:

$ for i in $(seq 500); do make clean ; make -j4 >/dev/null ; md5sum stream.cmti ; done |sort|uniq -c
    492 d560ae703a76c49614c1308d160d7303  stream.cmti
      8 e0fb1e4741b43ff8473a7b8ec067d58a  stream.cmti

Makefile:

CAMLC = ocamlc.opt
COMPFLAGS = -w -40 -w -3 -g -bin-annot -no-alias-deps

camlp_streams.cma: stream.cmo genlex.cmo
        $(CAMLC) -a -o $@ stream.cmo genlex.cmo

stream.ml:
        echo 'include Stream' > $@

stream.mli:
        echo 'include module type of struct include Stream end' > $@

genlex.ml:
        echo 'include Genlex' > $@

genlex.mli:
        echo 'include module type of struct include Genlex end' > $@

genlex.cmi: genlex.mli
        $(CAMLC) $(COMPFLAGS) -c genlex.mli

stream.cmi: stream.mli
        $(CAMLC) $(COMPFLAGS) -c stream.mli

genlex.cmo: genlex.cmi genlex.ml
        $(CAMLC) $(COMPFLAGS) -c genlex.ml

stream.cmo: stream.cmi stream.ml
        $(CAMLC) $(COMPFLAGS) -c stream.ml

.PHONY: clean
clean:
        @rm -rf *.cm*

@bmwiedemann
Copy link
Author

https://rb.zq1.de/other/ocaml/ocaml-camlp-streams-provenance.dot.svg has the auto-generated view on how stream.cmti is created twice

@ejgallego
Copy link

@bmwiedemann I'm afraid I cannot understand that graph, how come ocamlc stream.mli is called twice?

@nojb
Copy link
Contributor

nojb commented Nov 15, 2023

Wild guess: the interface of Genlex (in the standard library) refers to Stream. Could it be that that reference means two different things depending on whether stream.cmi has already been built (in the current directory, so taking precedence over the standard library version) or not? Note that there is no ocamldep-visible dependency between Genlex and Stream, so these would be built in parallel...

@bmwiedemann
Copy link
Author

how come ocamlc stream.mli is called twice?

To me it looks as if we called dune build twice from the %build section in the .spec file, but I cannot find where that would be coming from.

Another run produced https://rb.zq1.de/other/ocaml/ocaml-camlp-streams-provenance1b.dot.svg without the double-write.

I did notice that stream.cmti is not an explicit output but a side-output of the creation of steam.cmi . Is that file really used at runtime?

@ejgallego
Copy link

@bmwiedemann indeed if you called dune build twice, since a while, dune won't allow this to happen in parallel. That graph seems strange to me.

Second graph looks normal (one call for build one for install)

IMHO @nojb theory seems like a prime candidate for the problem.

@xavierleroy
Copy link
Contributor

I did notice that stream.cmti is not an explicit output but a side-output of the creation of steam.cmi . Is that file really used at runtime?

It's a compact, compiled form of the textual interface stream.cmi, used by editors and editor services like Merlin. It doesn't participate in compilation and linking.

@xavierleroy
Copy link
Contributor

Wild guess: the interface of Genlex (in the standard library) refers to Stream. Could it be that that reference means two different things depending on whether stream.cmi has already been built (in the current directory, so taking precedence over the standard library version) or not?

Quite possibly! This should eventually lead to checksum errors ("...make inconsistent assumptions on...") but these appear only at link-time in OCaml 4.

Note that there is no ocamldep-visible dependency between Genlex and Stream, so these would be built in parallel...

Sounds dangerous! What happens if you add that dependency to your Makefile?

@nojb
Copy link
Contributor

nojb commented Nov 15, 2023

Note that there is no ocamldep-visible dependency between Genlex and Stream, so these would be built in parallel...

Sounds dangerous! What happens if you add that dependency to your Makefile?

The irreproducibility seems to go away.

Here is a "synthethic" example of what (I think) is going on, where genlex.cmti is different depending on whether stream.cmi has been built in the current directory or not:

$ echo 'include module type of struct include Stream end' > stream.mli
$ echo 'include module type of struct include Genlex end' > genlex.mli
$ ocamlc -w -3 -bin-annot -c genlex.mli
$ md5sum genlex.cmti
41a34aaea74766ef28350968a959f894  genlex.cmti
$ ocamlc -w -3 -bin-annot -c stream.mli
$ ocamlc -w -3 -bin-annot -c genlex.mli
$ md5sum genlex.cmti
85cfabb6f480fbb01458713539d4bb4e  genlex.cmti

@bmwiedemann
Copy link
Author

That would also explain why there are only ever 2 different results (1 bit of randomness).

@stedolan
Copy link

I dug into @bmwiedemann's example genlex.cmti files a bit. They differ in the environment attached to a Tmod_constraint node in the typedtree: one has environment E and the other has Env_persistent (E, Ident.create_persistent "Stream").

Env_persistent arises only from the following logic in env.ml:

    let material =
      (* This addition only observably changes the environment if it shadows a
         non-persistent module already in the environment.
         (See PR#9345) *)
      match
        IdTbl.find_name wrap_module ~mark:false (Ident.name id) env.modules
      with
      | exception Not_found | _, Mod_persistent -> false
      | _ -> true
    in
    let summary =
      if material then Env_persistent (env.summary, id)
      else env.summary
    in

It looks like the value of material is changing depending on whether stream.cmi has been built. I am surprised by this: I think the presence or absence of stream.cmi should affect whether find_name reports Not_found or finds a Mod_persistent module, but both of these yield material = false. To get material = true and hence an Env_persistent node, we'd need to have a Stream module in the environment which is not Mod_persistent, but I don't know where this might come from.

Perhaps this is something to do with the shadowing of Stream?

@nojb
Copy link
Contributor

nojb commented Nov 15, 2023

but I don't know where this might come from.

Could it be the one coming from the standard library, Stdlib.Stream?

@stedolan
Copy link

Hmm, maybe material=true without Mod_persistent in this case because the Stream module in the environment is not the Stream module from stream.mli, but the Stream module alias from stdlib.mli (which points to the same thing, but maybe does not have the Mod_persistent flag set).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants