Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New test failure in the Zeitwerk test suite #2431

Open
fxn opened this issue Aug 27, 2021 · 16 comments
Open

New test failure in the Zeitwerk test suite #2431

fxn opened this issue Aug 27, 2021 · 16 comments

Comments

@fxn
Copy link

fxn commented Aug 27, 2021

Hi!

The suite of Zeitwerk has this test to check autoloading blocks context switching (autoloading is thread-safe):

test "constant definition is synchronized" do
  files = [["m.rb", <<-EOS]]
    module M
      sleep 0.5

      def self.works?
        true
      end
    end
  EOS
  with_setup(files) do
    t = Thread.new { M }
    assert M.works?
    t.join
  end
end

It has passed regularly for a long time, but it started to fail in a flaky way both in truffleruby and in truffleruby-head. See for example this build, or this build.

Does it ring a bell?

fxn added a commit to fxn/zeitwerk that referenced this issue Aug 27, 2021
Like to have CI green. Lately, TruffleRuby has been flaky
in the concurrency test. Don't know why, opened an issue

    oracle/truffleruby#2431

When we figure it out, I'll restore it.
@fxn
Copy link
Author

fxn commented Aug 27, 2021

This is a shell script that essentially does the same as the flaky test:

#!/bin/sh

cat <<EOS > m.rb
module M
  sleep 0.5

  def self.works?
    true
  end
end
EOS

ruby -I. <<EOS
autoload :M, "m"
t = Thread.new { M }
p M.works?
EOS

rm m.rb

If it passes, you see "true" printed. Otherwise, M does not respond to works? (NoMethodError) because there was a context switch before the require triggered by autoload returned.

@fxn
Copy link
Author

fxn commented Aug 27, 2021

With CRuby, that passes even with sleep 60.

@eregon
Copy link
Member

eregon commented Aug 28, 2021

Thanks a lot for the report.
This is an area where the semantics of CRuby seems unclear, e.g. does it make other threads wait until the end of module M or the end of the file until the constant M is "published"? What happens if the autoloaded file creates threads, do they see M?

In short this is not yet implemented in TruffleRuby (IIRC), because of unclear semantics.
Currently there is a lock on who wins to autoload a constant, but currently no lock for "wait some other thread has finished autoloading", since the constant is published as soon as assigned.
Preventing thread switching itself seems difficult on JVM, and would feel like a hack at best + potential for deadlocks.
I guess we should look at what CRuby does here but the autoload logic in CRuby seems particularly unreadable.

Re Zeitwerk CI, could you exclude this test for TruffleRuby and keep running other tests? If it passed before I would think it was just lucky timings.

@eregon
Copy link
Member

eregon commented Aug 28, 2021

BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?, otherwise it's likely the M.works? runs first and then the other thread will see the constant is autoloading and wait, and anyway that thread does not check what is defined on M. For the test to fail it needs to be the Thread.new running first and defining the constant but not yet the method, before the main thread keeps running and call the method.

@eregon eregon self-assigned this Aug 28, 2021
@fxn
Copy link
Author

fxn commented Aug 28, 2021

Problem is, this is a fundamental property of Zeitwerk, which is a fundamental property of CRuby: You won't see a class half-defined due to context switching while autoloading. Meaning, what the test does (of course, a class may be reopened elsewhere later, not in a generic sense).

BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?

You're right.

@fxn
Copy link
Author

fxn commented Aug 28, 2021

Test revised here: fxn/zeitwerk@016404c. Thanks for the comment about that :).

@fxn
Copy link
Author

fxn commented Sep 24, 2021

Hey! I have documented this in the README. I believe it is accurate, but let me know if it should word it in a different way :).

@eregon
Copy link
Member

eregon commented Sep 24, 2021

That looks accurate, I'll try to fix this issue soon.

jamieblynn added a commit to jamieblynn/zeitwerk that referenced this issue Jan 29, 2022
Like to have CI green. Lately, TruffleRuby has been flaky
in the concurrency test. Don't know why, opened an issue

    oracle/truffleruby#2431

When we figure it out, I'll restore it.
@eregon
Copy link
Member

eregon commented May 10, 2022

Some more details about how CRuby handles this: ruby/ruby#5788
And: (from @ioquatix)

No, it's literally cached, there is an autoload cache
Instead of setting the constant on the module, it's stashed into an autoload save area
once the autoload is done, all the constants are added to the module and all waiting threads are unblocked

@ioquatix
Copy link
Contributor

ioquatix commented May 10, 2022

I'm slowly figuring out how it works, I guess it's not a bad implementation, but I think variable.c could do with a rewrite. lol. ruby/ruby#5898 fyi

@eregon
Copy link
Member

eregon commented May 10, 2022

FWIW, I found https://bugs.ruby-lang.org/issues/15663#change-77214 where I asked some autoload questions and which makes it somewhat clear it's so complex nobody understands it fully.
I think it shouldn't be too bad to implement in TruffleRuby because we already have a lock per autoload constant.
But potentially deadlocks is an issue to consider, i.e., what if one thread is loading A and that needs B, and another thread is loading B and that needs A? Deadlock or the "isolation" of autoload is broken in that case?

@fxn
Copy link
Author

fxn commented May 10, 2022

I am not familiar with the implementation, but isn't require already synchronized to avoid seeing partially evaluated files elsewhere? Why aren't these autoloads inheriting or relying on this property?

@eregon
Copy link
Member

eregon commented May 10, 2022

@fxn require only synchronizes per file, i.e., to guarantee a file is only loaded once (that's easy enough).
That doesn't prevent that e.g. class C defined in c.rb is seen partially defined (e.g. some methods defined, some not, only part of c.rb was evaluated) by file use.rb being loaded in another thread and e.g. doing C.new.foo -> potential NoMethodError.

autoload makes it possible to know some of those dependencies, but nested autoloads cannot be known in advance.
I think I'll need to write a test script to figure what CRuby does there and if/how it deals with those potential deadlocks.

The general strategy for autoload seems to not publish a constant's value while the file is being loaded (so other threads cannot see it's set and continue executing, they have to block and wait), and instead the autoload logic stashes the value somewhere so only that thread can see it, and then publish the constant at once when the file has been loaded.
Not sure about other constants defined in the file, I'd think there is no special behavior there.

@fxn
Copy link
Author

fxn commented May 10, 2022

@eregon But if user.rb uses C, it should require 'c' before at some point in the code path, no?

@fxn
Copy link
Author

fxn commented May 10, 2022

Something that could be relevant in @ioquatix foobar.rb example (just a remark, I don't know from the top of my head) is the constant lookup that happens in module Foo; end may trigger the autoload for Foo, depending on load order. Same for the constant lookup that happens in module Bar; end. So, one of them is trying to require a file that is being required by the other one simultaneously.

@fxn
Copy link
Author

fxn commented May 10, 2022

So, let me repeat the question.

The example sets autoloads for two constants, pointing to the same file. Since that is synchronized, why aren't the autoloads?

I'd expect the second one to trigger a require that blocks, waits, and when it returns, the constant is defined and all continues.

This argument is incorrect, as shown by the tests, but in what sense?

connorchris831 added a commit to connorchris831/zeitwerk that referenced this issue Jan 5, 2023
Like to have CI green. Lately, TruffleRuby has been flaky
in the concurrency test. Don't know why, opened an issue

    oracle/truffleruby#2431

When we figure it out, I'll restore it.
tigran25 added a commit to tigran25/zeitwerk that referenced this issue Mar 25, 2023
Like to have CI green. Lately, TruffleRuby has been flaky
in the concurrency test. Don't know why, opened an issue

    oracle/truffleruby#2431

When we figure it out, I'll restore it.
graalvmbot pushed a commit that referenced this issue May 31, 2023
* We stash the value of the constant being autoloaded in AutoloadConstant
  and only publish it once the autoload ends.
* Fixes #2431
  and #3040
graalvmbot pushed a commit that referenced this issue Jun 9, 2023
* We stash the value of the constant being autoloaded in AutoloadConstant
  and only publish it once the autoload ends.
* Fixes #2431
  and #3040
graalvmbot pushed a commit that referenced this issue Jun 12, 2023
* We stash the value of the constant being autoloaded in AutoloadConstant
  and only publish it once the autoload ends.
* Fixes #2431
  and #3040
@eregon eregon removed this from the 24.0.0 Release (March 19, 2024) milestone Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants