Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfBoundsException from ManagedStrategy.getConcurrentActorCausesBlocking #275

Closed
durban opened this issue Feb 16, 2024 · 11 comments
Closed
Assignees

Comments

@durban
Copy link

durban commented Feb 16, 2024

I think an IndexOutOfBoundsException might be somehow thrown by Lincheck. I see in the code, that stacktraces are filtered; maybe that hides the exact source. EDIT: It is indeed thrown by Linchek, see below.

Steps to reproduce (unfortunately I don't know how to minimize):

git clone https://github.com/durban/choam.git
cd choam
git checkout 703b8199
sbt "stressLinchk/testOnly *.TtrieModelTest"

The IndexOutOfBoundsException looks like this:

The number next to an exception name helps you find its stack trace provided after the interleaving section
---
Exception stack traces:
#1: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
        at java.base/java.util.Objects.checkIndex(Objects.java:361)
        at java.base/java.util.ArrayList.get(ArrayList.java:427)

Now, this doesn't make a lot of sense, I suspect due to the stacktrace filtering. To the best of my knowledge, my code doesn't use ArrayList anywhere.

For reference, more output:

dev.tauri.choam.data.TtrieModelTest:
==> i dev.tauri.choam.data.TtrieModelTest.Model checking Ttrie.apply ignored 0.0s
==> X dev.tauri.choam.data.TtrieModelTest.Model checking Ttrie.skipListBased  30.698s java.lang.IllegalStateException: Non-determinism found. Probably caused by non-deterministic code (WeakHashMap, Object.hashCode, etc).
== Reporting the first execution without execution trace ==
= Invalid execution results =
| --------------------------------------------------------------------------------------------- |
|                     Thread 1                     |                  Thread 2                  |
| --------------------------------------------------------------------------------------------- |
| insertIfAbsent(Ve, hF1iwXTfVYhRYXG): None        |                                            |
| removeKey(ez): false                             |                                            |
| --------------------------------------------------------------------------------------------- |
| removeKey(zE): false                             | removeKey(2W8): false [3,0]                |
| lookup(q3_): None [1,0]                          | lookup(Pvx): None [3,1]                    |
| lookup(hDjR): IndexOutOfBoundsException #1 [2,0] | insert(S3Qwv, t4m5fQnVDGfvpd8): None [3,2] |
| --------------------------------------------------------------------------------------------- |
| insert(7bSl3, OOfbSFKGKWr0c x): None             |                                            |
| --------------------------------------------------------------------------------------------- |

---
All operations above the horizontal line | ----- | happen before those below the line
---
Values in "[..]" brackets indicate the number of completed operations
in each of the parallel threads seen at the beginning of the current operation
---
The number next to an exception name helps you find its stack trace provided after the interleaving section
---
Exception stack traces:
#1: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
        at java.base/java.util.Objects.checkIndex(Objects.java:361)
        at java.base/java.util.ArrayList.get(ArrayList.java:427)


== Reporting the second execution ==
= Invalid execution results =
| -------------------------------------------------------------------------------------- |
|                 Thread 1                  |                  Thread 2                  |
| -------------------------------------------------------------------------------------- |
| insertIfAbsent(Ve, hF1iwXTfVYhRYXG): None |                                            |
| removeKey(ez): false                      |                                            |
| -------------------------------------------------------------------------------------- |
| removeKey(zE): false                      | removeKey(2W8): false [3,0]                |
| lookup(q3_): None [1,0]                   | lookup(Pvx): None [3,1]                    |
| lookup(hDjR): None [2,0]                  | insert(S3Qwv, t4m5fQnVDGfvpd8): None [3,2] |
| -------------------------------------------------------------------------------------- |
| insert(7bSl3, OOfbSFKGKWr0c x): None      |                                            |
| -------------------------------------------------------------------------------------- |

---
All operations above the horizontal line | ----- | happen before those below the line
---
Values in "[..]" brackets indicate the number of completed operations
in each of the parallel threads seen at the beginning of the current operation
---

The following interleaving leads to the error:
| -------------------------------------------------------------------------------- |
|                 Thread 1                  |               Thread 2               |
| -------------------------------------------------------------------------------- |
| insertIfAbsent(Ve, hF1iwXTfVYhRYXG): None |                                      |
| removeKey(ez): false                      |                                      |
| -------------------------------------------------------------------------------- |
| removeKey(zE): false                      |                                      |
| lookup(q3_): None                         |                                      |
| lookup(hDjR): None                        |                                      |
|                                           | removeKey(2W8): false                |
|                                           | lookup(Pvx): None                    |
|                                           | insert(S3Qwv, t4m5fQnVDGfvpd8): None |
| -------------------------------------------------------------------------------- |
| insert(7bSl3, OOfbSFKGKWr0c x): None      |                                      |
| -------------------------------------------------------------------------------- |

Detailed trace:
...
@durban
Copy link
Author

durban commented Feb 18, 2024

I've locally modified Lincheck to not filter stacktraces. This way I've got the full stacktrace of the IndexOutOfBoundsException:

The number next to an exception name helps you find its stack trace provided after the interleaving section
---
Exception stack traces:
#1: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
        at java.base/java.util.Objects.checkIndex(Objects.java:361)
        at java.base/java.util.ArrayList.get(ArrayList.java:427)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.getConcurrentActorCausesBlocking(ManagedStrategy.kt:263)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.failIfObstructionFreedomIsRequired(ManagedStrategy.kt:250)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPointRegular(ManagedStrategy.kt:327)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPoint(ManagedStrategy.kt:312)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.beforeAtomicMethodCall$lincheck(ManagedStrategy.kt:537)
        at org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.atomic.AtomicReference.getAcquire(AtomicReference.java:334)
        at dev.tauri.choam.internal.skiplist.SkipListMap.walkRight(SkipListMap.scala:104)
        at dev.tauri.choam.internal.skiplist.SkipListMap.findPredecessor(SkipListMap.scala:808)
        at dev.tauri.choam.internal.skiplist.SkipListMap.doRemove(SkipListMap.scala:602)
        at dev.tauri.choam.internal.skiplist.SkipListMap.remove(SkipListMap.scala:176)
        at dev.tauri.choam.data.Ttrie.unsafeDelRef(Ttrie.scala:150)
        at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2(Ttrie.scala:160)
        at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2$adapted(Ttrie.scala:160)
        at dev.tauri.choam.core.Rxn$InterpreterState.loop(Rxn.scala:1400)
        at dev.tauri.choam.core.Rxn$InterpreterState.interpretWithContext(Rxn.scala:1552)
        at dev.tauri.choam.core.Rxn$InterpreterState.interpret(Rxn.scala:1545)
        at dev.tauri.choam.core.Rxn.unsafePerform(Rxn.scala:260)
        at dev.tauri.choam.data.TtrieModelTest$AbstractTestState.lookup(TtrieModelTest.scala:71)
        at org.jetbrains.kotlinx.lincheck.runner.TestThreadExecution21.run(Unknown Source)
        at org.jetbrains.kotlinx.lincheck.runner.FixedActiveThreadsExecutor.testThreadRunnable$lambda$8(FixedActiveThreadsExecutor.kt:151)
        at java.base/java.lang.Thread.run(Thread.java:840)

So it is indeed coming from Lincheck itself.

@durban durban changed the title IndexOutOfBoundsException from unclear source IndexOutOfBoundsException from ManagedStrategy.getConcurrentActorCausesBlocking Feb 18, 2024
@eupp eupp self-assigned this Feb 19, 2024
@ndkoval
Copy link
Collaborator

ndkoval commented Feb 19, 2024

Hi @durban! Could you please check whether the issue still occurs with Lincheck 2.26?

@durban
Copy link
Author

durban commented Feb 19, 2024

I've tried 2.26, and it fails the same way. The (unfiltered) stacktrace is this:

The number next to an exception name helps you find its stack trace provided after the interleaving section
---
Exception stack traces:
#1: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
        at java.base/java.util.Objects.checkIndex(Objects.java:361)
        at java.base/java.util.ArrayList.get(ArrayList.java:427)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.getConcurrentActorCausesBlocking(ManagedStrategy.kt:267)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.failIfObstructionFreedomIsRequired(ManagedStrategy.kt:254)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPointRegular(ManagedStrategy.kt:330)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPoint(ManagedStrategy.kt:315)
        at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.beforeAtomicMethodCall$lincheck(ManagedStrategy.kt:538)
        at org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.atomic.AtomicReference.getAcquire(AtomicReference.java:334)
        at dev.tauri.choam.internal.skiplist.SkipListMap.walkRight(SkipListMap.scala:104)
        at dev.tauri.choam.internal.skiplist.SkipListMap.findPredecessor(SkipListMap.scala:808)
        at dev.tauri.choam.internal.skiplist.SkipListMap.doRemove(SkipListMap.scala:602)
        at dev.tauri.choam.internal.skiplist.SkipListMap.remove(SkipListMap.scala:176)
        at dev.tauri.choam.data.Ttrie.unsafeDelRef(Ttrie.scala:150)
        at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2(Ttrie.scala:160)
        at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2$adapted(Ttrie.scala:160)
        at dev.tauri.choam.core.Rxn$InterpreterState.loop(Rxn.scala:1400)
        at dev.tauri.choam.core.Rxn$InterpreterState.interpretWithContext(Rxn.scala:1552)
        at dev.tauri.choam.core.Rxn$InterpreterState.interpret(Rxn.scala:1545)
        at dev.tauri.choam.core.Rxn.unsafePerform(Rxn.scala:260)
        at dev.tauri.choam.data.TtrieModelTest$AbstractTestState.lookup(TtrieModelTest.scala:71)
        at org.jetbrains.kotlinx.lincheck.runner.TestThreadExecution21.run(Unknown Source)
        at org.jetbrains.kotlinx.lincheck.runner.FixedActiveThreadsExecutor.testThreadRunnable$lambda$8(FixedActiveThreadsExecutor.kt:151)
        at java.base/java.lang.Thread.run(Thread.java:840)

So essentially the same as before, although some line numbers changed.

@eupp
Copy link
Collaborator

eupp commented Feb 20, 2024

Hi @durban !

PR #277 should fix the IndexOutOfBoundsException.
But even with this fix, the test still fails (I attached the log: output.txt).

As you might see, the log contains the following line:

java.lang.IllegalStateException: Non-determinism found. Probably caused by non-deterministic code (WeakHashMap, Object.hashCode, etc).

I suspect it might be caused by the usage of WeakReference-s (there are plenty of them in the trace).

Nevertheless, it can be seen, that the first issue detected by Lincheck is an active spin lock:

= The algorithm should be non-blocking, but an active lock is detected =

This could be a sign of an actual live-lock, or a false positive bug report.

No matter what, you can effectively disable live-lock detection by setting larger value for parameter hangingDetectionThreshold in ModelCheckingOptions. Setting value 200 is enough to make the test pass:

def defaultModelCheckingOptions(): ModelCheckingOptions = {
    ....
    increaseTimeout(new ModelCheckingOptions())
          .addGuarantee(assumedAtomic)
          .checkObstructionFreedom(true)
          // this is the "fast" configuration from the Lincheck paper:
          .iterations(30) // scenarios
          .threads(2)
          .actorsPerThread(3) // operations per thread
          .invocationsPerIteration(1000) // run each scenario this much time (FIXME)
          // end of "fast" configuration
          .actorsBefore(2) // so that we don't work with empty data structures
          .actorsAfter(1) // to have a chance of detecting inconsistent state left
          .hangingDetectionThreshold(200)
}

If you think this live-lock is false positive, please report back so we can investigate further.
The hack with disabling lock detection should make the test green for now.

@eupp
Copy link
Collaborator

eupp commented Feb 20, 2024

For weak references support, we added a new tracking issue #279

@durban
Copy link
Author

durban commented Feb 20, 2024

Thanks for looking into this! I've looked at the output.txt you've attached, and some things are not clear to me:

The Non-determinism found: I'm definitely using weakrefs, but they shouldn't be externally visible. I thought that model checking inspects the results of the @Operations. Does it also check the internals of a data structure? Also, if you look at the original output above, it contains the operation results, e.g., removeKey(ez): false. In the output.txt, there are no results, e.g., lookup(zE). Is this normal?

There is also this part in the log:

java.lang.IllegalArgumentException: Internal error
	at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy$ReplayModeLoopDetectorHelper.onNextExecution(ManagedStrategy.kt:1197)
	at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy$LoopDetector.onNextExecutionPoint(ManagedStrategy.kt:1051)
	at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy$LoopDetector.onThreadFinish(ManagedStrategy.kt:1010)
	at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.onFinish(ManagedStrategy.kt:379)
	at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategyRunner.onFinish(ManagedStrategy.kt:1251)
	at org.jetbrains.kotlinx.lincheck.runner.TestThreadExecution26.run(Unknown Source)
	at org.jetbrains.kotlinx.lincheck.runner.FixedActiveThreadsExecutor.testThreadRunnable$lambda$8(FixedActiveThreadsExecutor.kt:151)
	at java.base/java.lang.Thread.run(Thread.java:829)

This looks like it shouldn't happen, regardless of the behavior of the tested code. Could this be the reason for the Non-determinism found also?

Finally about the active lock is detected: there should be no spinlocks, but I'll look into it. For the hangingDetectionThreshold, what does the value 200 mean?

@eupp
Copy link
Collaborator

eupp commented Feb 20, 2024

The Non-determinism found: I'm definitely using weakrefs, but they shouldn't be externally visible. I thought that model checking inspects the results of the @operations. Does it also check the internals of a data structure?

Model checking mode generally relies on the fact that it can deterministically reproduce any execution of the program. It stores some data structure internally, that allows to do that, and the bytecode transformations ensure that we track all the events in the program that may cause non-determinism (like calls to Random, thread switches, etc).
But for now we do not support WeakReference-s. Their methods are non-deterministic too (because their results depend on GC). We will fix this problem later under #279.

Also, if you look at the original output above, it contains the operation results, e.g., removeKey(ez): false. In the output.txt, there are no results, e.g., lookup(zE). Is this normal?

What happens with your test, is that the model checker discovers what it thinks is an active lock. In general, when the MC discovers some bug, it attempts to run the same interleaving leading to a bug yet another time -- but this time collecting the program trace on the road. Because of non-determinism it cannot replay the same execution, and thus the second time it obtains different results.

For the hangingDetectionThreshold, what does the value 200 mean?

At least for now, the active lock detector suspects that program entered active lock whenever same operation hits specific code location N times (it usually happens when the code spins in a loop). The hangingDetectionThreshold sets this N parameter.

@durban
Copy link
Author

durban commented Feb 21, 2024

Thanks, that makes a lot of things much more clear to me. I'll try to look into what could be this spinlock (or spinlock-like thing).

@eupp
Copy link
Collaborator

eupp commented Feb 21, 2024

Hi @durban !

For your test, Lincheck actually reported false positive active lock due to a bug. PR #281 fixes this bug.
Thanks to @ndkoval for spotting the root cause of the bug.

@durban
Copy link
Author

durban commented Feb 23, 2024

Thanks, that's good to hear! I'll recheck when the fix is released.

@eupp
Copy link
Collaborator

eupp commented Mar 18, 2024

The new release lincheck-2.27 should fix this issue.
@durban please re-open it if the problem persists.

@eupp eupp closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants