Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug found during reactive library benchmark stress tests #3926

Closed
transitive-bullshit opened this issue Sep 15, 2024 · 8 comments
Closed

Comments

@transitive-bullshit
Copy link

transitive-bullshit commented Sep 15, 2024

I created a comparison of the most popular TS reactivity libs, which of course includes MobX.

During my deep-dive, I wanted to benchmark all of them and created a fork of @milomg's excellent js-reactivity-benchmark, which includes dozens of benchmarks and stress tests.

The problem is that MobX fails to complete some of the dynamic benchmark tests in the suite.

Intended outcome: MobX should be able to run these benchmark tests just like all of the other reactive libs.

Actual outcome: MobX runs most of the benchmark tests fine, but consistently hangs on several of the larger dynamic tests.

CleanShot 2024-09-15 at 02 12 01@2x

I've been unable to track down the source of the issue, whether it may be due to a rare bug in MobX or possibly due to a bug in the MobX adapter used by the benchmark.

How to reproduce the issue:

  1. Checkout the minimal repro branch of https://github.com/transitive-bullshit/js-reactivity-benchmark and switch to the feature/minimal-issues-repro-mobx branch
  2. Run pnpm install
  3. Run pnpm test to make sure everything's working locally
  4. Run pnpm bench to reproduce the issue

The mobx benchmark tests work fine for the first two test cases but then hang on the third (slightly larger) test case. Note that these test cases are creating a rectangular grid of shallow signals, where each layer of the grid contains computed properties depending on some number of the previous row's signals as a stress test.

I've paired down this branch as much as possible to try and make it easier to debug.

Note that in the full benchmark branch, all of the other reactivity libs are able to pass these benchmark tests.

Versions

@urugator
Copy link
Collaborator

I think it doesn't hang, it's just slow. I limited large web app, to 2000 iterations and every 100 iterations took ~27s:

iteration: 0 delta:0
iteration: 100 delta:27863
iteration: 200 delta:26824
iteration: 300 delta:26888
iteration: 400 delta:27230
iteration: 500 delta:27152
iteration: 600 delta:26822
iteration: 700 delta:26707
iteration: 800 delta:26558
iteration: 900 delta:26565
iteration: 1000 delta:26740
iteration: 1100 delta:26748
iteration: 1200 delta:26717
iteration: 1300 delta:28612
iteration: 1400 delta:27280
iteration: 1500 delta:28665
iteration: 1600 delta:27326
iteration: 1700 delta:27663
iteration: 1800 delta:27955
iteration: 1900 delta:27098

So 7000 iterations * 5 repetitions would take ~2.62h to complete.

@urugator
Copy link
Collaborator

The reason why it's so slow, is because the computeds are not read in reactive context or in batch:
https://github.com/transitive-bullshit/js-reactivity-benchmark/blob/5a5719ebbc9368fd00c639d51fc50d6858005efd/src/util/dependencyGraph.ts#L69
In such case computeds are not cached and behaves almost like a plain getter. "Almost", because the computation itself creates a temporal batch - so if other computeds are accessed multiple times during the single leaf.read(), these other computeds are cached until the outemost computation completes - but once the outermost computation is done, all computeds are reset to their initial state - nothing is cached and there are no links between them (no dependency graph).

You either have to make sure that accessed leaf computeds stays observed (some effect depends on them) between individual reads, or reads must be part of the same batch (transaction/action) - this has similar effect, because computeds also stay "hot" for the duration of a batch, just in case they would be accessed multiple times during the batch, which is exactly the situation here.

Eg if you change it like this, you will immediately see drastic improvement:

export function runGraph(
  graph: Graph,
  iterations: number,
  readFraction: number,
  framework: ReactiveFramework
): number {
  const rand = pseudoRandom();
  const { sources, layers } = graph;
  const leaves = layers[layers.length - 1];
  const skipCount = Math.round(leaves.length * (1 - readFraction));  
  const readLeaves = removeElems(leaves, skipCount, rand);
  let sum = 0;
  framework.withBatch(() => {
    let start = Date.now();
    for (let i = 0; i < iterations; i++) {
      if (i % 100 === 0) {
        console.log('iteration: ' + i, 'delta: ' + (Date.now() - start));
        start = Date.now()
      }
      const sourceDex = i % sources.length;
      sources[sourceDex].write(i + sourceDex);

      for (const leaf of readLeaves) {
        leaf.read();
      }
    }

    sum = readLeaves.reduce((total, leaf) => leaf.read() + total, 0);  
  });
  return sum;
}

Also you can change the adapter like so:

import { computed, observable, autorun, runInAction } from "mobx";
import { ReactiveFramework } from "../util/reactiveFramework";

export const mobxFramework: ReactiveFramework = {
  name: "MobX",
  signal(initial) {
    const s = observable.box(initial, { deep: false });
    return {
      read: () => s.get(),
      write: (x) => s.set(x), // <-- no action here
    };
  },  
  computed: (fn) => {
    const read = computed(fn);
    return {
      read: () => read.get(),
    };
  },
  effect: (fn) => autorun(fn),
  withBatch: (fn) =>  runInAction(fn), // <-- more realistic
  withBuild: (fn) => fn(),
};

It still fails on assertion, which I did not dig into.

@urugator
Copy link
Collaborator

https://github.com/transitive-bullshit/js-reactivity-benchmark/blob/ee8ab952ba11c0aae3df0224964cccafa06d9a5c/src/dynamicBench.ts#L28-L32

You should consider creating a new graph for these runs, otherwise they may affect each other.

@transitive-bullshit
Copy link
Author

Thanks so much for the prompt help here, @urugator 🙏

I can confirm that applying your suggested changes results in the test suite running with times much more in line with what I'm seeing from other frameworks. https://github.com/transitive-bullshit/js-reactivity-benchmark/tree/feature/minimal-issues-repro-mobx 🎉 🎉 🎉

CleanShot 2024-09-15 at 17 31 48@2x

I wonder, though, if this makes sense as the expected behavior?

The changes to the mobx adapter make perfect sense to me.

The changes to the runGraph benchmark method, however, seem like they're not staying true to the original intention of the benchmark mimicking a seemingly very reasonable, real-world usage pattern (albeit one you're unlikely to find using mobx with react or another frontend lib). E.g., instead of having one batch / action with thousands of writes and reads in it, having thousands of smaller batch/actions which perform some writes and then subsequently reading the results outside of those batches.

The reason why it's so slow, is because the computeds are not read in reactive context or in batch. In such case computeds are not cached and behaves almost like a plain getter. "Almost", because the computation itself creates a temporal batch - so if other computeds are accessed multiple times during the single leaf.read(), these other computeds are cached until the outemost computation completes - but once the outermost computation is done, all computeds are reset to their initial state - nothing is cached and there are no links between them (no dependency graph).

Would you be able to explain the reasoning behind this design decision in MobX? It seems like it'd be pretty core and difficult to change, I just want to better understand the reasoning as I continue my deep-dive into understanding the various tradeoffs made by different reactive libs across the TS landscape.

This may be related to pmndrs/valtio#949 and/or vuejs/core#11928. Note that at least a dozen other similar reactive libs tested in the benchmark don't seem to have this perf footgun (see the non-minimal base benchmark branch for examples).

@transitive-bullshit
Copy link
Author

transitive-bullshit/js-reactivity-benchmark@ee8ab95/src/dynamicBench.ts#L28-L32

You should consider creating a new graph for these runs, otherwise they may affect each other.

Agreed. In general, I think there's a lot we can / should do to improve isolation of the benchmark runs. Currently, invoking the GC inbetween each run isn't a great solution, but I've only been working on this fork of https://github.com/milomg/js-reactivity-benchmark for a few days so far.

@transitive-bullshit
Copy link
Author

transitive-bullshit commented Sep 16, 2024

I've updated the benchmark for now with a middle-ground to make mobx run reasonably (though still not nearly as efficiently as other libs), though I don't believe the second or third withBatch() (aka runInAction()) calls should be necessary to have this type of usage pattern work efficiently as expected. If we're just comparing apples to apples, this computed footgun seems to be unique to mobx, though it may be a rare enough issue that it's not worth addressing?

export function runGraph(
  graph: Graph,
  iterations: number,
  readFraction: number,
  framework: ReactiveFramework
): number {
  const rand = pseudoRandom();
  const { sources, layers } = graph;
  const leaves = layers[layers.length - 1];
  const skipCount = Math.round(leaves.length * (1 - readFraction));
  const readLeaves = removeElems(leaves, skipCount, rand);
  const start = Date.now();
  let sum = 0;

  for (let i = 0; i < iterations; i++) {
    // Useful for debugging edge cases for some frameworks that experience
    // dramatic slow downs for certain test configurations. These are generally
    // due to `computed` effects not being cached efficiently, and as the number
    // of layers increases, the uncached `computed` effects are re-evaluated in
    // an `O(n^2)` manner where `n` is the number of layers.
    if (i % 100 === 0) {
      console.log("iteration:", i, "delta:", Date.now() - start);
    }

    framework.withBatch(() => {
      const sourceDex = i % sources.length;
      sources[sourceDex].write(i + sourceDex);
    });

    framework.withBatch(() => {
      for (const leaf of readLeaves) {
        leaf.read();
      }
    });
  }

  framework.withBatch(() => {
    sum = readLeaves.reduce((total, leaf) => leaf.read() + total, 0);
  });

  return sum;
}

Alternatively, if we kept the runInAction solely around the .write as it was originally and replaced the current no-op version of the mobx adapter's withBuild function to something which creates the root-level scope for ensuring that all sub-computeds are properly tracked, then that should resolve things, right? E.g., in S.js it's S.root(), in @vue/reactivity, it's effectScope(), etc.

Does MobX have a similar way to create a tracked scope which would ensure that all sub-computations and effects are both cached as well as disposed of properly when that scope expires? Maybe we just want to wrap the entire content of the benchmark in runInAction(), but this doesn't seem to get at the same semantics of root scoping that we're getting at with other lib's root scoping APIs. Thoughts?

@urugator
Copy link
Collaborator

urugator commented Sep 16, 2024

benchmark mimicking a seemingly very reasonable, real-world usage pattern

Every real world pattern containing observables must also contain observers. Observer in this case means a side effect that should run whenever state changes: angular.effect, rxjs.subscribe, mobx.autorun/reaction etc. The test doesn't contain any.
If graph leafs are supposed to represent components, they should be modeled as effects. If you want to simulate mounting/unmounting these components, you have to create and dispose the effect.
Effects decide which parts of your state (including computeds) are being used by the application and establish the dependency graph. Firstly, nodes (signals/computeds) that are not part of the dependency graph must be collectable by GC. Therefore they can't hold onto any cached value or maintain connections with other nodes.
Secondly, nodes that are not part of the depedency graph are effectively not consumed in any meaningful way, so running any computations or keeping anything in memory is a waste of resources. There are some nuances to that, which is why keepAlive: true computed option exists.

I used framework.withBatch(() => {}) just because I know it has a similar effect on computeds as if they would be observed and it was easier to do so. But if you intend to simulate real applications, you need effects. It's absolutely not unique to mobx.

E.g., instead of having one batch / action with thousands of writes and reads in it, having thousands of smaller batch/actions which perform some writes and then subsequently reading the results outside of those batches.

Batch is a series of individual mutations, that looks like a single mutation to a consumer - any "subsequent read" is not a consumer. Batching has no effect on subsequent reads, because they're, well, "subsequent" - they can never see intermidiate mutations that happened before. So if you talk about batching, you again need observers.

framework.withBatch(() => {
  const sourceDex = i % sources.length;
  sources[sourceDex].write(i + sourceDex);
});

Here the batch is completely useless because:

  1. There is only single mutation, so there is nothing to batch together.
  2. As stated above, there are no observers.

There could however be implementations, where batch has an effect here.

@mweststrate
Copy link
Member

closing as answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants