Skip to content

Conversation

@joel-solymosi
Copy link

This is a highly tricky concurrency issue. Under some mix of preact signals, and preact suspense, oldVNode._children can evaluate to null, leading to a full preact page crash. I have limited time in digging deeper for a repro, but attached PR fixes the issue.

@github-actions
Copy link

📊 Tachometer Benchmark Results

Summary

A summary of the benchmark results will show here once they finish.

Results

The full results of your benchmarks will show here once they finish.

tachometer-reporter-action v2 for CI


newVNode._dom = oldVNode._dom;
newVNode._children = oldVNode._children;
newVNode._children = oldVNode._children || [];
Copy link
Member

@JoviDeCroock JoviDeCroock Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test for this, I actually think an issue like this could also be related to our changing of the raf timing in signals/hooks. The main reason I want a test is to see whether this is the actual root-cause rather than a band-aid, there is little concurrency going on in Preact as the diff happens synchronously.

It's entirely possible that we somehow succeed at initiating a state-update while a component is suspended which, when the component is suspended upon mount, could lead to this. I've admittedly seen this issue occur once as well d55f62d and it started somewhere in the 10.26 line so my assumptions are the change in RAF timing or this one.

Another thought of something that got introduced in the 10.26.x release is that top-level Fragments are now cloned in the diff, which if the Suspense boundary captures this node could actually lead to this issue. as we'd rely on this being a diff and erroneously assume that this boundary has _children defined. This is in all scenarios prevent by this line but maybe there's a way to bypass it. It recursively removes the _vnodeId (_original) property of all of the suspended vnodes so it should never bail for the _original case which kind of disproves the #4724 assertion but maybe as you say there's a way to hit this case in a concurrent scenario.

I don't mean to be difficult about this, I would just love to get to the root of this problem and if you have a good way of reproducing then that is a great opportunity for that. I want to avoid us defensively setting _children and then realising that the DOM is clobbered because of it, or it duplicates vnodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jovi, thanks for the reply.
I can't provide a minimum repro, and the codebase is large and commercial, however I dug further into the issue:

  • we're doing server-side rendering using preact-iso 's async prerender()
  • the component at issue had a lazy() loaded subcomponent
  • on server side, the whole page gets rendered
  • on client side, when the lazy() resolution takes some time, preact seems to attempt to render the currently not-loaded component, and hits an error: at that point, oldVNode & newVNode 's ._children is null, therefore newVNode._children.some throws
    ** Specific experiment: we injected a conditional console.log into the above code to dump the nodes when oldVNode's children is null; oldVNode was pointing at this Fallback component.

What's tricky here, is that testing locally (with new incognito opened every time) gets maybe 1:5 reproduction on our codebase; testing on a remote server gets about 1 in 3 -this is stochastic depending on how the load shakes out.

We solved this by just importing the full component without lazy() loading it; this currently works 100%, without the framework patch. You can disregard this PR. Hope this helps tracking the root issue down.

Copy link
Member

@JoviDeCroock JoviDeCroock Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try with several versions of 10.26.x to see whether you can track the specific patch down where this breaks? What happens when the lazy resolution takes some time, are updates happening or?

Copy link
Author

@joel-solymosi joel-solymosi Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| Could you try with several versions of 10.26.x to see whether you can track the specific patch down where this breaks?
Went down that road a few hours ago. Can't say for several versions, but it solidly reproduces in [email protected] .

| What happens when the lazy resolution takes some time, are updates happening or?
^^ it's the difference between crash happening 1 in 5 vs 1 in 3 -if there's a timelag to load the lazy asset, the rate of this error goes much higher. I suspect, but haven't confirmed yet, that lazy happens async; and during this async load, preact attempts a redraw -and it is during this redraw that it encounters a vnode which is currently loading that may cause the crash.

Copy link
Member

@JoviDeCroock JoviDeCroock Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but there is an action that happens in that timespan that leads to an inconsistent state. The duration of a deterministic action should not lead to indeterminism. The state has been altered in this timespan and I'm trying to get to the state-update on which intermediate node that can lead to this.

The confusing thing with Suspense is that when it happens on a mounted tree that updates could still happen. Hmm, it happening in 10.25.4 would make me revalidate the candidates for this to:

We can confirm this by trying out 10.24.2 and if that doesn't crash we have our contender.

I would really love to get to the bottom of this and I haven't found good reproducing grounds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went down the version rabbithole:
10.24.2 nope
10.24.3 nope
10.25.0 nope
10.25.1 yes
10.25.2 yes

Methodology:

pnpm i --save [email protected] -w
ctrl+shift+n chrome -> long url that reproes, enter
do above 10x time, observe stuff crashing/not crashing
"yes"es happen within the first 3 in all cases; nopes are "tested 10 times, worked for all 10"

Copy link
Member

@JoviDeCroock JoviDeCroock Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, that's a bit of a pickle, there's not really something special in 10.25.1. Are we hydrating in this particular example or are we rendering? Thank you for all the information already! The only thing that could realistically affect this would be #4563 but that would only happen if a legit error gets thrown

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial hydration works, it's during rendering when it crashes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I'm asking though, so yes I asked whether you are hydrating but the boundary could still be a resumed hydration. That aside, the linked PR fixes an issue (in the version you mentioned) where we assign _children when a boundary throws an error. If we are not in a resumed hydration nor a hydration scenario I really have issues seeing how this would be attributable to 10.25.1 😅

In essence the linked PR sais that if we are hydrating (or resumed hydrating) that an error assigns _children to be potentialy null, if then the subsequent re-render hits a bailout scenario we could risk the issue at hand.

A resumed hydration is when we are hydrating and during hydration we hit an async boundary i.e. iso lazy-route, ...

Copy link
Member

@JoviDeCroock JoviDeCroock Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to reproduce it in - you are a hero mate!

	it.only('should crash when oldVNode._children is null during shouldComponentUpdate optimization', () => {
		const originalHtml = '<div><div>Hello</div></div>';
		scratch.innerHTML = originalHtml;

		class ErrorBoundary extends React.Component {
			constructor(props) {
				super(props);
				this.state = { hasError: false };
			}

			static getDerivedStateFromError() {
				return { hasError: true };
			}

			render() {
				return this.props.children;
			}
		}

		const [Lazy, resolve] = createLazy();
		function App() {
			return (
				<div>
					<Suspense>
						<ErrorBoundary>
							<Lazy />
						</ErrorBoundary>
					</Suspense>
				</div>
			);
		}

		hydrate(<App />, scratch);
		rerender(); // Flush rerender queue to mimic what preact will really do
		expect(scratch.innerHTML).to.equal(originalHtml);
		// expect(getLog()).to.deep.equal([]);
		clearLog();

		let i = 0;
		class ThrowOrRender extends React.Component {
			shouldComponentUpdate() {
				return i===0;
			}
			render() {
				if (i === 0) {
					i++;
					throw new Error('Test error');
				}
				return <div>Hello</div>;
			}
		}

		return resolve(ThrowOrRender).then(() => {
			rerender();
			expect(scratch.innerHTML).to.equal(originalHtml);
			clearLog();
		});
	});

@JoviDeCroock
Copy link
Member

JoviDeCroock commented Jul 28, 2025

I'll close this PR for #4856 - again thank you so much, this would not have been found without your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants