Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a test for this, I actually think an issue like this could also be related to our changing of the raf timing in signals/hooks. The main reason I want a test is to see whether this is the actual root-cause rather than a band-aid, there is little concurrency going on in Preact as the diff happens synchronously.
It's entirely possible that we somehow succeed at initiating a state-update while a component is suspended which, when the component is suspended upon mount, could lead to this. I've admittedly seen this issue occur once as well d55f62d and it started somewhere in the 10.26 line so my assumptions are the change in RAF timing or this one.
Another thought of something that got introduced in the 10.26.x release is that top-level Fragments are now cloned in the diff, which if the Suspense boundary captures this node could actually lead to this issue. as we'd rely on this being a diff and erroneously assume that this boundary has _children defined. This is in all scenarios prevent by this line but maybe there's a way to bypass it. It recursively removes the _vnodeId (_original) property of all of the suspended vnodes so it should never bail for the _original case which kind of disproves the #4724 assertion but maybe as you say there's a way to hit this case in a concurrent scenario.
I don't mean to be difficult about this, I would just love to get to the root of this problem and if you have a good way of reproducing then that is a great opportunity for that. I want to avoid us defensively setting _children and then realising that the DOM is clobbered because of it, or it duplicates vnodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Jovi, thanks for the reply.
I can't provide a minimum repro, and the codebase is large and commercial, however I dug further into the issue:
** Specific experiment: we injected a conditional console.log into the above code to dump the nodes when oldVNode's children is null; oldVNode was pointing at this Fallback component.
What's tricky here, is that testing locally (with new incognito opened every time) gets maybe 1:5 reproduction on our codebase; testing on a remote server gets about 1 in 3 -this is stochastic depending on how the load shakes out.
We solved this by just importing the full component without lazy() loading it; this currently works 100%, without the framework patch. You can disregard this PR. Hope this helps tracking the root issue down.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try with several versions of 10.26.x to see whether you can track the specific patch down where this breaks? What happens when the lazy resolution takes some time, are updates happening or?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Could you try with several versions of 10.26.x to see whether you can track the specific patch down where this breaks?
Went down that road a few hours ago. Can't say for several versions, but it solidly reproduces in preact@10.25.4 .
| What happens when the lazy resolution takes some time, are updates happening or?
^^ it's the difference between crash happening 1 in 5 vs 1 in 3 -if there's a timelag to load the lazy asset, the rate of this error goes much higher. I suspect, but haven't confirmed yet, that lazy happens async; and during this async load, preact attempts a redraw -and it is during this redraw that it encounters a vnode which is currently loading that may cause the crash.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but there is an action that happens in that timespan that leads to an inconsistent state. The duration of a deterministic action should not lead to indeterminism. The state has been altered in this timespan and I'm trying to get to the state-update on which intermediate node that can lead to this.
The confusing thing with Suspense is that when it happens on a mounted tree that updates could still happen. Hmm, it happening in 10.25.4 would make me revalidate the candidates for this to:
We can confirm this by trying out 10.24.2 and if that doesn't crash we have our contender.
I would really love to get to the bottom of this and I haven't found good reproducing grounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went down the version rabbithole:
10.24.2 nope
10.24.3 nope
10.25.0 nope
10.25.1 yes
10.25.2 yes
Methodology:
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, that's a bit of a pickle, there's not really something special in 10.25.1. Are we hydrating in this particular example or are we rendering? Thank you for all the information already! The only thing that could realistically affect this would be #4563 but that would only happen if a legit error gets thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the initial hydration works, it's during rendering when it crashes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I'm asking though, so yes I asked whether you are hydrating but the boundary could still be a resumed hydration. That aside, the linked PR fixes an issue (in the version you mentioned) where we assign _children when a boundary throws an error. If we are not in a resumed hydration nor a hydration scenario I really have issues seeing how this would be attributable to 10.25.1 😅
In essence the linked PR sais that if we are hydrating (or resumed hydrating) that an error assigns _children to be potentialy null, if then the subsequent re-render hits a bailout scenario we could risk the issue at hand.
A resumed hydration is when we are hydrating and during hydration we hit an async boundary i.e. iso lazy-route, ...
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to reproduce it in - you are a hero mate!