Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of structured clone algorithms #2555

Open
syg opened this issue Oct 25, 2021 · 44 comments
Open

Review of structured clone algorithms #2555

syg opened this issue Oct 25, 2021 · 44 comments

Comments

@syg
Copy link
Contributor

syg commented Oct 25, 2021

We reached consensus on the general idea of taking over maintainership of the structured clone algorithm in the October 2021 TC39, pending a requested review by @erights of the concrete algorithm proposed to be refactored into ecma262.

The two algorithms proposed to be copied are StructuredSerializeInternal and StructuredDeserialize.

For StructuredSerializeInternal:

  • Steps 19 and 20 will be refactored out, as they deal with non-ES values.
  • All occurrences of "throw a "DataCloneError" DOMException" will be changed to "throw a host-defined exception".

For StructuredDeserialize:

  • Step 20 and all its substeps will be refactored out since it deals with non-ES values.
  • All occurrences of "throw a "DataCloneError" DOMException" will be changed to "throw a host-defined exception".
@ljharb
Copy link
Member

ljharb commented Oct 25, 2021

Can we specify what non-browsers must throw, and use annex b-like wording so that browsers must use a domexception per html?

@syg
Copy link
Contributor Author

syg commented Oct 25, 2021

Can we specify what non-browsers must throw, and use annex b-like wording so that browsers must use a domexception per html?

What does Node throw today? And are you planning to include Node as a browser?

@Jack-Works
Copy link
Member

All occurrences of "throw a "DataCloneError" DOMException" will be changed to "throw a host-defined exception".

What about abort the AC with an AbortCompletion?

@Jack-Works
Copy link
Member

Can we specify what non-browsers must throw, and use annex b-like wording so that browsers must use a domexception per html?

What do you mean? Do you mean non-browser must throw with ES Error and browsers must use DOMException?

@syg
Copy link
Contributor Author

syg commented Oct 25, 2021

What about abort the AC with an AbortCompletion?

What does AbortController have to do with this?

@Jack-Works
Copy link
Member

What about abort the AC with an AbortCompletion?

What does AbortController have to do with this?

Oh it's typo, I mean abort the Abstract Operations.

@syg
Copy link
Contributor Author

syg commented Oct 25, 2021

Oh it's typo, I mean abort the Abstract Operations.

Ah. I was thinking it'd just return some spec-internal value like ~not-handled~, telling the host it wasn't an ES value.

@ljharb
Copy link
Member

ljharb commented Oct 25, 2021

What is node's use of structured cloning?

@Jack-Works yes, i mean exactly that.

@domenic
Copy link
Member

domenic commented Oct 26, 2021

Steps 19 and 20 will be refactored out, as they deal with non-ES values.

There are some substeps in the deep sections that also need a bit of care, e.g. StructuredSerializeInternal step 26.3.

All occurrences of "throw a "DataCloneError" DOMException" will be changed to "throw a host-defined exception".

My suggestion is that instead you use return values from the AO. So you can return either a JS value or ~failure~ which the host can translate as appropriate.


Another thing that might need slight tweaks is that object creation is not as precise as it should be in the current algorithms. E.g. HTML says

set value to a new Boolean object in targetRealm whose [[BooleanData]] internal slot value is serialized.[[BooleanData]].

which should probably be expanded to something like

  1. Set value to ! OrdinaryCreateFromConstructor(targetRealm.[[Intrinsics]].[[%Boolean%]], "%Boolean.prototype%", « [[BooleanData]] »).
  2. Set value.[[BooleanData]] to serialized.[[BooleanData]].

@andreubotella
Copy link
Member

andreubotella commented Oct 27, 2021

There could be platform objects deep into the object graph, so it seems like there would need to be host hooks for serializing and deserializing host objects, that get invoked in StructuredSerializeInternal and StructuredDeserialize. Perhaps if the serialization hook returns a record (as opposed to undefined or some spec-level empty marker), StructuredSerializeInternal would return a wrapped record with [[Type]]: "HostObject", to ensure that the deserialization hook can't affect built-ins.

@erights
Copy link

erights commented Oct 28, 2021

From https://html.spec.whatwg.org/#structuredserializeinternal step 13.2.1:

If the current settings object's cross-origin isolated capability is false

The settings object in question seems to be the [[HostDefined]] field of RealmRecords defined at https://tc39.es/ecma262/#realm-record . The ecma262 its internal structure is unspecified, and "cross-origin isolated capability" is not an EcmaScript concept. However, 13.2.1 goes on:

and a SharedArrayBuffer cannot leave an agent cluster.

which is a restriction meaningful, indeed necessary, in EcmaScript. This restriction needs rephrasing in purely EcmaScript terms and in a host independent manner.

@bathos
Copy link
Contributor

bathos commented Oct 28, 2021

Structured clone has some pretty unique behaviors which (afaik) aren't found elsewhere currently. Some of these amount to (very hard to leverage) unique information sources. The best known one is that it's the sole place where one can observe [[ErrorData]], but a more surprising one I think is that it bypasses IsArray and therefore uniquely pokes a hole in the boundary that (otherwise would) prevent ES code from observing whether an object is a true array exotic object.

I'd be curious to know if the quirky one-offs could be safely closed off (or whether folks would consider that desirable). I don't have much of an opinion on this; I'm just deeply fascinated by places where the unobservable becomes observable under a blood red moon and my god, structured clone delivers.

@erights
Copy link

erights commented Oct 28, 2021

pokes a hole in the boundary that (otherwise would) prevent ES code from observing

or whether folks would consider that desirable

I'm still absorbing. But IIUC, I find this both desirable and necessary. I left off the text after "observing" above because the concern is not specific to arrays. ANY case where this makes something observable that was previously unobservable is likely a bug that must be fixed. I look forward to going through cases. Thanks!

@erights
Copy link

erights commented Oct 28, 2021

https://html.spec.whatwg.org/#structuredserializeinternal step 17 refers to "platform object" which is not term used in ecma262. What does in mean in ecma262 terms? How would this be rephrased?

@erights
Copy link

erights commented Oct 28, 2021

The problem @bathos identifies at #2555 (comment) seems to be pervasive. For example, this algorithm dispatches on the presence of [[ErrorData]] which EcmaScript code currently cannot reliably detect. Indeed https://tc39.es/ecma262/#sec-properties-of-error-instances says:

Error instances are ordinary objects that inherit properties from the Error prototype object and have an [[ErrorData]] internal slot whose value is undefined. The only specified uses of [[ErrorData]] is to identify Error, AggregateError, and NativeError instances as Error objects within Object.prototype.toString.

Note that a proposal in progress (Error stacks by @ljharb and I) would likely makes [[ErrorData]] reliably observable anyway, so this may ultimately not be a problem. But it is still a problem now.

@bathos
Copy link
Contributor

bathos commented Oct 28, 2021

@erights one of the things you'll likely find interesting is that it's "asymmetrical" because of how, given an arbitrary "untrusted" object, MOP ops will be invoked in steps prior to those which reveal special information. This is one factor that limits its real world utility: though novel knowledge can be obtained, it does not have the property of being reliably obtainable. For example we cannot use structured clone to know an object isn't an error, but we can use it to know that some objects definitely are.

@erights
Copy link

erights commented Oct 28, 2021

For example we cannot use structured clone to know an object isn't an error, but we can use it to know that some objects definitely are.

I don't understand that. Could you explain with errors specifically, since I just looked at that? Thanks.

@erights
Copy link

erights commented Oct 28, 2021

If name is not one of "Error", "EvalError", "RangeError", "ReferenceError", "SyntaxError", "TypeError", or "URIError", then set name to "Error".

Just the obvious problem that this list needs to be updated to include "AggregateError". Which is indeed a great example of why this spec needs to move into tc39.

@bathos
Copy link
Contributor

bathos commented Oct 28, 2021

Could you explain with errors specifically, since I just looked at that? Thanks.

yep: Let name be ? Get(value, "name"). this property can be modified, so if we do not know anything about the object we are inspecting via structured clone, this can have side effects, and if it thows, the information on its status as a true [[ErrorData]] bearer is denied to us (as the algorithm could throw at other junctures as well).

@andreubotella
Copy link
Member

andreubotella commented Oct 28, 2021

https://html.spec.whatwg.org/#structuredserializeinternal step 17 refers to "platform object" which is not term used in ecma262. What does in mean in ecma262 terms? How would this be rephrased?

In web specs, a platform object is an object that implements a Web IDL interface – roughly an instance of a web API "class". I suspect ecma262 might want to use something like "host-defined object" instead (though that already has a meaning that might not exactly match).

Since the (de)serialization behavior for such objects must be host-specific, and there might be host-defined objects deep into the object graph (so the caller to the structured clone algorithms can't do all the work), there would need to be host hooks to serialize and deserialize host-defined objects as I argued in my comment above. The default implementation of the serialize hook would be to return some empty marker (or undefined?) to indicate that the passed object is not host-defined; the deserialize hook should by default return failure.

Note that the global objects in the web are non-serializable platform objects, so serializing them fails. I'd argue this should probably still happen even with hosts that don't override the serialization hooks.

@erights
Copy link

erights commented Oct 28, 2021

Step 23:

[...] then throw [...]
For instance, a proxy object.

is clearly incompatible with ecma262. How should this be repaired to restore practical membrane transparency?

@ljharb
Copy link
Member

ljharb commented Oct 28, 2021

It is a bug/oversight that [[ErrorData]] is not currently observable, so anything that adds this observability is a good thing.

@erights
Copy link

erights commented Oct 28, 2021

It is a bug/oversight that [[ErrorData]] is not currently observable, so anything that adds this observability is a good thing.

Aside from "anything", I agree. It is a good thing only when coordinated with the rest of the language. This observability should enter via our error stacks proposal. In any case, it should not happen by accident. There seem to be lots of such accidents scattered in the structured clone serialization algorithm.

@ljharb
Copy link
Member

ljharb commented Oct 28, 2021

I don’t agree that it should only enter that way - the stacks proposal is merely one way. I agree it should be added on purpose, but i think it’s perfectly fine if structured clone also adds it along with stacks.

@erights
Copy link

erights commented Oct 28, 2021

I don’t agree that it should only enter that way - the stacks proposal is merely one way. I agree it should be added on purpose, but i think it’s perfectly fine if structured clone also adds it along with stacks.

Any such observability change would at least need to go through the stage process, just as the error stack proposal itself needs to.

@erights
Copy link

erights commented Oct 28, 2021

Step 26.4:

  1. Otherwise, for each key in ! EnumerableOwnPropertyNames(value, key):

    1. If ! HasOwnProperty(value, key) is true, then:

How could a key obtained in step 4 fail the test in 4.1? Should 4.1 be an assert rather than a test?

@erights
Copy link

erights commented Oct 28, 2021

I am surprised that it serializes only enumerable own properties. I have no pro or con opinion on this yet; merely surprise. After all, elsewhere it reaches past public interfaces to access internal slots like [[MapData]]. But here it avoids non-enumerable own properties. Why? What's the rationale and history?

@erights
Copy link

erights commented Oct 28, 2021

At https://html.spec.whatwg.org/#structureddeserialize step 12.1:

If targetRealm's corresponding agent cluster is not serialized.[[AgentCluster]], then

is stated using only EcmaScript concepts. Can we fix #2555 (comment) to do likewise?

@bathos
Copy link
Contributor

bathos commented Oct 28, 2021

But here it avoids non-enumerable own properties. Why? What's the rationale and history?

I don’t know the rationale or history, but my sense is — given all the guards that exist just to prevent serializing any platform or intrinsic objects that aren’t specifically permitted and the choice to forbid functions and (problematically) PEOs, that this was “meant” for POJOs. I suspect that if there were actually a meaningful/reliable way to forbid instances of userland classes too, it might have done so.

If you view it that way, it’s consistent with e.g. { ...spreading } and Object.assign(), which also stick with own-enumerables and are generally only friendly to POJOs.

@erights
Copy link

erights commented Oct 28, 2021

13:

This step might throw an exception if there is not enough memory available to create such an ArrayBuffer object.

Is this the only circumstance that can throw at this step? I suspect so. Can we state this explicitly with an assert?

@erights
Copy link

erights commented Oct 28, 2021

Unless I'm missing something, the good news is that deserialization is vastly less problematic than serialization. If true, this is unsurprising. But I felt it is worth stating explicitly.

Are there any problems or concerns with deserialization that I may have missed?

@syg
Copy link
Contributor Author

syg commented Oct 28, 2021

is clearly incompatible with ecma262. How should this be repaired to restore practical membrane transparency?

I'd like to say it's not feasible to change the behavior of structured clone. The most realistic way to repair this is to add host-specific carve outs here, but we're not going to be changing how structured clone already works on the web wrt Proxies (nor in Node, I expect).

Would that be satisfactory?

@syg
Copy link
Contributor Author

syg commented Oct 28, 2021

In general though thank you all for the detailed review comments.

Before I try to synthesize the comments and come up with a concrete PR to review, I'd very much appreciate resolving the question of "are there concerns with the normative behavior" with @erights. To set expectations, normative changes to how structured clone works on the web or node is aren't on the table. I am fine, however, with having host-specific carve outs, if there are normative behavior you think are blockers for taking this into ecma262.

@syg
Copy link
Contributor Author

syg commented Oct 28, 2021

Any such observability change would at least need to go through the stage process, just as the error stack proposal itself needs to.

To clarify, are you suggesting that due to the new observability points added here, taking over structured clone ought to go through the stage process with a call out to these new observability points?

@annevk
Copy link
Member

annevk commented Oct 28, 2021

How could a key obtained in step 4 fail the test in 4.1? Should 4.1 be an assert rather than a test?

The later steps can mutate value to exclude key, but the list returned by EnumerableOwnPropertyNames would still contain it. So you could assert the first key in the list, but not the later ones.

@annevk
Copy link
Member

annevk commented Oct 28, 2021

13:

This step might throw an exception if there is not enough memory available to create such an ArrayBuffer object.

Is this the only circumstance that can throw at this step? I suspect so. Can we state this explicitly with an assert?

Yes it is. This should become explicit when you refactor to use the necessary create abstract operations for these objects.

@annevk
Copy link
Member

annevk commented Oct 28, 2021

At https://html.spec.whatwg.org/#structureddeserialize step 12.1:

If targetRealm's corresponding agent cluster is not serialized.[[AgentCluster]], then

is stated using only EcmaScript concepts. Can we fix #2555 (comment) to do likewise?

No, see #1435 (comment). You could make this more agnostic by having a bit on agent clusters that determines whether or not SAB are serializable in them. HTML could set that bit appropriately. (We could also derive exposure of SharedArrayBuffer on the global from that, but I still somewhat like the idea of eventually just exposing SharedArrayBuffer there all the time, with just serialization being restricted.)

@Jamesernator
Copy link

Jamesernator commented Oct 28, 2021

To be clear this doesn't actually add the structuredClone global right, i.e. its just moving the algorithms into ECMA262 for now? So exposing things like [[ErrorData]] will only be exposed where hosts actually choose to implement APIs using the algorithms?

@bakkot
Copy link
Contributor

bakkot commented Oct 28, 2021

@Jamesernator Correct.

@devsnek
Copy link
Member

devsnek commented Oct 29, 2021

following up on node, it uses v8's default behavior which is an ordinary Error instance with the message "xyz could not be cloned". v8 allows hooking this error which is how chromium throws a dom exception. this matches nicely with throwing a host defined exception. we could also have something like ThrowDataCloneError which html normatively implements.

What is node's use of structured cloning?

normally just for threads, same sorta thing as html, but we also expose a serialization api (https://nodejs.org/api/v8.html#serialization-api)

@dead-claudia
Copy link

Seeing https://es.discourse.group/t/symbol-clone-to-ease-out-structuredclone-implicit-conversion/2035/7 reminded me of this effort. What's the status of this transition?

@WebReflection
Copy link

WebReflection commented Jul 31, 2024

if worth anything, and big imho here:

  • the deserialize topic should be dropped: so far we had only the toJSON (arguably ugly) primitive to prevent data from disappearing from JSON serialization, and that's been enough to date ... because ...
  • the structuredClone algorithm doesn't necessarily run on the same thread, it runs on postMessage operations too, and maybe IndexedDB operations as well
  • reviving any serialized data has always been an implementation detail ... toJSON worked well for receivers of that JSON data that could hook into the JSON revive extra argument utility ... but we don't have a serialize and then unserialize counter part to worry about when it comes to structuredClone, it's a one way operation that is implicitly used in other APIs such as postMessage
  • when it comes to workers communicating to another thread, there's not such thing as revive because instances on that worker might come from WASM targeting PLs, Proxies that can't be revived in the wild, or just classes that don't exist on the other thread / world... any attempt to tackle this would be probably futile as there's no solution to pass classes cross realms, or proxies, or any other reference that needed a mechanism to survive postMessage in the first place
  • having a way to survive this algorithm would already unlock a world of potentials. and because the implementation details are left to engines/vendors, enabling at least a way forward would unlock tons of use cases currently stuck, or needing, this possibility to happen

I wrote a polyfill to help paving the path forward and because the current API is one way, I think we should consider that one way forward: you clone and deliver, what happens next, as long as your data was delivered and didn't throw instead because behind a Proxy, or data that can't be sent, is up to you (developer).

@WebReflection
Copy link

the alternative that comes up to mind is a Clonable primitive, where you Cloncable.srialixe and you Clonable.unserialize things so that Symbol.clonable plays a role and the API offers similar JSON extra arguments features where anyone can hook into that feature, and Symbol.clonable plays a role that same way toJSON did to date.

This is a whole new topic to eventually discuss though, and less relevant around current postMessage operations.

@WebReflection
Copy link

last for me, and sorry for the noise, the structuredClone API already has an option field ... so how about we add a serialize and unserialzie option in there, so that Symbol.serialize would invoke that serializer callback when present, and unserialize would be triggered only if the returning object, after serialize, returned a new instance of Serializable primitive?

That way people just serializing values wouldn't care, people interested into unserializing would, and provide their own returns when any data passes through it.

This would unlock even more worlds, and I'd be more thank happy to provide a better polyfill for that in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests