Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Userland transferable/cloneable objects #37080

Open
jasnell opened this issue Jan 26, 2021 · 20 comments
Open

Userland transferable/cloneable objects #37080

jasnell opened this issue Jan 26, 2021 · 20 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. discuss Issues opened for discussions and feedbacks. feature request Issues that request new features to be added to Node.js. node-api Issues and PRs related to the Node-API. worker Issues and PRs related to Worker support.

Comments

@jasnell
Copy link
Member

jasnell commented Jan 26, 2021

@addaleax @nodejs/n-api @nodejs/workers ...

Currently there is no way for userland code to define their own cloneable/transferable objects that work with the MessageChannel/MessagePort API.

For instance,

class Foo {
  a = 1;
}

const foo = new Foo();

const mc = new MessageChannel();
mc.port1.onmessage = ({ data }) => console.log(data);   // outputs {}
mc.port2.postMessage(foo);

The structured clone algorithm takes the Foo object and clones it as an ordinary JavaScript object that has lost information.

With Node.js built in objects we are able to get around this because at the native layer, node::BaseObject is a "host object" in v8 terms, allowing us to delegate serialization and deserialization to our own provider. We use this, for instance, to transfer KeyObject, X509Certificate, and FileHandle objects in a way that just works.

The node::BaseObject base class has a set of APIs that allow an object to declare itself as being cloneable or transferable, along with functions that provide the serialization and deserialization delegates. For JavaScript objects, we provide the JSTransferable base class that derives from node::BaseObject. The JavaScript KeyObject, X509Certificate and FileHandle objects derive from JSTransferable. On deserialization of a cloned or transfered object, JSTransferable will parse information out of the serialized image and use that to require() and create the implementation class on the receiving side. It all just works.

The challenge, however, is that user code cannot use either node::BaseObject or JSTransferable.

On the JavaScript side, JSTransferable is internal only, and is not exposed to userland code. Even if it was exposed to userland code, it wouldn't work because the internal require() function it uses during deserialization only sees the Node.js built in modules and will not resolve userland code. Also, it uses require() and would not work with ESM modules.

On the C++ side, the node-api napi_define_class() does not define the JavaScript/C++ object wrapper in the same way as what we use internally with BaseObject. Specifically, node-api wrapper objects are not seen as host objects by v8 and therefore will use the default structured clone algorithm to clone only the enumerable JavaScript object properties on the wrapper object. Because these objects would need to provide their own serialization/deserialization logic it's not clear if napi_define_class() can be modified in a backwards compatible way such that the objects can be host objects.

What I'd like to be able to achieve is efficiently and easily creating both JavaScript classes and native wrapper objects that can be cloned/transferred intact over MessagePort but it's not clear at all exactly how we should do it.

There is an argument that can be made for Just Using ArrayBuffers(tm). Specifically, have user code serialize it's objects into an ArrayBuffer, then just transfer/copy that ArrayBuffer over the MessagePort and manually deserialize on the other side. That's obviously far less efficient. For comparison, consider how we handle other native objects like KeyObject and X509Certificate. For those, we are able to clone the JavaScript wrapper while avoiding the cost of copying and reallocating the underlying native data structures. The custom serialization/deserialization/copy is far less efficient.

I'm opening this issue to see if we can discuss and identify an approach to implementing this that would work, as the fix is not clear and touches on several subsystems (node-api, messaging, workers, etc).

@jasnell jasnell added c++ Issues and PRs that require attention from people who are familiar with C++. discuss Issues opened for discussions and feedbacks. node-api Issues and PRs related to the Node-API. worker Issues and PRs related to Worker support. labels Jan 26, 2021
@targos
Copy link
Member

targos commented Jan 26, 2021

Should we try to think about what would be the best API from the user's perspective first (without checking if it's implementable at all)?

@mhdawson
Copy link
Member

@targos that seems like a good starting point and then go from there towards something we can implement.

@jasnell
Copy link
Member Author

jasnell commented Jan 26, 2021

To start looking at the API, here are examples of how this is implemented currently inside core...

First, from the JavaScript side:

// This is the public class... note the constructor
class Whatever extends JSTransferable {
  constructor(a, b) {
    this.a = a;
    this.b = b;
  }

  [kClone]() {
    // The data property here contains the data that is included
    // in the serialized buffer. The deserializeInfo is used on
    // the receiving end to identify the internal class to create
    // when deserializing. Note that here we identify InternalWhatever.
    // We do that because when the object is deserialized, we will
    // use a default constructor then set the properties separately
    // using the kDeserialize function, rather than calling the
    // constructor above.
    return {
      data: {
        a: this.a,
        b: this.b,
      },
      deserializeInfo: 'internal/whatever:InternalWhatever'
    };
  }

  [kDeserialize]({ a, b }) {
    this.a = a;
    this.b = b;
  }
}

class InternalWhatever extends JSTransferable {}

InternalWhatever.prototype.constructor = Whatever;
ObjectSetPrototypeOf(InternalWhatever.prototype, Whatever.prototype);

module.exports = {
  Whatever,
  InternalWhatever,
};

Then, on the C++ side

class WhateverObject : public BaseObject {
 public:
  // ...

  class WhateverTransferData : public worker::TransferData {
   public:
    explicit WhateverTransferData(const std::shared_ptr<WhateverData>& data)
        : data_(data) {}

    BaseObjectPtr<BaseObject> Deserialize(
        Environment* env,
        v8::Local<v8::Context> context,
        std::unique_ptr<worker::TransferData> self) override;

    // ...

   private:
    std::shared_ptr<WhateverData> data_;
  };

  BaseObject::TransferMode GetTransferMode() const override;
  std::unique_ptr<worker::TransferData> CloneForMessaging() const override;

 private:
  std::shared_ptr<WhateverData> handle_data_;
};

For the JavaScript-side, from a userland perspective, what I'd like to have is something very similar, except perhaps without actually requiring the extends JSTransferable inheritance.

const {
  clone_symbol: kClone,
  deserialize_symbol: kDeserialize
} = require('worker_threads');

class Foo {
  [kClone]() {
    return {
      data: { /*...*/ },
      deserializeInfo: 'foo:InternalFoo'
    };
  }

  [kDeserialize]({ /* ... */ }) {
    /* ... */
  }
}

class InternalFoo {}
InternalFoo.prototype.construtor = Foo;
Object.setPrototypeOf(InternalFoo.prototype, Foo.prototype);

module.exports = {
  Foo,
  InternalFoo,
};

When an instance of Foo is passed in to postMessage(), it would be fantastic if the fact that it exposes [kClone] and [kDeserialize] functions would have it be recognized as a host object (using a BaseObject delegate).

For the node-api side, I'd like something similar, using a napi_define_transferable_class() method to define a wrap object class that is cloneable or transferable, and having that Just Work(tm) with the wrapper class exposing functions similar to GetTransferMode/CloneForMessaging...

NAPI_CALL_RETURN_VOID(env, napi_define_transferable_class(
    env, "MyObject", -1, New, nullptr,
    sizeof(properties) / sizeof(napi_property_descriptor),
    properties, &cons));

class MyObject {
 public:
  static void Init(napi_env env, napi_value exports);
  static void Destructor(napi_env env, void* nativeObject, void* finalize_hint);

  static napi_value GetTransferMode(napi_env env, void* nativeObject);
  static napi_value CloneForMessaging(napi_env, void* nativeObject);
  // etc...

 private:
  explicit MyObject(double value_ = 0);
  ~MyObject();

  static napi_value New(napi_env env, napi_callback_info info);
  static napi_value GetValue(napi_env env, napi_callback_info info);
  static napi_value SetValue(napi_env env, napi_callback_info info);
  static napi_value PlusOne(napi_env env, napi_callback_info info);
  static napi_value Multiply(napi_env env, napi_callback_info info);
  static napi_ref constructor;
  double value_;
  napi_env env_;
  napi_ref wrapper_;
};

There would obviously be some details to work through in that but having this should give a start.

@jasnell
Copy link
Member Author

jasnell commented Jan 26, 2021

On the javascript side, another option is a Symbol based delegate approach...

const { transfer_handler_symbol: kTransferHandler } = require('worker_threads');

class FooTransferHandler {
  clone() { }
  deserialize() { }
}

class Foo {
  static get [kTransferHandler]() { return new FooTransferHandler(); }
}

@addaleax
Copy link
Member

So … a Symbol-based approach is something that I would consider ideal as well, and I would have picked it for Node.js core if it was technically feasible. However, without support from V8, this is effectively a non-starter – as it stands, the only way to give an object special treatment for transfer is to make it inherit from a C++-backed object.

It’s also not obvious how we would deal with this in terms of how to deserialize – one of the reasons why we only support require() with internal methods is that those are the only ones that we can assume to be present in all execution contexts without relying on external factors (because userland require() depends on the module from which it was called, or at the very least a specific on-disk layout).

On the native side, this is easier from a technical point of view, and providing a C++-style API for this seems like something that would be doable to some degree. For a N-API-based approach, I feel like things become a bit less clear, in particular because N-API uses napi_env as its main gateway for communicating with the Node.js source code – but napi_env is bound to a specific v8::Context/v8::Isolate/add-on instance. How one for the target environment would be created seems to be a tricky question (for example: if we automatically were to create a napi_env, how would the napi_(get|set)_instance_data for that look like? What would its lifetime be?).

There’s also the added difficulty of dealing with addon lifetimes in general – i.e. we’ll need to ensure that the addon isn’t unloaded before the transferred object has reached its target environment.

So, lots of challenges here, and a potentially wide API surface – we could do something here if we want to, but maybe it makes sense to first have a conversation with the V8 team about what we could do in terms of adding Symbol-based transferable support?

@jasnell
Copy link
Member Author

jasnell commented Jan 26, 2021

... but maybe it makes sense to first have a conversation with the V8 team about what we could do in terms of adding Symbol-based transferable support?

+1 to this for sure. /cc @nodejs/v8

@jimmywarting
Copy link

Don't you quite often extend your class on something else?

class Foo extends EventTarget { }

Maybe symbol would be a better alternative?

@jasnell
Copy link
Member Author

jasnell commented Mar 10, 2021

Yes, the inheritance is an issue. A symbol approach would be ideal but won't work until v8 supports it.

@marjakh
Copy link
Contributor

marjakh commented Mar 11, 2021

Hi from v8,

If I understand this correctly, you'd like to have a special Symbol you can install on an object, and when that object gets transferred, some user-defined JS will be called, and that will then return something transferable; on the other side that transferable is given to your other function for deserialization. But you're not looking into extending transferability e.g., in a way that would allow you to transfer functions and have them magically work. Is that roughly correct?

Extending the V8 API via a Symbol like that sounds possible ( cc @camillobruni for API).

@jimmywarting
Copy link

jimmywarting commented Mar 11, 2021

I would like for this to work in browser also with postMessages.

There are related work done here:
https://github.com/littledan/serializable-objects
https://github.com/Microsoft/napajs/blob/master/docs/api/transport.md
it's in Deno's Q1 2021 plans denoland/deno#3557 & denoland/deno#9458

i thought napajs sounds cool

For user classes that implement the Transportable interface, Napa uses Constructor ID (cid) to lookup constructors for creating a right object from a string payload. cid is marshalled as a part of the payload. During unmarshalling, the transport layer will extract the cid, create an object instance using the constructor associated with it, and then call unmarshall on the object.

Means that you can clone/serialize mostly anything with a json rest API, putting things in indexedDB/localStorage for instances.

@marjakh
Copy link
Contributor

marjakh commented Mar 11, 2021

Taking a step back, I'll ask a fundamental(ly stupid) question: Since Node implements the MessageChannel.postMessage function, why can't that function inspect the incoming object and see if it has the special Symbol installed? Why would it need to be plugged into V8?

@jasnell
Copy link
Member Author

jasnell commented Mar 11, 2021

@marjakh :

you'd like to have a special Symbol you can install on an object, and when that object gets transferred, some user-defined JS will be called, and that will then return something transferable; on the other side that transferable is given to your other function for deserialization. But you're not looking into extending transferability e.g., in a way that would allow you to transfer functions and have them magically work. Is that roughly correct?

Yes. Take for instance an object like URL, which is generally not cloneable because it relies on the URL class. V8's current structured clone implementation currently only looks to see if it's been handed a host object, and if it does, allows our serialization delegate to take over. If it's not a host object, then V8 just applies the default clone algorithm that handles it generically. This new symbol would give an additional option for indicating that a serialization/deserialization delegate should be used as well as provide that delegate. There are definitely some details to work through on how we would identify the deserializer to use.

Since Node implements the MessageChannel.postMessage function, why can't that function inspect the incoming object and see if it has the special Symbol installed? Why would it need to be plugged into V8?

Let's say we make the URL object cloneable using the symbol, then attach it as a property to an ordinary object:

const a = {
  b: new URL('https://example.org/foo')
};

mc.port1.postMessage(a);

The Node.js postMessage() implementation passes a off to v8's structured clone algorithm for handling and does not walk the structure at all. V8 takes over and will only engage the serialization delegate if/when it encounters a host object.

Node.js could walk the objects but only with either a significant performance cost or at the cost of implementing a whole new structured clone implementation.

@targos
Copy link
Member

targos commented Mar 11, 2021

It would also be nice if whatever we end up with also works with the Serialization API

@addaleax
Copy link
Member

@targos The good part is that I think that that would automatically be the case. :)

@marjakh
Copy link
Contributor

marjakh commented Mar 12, 2021

I see; in that case, extending V8's structured clone algorithm for Node's purposes sounds okay-ish.

However, if you'd like a unified solution that also works in the browsers, then the way to go would be to advance the standardization efforts of e.g., https://github.com/littledan/serializable-objects .

@benjamingr
Copy link
Member

I see; in that case, extending V8's structured clone algorithm for Node's purposes sounds okay-ish.

The way I see it, such work would help existing standardisation efforts and provide feedback to the standardisation process.

So by providing Node.js such an API to use as a consumer in our APIs we will also be eventually providing valuable feedback to browsers.

@marjakh
Copy link
Contributor

marjakh commented Mar 15, 2021

cc @syg

@marjakh
Copy link
Contributor

marjakh commented Mar 15, 2021

We should then probably pick the most promising direction (standardization-wise) for an implementation candidate. E.g., if it is the "Serializable objects", V8 could implement the interface behind a flag that Node then passes. Should be ok to impl before it actually is standardized, or wdyt, @syg ? (Ditto for the Symbol approach - but if that's not the way it's going to be standardized, we shouldn't probably implement that.)

@addaleax
Copy link
Member

Fwiw, the serializable-objects repository you mentioned above is basically exactly what we’d be looking for, with a lot of the same design considerations (although things like cross-origin handling are less relevant for Node.js).

@mhdawson
Copy link
Member

We discussed in the node-api team meeting today and it still seems applicable/would be good to investigate.

@mhdawson mhdawson added the feature request Issues that request new features to be added to Node.js. label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. discuss Issues opened for discussions and feedbacks. feature request Issues that request new features to be added to Node.js. node-api Issues and PRs related to the Node-API. worker Issues and PRs related to Worker support.
Projects
Status: Awaiting Triage
Development

No branches or pull requests

7 participants