-
Notifications
You must be signed in to change notification settings - Fork 39
CIDs across the message channels #109
Comments
Keep in mind that
There is no longer a We also no longer cache the So, the new {
version: 0|1,
code: int,
multihash: Uint8Array,
buffer: Uint8Array
}
We have a similar problem in
It’s tempting to use some kind of view on the binary representation of the CID here as well but unfortunately the way we tend to support CIDv0 is through detection that depends on the string encoding. In the new |
What's does the
Different way to frame this issue would be, can we work out a more general solution that can be recognized across the code base instead of needing a codec specific solutions like this.
I think backwards compatibility is obviously important, however I think that we can make progress on better CID representation separate from that, with expectation that codecs may need to do some detection to recognize former encodings like base58btc strings.
I'm not sure this helps with described problem. Were you suggesting to serialize CIDs via |
It’s the full binary representation of the CID as a Uint8Array.
A few years back there was a belief that we could have a “canonical JSON representation” of IPLD and that should be used everywhere as an intermediary format. This is still visible in a lot of the HTTP API’s in IPFS. That turned out not to work so well which is why we defined the “IPLD Data Model” to have top level types for Link (CID) and Binary.
I’m saying that if you want to use if (CID.isCID(value)) value = { “/“: value.toString() } You actually cannot do this with |
So here is rough proposal what we could do to address this problem (I'm making a lot of assumptions here, so please call out when those are incorrect).
What I would like to propose is:
I think this could be achieved by:
With all the above problem described goes away, because things coming on the other end of the message channel would still represent CIDs and be treated as such. Question however is:
|
Are you suggesting that if I do following: const dagCBOR = require('ipld-dag-cbor')
dagCBOR.util.serialize({ link: {'/': stringCID} }) It would be recognized as link ? As far as I can tell it will not. More relevant question (assuming above is Yes or could be made Yes) can we change CID implementation such that once you do following: worker.port.postMessage({ link: new CID(...) }) what comes out on the other end would be: { link: {'/': stringCID} } Or at least something that |
That’s sort of what I assumed we would end up doing. In whatever the communication bridge between the worker is you would need to serialize and deserialize CID’s with a little extra work.
I’m pretty sure this will break a lot of stuff. I’d need some time to work through all the consequences of There are a lot of external consumers of this interface that use the properties you’d be deprecating. So much so that, even though we’re doing a breaking change to Frankly, this is a big step backwards from IPLD’s point of view. This isn’t the first time we’ve had to figure out how to represent a |
This implies a requirement that I don’t believe we have. I would assume that these worker processes aren’t just responding to every random message anyone decides to post. The API will actually be something along the lines of const resp = await ipfsWorker.send({ link: new CID(...) }) And that will turn into something along the lines of worker.port.postMessage({ ipfsMessage: { msgID: id, data: { link: { “/“: stringCID }}}) The other side can then de-serialize that form into a CID instance before it hands it off to other code. I don’t see a compelling reason why all libraries, and the entire ecosystem of modules, needs to move towards accepting the serialized Yes, this means we suffer some extra base encoding/decoding but unless we want to drop support for CIDv0 I think we’re going to end up eating that anyway. |
These are good points. I am also recognizing that overloading What about the following instead:
Which I could be achieved as follows:
Figuring out what the |
I am not suggest to not make |
It is not however some message payloads do not have predefined structures (as they come from user space) and there for could contain
Motivation here is
Getting both would be ideal, but getting even 1 would be good. |
So here is a more concrete example based on input I got here and discussion with @mikeal on a different sync channel. I understand that having a canonical top level type is important for IPLD team and using shapes to recognize types feels like a step backwards. However I think there are ways to achieve a representation that can't be accidentally be mistaken for something else e.g.: class CID extends Blob {
constructor(...args) {
const {version, code, multihash, buffer } = parseCID(...args)
super([ buffer ], { type: "application/cid" })
this.version = version
this.code = code
this.multihash = multihash
this.buffer = buffer
}
// all the existing methods go here
static match(input) {
if (input instanceof CID) {
return input
} else if (input instanceof Blob && input.type === "application/cid") {
return new CID(input)
}
}
}
port.postMessage({ link: new CID(...) })
// comes out on the other end as `blob` with type `'application/cid'`
// Only thing codecs would have to change is instead of doing `CID.isCID(input)` switch to
const cid = CID.match(input)
if (cid) {
// ...
} |
I would like to also note that this is not just about passing things around worker threads. It's bit more general web platform thing, e.g storing things in IndexedDB has exact same constraints as in it goes through structured cloning. I believe that picking a representation that is compatible with structured cloning constraints is going to be win long term. Just as with node |
One thing that came out in conversation with @mikeal
Acyclic nature of DAG could be exploited to represent links that otherwise could not be part of the DAG - Represent it with a cycle. e.g. That way there is no chance that |
@Gozala can you explain this last comment a bit more? I don't understand how this is relevant to the discussion here. fwiw I really like the idea of subclassing Uint8Array but see the current The new js-multiformats work has messed up |
So as I understand main concern with not relying on Which lead me to idea that we could take advantage of asyclic nature of DAG to represent links (CIDs) in a (cyclic) shape that otherwise would be invalid.
The way I see it
I feel like I don't have enough context here. |
Few note here:
|
In the context ipfs/js-ipfs#3022 I'm running into problem with
CID
s that is not too dissimilar from ones that come fromBuffer
s. I would like to bring those up here for a discussion.Problem
When you post things across browsing contexts data browsers perform structured cloning which implies that only handful of things will come out with the same prototype chain on the other end. Including table from MDN below:
Supported types
lastIndex
is not preserved.Blob
File
FileList
ImageBitmap
ImageData
This implies that node
Buffer
s come out asUint8Arrays
(as they are sub-classes) andCID
s come out as plain objects, that is following most of the time:Which is problematic because e.g. dag-cbor encoder will not recognize such structures as
CID
and would encode it as JSON. In other words only way for things to work across message-channel we need either to:DeepCopy+Encode -> Structure Clone -> Traverse+Decode
Structure Clone -> Traverse+Decode
Neither of two is ideal. What would be much better if
CID
had a canonical representation that is not tied to specific class. Kind of howArrayBuffer
represents bytes andUint8Array
is just a view.I am recognizing that this might be almost impossible change to make at this point, however it is still worse having a discussion I think. Furthermore there might be certain compromises that may provide workable compromise. E.g. if all the IPFS internals did recognized canonical representation which is not tied to CID class that would remove need for decode / encode at least on messages going from client to server. I can also imagine that new
Block
API might be made to recognize that canonical representation which would effectively remove need for arbitrary traversals and limit CID encode / decode to very specific API endpoints./cc @mikeal @vmx @achingbrain @hugomrdias @lidel
The text was updated successfully, but these errors were encountered: