Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document version fetching #218

Closed
alecgibson opened this issue Jun 26, 2018 · 11 comments
Closed

Document version fetching #218

alecgibson opened this issue Jun 26, 2018 · 11 comments

Comments

@alecgibson
Copy link
Collaborator

Problem statement

As a ShareDb client, I want to fetch a read-only snapshot of my document at a given version number or timestamp.

Motivation

By the time we're going to all the effort of storing a set of deltas between document versions, it's only natural that a client would wish to leverage this power to view a document at any point in its history.

The problem statement mentions fetching a document at a given timestamp, because it is far more natural to request a document at a particular time, than at a given (arbitrary) version number.

API

The proposed API is to add two methods to the Doc class:

  • Doc.prototype.fetchVersion(version, callback) takes version, which is a number, and recreates the snapshot up to that version number. The result is stored in doc.data
  • Doc.prototype.fetchAtTime(time, callback) takes time, which is a Date, and recreates the snapshot using ops whose timestamps are up-to-and-including that Date. The result is stored in doc.data

Implementation details

Data flow

The request for the document version will be submitted like the existing fetch function - by submitting an event to the server, and attaching a callback.

The message will be picked up by Agent._handleMessage, which will then leverage Backend.getOps to fetch the requested ops.

We may need to make a small change to Backend.getOps to let us request metadata from the backend using the options object. As discussed in this Pull Request, this will be done in such a way that keeps the option object out of the public API (probably by creating an internal Backend._getOps method that can take an options object, and calling it with null from Backend.getOps).

Discussion

Read-only snapshots

Using the Doc class is potentially a leaky concern, given that it will also have Doc.prototype.submitOp, which doesn't really make sense when fetching an historical document.

A possible alternative could be to expose these functions instead on the Connection class? That way it should be very clear that the consumer is receiving a snapshot, and not a full-blown Doc?

Out of scope

The following possibilities are deemed out-of-scope for the initial solution.

Optimising for reversible types

Making a type reversible is optional. As such, any solution must at least be able to construct a document from its initial version, and build up. However, given the nature of documents, it is highly likely that users will wish to return to more recent versions, where it will probably be faster to start from the current version and work backwards.

This is deemed out-of-scope.

Caching ops

Fetching ops can be expensive. Ideally we would cache the ops for a given document, and - so long as the requested version/timestamp is lower than the latest op - then we could simply read ops back from the cache.

This is deemed out-of-scope.

Other starting snapshot optimisations

It could be possible to fetch the latest create op, and start building from there, instead of the very beginning. It might also (theoretically) be possible to store intermittent snapshots of the document for faster reconstruction at a trade-off with space.

These sorts of optimisations are also deemed out-of-scope.

Doing the work

Given that we need this functionality, I'm happy to undertake the majority of the work on this, but I haven't developed in this codebase before, so may need some assistance (especially because I haven't really worked with all the features of ShareDb, such as projections).

@gkubisa
Copy link
Contributor

gkubisa commented Jun 26, 2018

Great proposal, @alecgibson !

API

An alternative would be to support passing options to fetch, eg: fetch({ version, timestamp }, callback). It seems a bit cleaner to me than adding new methods to Doc which do essentially the same thing. Additionally, we could easily add support for more options in the future.

Implementation details

Data flow

It's not completely clear to me what data you're planning to send to the client, when a specific version is requested. It would be nice if the server sent the requested snapshot, rather then the operations leading up to that snapshot. It could drastically limit the amount of data that needs to be transferred and would enable adding the optimizations that are currently out of scope.

Discussion

Read-only snapshots

Using the Doc class is potentially a leaky concern, given that it will also have Doc.prototype.submitOp, which doesn't really make sense when fetching an historical document.

Even if a "historical" snapshot is loaded into a Doc, it would still be possible to successfully submit new operations - they would simply be transformed against the newer operations stored on the server. I wouldn't necessarily recommend submitting operations for historical versions, however, it's certainly possible.

A possible alternative could be to expose these functions instead on the Connection class? That way it should be very clear that the consumer is receiving a snapshot, and not a full-blown Doc?

It would avoid the problem of shared documents mentioned below. It would also prevent Doc actions which are not recommended on historical snapshots, even though those actions are possible, as I explained above.

Overall I like this idea. What kind of API do you have in mind? Whatever it is, I think it should be consistent with Connection.prototype.get/Doc and Connection.prototype.query/Query.

Sharing of Documents

Another thing to consider is sharing of the Doc instances. It can already cause problems in some cases - see the "Bigger Problem" section at #204 (comment). Similar problems would arise out of loading historical versions into shared documents.

Doing the work

I'm certainly available to help with the implementation. :-)

@alecgibson
Copy link
Collaborator Author

Okay, so I definitely think we should move this logic away from the Doc class. The Doc class is concerned with "live" document operations, such as subscriptions and op submission, neither of which make any sense to historical snapshots.

New API suggestion

How about we put it on Connection.prototype.getSnapshot(collection, id, { version, time }, callback)?

This will return a new Snapshot object, which will look similar to a Doc, but have a subset of its properties (in future, perhaps Doc could extend/have a Snapshot?). Importantly, it will also have no methods on it (it's just a POJO), which helps to reinforce the fact that you're looking at a historical snapshot that cannot be updated by the user, or by the server.

class Snapshot {
  data: any;
  id: string;
  collection: string;
  type: any;
  version: number;
  timestamp: Date; // NB: Not actually currently on Doc, but would be useful
}

Naming

I'm aware that Snapshot could be a possibly overloaded term. I make the argument that in this case, however, it's an accurate description of the object we're returning, and - indeed - a Doc could be considered to have an instance of a Snapshot at any given time, which might also help to firm up some logical boundaries in the code?

That being said, I'm all ears for an alternative name for this object to reduce any confusion or assuage anyone's fears. Possible alternatives:

  • ReadOnlyDoc (sounds too close to Doc to me)
  • DocVersion (again, sounds too related to Doc)
  • SnapshotVersion (my main argument is that all snapshots are technically a single version anyway)
  • Version (too generic - doesn't point to the fact that it has data, and doesn't just represent eg a number)
  • HistoricalDoc/HistoricalSnapshot (does a good job of conveying that it's old and probably read-only, although I still think it might be overcomplicated vs. Snapshot?)
  • DocData/SnapshotData (clearly conveys that we're just dealing with data)
  • other/some combination of above?

@gkubisa
Copy link
Contributor

gkubisa commented Jun 27, 2018

Looks great to me!

Just to clarify the API, is it like this: Connection.prototype.getSnapshot(collection: string, id: string, { version: number, timestamp: Date }, function(Error, Snapshot): undefined): undefined?

What do you think of timestamp being a number rather than a Date? I think I'd prefer a number because it'd be easier to deal with when communicating with the server. Are there any particular reasons to use Date?

@alecgibson
Copy link
Collaborator Author

Yes, that's exactly what I'm picturing for API.

I was thinking that a Date is a more client-friendly API, because one often thinks in dates, and then you can do things like:

connection.getSnapshot("collection", "123", { timestamp: new Date("2018-06-27") }, () => {})

Of course internally we could convert that Date object into a simple number timestamp if you want?

@gkubisa
Copy link
Contributor

gkubisa commented Jun 27, 2018

I'd prefer to use a number internally (client-server communication) to avoid the overhead of generating and parsing date strings.

In the public API maybe we could support a number and Date, and maybe also momentjs? Anyway, it's a detail. I think we agree on the most important aspects of the solution, so it should be fine to start the implementation. 👍

@alecgibson
Copy link
Collaborator Author

Yeah sounds good. We can discuss time interface properly on the pull request. I'll probably start with just the number and Date for now, because they're standard. While I recognise that momentjs is popular, it's not a standard, and it feels a little weird to me to support it, but as I say we can discuss later. I'm hoping to get cracking on a solution at the beginning of next week, so watch this space for a Pull Request!

@dcharbonnier
Copy link
Contributor

Number and Date would be a good solution, personally I would not add momentjs that is a great library but like many others is over used and I would not encourage people to use it or even worse introduce this dependency on sharedb even as a dev dependency.

@alecgibson
Copy link
Collaborator Author

@gkubisa I've started looking into this, and thinking about it a bit more. I'm not sure that defining the version through an object is the correct thing to do. For example, what happens when a user calls the function with the following:

{
  timestamp: new Date("2018-06-29"),
  version: 4
}

We could return an error, but if the point is extensibility, then error checking surely gets more and more annoying the more ways we have of defining the version:

if ( (version && timestamp) || (version && newDefinition) || (timestamp && newDefinition) ) return callback(error);

If we really don't want to have multiple methods, how about we just have our argument accept either a version number or a Date?

Connection.prototype.getSnapshot = function(collection, id, version, callback) {
  if (typeof version === 'number') {
    // it's a version number
  } else if (version instanceof Date) {
    // it's a date
  } else {
    return callback({ message: 'Version must be a version number or a Date object' });
  }
};

While we're here - how do error codes work? If an error doesn't fit into an existing code, do I just claim a new one and record it in the README?

@alecgibson
Copy link
Collaborator Author

Also, regarding deleted documents, what's the expected behaviour? If a document is currently deleted (ie its final op is a deletion op), then do we tell the client that the document is deleted, regardless of version they try to fetch?

@alecgibson
Copy link
Collaborator Author

Okay, I've raised a PR here: #220
I suggest further discussion happens there.

@alecgibson
Copy link
Collaborator Author

This feature was added in this PR: #220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants