-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Milestone Snapshots #236
Conversation
DiscussionAPII've added these new methods to the driver API. They're non-breaking changes. Everything will continue to work even if a driver doesn't implement the new methods. Milestone snapshots are also off by default. In theory, we don't need to expose BackfillingIn its current state, this change will add milestone snapshot functionality, but only for new snapshots. That is, let's assume I already have a database with a document with 100 versions. If I now turn on milestone snapshots with an interval of 10, I will only generate milestones at v110, v120, v130, etc., and still won't have them for v10, v20, v30 ... v100. This isn't ideal, but it's also no worse than the current performance. It may also be excusable in the (common?) cases where users will want to access the most recent snapshots anyway. Note that this case also applies if we ever want to change the interval (ie if I have milestones every 10 versions, but I'd like them every 5 instead, then I'd be missing v5, v15, etc. for old documents). These are the approaches I can think of for this:
Default intervalAt the moment, I've set the default milestone interval to 1,000. This is mostly based on a finger-in-the-wind gut feel (10 is definitely too frequent, 10,000 definitely too infrequent. We probably want something between 100-1,000, and 1,000 is a bit more conservative with regards to disk space). Obviously what a consumer will want will vary depending on server performance stats, typical load, how they want to optimise disk vs speed, etc. Do people agree with 1,000 as a "sensible" default? Or would people even prefer no default at all, and force consumers to pick a value? (I personally think everything should have sensible defaults out of the box, but I know opinion can be divided on this). |
592c392
to
7a4b4c8
Compare
This non-breaking change introduces the concept of "Milestone Snapshots". A milestone snapshot is a snapshot of a document for a given version, which is persisted in the database. The purpose of this is to speed up the `Backend.fetchSnapshot` method, which currently has to fetch all the ops required to build a snapshot from v0. Instead, `fetchSnapshot` can now fetch the most recent, relevant milestone snapshot and build on top of that with fewer ops. In order to do this, the database adapter API has been updated to include two new methods: - `saveMilestoneSnapshot(collection, snapshot, callback): void;` stores the provided snapshot against the collection - `getMilestoneSnapshot(collection, id, version, callback): void` fetches the most recent snapshot whose version is equal to or less than the provided `version` (or the most recent version if version is `null`, in keeping with the `to` argument in `getOps`). The adapter also has the responsibility of saving the appropriate milestone snapshots when a new op is committed.
7a4b4c8
to
1adbe74
Compare
We recently [added Milestone Snapshots][1] to the database adapter API as an optional set of methods for speeding up snapshot fetches for a given version. This change implements those optional methods. [1]: share/sharedb#236
We recently [added Milestone Snapshots][1] to the database adapter API as an optional set of methods for speeding up snapshot fetches for a given version. This change implements those optional methods. [1]: share/sharedb#236
Looks great to me @alecgibson 👍 Re saving milestone snapshots on commit, I think they should be saved asynchronously after commit, and any errors when saving the milestone snapshots should not cause the commit to fail. This way the commit latency and the current implementation would not be affected - after all, milestone snapshots are just an optimization. Re backfilling, my preference would be to "Automagically backfill on Re default interval, I think we should not save any milestone snapshots by default. This way we would not accidentally use excessive storage for users who do use |
Great work! |
@ericyhwang / @nateps as discussed, I have:
@gkubisa agreed with non-blocking. That should have been addressed in these recent changes. However, regarding backfilling, we decided to leave this as an exercise to the reader, because:
|
43cc125
to
55ed07a
Compare
The concept of milestone snapshots can be completely decoupled from the existing database adapter. This change reflects that concept in code by extracting the milestone snapshot methods from the core database adapter and into their own `MilestoneDB` class. This means that a milestone database could be implemented using a completely separate database from the primary ShareDB database. As well as this change, some tweaks to the interface have been made: - Saving a snapshot does not require a callback. It emits an error if no callback is provided. This allows us to make saving a milestone on commit non-blocking, which avoid slowing down commits. - No default interval is provided for storing milestones. A default could be implemented by any adapters extending the base class, which can choose different defaults depending on the implementation.
This change adds a flag to `SubmitRequest`: `saveMilestoneSnapshot`. This flag defaults to `null`. If this value is left as `null`, then milestone snapshots will be written according to the interval set in the options passed into the provided `MilestoneDB` instance. The value can be overridden in middleware, like in the test case in this commit. If the value of `request.saveMilestoneSnapshot` is changed to a `boolean`, then all interval logic is ignored, and the value of that flag is used directly to determine if a milestone should be saved. This allows more complex milestone logic, such as: - saving a milestone once per day - different intervals depending on collection - non-uniform milestone intervals
55ed07a
to
10902b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! Thanks as always for keeping up test coverage.
Comments are mostly minor things. Biggest request is to make callbacks consistently async, and I also posed a question for discussion, around using MilestoneDB or MemoryMilestoneDB as the default.
Nate and I have some time booked tomorrow morning our time, and if he has no additional comments, then I can handle the merge and publish after this round of review!
lib/db/memory.js
Outdated
@@ -1,5 +1,6 @@ | |||
var DB = require('./index'); | |||
var Snapshot = require('../snapshot'); | |||
var ShareDBError = require('../error'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the refactor, this require is now unused and can be removed.
lib/backend.js
Outdated
@@ -24,6 +25,7 @@ function Backend(options) { | |||
this.pubsub = options.pubsub || new MemoryPubSub(); | |||
// This contains any extra databases that can be queried | |||
this.extraDbs = options.extraDbs || {}; | |||
this.milestoneDb = options.milestoneDb || new MemoryMilestoneDB(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit wary of using MemoryMilestoneDB as the default here. I do see that it doesn't store any snapshots unless it's constructed with {interval: <integer>}
, so it won't accumulate snapshots in-memory by default.
Posing a question for discussion - Thoughts on using the base MilestoneDB as the default instead, for virtually no overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was in two minds about this. I used MemoryMilestoneDB
for consistency with MemoryDB
and MemoryPubSub
, but actually MemoryMilestoneDB
just acts like the dumb MilestoneDB
when initialised like this, so it's probably better to just use the base implementation. If you wanted to set an interval, you'd need to re-initialise it and explicitly pass it in anyway.
I'll switch for the base implementation (which was written to be completely harmless anyway), and then it's super clear that nothing's happening by default.
lib/milestone-db/memory.js
Outdated
function MemoryMilestoneDB(options) { | ||
MilestoneDB.call(this, options); | ||
|
||
// Map form collection name -> doc id -> array of milestone snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, form
-> from
lib/milestone-db/memory.js
Outdated
MemoryMilestoneDB.prototype.getMilestoneSnapshot = function (collection, id, version, callback) { | ||
var milestoneSnapshots = this._getMilestoneSnapshotsSync(collection, id); | ||
|
||
let milestoneSnapshot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let
-> var
, as ShareDB code doesn't use ES6 syntax yet - I don't think there's a specifically declared policy on browser compatibility.
Server-only code could certainly start using ES6 features, as the oldest supported version of Node is 6 LTS. It'd be nice to use Map for higher performance than object properties. However, it isn't always obvious what's server-only vs client-only vs shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha whoops, yeah that was just a force of habit from my day-to-day work. I thought this would get picked up by the linter? (It certainly had a go at me for using the spread operator)
lib/milestone-db/index.js
Outdated
emitter.mixin(MilestoneDB); | ||
|
||
MilestoneDB.prototype.close = function(callback) { | ||
if (callback) callback(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base and memory DB implementations aren't super-consistent about ensuring their callbacks are async, but for these two new milestone classes, I think it'd be good to consistently process.nextTick
before calling callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me. I was unsure why I'd seen so many nextTick
s around (I work in TypeScript and avoid callbacks wherever possible!). I had a read of this blog post. Is that pretty much the gist of what we're trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a read of this blog post. Is that pretty much the gist of what we're trying to achieve?
Yup, that's exactly it!
This commit makes some changes requested in the [pull request][1]. Namely: - provide the base class of `MilestoneDB` to `Backend` by default to be super-clear that no milestones will be saved by default - making all callbacks call asynchronously using [`process.nextTick`][2] - cosmetic changes [1]: share#236 [2]: https://jsblog.insiderattack.net/event-loop-best-practices-nodejs-event-loop-part-5-e29b2b50bfe2
@ericyhwang I've made the changes you requested. Please have another look and let me know if there's anything else. Also, I've started work on a Mongo implementation of this. Should it live in the |
We recently [added Milestone Snapshots][1] to the database adapter API as an optional set of methods for speeding up snapshot fetches for a given version. This change implements those optional methods. [1]: share/sharedb#236
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The base bones of `MongoMilestoneDB` are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES7 for easier maintenance, especially when dealing with asynchronous code, as this gives us access to the `async`/`await` notation (although we can't realise its full potential due to needing to deal with consumer callbacks to be consistent with the rest of ShareDB). The move to ES7 drops support for Node v6. This library notably uses v2 of [`mongodb`][3] instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. The version of [`istanbul`][4] is also bumped to the alpha version to [support ES7 syntax][5]. We also move away from `jshint` and use `eslint` with some more aggressive linting. [1]: share/sharedb#236 [2]: https://github.com/share/sharedb-mongo [3]: https://mongodb.github.io/node-mongodb-native/ [4]: https://github.com/gotwarlost/istanbul [5]: gotwarlost/istanbul#733
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The base bones of `MongoMilestoneDB` are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES7 for easier maintenance, especially when dealing with asynchronous code, as this gives us access to the `async`/`await` notation (although we can't realise its full potential due to needing to deal with consumer callbacks to be consistent with the rest of ShareDB). The move to ES7 drops support for Node v6. This library notably uses v2 of [`mongodb`][3] instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. The version of [`istanbul`][4] is also bumped to the alpha version to [support ES7 syntax][5]. We also move away from `jshint` and use `eslint` with some more aggressive linting. [1]: share/sharedb#236 [2]: https://github.com/share/sharedb-mongo [3]: https://mongodb.github.io/node-mongodb-native/ [4]: https://github.com/gotwarlost/istanbul [5]: gotwarlost/istanbul#733
NB: I've thrown together an initial implementation of this for Mongo here: https://github.com/alecgibson/sharedb-milestone-mongo It's (controversially?) written in ES7, but I guess we can discuss that when we get to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nate said he didn't need to take another look, so I'll get this merged and published now.
We probably do want it in the share org eventually. I'm not sure if there's a way open-source projects typically handle this? @nateps - you have any thoughts?
I don't know enough about transfers to have a strong opinion one way or another, and either way you'll get commit authorship credit. With the second option, we'll probably want to add you as a writer to the new repo - as the original author, you'd know the code the best.
Yeah, worth a discussion. There's an argument for keeping syntax consistency within an org's projects so contributors don't have to jump back and forth between coding styles, and if we want to keep supporting, say, IE11, we can't use async functions in any client or shared code. |
Published as [email protected] 🎉 |
I think transferring to the Share org sounds great. I added you as a contributor, so you should be able to make repos. We should keep doing code reviews, and I think we're in a good cadence on that now. As far as ES7 goes, I'd like to avoid it for anything that has browser code and stick to good old ES5. I'd prefer to avoid the need for adding additional compilation complexity. These libraries are low level enough that they should be pretty easy to pull into any other framework or project, so I'd like to keep to very tried and true JavaScript. If it runs on the server, it should work in all versions of Node.js that are still in LTS, so 6+ at this point. We should definitely be consistent on code style within a repo, but I'm OK with having new repos be in more modern code style if it is the direction that we'd like to move in with all repos. If we were going to introduce a compiler requirement, we should discuss which direction we'd like to go: Regular ES7, TypeScript, or something else. But let's start that discussion on a more appropriate thread instead of this PR. |
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
@ericyhwang / @nateps I've opened a pull request for the Mongo Implementation here: share/sharedb-milestone-mongo#1 It's been downgraded to vanilla ES6 to support Node LTS (down to v6). We can continue discussion there. |
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
This is an initial implementation of ShareDB's [`MilestoneDB`][1]. The bare bones of `MongoMilestoneDB` connection logic are based on the [`sharedb-mongo`][2] database adapter. This adapter is a slight departure from other ShareDB code, as it is written in ES6. This is because this code is server-side-only, so we don't need to support legacy browsers and all LTS Node versions support ES6. The move should ease maintenance and understanding of asynchronous code through the use of `Promise`s, which the [`mongodb`][3] library supports out of the box. This library notably uses v2 of `mongodb` instead of the newer v3. This is to stay consistent with `sharedb-mongo`, so in theory the same config (or style of config) can be used in both `sharedb-mongo` and in `sharedb-milestone-mongo`. We also introduce usage of [`eslint`][4] instead of [`jshint`][5], which is used in other projects. `eslint` supports a much, much wider range of style rules and helps to maintain a uniform coding style. [1]: share/sharedb#236 [2]: share/sharedb-mongo [3]: mongodb.github.io/node-mongodb-native [4]: https://eslint.org/ [5]: http://jshint.com/
This non-breaking change introduces the concept of "Milestone
Snapshots". A milestone snapshot is a snapshot of a document for a given
version, which is persisted in the database. The purpose of this is to
speed up the
Backend.fetchSnapshot
method, which currently has tofetch all the ops required to build a snapshot from v0. Instead,
fetchSnapshot
can now fetch the most recent, relevant milestonesnapshot and build on top of that with fewer ops.
In order to do this, the database adapter API has been updated to
include two new methods:
saveMilestoneSnapshot(collection, snapshot, callback): void;
storesthe provided snapshot against the collection
getMilestoneSnapshot(collection, id, version, callback): void
fetches the most recent snapshot whose version is equal to or less
than the provided
version
(or the most recent version if version isnull
, in keeping with theto
argument ingetOps
).The adapter also has the responsibility of saving the appropriate
milestone snapshots when a new op is committed.