-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor "Worker", "Dispatcher", "Actor" into a general purpose "PooledWorker" utility #3034
Comments
I think "stateless" is going to be a tough challenge here, because: (a) efficient communication of large amounts of data to/from worker threads is limited to ArrayBuffers (is this true?); (b) Parsing ArrayBuffers (i.e. VT pbf data) is moderately expensive, so that it's desirable to cache the parsed representation for reuse, e.g. when reloading a tile. @lucaswoj do you have a vision for how we might manage this? |
My vision (originally @ansis' vision) is to use |
Oh, cool. That makes sense... and definitely seems more tractable now than it did at midnight! |
Did a quick scan and couldn't find any existing WebWorker libraries that might give us the functionality we need. |
I'm not necessarily intent on this being stateless, so long as state is somehow associated with the caller's scope rather than managed globally within the worker. The thinner the abstraction the better. // test.js
var PooledWorker = require('../util/pooled_worker');
var worker = new PooledWorker(require('./test_worker'));
worker.send('marco');
worker.on('polo', function() {
worker.send('marco');
});
// test_worker.js
module.exports = function() {
this.on('marco', function() {
this.send('polo');
});
} This is almost exactly the |
@lucaswoj Can you please make sure that the following issue is also handled as part of Worker refactoring? It will have a huge impact on productivity.
|
I don't think this is possible to fix, since the worker source is not persistent in the browser — with every reload, there are new sources generated through Blob URLs. A simple workaround is using |
@mourner Are you suggesting that it is acceptable/normal to modify the source code every time one needs to debug a web worker in GL JS codebase? Is this how everyone normally debugs when they have to analyze any issues pertaining to this? |
@mb12 yes. Also see browserify/webworkify#9 (comment) for a workaround that doesn't require modifying the source. |
I have begun implementing the API proposed in #3034 (comment). It is doable, though it requires some changes to |
@lucaswoj I can take on webworkify, what changes do you need specifically? |
The changes we'd need to
Do you think these make sense to implement in |
@lucaswoj it's not clear to me that it will be possible to make the require/importScripts thing work transparently for 3rd party modules, because the script source => blob URL transformation needs to happen at I think a custom/forked implementation of webworkify might be a good way to go. Right now, webworkify expects a module exporting Instead, we could have a gl-js-specific variation of it that expects a module with a more specific, useful signature -- e.g. something like ^ a 3rd party worker-side module would then look like: var worker = requre('mapbox-gl-js-worker')
module.exports = worker(function (uid, style) { ... } ) |
Agreed. 3rd party modules will need to be wrapped in something like
That's the general idea! Our goal is to build a general purpose utility. The architecture will be cleanest if the concept of styles are introduced at a higher layer of abstraction. Rather than exposing methods targetable from the main thread, workers will send / receive messages. This will result in the simplest implementation because it closely mirrors how I sketched my proposed interface above #3034 (comment) (recently updated)
Beautiful. 👍 |
@lucaswoj a GL-JS-specific webworkify variation sounds good. It's a pretty simple module, and some of the code that GL JS doesn't need could be removed.
I didn't really understand this part. Can you give an example? |
👍
Suppose we have 3 modules: Both // worker_foo.js
require('./util');
...
// worker_bar.js
require('./util');
... If we instantiate both PooledWorker.workerCount = 1;
var workerFoo = new PooledWorker(require('./worker_foo'));
var workerBar = new PooledWorker(require('./worker_bar')); |
If we're going the route of a GL-JS-specific webworkify variation, I'll keep working on this project. I've got a fair bit of code and momentum on this project already 😄 |
@lucaswoj I think deduping dependencies will be quite a tricky problem, but I agree it would be pretty awesome. Might be worth looking into https://github.com/substack/factor-bundle |
@lucaswoj what do you mean by "transfer the code"? Do you want to create just one global worker blob with the same contents no matter how many different "pooled workers" are used? |
Not exactly. I want to transfer code lazily as PooledWorker.workerCount = 1;
// transfers the code for "worker_foo" and "util" modules to the Worker
var workerFoo = new PooledWorker(require('./worker_foo'));
// transfers the code for "worker_bar" module to the Worker
var workerBar = new PooledWorker(require('./worker_bar')); Does that answer your question @mourner? |
@lucaswoj Thanks! I think I start to grasp the concept... So the code is going to be transferred by creating a blob URL that is later loaded with I think I have a good idea of how to implement bundling/deduping logic after getting into webworkify/browserify internals for that recent PR — if you need a hand on this, I could work on it tomorrow. How much do you have already implemented locally? Want to push anything? |
Yes, although it would be a breaking change that webpack users may not like. @tmcw Do you think it would be fine? |
Published an initial implementation of the worker pool module here: https://github.com/mapbox/workerpoolify. A few notes:
Waiting for your feedback @lucaswoj @jfirebaugh. |
👍 on all the points above @mourner. I appreciate the simplicity of the API and state sharing solutions. I'll do a quick code review of the |
Notes from a read-through of
|
@lucaswoj awesome feedback, thank you! I agree 100% with all terminology points. The module has some rough edges too — I wanted to get it working first to get early feedback.
Ah, typo — I meant
I think so. Were there some problems with
For the sake of simplicity. In practice, I think this won't matter because in 99.9% of cases all the dependencies will get there anyway — if the amount of pooled workers is not less than the amount of native workers, all deps will be across all the pool even if you track independently.
This is not ideal, but the only alternative I see is requiring all workers to inherit from a base prototype (that has I think an acceptable solution would be to just throw an error if such methods already exist before overwriting them, and make a note in the docs. |
Yeah - the issue was with Edge (browserify/webworkify#26), and led to backing off of automatically calling revokeObjectURL, in favor of simply exposing the generated objectURL so that client code could decide when it was appropriate to revoke it (e.g., perhaps after getting a first message back from the worker?) I love seeing this becoming a reality as a general-purpose module! |
@lucaswoj addressed all your terminology points, added throwing an error if you try to define a custom
|
@mourner This looks great! The API feedback I have is all around polishing the rough edges:
Do you have ideas for how to handle broadcasts / shared state? That sounds like an interesting additional challenge. |
This is an assumption tied to the GL JS usage patterns. GL JS's usage patterns may change with the introduction of this new architecture. |
I think there may be some misunderstanding here. The case I'm worried about is when a user creates two workers with the same var Worker = self.require(data.moduleId);
Worker.send = send;
Worker.workerId = data.workerId;
workerCache[data.workerId] = new Worker(); |
@lucaswoj oh, I see your point. You're right about overwriting var Worker = self.require(data.moduleId);
var worker = workerCache[data.workerId] = new Worker();
worker.send = send;
worker.workerId = data.workerId; I wanted module.exports = function Worker() {
// do stuff
this.send(...);
}; To make it work, I'd probably have to either make var WorkerClass = self.require(data.moduleId);
function Worker() {
WorkerClass.apply(this, arguments);
}
Worker.prototype = Object.create(WorkerClass.prototype);
Worker.prototype.send = send;
Worker.prototype.workerId = data.workerId;
workerCache[data.workerId] = new Worker(); Another option would be to ban calling @jfirebaugh thanks, good feedback! I'll make it a factory.
This is the one I'm not sure about — would you duplicate the code from evented.js in the library, or pick some other It's too bad that native
I'm thinking of creating something like a |
Do you think we could get away with passing
|
@lucaswoj this would make the API clunkier; would you need to always do module.exports = function Worker(send) {
this.send = send;
};
Worker.prototype.doStuff = () {
this.send('foo');
}; |
Could you get away with a minimal Worker.prototype.addEventListener = function (type, listener) {
if (type !== 'message') {
return nativeWorker.addEventListener(type, listener);
}
listener.__id__ = listener.__id__ || unique++;
listeners[this.workerId][listener.__id__] = listener;
}
Worker.prototype.removeEventListener = function (type, listener) {
if (type !== 'message') {
return nativeWorker.removeEventListener(type, listener);
}
delete listeners[this.workerId][listener.__id__];
} However, on the worker side, the typical code would be Anyway, not 100% sure what the best solution is, mainly just brainstorming. |
I agree that requiring A third option here is to use composition rather than inheritance: // workerpoolify
var createWorker = self.require(data.moduleId);
var worker = workerCache[data.workerId] = createWorker({send, workerId, on});
// worker implementation
module.exports = function(worker) {
worker.on('marco', function() {
worker.send('polo');
});
};
This might be acceptable. |
@lucaswoj composition is an option too. I'll try the subclass option first to see whether it works well — it doesn't appear to have any drawbacks apart from a potentially small performance overhead, but we can switch this at any time if we discover problems. @jfirebaugh Since targeting a full drop-in Worker replacement is non-trivial, I would opt for a barebones implementation with |
Did a bunch of work on
The only major thing left to address is the shared state challenge. I'm going to try approaching this as described in #3034 (comment), @jfirebaugh do you have anything to add/suggest on this? After it's solved, we can try refactoring GL JS with |
Looks great @mourner! Thank you. |
@jfirebaugh I just realized that we can implement shared state without any special classes. All it takes is creating N pooled worker instances (where N is the number of native workers) for setting shared state. They will be instantiated in each native worker, and we'll be able to broadcast changes by sending data from each instance in a simple loop. |
Outline of a potential future architecture using a worker pool:
|
If we can refactor
Worker
,Dispatcher
, andActor
into a general purpose background processing utility (or find such an existing utility), GL JS's architecture will simplify nicely, especially around custom sources. We will remove a lot of unnecessary coupling and pave the way for new background processing features like #1042.The text was updated successfully, but these errors were encountered: