-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Implement a model serving framework #1873
Comments
@futurely @piiswrong a straightforward way to deploy mxnet models into production environments would indeed be highly welcome. |
@futurely @revilokeb @piiswrong Hey guys, |
No one is doing it yet. An easy solution is to use AWS Lambda but it doesn't support GPU and doesn't do batching. You are welcome to work on it. Please propose a design and we can discuss it |
@jordan-green you may be interested in opening an issue for mxnet prediction support with https://github.com/beniz/deepdetect as it already has support for Caffe, XGBoost and Tensorflow. It may not be executed immediately, though not too difficult I believe. If you can help a bit with it, it is even better and will happen faster. |
Excited to see this!! |
Hi all, my current gut feeling is that this piece of functionality may be best provided as a standalone project, under a compatible and permissive license (most likely apache), so as to benefit other frameworks also. It would seem that outside of TF Serving, there's not a lot out there. Deep Detect looks interesting @beniz, however it appears to be under the GPL license - can you please confirm? Lambda / OpenWhiskLambda would almost certainly be a great option if it had GPU support, and Amazon will almost certainly provide this in the near future, whether via a different class of lambda or via their new elastic GPU offering (which may be slightly less suited here than the prior). This if of course not an open source solution, and as such may not be the ideal. This had me thinking about other options for implementing a simple, server-less method for hosting inference models, and I think OpenWhisk may suit here. GPU CompatibilityI can't find validation that it works on GPUs, however their generic action invocation appears to run an arbitrary binary via Alpine Linux, which I've used with cuda in the past with some success. I'll spin up an OpenWhisk VM on my GPU box and report back as to whether or not GPUs are accessible, however it's not immediately obvious to me why it shouldn't be. SimplificationFrom there, I think making use of the amalgamation script/s within Mxnet to provide a simple 'runnable' object may be a good approach to providing a simple deployment process to users. This will obviously need performance testing. Mxnet IntegrationI think this could prove to be a powerful tool for many ML frameworks, with MxNet serving as the foundation in places. Perhaps this would best be its own project/repository, mirrored within and closely integrating with Mxnet? Thoughts on this are much appreciated. Please let me know your thoughts, and once I've validated some of the moving pieces, particuarly GPU support on OpenWhisk, I'll knock together a design proposal for further discussion. |
DD is under LGPL, please see https://github.com/beniz/deepdetect/blob/master/COPYING. |
@yuruofeifei and I are working on MXNet Model serving. It's still in early stage. In current phase, it creates a http end point and allows developers to fully customize their preprocess and post process function for inference. In the future stage, more powerful functions will be added. |
as @tqchen suggested in soumith/convnet-benchmarks#101 (comment) to compete with https://github.com/tensorflow/serving.
The text was updated successfully, but these errors were encountered: