Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration Plan (September - October 2017) #2410

Closed
3 of 19 tasks
cha-zhang opened this issue Sep 25, 2017 · 49 comments
Closed
3 of 19 tasks

Iteration Plan (September - October 2017) #2410

cha-zhang opened this issue Sep 25, 2017 · 49 comments

Comments

@cha-zhang
Copy link
Member

cha-zhang commented Sep 25, 2017

This plan captures our work from mid September to end of October. We will ship around November 22nd. Major work items of this iteration include ONNX support in CNTK, MKL integration, and many others.

Endgame

  • November 8: Code freeze for the end game
  • November 22: Release date

Planned items

We plan to ship these items at the end of this iteration.

Legend of annotations:

Icon Description
  • Item not started
  • Item finished
    🏃 Work in progress
    Blocked
    💪 Stretch

    Documentation

    • Finalize learner design and fix related documentation

    System

    • Support import/export ONNX format models
    • A network optimization API that helps model compression via SVD, quantization, etc.
    • 16bit support for training on Volta GPU (limited functionality)
    • C# high-level API design (no implementation)
    • Reader improvement for large data sets (sequential reader)

    Examples

    • Faster R-CNN object detection
      • Clean up the code to use arbitrary input image size
      • C++ implementation of some Python layers
      • Usability improvement
    • New example for natural language processing (NLP)
    • New tutorial on WGAN and LS-GAN
    • Semantic segmentation (stretch goal)

    Operations

    • Specify frequency in the number of epochs and minibatches for progress report, validation, checkpoints
    • Improve statistics for distributed evaluation

    Performance

    • Intel MKL update to improve inference speed on CPU by around 2x on AlexNet

    Others

    @kyoro1
    Copy link
    Contributor

    kyoro1 commented Sep 26, 2017

    @cha-zhang Can we assume that parallel learning for Faster R-CNN will be implemented in this sprint?
    I put my comments for Fast R-CNN in the issue. Indeed, I don't stick to this issue, and I'd like to know if "Faster R-CNN" can include MORE FASTER implementation in this sprint:)

    @arijit17
    Copy link

    Continue work on Deep Learning Explained course on edX.

    Does it mean an advanced course is coming up?

    @grzsz
    Copy link

    grzsz commented Sep 26, 2017

    Will new release be available for .netcore2.0?

    @cha-zhang
    Copy link
    Member Author

    cha-zhang commented Sep 26, 2017

    @arijit17 No, we are not working on an advanced course at this moment. It's there just to indicate some routine maintenance needed for the course.

    @cha-zhang
    Copy link
    Member Author

    @kyoro1 Yes, faster implementation is on the roadmap, but we first want to achieve full parity.

    @cha-zhang
    Copy link
    Member Author

    @grzsz We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

    @helloguo
    Copy link

    We are making some fixes for the C# low-level API as well during this iteration (didn't mention above).

    @cha-zhang This C# support is language binding? Or the APIs will be implemented in C#?

    @cha-zhang
    Copy link
    Member Author

    @helloguo The C# API is SWIG generated binding.

    @helloguo
    Copy link

    @cha-zhang Thank you for your clarification.

    The example Evaluation code shows the target framework is .NET Framework, which is Windows only. So can I assume these C# APIs are Windows only at this moment? If yes, are you planning to support Linux as well (e.g. using .NET Core since it supports Windows, Linux and macOS)?

    @liqunfu
    Copy link
    Contributor

    liqunfu commented Sep 26, 2017

    @helloguo people had raised this .NET Code issue #2346, #2352. We are investigating. Not sure if we can push into this release or not. However if we can, we will update this iteration plan.

    @Dozer3D
    Copy link

    Dozer3D commented Sep 26, 2017

    Regarding the Usability improvements to the Faster r-cnn, would this include a GPU enabled version of the proposal layer UDF? Otherwise I find the faster r-cnn example is already quite usable as it is. Since adding the 'STORE_EVAL_MODEL_WITH_NATIVE_UDF' option it now has everything you need to include it in a native c++ windows based product for example (i.e. without the need for python dependencies) . The only problem is that the evaluation is a very slow because we are stuck using the CPU.

    @main76
    Copy link

    main76 commented Sep 27, 2017

    A network optimization API that helps model compression via SVD, quantization, etc.

    Awesome! And, does there exist a way to get early access?

    @cha-zhang
    Copy link
    Member Author

    @master76 We have some prototype code but they are not written as CNTK API. So the answer to your question is no, you will have to wait till the end of the iteration. Thanks!

    @grzsz
    Copy link

    grzsz commented Sep 27, 2017

    @grzsz We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

    @cha-zhang
    As everything - it depends :) I can use C++/Python, but I suppose many people want/have to stick to .netcore2 and will choose a competition or home-made solution when CNTK was their first choice due to assumed platform support

    @JimSEOW
    Copy link

    JimSEOW commented Sep 27, 2017

    @cha-zhang
    Can you please elaborate "Continue work on Deep Learning Explained course on edX."

    If there a plan or milestone?
    edX's CNTK course is an important way to promote and explain the "comprehensive extensive coverage of Deep Learning Topics" by CNTK.

    It could be useful to use this thread to get feedback "WHAT GO INTO THE edX course"

    Use this thread or a dedicated one to discuss

    • what have gone in, for that,
    • what users think about that,
    • What are the new topics YET to be included.

    @rhy-ama
    Copy link

    rhy-ama commented Sep 27, 2017

    #2422

    what is the medium term planning in terms of NNs debugging facilities?

    Can we output few more metrics using existing TensorBoard facilities within the next release under "improve statistics for distributed evaluation"? A good start would be weights histogram.

    @cha-zhang
    Copy link
    Member Author

    @JimSEOW Sure let's create a dedicated thread for edX course.

    As I mentioned earlier, for this iteration, we are just doing maintenance. Maybe I'll remove it from the list.

    @clintjcampbell
    Copy link

    Does onnx mean that the model format will stabilize in the near future so models i have already trained will continue to work with future versions of cntk? At least for after onnx is implemented?

    @cha-zhang
    Copy link
    Member Author

    @clintjcampbell Yes when ONNX is implemented it will be stable. ONNX itself is still evolving, but in a few weeks it should stabilize and be backward compatible.

    @cha-zhang
    Copy link
    Member Author

    cha-zhang commented Sep 29, 2017

    @rhy-ama weight histogram is not part of "improve statistics for distributed evaluation". This item specifically refers to improving printed information about training statistics when in distributed eval.

    NN debugging facility is not in the current plan. The team is busy delivering a major milestone that sets a few things to relatively lower priority. If someone could contribute this, it would be great!

    @e-thereal
    Copy link

    On the note of Improve statistics: It was possible in BrainScript to specify multiple metrics that were all evaluated and reported during training, but it seems that you can only monitor the loss and one metric using the Python API. It would be great to add the old BrainScript feature of multiple metrics back to the Python API.

    @skynode
    Copy link

    skynode commented Oct 16, 2017

    We are making some fixes for the C# low-level API as well during this iteration (didn't mention above). .netcore2.0 compatibility is not a very high priority at this moment. How important is this?

    This is super important to us. We would like to be able to reuse and maintain C# across the dev spectrum especially for business continuity. Plus there are performance improvements on .NET Core 2.0 which we would like to take advantage of without further optimization of our codebase. Please consider making it high priority.

    Thank you for your time and efforts!

    @cha-zhang
    Copy link
    Member Author

    @skynode Please refer to #2352.

    @mhjabreel
    Copy link

    Hi @cha-zhang,

    I am willing to implement high level API for C#, actually I have started that and I have implemented the following layers:

    • Linear
    • Convolution: Conv1D, Conv2D and Conv3D
    • Pooling: Max(Pool1D, Pool2D and Pool3D) and Avg(Pool1D, Pool2D and Pool3D)

    You can find it in this link:
    https://github.com/mhjabreel/DeepSharp

    Regards,

    Mohammed

    @cha-zhang
    Copy link
    Member Author

    Hi, we have to postpone the release date for this iteration to Nov. 14. We added one week to wrap up a few features under implementation, and another week to fix some bugs reported in GitHub issues. Sorry for the delay!

    @IvanFarkas
    Copy link

    I highly recommend the Deep Learning Explained course on edX.
    Waiting patiently for the advanced course.

    .NET Core 2.0 support is very important.
    I hope CUDA 9 support and VS 2017 build is part of this iteration.

    @mstockfo
    Copy link

    mstockfo commented Nov 8, 2017

    Does C++ implementation of some Python layers for Faster R-CNN object detection include gpu enabled evaluation from c#?

    @ddurschlag
    Copy link

    These features sound awesome. Are we still looking at getting them sometime this week? Is there a list of open issues for the release that someone who knows C# well could contribute to?

    @cha-zhang
    Copy link
    Member Author

    The new ship date for v2.3 is Nov. 14, as updated in the message above.

    The C# high-level API design task is now blocked due to internal deadlines. We encourage the community to build high level API on top of the current low level one and share. You may use a similar design as CNTK's high level API, or feel free to mimic other high level APIs such as Keras/Gluon.

    Starting next iteration, we will be making some changes to the release procedure. We are working hard to enable nightly releases (ETA before end of this year). Official release will then be done as-needed. Please comment if you have comments/suggestions. Thanks!

    @bencherian
    Copy link
    Contributor

    Is 2.3 release still planned for today?

    @ebarsoumMS
    Copy link
    Contributor

    No, it got delayed 1 week. We are releasing it in Nov 22 due to some changes that we need to take.

    @whatever1983
    Copy link

    Well, that is a bummer. Might as well delay it all the way till you are ready to release cuda9,cudnn7,and stable fp16 training. It is pretty amazing that mxnet 0.12 beat both cntk and tensorflow on cuda9 fp16 support,but lacks keras 2.0 support.

    @ebarsoumMS
    Copy link
    Contributor

    Cuda9 and cuDNN7 will follow next.

    @Dozer3D
    Copy link

    Dozer3D commented Nov 14, 2017

    @ebarsoumMS , thank you for keeping us informed. The iteration plan included three improvements to the Faster RCNN example:

    1. Clean up the code to use arbitrary input image size
    2. C++ implementation of some Python layers
    3. Usability improvement

    Have these made it into the upcoming release?

    @ebarsoumMS
    Copy link
    Contributor

    Adding @spandantiwari to comment, arbitrary input image size is in and we fix most OPs to work with arbitrary size.

    @spandantiwari
    Copy link
    Contributor

    @Dozer3D - we have worked quite a bit to support free static axes (arbitrary input image size) in convolutional pipelines in this iteration. So convolution, pooling and other nodes that may be used in a typical pipeline support free static axes. We have also improved the performance for convolution with free static axes. But the FasterRCNN training using free static axes is not completely ready yet. We are still testing it out to match the numbers stated in the paper. Also, the C++ implementation of ProposalLayer.py is also under works. But these will most probably not make it into 2.3 release. Having said that, this model and making it work fast (especially inference) is still on our priorities.

    @ddurschlag
    Copy link

    @ebarsoumMS My understanding is that Cuda9 is required to eliminate .Net Framework dependencies and provide a Net Standard version of CNTK. Is that correct? If so, Is that likely to happen for 2.3 next week, or at some future point? If a future point, is there any estimate of when?

    Being able to use CNTK effectively in a container would be super useful, and my impression was this wasn't TOO far away...

    @Dozer3D
    Copy link

    Dozer3D commented Nov 15, 2017

    @spandantiwari thank you for that informative reply. We have created two datasets and trained faster RCNN networks with CNTK 2.2 to solve three problems for a client, but currently only one of these is usable without the GPU, and then only just. Having faster GPU and faster CPU inference would be much appreciated (I assume decreasing the input image size would also speed up the CPU processing)

    So nothing for us in 2.3? but a good chance something before say, end of January?

    Having said that, this model and making it work fast (especially inference) is still on our priorities.

    Thank you. As a traditional Windows programmer/solutions provider, who knows very little about machine learning , I find Faster RCNN to be a very practical tool for solving many real problems for our customers.

    @mathias-brandewinder
    Copy link

    @cha-zhang looking forward to the next release :)

    Given that you "encourage the community to build high level API on top of the current low level one and share", I figure I would mention that I started working with some F# community members on exploring what a high-level, script-friendly F# DSL on top of CNTK could look like.

    Got some of the C# samples converted to F# scripts already, very close to the original C# version here:

    https://github.com/mathias-brandewinder/CNTK.FSharp/tree/master/examples

    ... and currently trying out something loosely Keras inspired. Plenty of rough edges, not sure yet if the direction is right, but here is how the MNIST CNN sample looks like as of today, interesting part highlighted:

    https://github.com/mathias-brandewinder/CNTK.FSharp/blob/a0e9794697afacce65c95c66f5d899a9dd71cbf7/examples/MNIST-CNN.fsx#L89-L123

    @kodonnell
    Copy link

    @spandantiwari - we're also exploring FasterRCNN. If the improvements aren't going to be released in the next week or so, could you please create a document somewhere with a recommended approach? I'm new to CNTK, but with some direction I may be able to help (especially if there are some examples e.g. 'convert the python layers [files <...>] to C++ in the same way as was done for PR <...>' ... or 'see C++ layer <...> for an example').

    @cha-zhang
    Copy link
    Member Author

    cha-zhang commented Nov 22, 2017

    For those of you who are exploring FasterRCNN, we have a branch chazhang/faster_rcnn that updates the Faster RCNN with free static axis. The code is tangled with Fast RCNN, and Fast RCNN hasn't been verified, so we won't release it in this iteration. On the other hand, FasterRCNN is now functional with arbitrary input image size, tested on Pascal data set. We don't see much accuracy improvement with this, though.

    Most code was actually contributed by @spandantiwari. Thanks!

    @kodonnell
    Copy link

    Thanks @cha-zhang . Could you please provide feedback on the best way to implement some of the C++ layers as per here?

    As an aside, pip installs in code might want reconsidering before merging.

    @cha-zhang
    Copy link
    Member Author

    @kodonnell Are you asking about using C++ to implement the proposal layer instead of Python?

    @kodonnell
    Copy link

    @cha-zhang I'm referring to the original iteration plan:

    C++ implementation of some Python layers

    I don't even know what those layers are, hence why I'm asking for a starter = ) From other issues I've read, it sounds like implementing this will make evaluation of Faster RCNN a lot faster.

    @cha-zhang
    Copy link
    Member Author

    Yes, that's the proposal layer. The current custom proposal layer is in Python and can be written in C++ instead.

    You can refer to the binary convolution example for how to write a C++ custom layer:
    https://github.com/Microsoft/CNTK/tree/master/Examples/Extensibility/BinaryConvolution

    @Dozer3D
    Copy link

    Dozer3D commented Nov 22, 2017

    I am confused now :-(

    The current custom proposal layer is in Python and can be written in C++ instead.

    It is my understanding that evaluation using c++ only (no python) already works and was implemented in 2.2 using a UDF by @pkranen (see 2234 ).

    i.e. set __C.STORE_EVAL_MODEL_WITH_NATIVE_UDF = True

    This does seem to work, except it runs on the CPU only (very slow), and not GPU. If you set the device to a GPU it throws an exception because the GPU version of that layer hasn't been written.

    i..e in the file "cntk\Examples\Extensibility\ProposalLayer\ProposalLayerLib\ProposalLayerLib.h" we have the following code.

        if (computeDevice.Type() != DeviceKind::CPU)
               throw std::runtime_error("ProposalLayer: only CPU evaluation is supported at the moment.");
    

    @cha-zhang
    Copy link
    Member Author

    @Dozer3D I think I was referring to training. If eval only, then yes, we have a C++ version already.

    We are not satisfied with the training speed of Faster RCNN. More work is needed.

    @kodonnell
    Copy link

    @cha-zhang - might it pay to start a new issue (or update the docs somewhere) to have a single place referring to all the improvements intended for Faster RCNN (with some useful detail to encourage PRs), so it's a little clearer? There are quite a few threads (including the 'pollution' of this one) which I, for one, find hard to follow.

    @sigfrid696
    Copy link

    I am confused now :-(

    The current custom proposal layer is in Python and can be written in C++ instead.

    It is my understanding that evaluation using c++ only (no python) already works and was implemented in 2.2 using a UDF by @pkranen (see 2234 ).

    i.e. set __C.STORE_EVAL_MODEL_WITH_NATIVE_UDF = True

    This does seem to work, except it runs on the CPU only (very slow), and not GPU. If you set the device to a GPU it throws an exception because the GPU version of that layer hasn't been written.

    i..e in the file "cntk\Examples\Extensibility\ProposalLayer\ProposalLayerLib\ProposalLayerLib.h" we have the following code.

        if (computeDevice.Type() != DeviceKind::CPU)
               throw std::runtime_error("ProposalLayer: only CPU evaluation is supported at the moment.");
    

    Is there any support for GPU on Porposal Layer Lib c++ implementation ?
    I'm running CNTK 2.7 and it seems there still isn't any support for GPU.
    When Is it planned to release this kind of support ?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests