[PERFORMANCE] [master] Layer normalization code from Marian for CPU #19602

kpuatamazon · 2020-11-30T20:11:54Z

Description

This is the master version of #19601. There isn't much different in LayerNormalization implementation between v1.x and master.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage; already covered by unittest for layer normalization.
Code is well-documented

Changes

Copy Marian optimized CPU LayerNorm implementation and adapt to MXNet.
Refactor dispatch of optimized versions using bool return value.
Remove MKL call

Comments

See #19601 for benchmarks.

Experiment with OMP_NUM_THREADS=4, times in s, c5.12xlarge |batchxchanne| New code | MKL | | 1x 32 | 0.0000288| 0.0000278| | 128x 32 | 0.0000308| 0.0000311| | 2560x 32 | 0.0000712| 0.0000672| | 4096x 32 | 0.0000946| 0.0000910| | 8192x 32 | 0.0001597| 0.0001523| |16384x 32 | 0.0002905| 0.0002619| | 1x 64 | 0.0000264| 0.0000256| | 128x 64 | 0.0000339| 0.0000330| | 2560x 64 | 0.0000829| 0.0000972| | 4096x 64 | 0.0001137| 0.0001356| | 8192x 64 | 0.0002027| 0.0002435| |16384x 64 | 0.0003715| 0.0004639| | 1x 128 | 0.0000262| 0.0000263| | 128x 128 | 0.0000325| 0.0000389| | 2560x 128 | 0.0001074| 0.0001580| | 4096x 128 | 0.0001505| 0.0002336| | 8192x 128 | 0.0002861| 0.0004481| |16384x 128 | 0.0005648| 0.0008613| | 1x 256 | 0.0000273| 0.0000276| | 128x 256 | 0.0000390| 0.0000431| | 2560x 256 | 0.0001533| 0.0002811| | 4096x 256 | 0.0002258| 0.0004300| | 8192x 256 | 0.0004300| 0.0008464| |16384x 256 | 0.0010436| 0.0017613| | 1x 512 | 0.0000256| 0.0000302| | 128x 512 | 0.0000408| 0.0000551| | 2560x 512 | 0.0002444| 0.0005225| | 4096x 512 | 0.0003828| 0.0008147| | 8192x 512 | 0.0008832| 0.0017192| |16384x 512 | 0.0058463| 0.0074497| | 1x 768 | 0.0000252| 0.0000308| | 128x 768 | 0.0000450| 0.0000676| | 2560x 768 | 0.0003440| 0.0007719| | 4096x 768 | 0.0005890| 0.0013346| | 8192x 768 | 0.0014946| 0.0026145| |16384x 768 | 0.0089495| 0.0113557| | 1x 1024 | 0.0000285| 0.0000308| | 128x 1024 | 0.0000487| 0.0000786| | 2560x 1024 | 0.0004614| 0.0010190| | 4096x 1024 | 0.0008083| 0.0017376| | 8192x 1024 | 0.0059020| 0.0075588| |16384x 1024 | 0.0116553| 0.0146855| Benchmark program ```python import mxnet as mx import time def time_procedure(shape, count): data = mx.nd.random_uniform(shape=shape, low=-1.0, high = 1.0) factors = mx.nd.random_uniform(shape=(shape[-1],)) mx.nd.waitall() begin = time.time() for i in range(0, count): out = mx.nd.LayerNorm(data, factors, factors) mx.nd.waitall() return (time.time() - begin) / count count = 200 for channel in [32, 64, 128, 256, 512, 768, 1024]: for batch in [1, 128, 2560, 4096, 8192, 16384]: s = (batch, channel) timing = time_procedure(s, count) print("{:5d}x{:5d} | {:.7f}".format(s[0], s[1], timing)) ```

mxnet-bot · 2020-11-30T20:11:56Z

Hey @kpuatamazon , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, clang, windows-gpu, sanity, centos-gpu, miscellaneous, unix-cpu, windows-cpu, website, unix-gpu, edge]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

kpuatamazon · 2020-11-30T21:42:10Z

@mxnet-bot run ci [all]

Sigh everything is broken on some python HTTP thing.

[2020-11-30T20:24:22.242Z] Step 11/14 : RUN /work/docker_filepermissions.sh
[2020-11-30T20:25:18.379Z] [8924] Failed to execute script docker-compose
[2020-11-30T20:25:18.379Z] Traceback (most recent call last):
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 554, in _get_chunk_left
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 521, in _read_next_chunk_size
[2020-11-30T20:25:18.379Z] ValueError: invalid literal for int() with base 16: b''
[2020-11-30T20:25:18.379Z] 
[2020-11-30T20:25:18.379Z] During handling of the above exception, another exception occurred:
[2020-11-30T20:25:18.379Z] 
[2020-11-30T20:25:18.379Z] Traceback (most recent call last):
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 586, in _readinto_chunked
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 556, in _get_chunk_left
[2020-11-30T20:25:18.379Z] http.client.IncompleteRead: IncompleteRead(0 bytes read)
[2020-11-30T20:25:18.379Z] 
[2020-11-30T20:25:18.380Z] During handling of the above exception, another exception occurred:
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] Traceback (most recent call last):
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 425, in _error_catcher
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 507, in read
[2020-11-30T20:25:18.380Z]   File "http/client.py", line 457, in read
[2020-11-30T20:25:18.380Z]   File "http/client.py", line 491, in readinto
[2020-11-30T20:25:18.380Z]   File "http/client.py", line 602, in _readinto_chunked
[2020-11-30T20:25:18.380Z] http.client.IncompleteRead: IncompleteRead(0 bytes read)
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] During handling of the above exception, another exception occurred:
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] Traceback (most recent call last):
[2020-11-30T20:25:18.380Z]   File "bin/docker-compose", line 6, in <module>
[2020-11-30T20:25:18.380Z]   File "compose/cli/main.py", line 72, in main
[2020-11-30T20:25:18.380Z]   File "compose/cli/main.py", line 128, in perform_command
[2020-11-30T20:25:18.380Z]   File "compose/cli/main.py", line 303, in build
[2020-11-30T20:25:18.380Z]   File "compose/project.py", line 403, in build
[2020-11-30T20:25:18.380Z]   File "compose/project.py", line 385, in build_service
[2020-11-30T20:25:18.380Z]   File "compose/service.py", line 1110, in build
[2020-11-30T20:25:18.380Z]   File "compose/progress_stream.py", line 25, in stream_output
[2020-11-30T20:25:18.380Z]   File "compose/utils.py", line 61, in split_buffer
[2020-11-30T20:25:18.380Z]   File "compose/utils.py", line 37, in stream_as_text
[2020-11-30T20:25:18.380Z]   File "site-packages/docker/api/client.py", line 345, in _stream_helper
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 529, in read
[2020-11-30T20:25:18.380Z]   File "contextlib.py", line 130, in __exit__
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 443, in _error_catcher
[2020-11-30T20:25:18.380Z] urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:12,579 - root - WARNING - Exception: Command '['docker-compose', '-f', 'docker/docker-compose.yml', 'build', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', 'ubuntu_cpu']' returned non-zero exit status 255., Retrying in 1 seconds...
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:13,581 - root - INFO - Running command: 'docker-compose -f docker/docker-compose.yml build --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 ubuntu_cpu'
[2020-11-30T20:25:18.380Z] Couldn't connect to Docker daemon at http+docker://localhost - is it running?
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:14,126 - root - WARNING - Exception: Command '['docker-compose', '-f', 'docker/docker-compose.yml', 'build', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', 'ubuntu_cpu']' returned non-zero exit status 1., Retrying in 2 seconds...
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:16,129 - root - INFO - Running command: 'docker-compose -f docker/docker-compose.yml build --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 ubuntu_cpu'
[2020-11-30T20:25:18.380Z] Couldn't connect to Docker daemon at http+docker://localhost - is it running?
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
[2020-11-30T20:25:18.634Z] Traceback (most recent call last):
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 373, in <module>
[2020-11-30T20:25:18.634Z]     sys.exit(main())
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 309, in main
[2020-11-30T20:25:18.634Z]     no_cache=args.no_cache, cache_intermediate=args.cache_intermediate)
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 81, in build_docker
[2020-11-30T20:25:18.634Z]     run_cmd(env=env)
[2020-11-30T20:25:18.634Z]   File "/home/jenkins_slave/workspace/build-cpu-clang100/ci/util.py", line 84, in f_retry
[2020-11-30T20:25:18.634Z]     return f(*args, **kwargs)
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 79, in run_cmd
[2020-11-30T20:25:18.634Z]     check_call(cmd, env=env)
[2020-11-30T20:25:18.634Z]   File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
[2020-11-30T20:25:18.634Z]     raise CalledProcessError(retcode, cmd)
[2020-11-30T20:25:18.634Z] subprocess.CalledProcessError: Command '['docker-compose', '-f', 'docker/docker-compose.yml', 'build', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', 'ubuntu_cpu']' returned non-zero exit status 1.
script returned exit code 1

mxnet-bot · 2020-11-30T21:42:20Z

Jenkins CI successfully triggered : [edge, sanity, windows-gpu, unix-gpu, clang, centos-gpu, unix-cpu, miscellaneous, website, centos-cpu, windows-cpu]

kpuatamazon · 2020-12-04T11:16:14Z

@mxnet-bot run ci [unix-cpu, website, windows-cpu, windows-gpu]

Playing more CI docker daemon lottery.

mxnet-bot · 2020-12-04T11:16:20Z

Jenkins CI successfully triggered : [website, windows-gpu, unix-cpu, windows-cpu]

kpuatamazon · 2020-12-21T21:34:09Z

@mxnet-bot run ci [unix-cpu]

Memory gambling is annoying.

mxnet-bot · 2020-12-21T21:34:16Z

Jenkins CI successfully triggered : [unix-cpu]

kpuatamazon · 2020-12-28T11:20:22Z

@mxnet-bot run ci [unix-cpu]

Still just running out of RAM compiling numpy kernels. https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-19602/19/pipeline/

mxnet-bot · 2020-12-28T11:20:29Z

Jenkins CI successfully triggered : [unix-cpu]

sxjscience · 2020-12-30T07:41:20Z

src/operator/nn/layer_norm.cc

+            std::conditional<std::is_same<mshadow::half::half_t, Data>::value,
+                             float,
+                             Data>::type>
+void LayerNormCPUKernel(size_t width,


I would recommend to change the name to LayerNormContiguousCPUKernel or LayerNormLastAxisCPUKernel

sxjscience · 2020-12-30T07:43:00Z

One naming issue. Looks good to me.

sxjscience

Minor issue (can be addressed later actually).

fhieber · 2021-01-04T08:55:32Z

What are the next steps for this PR? Is this ready to be merged?

sxjscience · 2021-01-04T08:57:33Z

@fhieber I've just merged. Feel free to try it out.

Kenneth Heafield added 2 commits November 30, 2020 18:24

Layer normalization code from Marian

e661e86

lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Nov 30, 2020

kpuatamazon mentioned this pull request Nov 30, 2020

[v1.x] six/astroid dependency conflict broke lint #19603

Closed

lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 30, 2020

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 30, 2020

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 4, 2020

Enable pragma omp simd on MSVC

6d312be

kpuatamazon requested review from leezu and szha as code owners December 4, 2020 14:11

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 4, 2020

lanking520 added the pr-work-in-progress PR is still work in progress label Dec 21, 2020

Add -DUSE_MKL_LAYERNORM=ON to ubuntu MKL CPU test

54f25b5

kpuatamazon requested review from aaronmarkham and marcoabreu as code owners December 21, 2020 18:35

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 21, 2020

lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 28, 2020

sxjscience reviewed Dec 30, 2020

View reviewed changes

sxjscience approved these changes Dec 30, 2020

View reviewed changes

sxjscience merged commit ae8c974 into apache:master Jan 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERFORMANCE] [master] Layer normalization code from Marian for CPU #19602

[PERFORMANCE] [master] Layer normalization code from Marian for CPU #19602

kpuatamazon commented Nov 30, 2020

mxnet-bot commented Nov 30, 2020

kpuatamazon commented Nov 30, 2020

mxnet-bot commented Nov 30, 2020

kpuatamazon commented Dec 4, 2020

mxnet-bot commented Dec 4, 2020

kpuatamazon commented Dec 21, 2020

mxnet-bot commented Dec 21, 2020

kpuatamazon commented Dec 28, 2020

mxnet-bot commented Dec 28, 2020

sxjscience Dec 30, 2020

sxjscience commented Dec 30, 2020

sxjscience left a comment

fhieber commented Jan 4, 2021

sxjscience commented Jan 4, 2021

[PERFORMANCE] [master] Layer normalization code from Marian for CPU #19602

[PERFORMANCE] [master] Layer normalization code from Marian for CPU #19602

Conversation

kpuatamazon commented Nov 30, 2020

Description

Checklist

Essentials

Changes

Comments

mxnet-bot commented Nov 30, 2020

kpuatamazon commented Nov 30, 2020

mxnet-bot commented Nov 30, 2020

kpuatamazon commented Dec 4, 2020

mxnet-bot commented Dec 4, 2020

kpuatamazon commented Dec 21, 2020

mxnet-bot commented Dec 21, 2020

kpuatamazon commented Dec 28, 2020

mxnet-bot commented Dec 28, 2020

sxjscience Dec 30, 2020

Choose a reason for hiding this comment

sxjscience commented Dec 30, 2020

sxjscience left a comment

Choose a reason for hiding this comment

fhieber commented Jan 4, 2021

sxjscience commented Jan 4, 2021