Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[PERFORMANCE] [master] Layer normalization code from Marian for CPU #19602

Merged
merged 13 commits into from
Jan 4, 2021

Conversation

kpuatamazon
Copy link
Contributor

Description

This is the master version of #19601. There isn't much different in LayerNormalization implementation between v1.x and master.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage; already covered by unittest for layer normalization.
  • Code is well-documented

Changes

  • Copy Marian optimized CPU LayerNorm implementation and adapt to MXNet.
  • Refactor dispatch of optimized versions using bool return value.
  • Remove MKL call

Comments

See #19601 for benchmarks.

Kenneth Heafield added 2 commits November 30, 2020 18:24
Experiment with OMP_NUM_THREADS=4, times in s, c5.12xlarge

|batchxchanne| New code | MKL      |
|    1x   32 | 0.0000288| 0.0000278|
|  128x   32 | 0.0000308| 0.0000311|
| 2560x   32 | 0.0000712| 0.0000672|
| 4096x   32 | 0.0000946| 0.0000910|
| 8192x   32 | 0.0001597| 0.0001523|
|16384x   32 | 0.0002905| 0.0002619|
|    1x   64 | 0.0000264| 0.0000256|
|  128x   64 | 0.0000339| 0.0000330|
| 2560x   64 | 0.0000829| 0.0000972|
| 4096x   64 | 0.0001137| 0.0001356|
| 8192x   64 | 0.0002027| 0.0002435|
|16384x   64 | 0.0003715| 0.0004639|
|    1x  128 | 0.0000262| 0.0000263|
|  128x  128 | 0.0000325| 0.0000389|
| 2560x  128 | 0.0001074| 0.0001580|
| 4096x  128 | 0.0001505| 0.0002336|
| 8192x  128 | 0.0002861| 0.0004481|
|16384x  128 | 0.0005648| 0.0008613|
|    1x  256 | 0.0000273| 0.0000276|
|  128x  256 | 0.0000390| 0.0000431|
| 2560x  256 | 0.0001533| 0.0002811|
| 4096x  256 | 0.0002258| 0.0004300|
| 8192x  256 | 0.0004300| 0.0008464|
|16384x  256 | 0.0010436| 0.0017613|
|    1x  512 | 0.0000256| 0.0000302|
|  128x  512 | 0.0000408| 0.0000551|
| 2560x  512 | 0.0002444| 0.0005225|
| 4096x  512 | 0.0003828| 0.0008147|
| 8192x  512 | 0.0008832| 0.0017192|
|16384x  512 | 0.0058463| 0.0074497|
|    1x  768 | 0.0000252| 0.0000308|
|  128x  768 | 0.0000450| 0.0000676|
| 2560x  768 | 0.0003440| 0.0007719|
| 4096x  768 | 0.0005890| 0.0013346|
| 8192x  768 | 0.0014946| 0.0026145|
|16384x  768 | 0.0089495| 0.0113557|
|    1x 1024 | 0.0000285| 0.0000308|
|  128x 1024 | 0.0000487| 0.0000786|
| 2560x 1024 | 0.0004614| 0.0010190|
| 4096x 1024 | 0.0008083| 0.0017376|
| 8192x 1024 | 0.0059020| 0.0075588|
|16384x 1024 | 0.0116553| 0.0146855|

Benchmark program
```python
import mxnet as mx
import time

def time_procedure(shape, count):
  data = mx.nd.random_uniform(shape=shape, low=-1.0, high = 1.0)
  factors = mx.nd.random_uniform(shape=(shape[-1],))
  mx.nd.waitall()
  begin = time.time()
  for i in range(0, count):
    out = mx.nd.LayerNorm(data, factors, factors)
    mx.nd.waitall()
  return (time.time() - begin) / count

count = 200

for channel in [32, 64, 128, 256, 512, 768, 1024]:
  for batch in [1, 128, 2560, 4096, 8192, 16384]:
    s = (batch, channel)
    timing = time_procedure(s, count)
    print("{:5d}x{:5d} | {:.7f}".format(s[0], s[1], timing))
```
@mxnet-bot
Copy link

Hey @kpuatamazon , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, clang, windows-gpu, sanity, centos-gpu, miscellaneous, unix-cpu, windows-cpu, website, unix-gpu, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Nov 30, 2020
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 30, 2020
@kpuatamazon
Copy link
Contributor Author

@mxnet-bot run ci [all]

Sigh everything is broken on some python HTTP thing.

[2020-11-30T20:24:22.242Z] Step 11/14 : RUN /work/docker_filepermissions.sh
[2020-11-30T20:25:18.379Z] [8924] Failed to execute script docker-compose
[2020-11-30T20:25:18.379Z] Traceback (most recent call last):
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 554, in _get_chunk_left
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 521, in _read_next_chunk_size
[2020-11-30T20:25:18.379Z] ValueError: invalid literal for int() with base 16: b''
[2020-11-30T20:25:18.379Z] 
[2020-11-30T20:25:18.379Z] During handling of the above exception, another exception occurred:
[2020-11-30T20:25:18.379Z] 
[2020-11-30T20:25:18.379Z] Traceback (most recent call last):
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 586, in _readinto_chunked
[2020-11-30T20:25:18.379Z]   File "http/client.py", line 556, in _get_chunk_left
[2020-11-30T20:25:18.379Z] http.client.IncompleteRead: IncompleteRead(0 bytes read)
[2020-11-30T20:25:18.379Z] 
[2020-11-30T20:25:18.380Z] During handling of the above exception, another exception occurred:
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] Traceback (most recent call last):
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 425, in _error_catcher
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 507, in read
[2020-11-30T20:25:18.380Z]   File "http/client.py", line 457, in read
[2020-11-30T20:25:18.380Z]   File "http/client.py", line 491, in readinto
[2020-11-30T20:25:18.380Z]   File "http/client.py", line 602, in _readinto_chunked
[2020-11-30T20:25:18.380Z] http.client.IncompleteRead: IncompleteRead(0 bytes read)
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] During handling of the above exception, another exception occurred:
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] Traceback (most recent call last):
[2020-11-30T20:25:18.380Z]   File "bin/docker-compose", line 6, in <module>
[2020-11-30T20:25:18.380Z]   File "compose/cli/main.py", line 72, in main
[2020-11-30T20:25:18.380Z]   File "compose/cli/main.py", line 128, in perform_command
[2020-11-30T20:25:18.380Z]   File "compose/cli/main.py", line 303, in build
[2020-11-30T20:25:18.380Z]   File "compose/project.py", line 403, in build
[2020-11-30T20:25:18.380Z]   File "compose/project.py", line 385, in build_service
[2020-11-30T20:25:18.380Z]   File "compose/service.py", line 1110, in build
[2020-11-30T20:25:18.380Z]   File "compose/progress_stream.py", line 25, in stream_output
[2020-11-30T20:25:18.380Z]   File "compose/utils.py", line 61, in split_buffer
[2020-11-30T20:25:18.380Z]   File "compose/utils.py", line 37, in stream_as_text
[2020-11-30T20:25:18.380Z]   File "site-packages/docker/api/client.py", line 345, in _stream_helper
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 529, in read
[2020-11-30T20:25:18.380Z]   File "contextlib.py", line 130, in __exit__
[2020-11-30T20:25:18.380Z]   File "site-packages/urllib3/response.py", line 443, in _error_catcher
[2020-11-30T20:25:18.380Z] urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:12,579 - root - WARNING - Exception: Command '['docker-compose', '-f', 'docker/docker-compose.yml', 'build', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', 'ubuntu_cpu']' returned non-zero exit status 255., Retrying in 1 seconds...
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:13,581 - root - INFO - Running command: 'docker-compose -f docker/docker-compose.yml build --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 ubuntu_cpu'
[2020-11-30T20:25:18.380Z] Couldn't connect to Docker daemon at http+docker://localhost - is it running?
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:14,126 - root - WARNING - Exception: Command '['docker-compose', '-f', 'docker/docker-compose.yml', 'build', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', 'ubuntu_cpu']' returned non-zero exit status 1., Retrying in 2 seconds...
[2020-11-30T20:25:18.380Z] 2020-11-30 20:25:16,129 - root - INFO - Running command: 'docker-compose -f docker/docker-compose.yml build --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 ubuntu_cpu'
[2020-11-30T20:25:18.380Z] Couldn't connect to Docker daemon at http+docker://localhost - is it running?
[2020-11-30T20:25:18.380Z] 
[2020-11-30T20:25:18.380Z] If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.
[2020-11-30T20:25:18.634Z] Traceback (most recent call last):
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 373, in <module>
[2020-11-30T20:25:18.634Z]     sys.exit(main())
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 309, in main
[2020-11-30T20:25:18.634Z]     no_cache=args.no_cache, cache_intermediate=args.cache_intermediate)
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 81, in build_docker
[2020-11-30T20:25:18.634Z]     run_cmd(env=env)
[2020-11-30T20:25:18.634Z]   File "/home/jenkins_slave/workspace/build-cpu-clang100/ci/util.py", line 84, in f_retry
[2020-11-30T20:25:18.634Z]     return f(*args, **kwargs)
[2020-11-30T20:25:18.634Z]   File "ci/build.py", line 79, in run_cmd
[2020-11-30T20:25:18.634Z]     check_call(cmd, env=env)
[2020-11-30T20:25:18.634Z]   File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
[2020-11-30T20:25:18.634Z]     raise CalledProcessError(retcode, cmd)
[2020-11-30T20:25:18.634Z] subprocess.CalledProcessError: Command '['docker-compose', '-f', 'docker/docker-compose.yml', 'build', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', 'ubuntu_cpu']' returned non-zero exit status 1.
script returned exit code 1

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [edge, sanity, windows-gpu, unix-gpu, clang, centos-gpu, unix-cpu, miscellaneous, website, centos-cpu, windows-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 30, 2020
@kpuatamazon
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, website, windows-cpu, windows-gpu]

Playing more CI docker daemon lottery.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [website, windows-gpu, unix-cpu, windows-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 4, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 4, 2020
@lanking520 lanking520 added the pr-work-in-progress PR is still work in progress label Dec 21, 2020
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 21, 2020
@kpuatamazon
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

Memory gambling is annoying.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 21, 2020
@kpuatamazon
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

Still just running out of RAM compiling numpy kernels. https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-19602/19/pipeline/

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 28, 2020
std::conditional<std::is_same<mshadow::half::half_t, Data>::value,
float,
Data>::type>
void LayerNormCPUKernel(size_t width,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to change the name to LayerNormContiguousCPUKernel or LayerNormLastAxisCPUKernel

@sxjscience
Copy link
Member

One naming issue. Looks good to me.

Copy link
Member

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue (can be addressed later actually).

@fhieber
Copy link
Contributor

fhieber commented Jan 4, 2021

What are the next steps for this PR? Is this ready to be merged?

@sxjscience sxjscience merged commit ae8c974 into apache:master Jan 4, 2021
@sxjscience
Copy link
Member

@fhieber I've just merged. Feel free to try it out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants