MPS Accelerator #13123

justusschock · 2022-05-21T12:40:52Z

What does this PR do?

Adds a prototype of M1 GPU support via PyTorch's MPS backend (part of #13102 , renaming existing accelerators will be done separately)

To give it a try, install PL from this branch and change the accelerator to 'mps'

Remaining ToDos:

Does your PR introduce any breaking changes? If yes, please list them.

Nope

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @akihironitta @rohitgr7 @justusschock

for more information, see https://pre-commit.ci

tests/accelerators/test_accelerator_connector.py

docs/source/accelerators/mps_basic.rst

for more information, see https://pre-commit.ci

johnnynunez · 2022-06-11T14:18:34Z

I'm testing this! And works fine! It is cool to use. The problem now is the pytorch backend that most of features hadn't been implemented.
pytorch/pytorch#77764

justusschock · 2022-06-13T07:30:12Z

@johnnynunez Thanks for testing it. You can still use the cpu fallback with PYTORCH_ENABLE_MPS_FALLBACK=1 python your_script.py . That will move the computation for unsupported ops to cpu, but luckily due to the shared memory concept it should be very cheap moving the actual tensors

johnnynunez · 2022-06-13T08:23:44Z

@johnnynunez Thanks for testing it. You can still use the cpu fallback with PYTORCH_ENABLE_MPS_FALLBACK=1 python your_script.py . That will move the computation for unsupported ops to cpu, but luckily due to the shared memory concept it should be very cheap moving the actual tensors

thank you! I will test it. I have M1 Max with 32gb

still is in beta:
/Users/johnny/Projects/DS_TFM/src/utils/models/metrics.py:85: UserWarning: The operator 'aten::index.Tensor' is not currently supported on the MPS backend and will fall back to
run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
output, target = output[:, :, ..., dimensions_idces], target[:, :, ..., dimensions_idces]
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<64x1x1x1x2xf32>' and 'tensor<64x1x1x1x2xi1>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

Borda · 2022-06-22T15:28:10Z

@justusschock mind checking the collision? (in git there are much smaller, don't be scared by the GH message) I think we are ready to merge it... :)

justusschock · 2022-06-22T15:32:13Z

let me see if I can get to it tomorrow :)

for more information, see https://pre-commit.ci

Co-authored-by: RobertLaurella <[email protected]>

rasbt · 2022-06-24T12:32:13Z

Just saw this finally got merged. Wohoo! Great work @justusschock & everyone! 🎉

saikatkumardey · 2022-08-11T11:24:28Z

@johnnynunez were you able to fix the issue? I have a similar error while trying to use AveragePrecision() on Mac M1 Pro.

By debugging it further I see that this line generates the following error.

loc("mps_subtract"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<128xi64>' and 'tensor<128xf32>' are not broadcast compatible

awaelchli · 2022-08-11T12:18:11Z

@saikatkumardey Thanks, since you are mentioning AveragePrecision, I suggest to open an issue directly on https://github.com/Lightning-AI/metrics. Please note, not all functions in PyTorch are implemented for MPS.

justusschock added 3 commits May 21, 2022 14:37

update accelerator and device parsing

b1aea3d

update runif checks

17523f3

update runif usages

1ed3895

justusschock added feature Is an improvement or enhancement accelerator labels May 21, 2022

justusschock self-assigned this May 21, 2022

justusschock and others added 13 commits May 21, 2022 14:42

more occurences of min_gpus

020e036

[pre-commit.ci] auto fixes from pre-commit.com hooks

13da4d0

for more information, see https://pre-commit.ci

fix mps accelerator

aaa5967

fix imports and device parser

06ff778

trainer integration

7a2dd98

docs

f27a7cf

fix runif

4460caf

update mps tests

4787724

update accelerator connector to reflect mps changes

46eaa7c

[pre-commit.ci] auto fixes from pre-commit.com hooks

1fe3a94

for more information, see https://pre-commit.ci

add import

2a08c61

fix accelerator registry test

1458acc

fix some gpu tests

b34c447

Borda reviewed May 22, 2022

View reviewed changes

tests/accelerators/test_accelerator_connector.py Outdated Show resolved Hide resolved

justusschock added this to the 1.7 milestone May 23, 2022

justusschock mentioned this pull request May 23, 2022

Rename min_gpus to min_cuda_gpus #13133

Merged

12 tasks

Borda self-requested a review May 23, 2022 13:47

kaushikb11 reviewed May 23, 2022

View reviewed changes

docs/source/accelerators/mps_basic.rst Outdated Show resolved Hide resolved

justusschock and others added 6 commits May 23, 2022 16:45

fix gpu intput normalization

3f17012

adjust runif for mps to also allow requiring that it is not available

b89efac

separate auto choice tests for mps and gpu

1fa485e

pep8

934b46a

remove unnecessary block in docs

70003d6

[pre-commit.ci] auto fixes from pre-commit.com hooks

b5ba668

for more information, see https://pre-commit.ci

fix wrong indent

7af2f87

mergify bot added has conflicts and removed ready PRs ready to be merged labels Jun 14, 2022

resolve merge conflicts

858708d

mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 23, 2022

pre-commit-ci bot and others added 5 commits June 23, 2022 09:33

[pre-commit.ci] auto fixes from pre-commit.com hooks

25d148b

for more information, see https://pre-commit.ci

fix imports

d267d9c

fix test imports

7e9c9f4

fix imports

e072d92

fix pep8

5b5a58e

justusschock force-pushed the mps_accelerator branch from 7eba6a4 to 5b5a58e Compare June 23, 2022 12:07

justusschock and others added 2 commits June 24, 2022 10:04

Apply suggestions from code review

c783a6a

Co-authored-by: RobertLaurella <[email protected]>

Update docs/source-pytorch/accelerators/mps_basic.rst

92e00b2

Co-authored-by: RobertLaurella <[email protected]>

justusschock enabled auto-merge (squash) June 24, 2022 10:15

lexierule disabled auto-merge June 24, 2022 12:15

lexierule merged commit f54abc5 into master Jun 24, 2022

lexierule deleted the mps_accelerator branch June 24, 2022 12:15

awaelchli mentioned this pull request Sep 18, 2022

Move accelerator-specific parsing functions with their accelerators #14753

Merged

11 tasks

otaj mentioned this pull request Oct 7, 2022

Use torch.testing.assert_close everywhere #15031

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS Accelerator #13123

MPS Accelerator #13123

justusschock commented May 21, 2022 •

edited

Loading

johnnynunez commented Jun 11, 2022

justusschock commented Jun 13, 2022

johnnynunez commented Jun 13, 2022 •

edited

Loading

Borda commented Jun 22, 2022

justusschock commented Jun 22, 2022

rasbt commented Jun 24, 2022

saikatkumardey commented Aug 11, 2022

awaelchli commented Aug 11, 2022 •

edited

Loading

MPS Accelerator #13123

MPS Accelerator #13123

Conversation

justusschock commented May 21, 2022 • edited Loading

What does this PR do?

Remaining ToDos:

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

johnnynunez commented Jun 11, 2022

justusschock commented Jun 13, 2022

johnnynunez commented Jun 13, 2022 • edited Loading

Borda commented Jun 22, 2022

justusschock commented Jun 22, 2022

rasbt commented Jun 24, 2022

saikatkumardey commented Aug 11, 2022

awaelchli commented Aug 11, 2022 • edited Loading

justusschock commented May 21, 2022 •

edited

Loading

johnnynunez commented Jun 13, 2022 •

edited

Loading

awaelchli commented Aug 11, 2022 •

edited

Loading