Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add Large Dim Checks for linalg Operators #18816

Merged
merged 12 commits into from
Jul 31, 2020

Conversation

Zha0q1
Copy link
Contributor

@Zha0q1 Zha0q1 commented Jul 29, 2020

Add Large Dim Checks for linalg Operators. Although external blas libraries support large tensors (>2^32 sized), large dimensions (>= 2^31) will trigger an openblas int overflow error under current configuration. This PR adds checks to exit those use cases properly.

Done:

  1. gemm and gemm2
  2. trmm and trms
  3. syrk
  4. gelqf

TODO:
All done. The rest of operators take square matrices as inputs, which cannot possibly have large dimensions (>= 2^31) because of memory constrains. For example a 2^32 * 2*31 float matrix will take up 2^24 TB of memory!

ubuntu@ip-172-31-6-47:~/mx/incubator-mxnet/build$  nosetests --logging-level=DEBUG --verbose -s ../tests/nightly/test_large_array.py:test_linalg_large_dim
test_large_array.test_linalg_large_dim ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @Zha0q1 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, centos-cpu, windows-cpu, unix-cpu, sanity, centos-gpu, website, edge, clang, windows-gpu, miscellaneous]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

src/operator/tensor/la_op.h Outdated Show resolved Hide resolved
@Zha0q1 Zha0q1 changed the title [WIP] Add Large Dim Checks for linalg Operators Add Large Dim Checks for linalg Operators Jul 30, 2020
@@ -1350,6 +1351,50 @@ def run_trsm(inp):
check_batch_trsm()


def test_linalg_large_dim():
def check_gemm():
A = nd.ones(shape=(1, INT32_MAX + 1, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go to test_large_vector since the input contains 1 dimension which has large while rest dimensions are small. @access2rohit plz confirm

Copy link
Contributor

@ChaiBapchya ChaiBapchya Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of the file can be made better.
Basically the idea was to have 2 separate files

  1. test_large_array.py [more like test_large_size.py]
    testing input whose individual dimensions are less than 2**32 but size of the input is > 2**32

  2. test_large_vector.py [more like test_large_shape.py]
    testing input whose atleast 1 individual dimensions is > 2**32

Copy link
Contributor Author

@Zha0q1 Zha0q1 Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make more explicit comments on what those test do? I can do so in my next commit. I still think the two cases both fall into the same category which is testing with tensors of large dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In comments I would say like 1. those tests are for overflowing total size 2. those other tests are for overflowing index calculation i.e. row dim, col dim, etc

Copy link
Contributor

@ChaiBapchya ChaiBapchya Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From consistency standpoint, I'd

  1. put these tests in test_large_vector.py file.
  2. Rename that file to [whatever sounds right I just gave a suggestion above]
  3. add a comment in that file.

All dimensions in test_large_array.py are less than INT32_MAX
Large dimension [>2**32] was introduced in test_large_vector.py for the same reason.

So to keep the testing approach consistent I'd do that.

Even if both files play with "Large tensors" one does it for large "size" other specifically for large "shape".

Copy link
Contributor Author

@Zha0q1 Zha0q1 Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well one thing to note is that they are not vectors per se. The inputs are all 3D whereas in test_large_vector they are all 1D. The dim checks happen on both row and col dim so you can see I used both (1, 1, x) and (1, x, 1).

I will add more comments in next commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I know they aren't "vectors" and hence recommended "renaming the file"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@access2rohit access2rohit Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zha0q1 the vector file generally houses tests for operators with a single dimension that exceeds 2^32 range. Please address what is suggested by @ChaiBapchya and move it to vector tests. Feel free to rename the file to test_large_dimensions.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would make sense. I have moved the tests to test_large_vector.py, which I kept the original name to avoid naming discrepancy with master. Also comments were added to the tests

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality-wise looks good to me. Had other thoughts about "where" this test should be placed. Feel free to disagree & merge.
Looks good other than that. Thanks!

@Zha0q1
Copy link
Contributor Author

Zha0q1 commented Jul 30, 2020

Functionality-wise looks good to me. Had other thoughts about "where" this test should be placed. Feel free to disagree & merge.
Looks good other than that. Thanks!

Thanks!

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit f4e62df into apache:v1.x Jul 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants