-
-
Notifications
You must be signed in to change notification settings - Fork 8
Rebuild for pytorch21 + bump to 0.7.1 #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
fe5057e
3d4b9b9
c143adb
4be341a
8df3911
7a1aa99
aa22267
dca136c
ddb453e
089dc2b
fe77568
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,15 @@ | ||
| {% set version = "0.6.1" %} | ||
| {% set version = "0.7.1" %} | ||
|
|
||
| package: | ||
| name: torchdata | ||
| version: {{ version }} | ||
|
|
||
| source: | ||
| url: https://github.com/pytorch/data/archive/refs/tags/v{{ version }}.tar.gz | ||
| sha256: c596db251c5e6550db3f00e4308ee7112585cca4d6a1c82a433478fd86693257 | ||
| sha256: 1b6589336776ccba19fd3bf435588416105d372f6b85d58a9f2b008286f483bf | ||
|
|
||
| build: | ||
| number: 2 | ||
| number: 0 | ||
| # no pytorch on windows in conda-forge, see | ||
| # https://github.com/conda-forge/pytorch-cpu-feedstock/issues/32 | ||
| skip: true # [win] | ||
|
|
@@ -48,6 +48,7 @@ test: | |
| - pytest | ||
| - adlfs | ||
| - awscli | ||
| - cryptography >=3.3.2,<40.0.2 | ||
| - datasets | ||
| - expecttest | ||
| - fsspec | ||
|
|
@@ -64,11 +65,15 @@ test: | |
| {% set tests_to_skip = tests_to_skip + " or test_fsspec_memory_list" %} | ||
| {% set tests_to_skip = tests_to_skip + " or test_elastic_training_dl1_backend_gloo" %} | ||
| {% set tests_to_skip = tests_to_skip + " or test_elastic_training_dl2_backend_gloo" %} | ||
| # fails because fsspec is not available (AWS S3 stuff) | ||
| {% set tests_to_skip = tests_to_skip + " or test_fsspec_io_iterdatapipe" %} | ||
| {% set tests_to_skip = tests_to_skip + " or test_s3_io_iterdatapipe" %} | ||
| # tend to fail due to Google Drive rate-limiting | ||
| {% set tests_to_skip = tests_to_skip + " or test_gdrive_iterdatapipe" %} | ||
| {% set tests_to_skip = tests_to_skip + " or test_online_iterdatapipe" %} | ||
| # unclear this fails only on py<=39 | ||
| {% set tests_to_skip = tests_to_skip + " or test_fsspec_io_iterdatapipe" %} # [py<=39] | ||
| # 20231124 - disable tests that might timeout after 6 hours | ||
| # https://github.com/pytorch/data/blob/v0.7.1/test/dataloader2/test_mprs.py#L233 | ||
| {% set tests_to_skip = tests_to_skip + " or test_early_exit_ctx_" %} | ||
|
Comment on lines
+74
to
+76
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Disabling these multiprocessing related tests from https://github.com/pytorch/data/blob/v0.7.1/test/dataloader2/test_mprs.py#L233 because they can lead to timeouts after 6 hours (see previous failure at commit 8df3911, e.g. https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=823046&view=logs&j=4f922444-fdfe-5dcf-b824-02f86439ef14&t=937c195f-508d-5135-dc9f-d4b5730df0f7&l=1080) |
||
| # test_audio_examples uses an uninstalled local folder ("examples"); | ||
| # avoid test_text_examples due to cycle since torchtext depends on torchdata | ||
| - pytest -v --ignore=test_audio_examples.py --ignore=test_text_examples.py -k "not ({{ tests_to_skip }})" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, why not add
fsspecas a test dependency then?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nevermind,
fsspecis already there. Could you explain what you mean by "fails because fsspec is not available" then?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got no idea why it fails, I was assuming because
fsspecwas not a dep.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried re-enabling those tests in 089dc2b. This is the traceback from https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=830806&view=logs&j=4f922444-fdfe-5dcf-b824-02f86439ef14&t=937c195f-508d-5135-dc9f-d4b5730df0f7&l=1292:
=================================== FAILURES =================================== _______________ TestDataPipeRemoteIO.test_fsspec_io_iterdatapipe _______________ self = <test_remote_io.TestDataPipeRemoteIO testMethod=test_fsspec_io_iterdatapipe> @skipIfNoFSSpecS3 def test_fsspec_io_iterdatapipe(self): input_list = [ ["s3://ai2-public-datasets"], # bucket without '/' ["s3://ai2-public-datasets/charades/"], # bucket with '/' [ "s3://ai2-public-datasets/charades/Charades_v1.zip", "s3://ai2-public-datasets/charades/Charades_v1_flow.tar", "s3://ai2-public-datasets/charades/Charades_v1_rgb.tar", "s3://ai2-public-datasets/charades/Charades_v1_480.zip", ], # multiple files ] for urls in input_list: fsspec_lister_dp = FSSpecFileLister(IterableWrapper(urls), anon=True) self.assertEqual( > sum(1 for _ in fsspec_lister_dp), self.__get_s3_cnt(urls, recursive=False), f"{urls} failed" ) test_remote_io.py:278: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_remote_io.py:253: in __get_s3_cnt res = subprocess.run(aws_cmd, shell=True, check=True, capture_output=True) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ input = None, capture_output = True, timeout = None, check = True popenargs = ('aws --output json s3api list-objects --bucket ai2-public-datasets --no-sign-request --delimiter /',) kwargs = {'shell': True, 'stderr': -1, 'stdout': -1} process = <Popen: returncode: 255 args: 'aws --output json s3api list-objects --bucke...> stdout = b'' stderr = b'\n<botocore.awsrequest.AWSRequest object at 0x7fbea3642dd0>\n' retcode = 255 def run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs): """Run command with arguments and return a CompletedProcess instance. The returned instance will have attributes args, returncode, stdout and stderr. By default, stdout and stderr are not captured, and those attributes will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them, or pass capture_output=True to capture both. If check is True and the exit code was non-zero, it raises a CalledProcessError. The CalledProcessError object will have the return code in the returncode attribute, and output & stderr attributes if those streams were captured. If timeout is given, and the process takes too long, a TimeoutExpired exception will be raised. There is an optional argument "input", allowing you to pass bytes or a string to the subprocess's stdin. If you use this argument you may not also use the Popen constructor's "stdin" argument, as it will be used internally. By default, all communication is in bytes, and therefore any "input" should be bytes, and the stdout and stderr will be bytes. If in text mode, any "input" should be a string, and stdout and stderr will be strings decoded according to locale encoding, or by "encoding" if set. Text mode is triggered by setting any of text, encoding, errors or universal_newlines. The other arguments are the same as for the Popen constructor. """ if input is not None: if kwargs.get('stdin') is not None: raise ValueError('stdin and input arguments may not both be used.') kwargs['stdin'] = PIPE if capture_output: if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None: raise ValueError('stdout and stderr arguments may not be used ' 'with capture_output.') kwargs['stdout'] = PIPE kwargs['stderr'] = PIPE with Popen(*popenargs, **kwargs) as process: try: stdout, stderr = process.communicate(input, timeout=timeout) except TimeoutExpired as exc: process.kill() if _mswindows: # Windows accumulates the output in a single blocking # read() call run on child threads, with the timeout # being done in a join() on those threads. communicate() # _after_ kill() is required to collect that and add it # to the exception. exc.stdout, exc.stderr = process.communicate() else: # POSIX _communicate already populated the output so # far into the TimeoutExpired exception. process.wait() raise except: # Including KeyboardInterrupt, communicate handled that. process.kill() # We don't call process.wait() as .__exit__ does that for us. raise retcode = process.poll() if check and retcode: > raise CalledProcessError(retcode, process.args, output=stdout, stderr=stderr) E subprocess.CalledProcessError: Command 'aws --output json s3api list-objects --bucket ai2-public-datasets --no-sign-request --delimiter /' returned non-zero exit status 255. ../../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/lib/python3.11/subprocess.py:571: CalledProcessError _________________ TestDataPipeRemoteIO.test_s3_io_iterdatapipe _________________ self = <test_remote_io.TestDataPipeRemoteIO testMethod=test_s3_io_iterdatapipe> @skipIfNoAWS @unittest.skipIf(IS_M1, "PyTorch M1 CI Machine doesn't allow accessing") def test_s3_io_iterdatapipe(self): # S3FileLister: different inputs input_list = [ ["s3://ai2-public-datasets"], # bucket without '/' ["s3://ai2-public-datasets/"], # bucket with '/' ["s3://ai2-public-datasets/charades"], # folder without '/' ["s3://ai2-public-datasets/charades/"], # folder without '/' ["s3://ai2-public-datasets/charad"], # prefix [ "s3://ai2-public-datasets/charades/Charades_v1", "s3://ai2-public-datasets/charades/Charades_vu17", ], # prefixes ["s3://ai2-public-datasets/charades/Charades_v1.zip"], # single file [ "s3://ai2-public-datasets/charades/Charades_v1.zip", "s3://ai2-public-datasets/charades/Charades_v1_flow.tar", "s3://ai2-public-datasets/charades/Charades_v1_rgb.tar", "s3://ai2-public-datasets/charades/Charades_v1_480.zip", ], # multiple files [ "s3://ai2-public-datasets/charades/Charades_v1.zip", "s3://ai2-public-datasets/charades/Charades_v1_flow.tar", "s3://ai2-public-datasets/charades/Charades_v1_rgb.tar", "s3://ai2-public-datasets/charades/Charades_v1_480.zip", "s3://ai2-public-datasets/charades/Charades_vu17", ], # files + prefixes ] for input in input_list: s3_lister_dp = S3FileLister(IterableWrapper(input), region="us-west-2") > self.assertEqual(sum(1 for _ in s3_lister_dp), self.__get_s3_cnt(input), f"{input} failed") test_remote_io.py:341: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test_remote_io.py:253: in __get_s3_cnt res = subprocess.run(aws_cmd, shell=True, check=True, capture_output=True) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ input = None, capture_output = True, timeout = None, check = True popenargs = ('aws --output json s3api list-objects --bucket ai2-public-datasets --no-sign-request',) kwargs = {'shell': True, 'stderr': -1, 'stdout': -1} process = <Popen: returncode: 255 args: 'aws --output json s3api list-objects --bucke...> stdout = b'' stderr = b'\n<botocore.awsrequest.AWSRequest object at 0x7f56bdd8e790>\n' retcode = 255 def run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs): """Run command with arguments and return a CompletedProcess instance. The returned instance will have attributes args, returncode, stdout and stderr. By default, stdout and stderr are not captured, and those attributes will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them, or pass capture_output=True to capture both. If check is True and the exit code was non-zero, it raises a CalledProcessError. The CalledProcessError object will have the return code in the returncode attribute, and output & stderr attributes if those streams were captured. If timeout is given, and the process takes too long, a TimeoutExpired exception will be raised. There is an optional argument "input", allowing you to pass bytes or a string to the subprocess's stdin. If you use this argument you may not also use the Popen constructor's "stdin" argument, as it will be used internally. By default, all communication is in bytes, and therefore any "input" should be bytes, and the stdout and stderr will be bytes. If in text mode, any "input" should be a string, and stdout and stderr will be strings decoded according to locale encoding, or by "encoding" if set. Text mode is triggered by setting any of text, encoding, errors or universal_newlines. The other arguments are the same as for the Popen constructor. """ if input is not None: if kwargs.get('stdin') is not None: raise ValueError('stdin and input arguments may not both be used.') kwargs['stdin'] = PIPE if capture_output: if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None: raise ValueError('stdout and stderr arguments may not be used ' 'with capture_output.') kwargs['stdout'] = PIPE kwargs['stderr'] = PIPE with Popen(*popenargs, **kwargs) as process: try: stdout, stderr = process.communicate(input, timeout=timeout) except TimeoutExpired as exc: process.kill() if _mswindows: # Windows accumulates the output in a single blocking # read() call run on child threads, with the timeout # being done in a join() on those threads. communicate() # _after_ kill() is required to collect that and add it # to the exception. exc.stdout, exc.stderr = process.communicate() else: # POSIX _communicate already populated the output so # far into the TimeoutExpired exception. process.wait() raise except: # Including KeyboardInterrupt, communicate handled that. process.kill() # We don't call process.wait() as .__exit__ does that for us. raise retcode = process.poll() if check and retcode: > raise CalledProcessError(retcode, process.args, output=stdout, stderr=stderr) E subprocess.CalledProcessError: Command 'aws --output json s3api list-objects --bucket ai2-public-datasets --no-sign-request' returned non-zero exit status 255. ../../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/lib/python3.11/subprocess.py:571: CalledProcessErrorSeems to be something related to opening the s3 objects on https://registry.opendata.aws/allenai-arc/? What's strange is that these tests fail on Linux, but passes for OSX-64.