Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Feature] update sharded loader #468

Merged
merged 5 commits into from
Dec 21, 2018
Merged

[Feature] update sharded loader #468

merged 5 commits into from
Dec 21, 2018

Conversation

zhreshold
Copy link
Member

Description

Update SharededDataLoader according to apache/mxnet#13447

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

@zhreshold zhreshold requested a review from szha as a code owner December 17, 2018 07:53
@szha szha requested review from szhengac and leezu December 17, 2018 21:18
batch = batchify_fn([dataset[i] for i in samples])
return batch

class _MultiWorkerIter(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to copy the full implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.4.0 not yet released, better save the effort to move dependency forward to a nightly build

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will approve it once it passes the ci test.


"""
def __init__(self, dataset, batch_size=None, shuffle=False, sampler=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to previous comment, why do we need to copy all.

@mli
Copy link
Member

mli commented Dec 19, 2018

Job PR-468/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-468/6/index.html


@pytest.mark.remote_required
def test_sharded_data_loader_record_file():
if not hasattr(mx.recordio.MXRecordIO, '_check_pid'):
# skip if mxnet<=1.4.0 detected, some hotfix is not included so recordfile will break
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about skipping the test depending on mxnet.__version__? In case there is any change in later mxnet versions that changes the _check_pid attribute it may help keep this test useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early 1.4.0 nightly build version don't have the according PR included, so I don't know how to skip those given version only. IMO it's a non issue after the official 1.4.0 is released

@szha szha merged commit f523396 into dmlc:master Dec 21, 2018
paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019
* update sharded loader

* fix

* fix threadpool

* use thread_pool, test no merge

* fix sharded batch
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants