[Feature] update sharded loader #468

zhreshold · 2018-12-17T07:53:25Z

Description

Update SharededDataLoader according to apache/mxnet#13447

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

szhengac · 2018-12-18T08:10:37Z

src/gluonnlp/data/dataloader.py

+        batch = batchify_fn([dataset[i] for i in samples])
+    return batch
+
+class _MultiWorkerIter(object):


Why do we have to copy the full implementation?

1.4.0 not yet released, better save the effort to move dependency forward to a nightly build

Ok, I will approve it once it passes the ci test.

szhengac · 2018-12-18T08:12:49Z

src/gluonnlp/data/dataloader.py


+    """
    def __init__(self, dataset, batch_size=None, shuffle=False, sampler=None,


Similar to previous comment, why do we need to copy all.

mli · 2018-12-19T21:13:46Z

Job PR-468/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-468/6/index.html

leezu · 2018-12-20T11:52:01Z

tests/unittest/train/test_dataloader.py


 @pytest.mark.remote_required
 def test_sharded_data_loader_record_file():
+    if not hasattr(mx.recordio.MXRecordIO, '_check_pid'):
+        # skip if mxnet<=1.4.0 detected, some hotfix is not included so recordfile will break
+        return


How about skipping the test depending on mxnet.__version__? In case there is any change in later mxnet versions that changes the _check_pid attribute it may help keep this test useful?

Some early 1.4.0 nightly build version don't have the according PR included, so I don't know how to skip those given version only. IMO it's a non issue after the official 1.4.0 is released

* update sharded loader * fix * fix threadpool * use thread_pool, test no merge * fix sharded batch

update sharded loader

89e3fad

zhreshold requested a review from szha as a code owner December 17, 2018 07:53

zhreshold added 2 commits December 17, 2018 11:24

fix

91a982f

fix threadpool

cf1a6ce

szha requested review from szhengac and leezu December 17, 2018 21:18

szhengac reviewed Dec 18, 2018

View reviewed changes

zhreshold added 2 commits December 18, 2018 11:49

use thread_pool, test no merge

f428015

fix sharded batch

8aa8b46

szhengac approved these changes Dec 20, 2018

View reviewed changes

leezu reviewed Dec 20, 2018

View reviewed changes

leezu approved these changes Dec 21, 2018

View reviewed changes

szha merged commit f523396 into dmlc:master Dec 21, 2018

paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019

[Feature] update sharded loader (dmlc#468)

bf63b99

* update sharded loader * fix * fix threadpool * use thread_pool, test no merge * fix sharded batch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] update sharded loader #468

[Feature] update sharded loader #468

zhreshold commented Dec 17, 2018

szhengac Dec 18, 2018

zhreshold Dec 18, 2018

szhengac Dec 19, 2018

szhengac Dec 18, 2018

mli commented Dec 19, 2018

leezu Dec 20, 2018

zhreshold Dec 20, 2018


		"""
		def __init__(self, dataset, batch_size=None, shuffle=False, sampler=None,

[Feature] update sharded loader #468

[Feature] update sharded loader #468

Conversation

zhreshold commented Dec 17, 2018

Description

Checklist

Essentials

szhengac Dec 18, 2018

Choose a reason for hiding this comment

zhreshold Dec 18, 2018

Choose a reason for hiding this comment

szhengac Dec 19, 2018

Choose a reason for hiding this comment

szhengac Dec 18, 2018

Choose a reason for hiding this comment

mli commented Dec 19, 2018

leezu Dec 20, 2018

Choose a reason for hiding this comment

zhreshold Dec 20, 2018

Choose a reason for hiding this comment