Skip to content
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

Backport #862#863

Merged
leezu merged 7 commits intodmlc:v0.7.xfrom
leezu:7xfixvocabunktoken
Aug 5, 2019
Merged

Backport #862#863
leezu merged 7 commits intodmlc:v0.7.xfrom
leezu:7xfixvocabunktoken

Conversation

@leezu
Copy link
Contributor

@leezu leezu commented Aug 3, 2019

See #862

@leezu leezu requested a review from szha as a code owner August 3, 2019 20:40
@codecov
Copy link

codecov bot commented Aug 3, 2019

Codecov Report

❗ No coverage uploaded for pull request head (7xfixvocabunktoken@513e920). Click here to learn what that means.
The diff coverage is n/a.

@codecov
Copy link

codecov bot commented Aug 3, 2019

Codecov Report

Merging #863 into v0.7.x will decrease coverage by 0.67%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           v0.7.x     #863      +/-   ##
==========================================
- Coverage   90.43%   89.76%   -0.68%     
==========================================
  Files          66       66              
  Lines        6348     6350       +2     
==========================================
- Hits         5741     5700      -41     
- Misses        607      650      +43
Impacted Files Coverage Δ
src/gluonnlp/vocab/vocab.py 97.32% <100%> (+0.02%) ⬆️
src/gluonnlp/model/parameter.py 92% <0%> (-8%) ⬇️
src/gluonnlp/data/registry.py 78.12% <0%> (-6.25%) ⬇️
src/gluonnlp/vocab/subwords.py 81.13% <0%> (-5.67%) ⬇️
src/gluonnlp/data/corpora/wikitext.py 94.82% <0%> (-5.18%) ⬇️
src/gluonnlp/optimizer/bert_adam.py 86.27% <0%> (-3.93%) ⬇️
src/gluonnlp/data/batchify/batchify.py 92.85% <0%> (-3.58%) ⬇️
src/gluonnlp/data/utils.py 70.74% <0%> (-3.41%) ⬇️
src/gluonnlp/data/transforms.py 77.13% <0%> (-3.38%) ⬇️
src/gluonnlp/data/dataset.py 97.61% <0%> (-1.59%) ⬇️
... and 3 more

leezu added 3 commits August 3, 2019 21:16
Confirmed that vocab[vocab.unknown_token] still == 0 for all models created
prior to the flexible vocab PR. Ie:

- book_corpus_wiki_en_uncased
- wiki_multilingual_uncased
- openwebtext_book_corpus_wiki_en_uncased
- wiki_multilingual_cased
- wiki_cn_cased
@leezu leezu force-pushed the 7xfixvocabunktoken branch from 513e920 to ddcda74 Compare August 3, 2019 21:17
leezu added 2 commits August 4, 2019 09:31
Fixed in master branch by dmlc#838
Fix does not apply here as it would require dropping py2.
@leezu leezu force-pushed the 7xfixvocabunktoken branch from 980f689 to 883a34a Compare August 4, 2019 13:13
@szha
Copy link
Member

szha commented Aug 4, 2019

This needs more fixing

leezu added 2 commits August 5, 2019 15:13
Test v0.7x branch only on v1.4 due to incompatible doctest output on both mxnet
versions
@leezu leezu force-pushed the 7xfixvocabunktoken branch from 9861def to 9fb7e32 Compare August 5, 2019 15:13
@mli
Copy link
Member

mli commented Aug 5, 2019

Found link check problems in job PR-863/7:
(line 19) broken https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/ - 404 Client Error: Not Found for url: https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/
(line 5) broken https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/ - 404 Client Error: Not Found for url: https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/
(line 5) broken https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/ - 404 Client Error: Not Found for url: https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/
(line 21) broken https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/ - 404 Client Error: Not Found for url: https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/
(line 442) broken https://www.aclweb.org/anthology/P02-1040.pdf)[1 - 404 Client Error: Not Found for url: https://www.aclweb.org/anthology/P02-1040.pdf)%5B1
(line 92) broken https://nlp.stanford.edu/pubs/glove.pdf)[2 - 404 Client Error: NOT FOUND for url: https://nlp.stanford.edu/pubs/glove.pdf)%5B2
(line 208) broken https://www.bioinf.jku.at/publications/older/2604.pdf)[3 - 404 Client Error: Not Found for url: https://www.bioinf.jku.at/publications/older/2604.pdf)%5B3

@mli
Copy link
Member

mli commented Aug 5, 2019

Job PR-863/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-863/7/index.html

@leezu leezu merged commit 70c5438 into dmlc:v0.7.x Aug 5, 2019
@leezu leezu deleted the 7xfixvocabunktoken branch August 5, 2019 17:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants