-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
* [REVIEW REQUIRED] Revert PR #9484 & add additional dependency licenses to LICENSE file (#9701) * Revert "[Review Required] Fixing Licenses: Cleaning up the Top Level LICENSE file (#9484)" This reverts commit 8930d96. * Some more LICENSE fixes * Adding some more packages to the LICENSE file * Adding dependencies of dependencies * update v1.1.0 change log to NEWS.md * sync README.md from v1.1.0 branch * revert to correct jenkins url in README
* parallelization for roipooling * remove some useless computation * remove useless muls * add author and retriggering * retrigger again
* Bug Fix and performance optimized for rtc 1. "super().__init__()" bug is fixed in python 2. 2. Kernel is initialized in the stage of operator init. * Update custom_softmax_rtc.py fix unnessesary format
tests/python/unittest/test_text.py
Outdated
|
||
|
||
def test_token_embedding_from_file(): | ||
embed_root = 'embedding' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a tempfile instead
tests/python/unittest/test_text.py
Outdated
|
||
|
||
def test_vocab_set_embedding_with_one_custom_embedding(): | ||
embed_root = 'embedding' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a tempfile instead
tests/python/unittest/test_text.py
Outdated
|
||
|
||
def test_vocabulary_with_two_custom_embeddings(): | ||
embed_root = '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a tempfile instead
python/mxnet/text/embedding.py
Outdated
# coding: utf-8 | ||
# pylint: disable=consider-iterating-dictionary | ||
# pylint: disable=super-init-not-called | ||
# pylint: disable=arguments-differ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the last two pylint ignores really invalid?
This should be in gluon. |
python/mxnet/text/embedding.py
Outdated
:func:`~mxnet.contrib.text.embedding.create`. | ||
:func:`~mxnet.text.embedding.create`. | ||
|
||
|
||
Examples | ||
-------- | ||
>>> @mxnet.contrib.text.embedding.register |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> @mxnet.text.embedding.register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a JIRA
python/mxnet/gluon/text/embedding.py
Outdated
@@ -29,10 +29,10 @@ | |||
import warnings | |||
import zipfile | |||
|
|||
from . import _constants as C | |||
from mxnet import ndarray as nd | |||
from mxnet import nd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use relative import.
docs/api/python/gluon/text.md
Outdated
|
||
```python | ||
>>> text_data = " hello world \n hello nice world \n hi world \n" | ||
>>> counter = text.count_tokens_from_str(text_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem necessary to create vocab just to access embedding vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
|
||
``` | ||
|
||
The obtained `counter` has key-value pairs whose keys are words and values are word frequencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should explain why a counter is needed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
python/mxnet/gluon/text/embedding.py
Outdated
file is: | ||
|
||
'(token_1)(ed))v_11)(ed)(v_12)(ed)...(ed)(v_1k)\\\\n | ||
(token_2)(ed)(v_21)(ed)(v_22)(ed)...(ed)(v_2k)\\\\n...' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an example for the file format inside a code block so that it's easier to understand the file format. Currently it looks confusing. http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-10074/10/api/python/gluon/text.html#mxnet.gluon.text.embedding.TokenEmbedding.from_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
python/mxnet/gluon/text/vocab.py
Outdated
Examples | ||
-------- | ||
>>> fasttext = text.embedding.create('fasttext', file_name='wiki.simple.vec') | ||
>>> text_data = " hello world \n hello nice world \n hi world \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
python/mxnet/gluon/text/vocab.py
Outdated
def __len__(self): | ||
return len(self._idx_to_token) | ||
|
||
def set_embedding(self, embeddings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use (self, *embeddings)
instead. embeddings
should not be a list. After the change, it should be possible to do:
vocab.set_embedding(fasttext_emb, glove_embed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to update the doc/example accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved. with updated test cases
docs/api/python/gluon/text.md
Outdated
frequent words 'world' and 'hello' are also indexed. | ||
|
||
|
||
### Assign token embedding to vocabulary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assign doesn't seem like the right verb. Maybe attach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
|
||
@property | ||
def reserved_tokens(self): | ||
return self._reserved_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the reserved_tokens
property always include unknown_token
, given that they are all indexed first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved with more detailed api specification.
* [MXNET-67] Sync master with v1.1.0 branch (apache#10031) * [REVIEW REQUIRED] Revert PR apache#9484 & add additional dependency licenses to LICENSE file (apache#9701) * Revert "[Review Required] Fixing Licenses: Cleaning up the Top Level LICENSE file (apache#9484)" This reverts commit 8930d96. * Some more LICENSE fixes * Adding some more packages to the LICENSE file * Adding dependencies of dependencies * update v1.1.0 change log to NEWS.md * sync README.md from v1.1.0 branch * revert to correct jenkins url in README * Parallelization for ROIpooling OP (apache#9958) * parallelization for roipooling * remove some useless computation * remove useless muls * add author and retriggering * retrigger again * comments to copy and copyto are corrected (apache#10040) * Bug Fix and performance optimized for rtc (apache#10018) * Bug Fix and performance optimized for rtc 1. "super().__init__()" bug is fixed in python 2. 2. Kernel is initialized in the stage of operator init. * Update custom_softmax_rtc.py fix unnessesary format * set embedding * Code and test revised * api implementation done * license and news * readme and cpp * pylint disable * Add API doc * less pylint disable * remove contrib * move to gluon, revise api doc * fix import order * re-test * relative imports * re-run test * revise implementation, test case, and api doc * re-test
Description
Add vocabulary and embedding
Checklist
Essentials
make lint
)Changes
Comments