Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc输入如果是几十万行的文本的话,100g内存的机器都跑步起来 #5

Open
zhegeliang2 opened this issue May 16, 2018 · 2 comments

Comments

@zhegeliang2
Copy link

我们的语料库有几十万行,文件大小大概1G,这些文本作为doc输入,直接就oom了,有没有处理这种情况的好方法。

@tuzhe0210
Copy link

可以给我试试,实现过一版类似的

@skadai
Copy link

skadai commented Nov 11, 2019

https://github.com/smoothnlp/SmoothNLP/blob/master/smoothnlp/algorithm/phrase/ngram_utils.py 这个库用trie树计算自由度,内存占用上比当前库要好很多

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants