Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese entity HiExpan issues #7

Open
weather319 opened this issue Sep 4, 2019 · 1 comment
Open

Chinese entity HiExpan issues #7

weather319 opened this issue Sep 4, 2019 · 1 comment

Comments

@weather319
Copy link

weather319 commented Sep 4, 2019

Hi jiaming,
Thanks for your idea and codes. When I run those codes in Chinese corpus, I found some issues:

  • First, Dependent syntax and part of speech seem to be unnecessary in corpus processing.

  • Second, getCombinedWeightByFeatureMap function use too much time when the featuresOfSeed size is large(the skip gram patterns size reaches hundreds of thousands of levels). So I only retained the 600 features with the highest score and standard length in "eidSkipgram2TFIDFStrength.txt" for each entity. This method reduced the run time from 30 hours to 30 minutes, but there is the possibility of reducing the accuracy of the calculation of the combinedWeight score.

  • Third, type feature is useless for Chinese, I have to use the LDA model's score instead. Now I am evaluating the effectiveness of this method.

  • At last, I didn't find the code for the Taxonomy Global Optimization section. Where can I find it?

@fredia
Copy link

fredia commented Feb 17, 2020

Hi weather319,I also want to use HiExpan in Chinese corpus, but the porbase which HiExpand used to extract features does not suit Chinese, would you like to share which Chinese probase did you use or how did you extract type feature in Chinese? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants