Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better positions for extracting skip gram feature? #6

Open
JieyuZ2 opened this issue Jul 17, 2019 · 1 comment
Open

better positions for extracting skip gram feature? #6

JieyuZ2 opened this issue Jul 17, 2019 · 1 comment

Comments

@JieyuZ2
Copy link

JieyuZ2 commented Jul 17, 2019

Hi Jiaming,

In the code of extracting skip gram features https://github.com/mickeystroller/HiExpan/blob/master/src/featureExtraction/extractSkipGramFeature.py, the positions of possible skip gram are set as [(-1, 1), (-2, 1), (-3, 1), (-1, 3), (-2, 2), (-1, 2)] (line 30) , but I found when the center word is the first word of a sentence, the positions will actually become (0, 1) instead of (-1, 1) since there is no word before the center word, so maybe we should add positions like (0, 1), (0, 2) . Otherwise, we will see some entities have "a _ problem" feature but do not have "_ problem" feature. It may hurt when "_ problem" become an important feature later. Thanks!

Best,
Jieyu

@mickeysjm
Copy link
Owner

Thanks for this comment. I initially chose to select this six possible skipgrams in order to somehow align with existing literature. You can definitely change to other positions and I think your proposed schedule is very reasonable. You can do a comparative analysis and I am looking forward to seeing some empricial results. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants