You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the presentation of this project from 2013, Chris talked about weighting the "significance" of 4-grams. He referenced "Millennium Falcon" as a significant 2-gram.
What is the method for weighting used in the 2013 Study? Was it something like a stop word list or something more complex like TFIDF?
You may have intended to go over this in one of the unfinished sections, but it seems like something that would have been applied during the creation of the language models.
The text was updated successfully, but these errors were encountered:
Great question! I will be able to answer in more detail soon (I'm on a road
trip at the moment). We didn't use weighting on the model, and we didn't
use stop words. The hand-wavey answer is that we take the inverse of the
frequency as the weight. So if "millennial falcon" shows up just once in
all of the pre-1830s literature, then the "significance" score for that
two-word combination would be 1/1 = 1.0 but if it occurs twice (say, once
in Book A, and once again in Book B) then the score would be 1/2 = 0.5.
On Tue, Sep 17, 2019 at 10:36 AM misplacedFaith ***@***.***> wrote:
In the presentation of this project from 2013, Chris talked about
weighting the "significance" of 4-grams. He referenced "Millennial Falcon"
as a significant 2-gram.
What is the method for weighting used in the 2013 Study? Was it something
like a stop word list or something more complex like TFIDF?
You may have intended to go over this in one of the unfinished sections,
but it seems like something that would have been applied during the
creation of the language models.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1?email_source=notifications&email_token=AAAABAMCTQLI7D5F4RJNDKTQKEBP7A5CNFSM4IXSQER2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HL472UQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAABAPU4LE5BLZY24LXQE3QKEBP7ANCNFSM4IXSQERQ>
.
In the presentation of this project from 2013, Chris talked about weighting the "significance" of 4-grams. He referenced "Millennium Falcon" as a significant 2-gram.
What is the method for weighting used in the 2013 Study? Was it something like a stop word list or something more complex like TFIDF?
You may have intended to go over this in one of the unfinished sections, but it seems like something that would have been applied during the creation of the language models.
The text was updated successfully, but these errors were encountered: