GitHub - pz325/Coursera_NLP_MC: Coursera Natural Language Processing by Michael Collins Columbia University

pz325 / Coursera_NLP_MC Public

Notifications You must be signed in to change notification settings
Fork 23
Star 31

Coursera Natural Language Processing by Michael Collins Columbia University

31 stars 23 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
h1		h1
h2		h2
h3		h3
.gitignore		.gitignore
README		README

Repository files navigation

Assignments of Coursera National Language Processing by Michael Collins Columbia University
----

H1: Hidden Markov Models
----
Instruction refer to h1/h1.pdf

hmm.py
    Hmm_ex, extending Hmm, calculates and stores:
        * e(x|y), 
        * q(y_i|y_i-1, y_i-2)
        * count(x), 
        * rare_word, 
        * all tags 
        * all words
    SimpleTagger does simple tagging as instructed by Part 1
    ViterbiTagger does Viterbi tagging as instructed by Part 2    
p1.py
    Part 1
p2.py
    Part 2
p3.py
    Part 3
    not as good as required: Your F1-Score is 35.009 and the goal F1-Score is 39.519.
util.py
    Helper methods including
        * handling rare word (applying different rules)
        * test data iterator

----

H2: Probabilistic Context-Free Grammar (PCFG)
----
Instruction refer to h2/h2.pdf

pcfg.py
    PCFG, extending Count, calculate and store
        * q(X->Y1Y2)
        * q(X->w)
    CKYTagger implements CKY algorithm
p1.py
    Part 1
p2.py
    Part 2
    Expected development total F1-Scores are 0.79 for part 2 and 0.83 for part 3. 
p3.py
    Part 3

----
H3: IBM Model 1 & 2
----
Instruction refer to h3/h3.pdf

ibmmodel.py
    Count
        * t(f|e)
    IBMModel1, implements EM and align algorithm


p1.py
    Part 1

The expected development F-Scores are 0.420, 0.449, and a basic intersection alignment should give 0.485 for the last part.

----