Skip to content

Some command line utils and libraries to make NLTK (Natural Language Toolkit) easy to use for certain tasks

License

Notifications You must be signed in to change notification settings

jecompton/nltk_utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nltk_utils

This is a grab bag of scripts intended to help in using nltk for certain tasks, with a focus on using them as interactive command line utlities.

The word_freq.py script, for example, has a number of commandline options and expects a text file. It then gives you a frequency statistics report for most common words. You can easily filter out stopwords, punctuation, and tune your investigation as you like.

This is useful for exploring the text and finding directions you might go with an ML algorithm such as a classifier. It can also help you find places in the text that might contain dirty data, such as unusual punctuation characters that aren't filtering out.

You can get a lot of interesting results just by looking at bigrams and trigrams for texts such as political speeches (but don't jump to conclusions--NLP of this sort is a very inexact science).

About

Some command line utils and libraries to make NLTK (Natural Language Toolkit) easy to use for certain tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages