The Brown Corpus contains approximately a million words of English text, tagged with symbols indicating their parts of speech. You can download the corpus here.
Your task is to write a program that determines which nouns in this corpus appear particularly often in the plural form relative to the singular form. Write the results to a file, including whatever information and using whatever format seems best. Choose a reasonable criterion to determine which words to output.
Document your design and your decisions (in comments and/or a readme file), including how to run your program. Explain any problems or errors in your results, and how they could be fixed.
Feel free to make simplifying assumptions; while we expect your code to recognize the most common patterns for pluralizing words, for example, we do not necessarily expect it to know that "corpora" is the plural of "corpus". You may use any external libraries or data you wish.
When your Python code is ready, package it up and send it to [email protected], along with any necessary instructions on how to run it. Please send it only to us, and don't make your code publicly available.
We review code samples anonymously, so please do not put your name in the code itself.