The poetry generator is designed to create a sonnet. Broken down, the sonnet would contain 3 quatrains in ABAB form and 1 couplet. The sonnet does not follow the parameters of traditional sonnets. Thus the sonnet will not have iambic pentameter but will have sentences that contain ten syllables each.
The implementation starts off by creating a bi-gram markov chain from the Brown corpus in a function called generate_markov_chain(). Note that this function can be used to create Markov chains of n-grams, but in our case we chose to use bi-grams for generating a sonnet. The next major function is generate_sentence().
The generate_sentence function starts off by randomly choosing a word from the Markov chain. After the word is chosen, calculate_weights() is called to get the weights of all possible choices one can choose when starting from the chosen word. Once the weights are calculated, we try to find the next word by randomly choosing a word from the current chosen word with the help of numpy’s random.choice(). Once the next word is chosen, we see if the addition of that word will still give us a sentence of 10 syllables and whether that word’s part of speech differs from the current word. If the total number of syllables of the word and previous words is greater than 10 or the word’s part of speech is the same as the previous, we pick another word. If both conditions are met, we do the same operations again until we have finished creating the sentence. That is, we have created a sentence that has ten syllables. Making sure that the sentence has a total of ten syllables stems from calculating the number of syllables of each word that was chosen from the Markov chain in syllables(). In this syllables() function, the number of syllables of a given word is determined by getting the pronunciations of a word and counting its length as it equals the number of syllables in the word. When we have reached picking the last word of the sentence we also want to make sure whether that sentence must end in a word that rhymes with that of a previously generated sentence to satisfy a quatrain’s ABAB form or couplets AA form. If we have reached the last word where it must rhyme, get_rhyme _word() is called.
The get_rhyme_word determines which word must be rhymed with. Once that is determined, rhymes_all_words() is called to get all the words that rhyme with the word that must be rhymed with. Taking the output of rhymes_all_words(), a random word from that is chosen and brought back to generate_sentence().
The rhyme word obtained from get_rhyme_word() then replaces the last word that was chosen from the markov chain. Keep in mind that this rhyme word will have the same number of syllables as the chosen word from the markov chain. Thus the final product sentence will have a total of ten syllables.
The sonnet is now created from calling the generate_abab() function 3 times as each time generates a quatrain and then calling generate_couplet() function once. All these combined together creates our sonnet.
When generating a sentence in generate_sentence(), a problem arose. What would happen if all the words in our chosen word’s Markov chain could not satisfy the conditions we impose on them? This case was ultimately fixed by picking a new word from all the words in the root Markov chain. Another problem was in the syllables() function. When determining the number of syllables of a given word, the pronouncing package was used to get pronunciation of the word. But what if the number of syllables for that word could not be determined with the pronouncing package. A last resort algorithm was used if this was the case. We would count the number of consonants in the word and that would give the number of syllables of a given word. In cases where there are consecutive consonants, we would only count them as one. This algorithm is not as accurate as we want it to be, but as a last resort it is sufficient. Another problem was that in some cases, a word that had to rhyme with the last word of a previous sentence could not be picked that would also make the total number of syllables of that sentence be 10. When this arises one can see through ‘@’ at the end of the sentence where a rhyme could not be found. This could have been fixed by increasing the Markov chain to have more words but chose not for the sake of runtime.