cl-phonetic

A phonetic pattern matching library for Common Lisp.

This library is intended to replace the Sylvia library for Python.

The Short Version

(let
    ((dict (from-cmudict "./cmudict")))

  (pronounce-word dict "tomato")
  ;; (#<PRONUNCIATION T AH M AA T OW> #<PRONUNCIATION T AH M EY T OW>)

  (regex-search dict "P .* EY T OW #+")
  ;; Words starting with the P sound, followed by anything, and ending with the
  ;; EY T OW sequence, plus at least one more consonant.
  ;; (("potatoes" #<PRONUNCIATION P AH T EY T OW Z>)
  ;;  ("plato's" #<PRONUNCIATION P L EY T OW Z>))

  (regex-search dict "#<v,,f> %% NG")
  ;; Two syllable words beginning with a voiced fricative and ending with NG
  ;; (("zooming" #<PRONUNCIATION Z UW M IH NG>)
  ;;  ("zapping" #<PRONUNCIATION Z AE P IH NG>)
  ;;  ("voicing" #<PRONUNCIATION V OY S IH NG>)
  ;;  ("vesting" #<PRONUNCIATION V EH S T IH NG>)
  ;;  ...

  (generate-regex 'rhyme (first (pronounce-word dict "roses")))
  ;; ".* OW Z IH Z "

  (generate-regex 'rhyme (first (pronounce-word dict "roses")) :loose t)
  ;; ".* OW #* Z #* IH #* Z #* "

  (generate-regex 'rhyme (first (pronounce-word dict "roses")) :near t)
  ;; ".* OW Z [  AE EH IH IY  ] Z "

  (the-words (find-metapattern dict 'rhyme "roses"))
  ;; Words that perfectly rhyme with "roses"
  ;; ("supposes" "roses" "rose's" "proposes" "primroses" "presupposes" "poses"
  ;;  "overexposes" "opposes" "noses" "juxtaposes" "imposes" "hoses" "forecloses"
  ;;  "exposes" "dozes" "disposes" "discloses" "decomposes" "composes" "closes"
  ;;  "bulldozes")

  (the-words (find-metapattern dict 'rhyme "roses" :loose t))
  ;; Words that loosely or perfectly rhyme with "roses"
  ;; ("supposes" "roses" "rose's" "rolls's" "proposes" "primroses" "presupposes"
  ;;  "poses" "overexposes" "opposes" "noses" "juxtaposes" "joneses" "jones's"
  ;;  "imposes" "hoses" "holmes's" "forecloses" "exposes" "dozes" "doles's"
  ;;  "disposes" "discloses" "decomposes" "composes" "closings" "closes" "bulldozes")

  (the-words (find-metapattern dict 'rhyme "roses" :loose t :near t))
  ;; ("supposes" "rosie's" "roses" "rosecrans" "roseanne's" "rose's" "rolls's"
  ;;  "proposes" "primroses" "presupposes" "poses" "overexposes" "opposes" "noses"
  ;;  "juxtaposes" "joneses" "jones's" "imposes" "hoses" "holmes's" "grozny's"
  ;;  "forecloses" "exposes" "dozes" "doles's" "disposes" "discloses" "decomposes"
  ;;  "composes" "closings" "closes" "bulldozes" "bozell's")

  (the-words (find-metapattern dict 'consonance "babble"))
  ;; Words with consonance against "babble"
  ;; ("burble" "bubel" "bubbly" "bubble" "bobble" "biebel" "bibler" "bible" "bauble"
  ;;  "babula" "babler" "babel" "babbler" "babble")

  (the-words (find-metapattern dict 'consonance "babble" :loose t))
  ;; Words with a loose consonance against "babble"
  ;; ("obtainable" "observable" "objectionable" "butterball" "burnable" "burble"
  ;;  "bumbly" "bumble" "bumbalough" "buildable" "bubel" "bubbly" "bubble"
  ;;  "brumbelow" "brightbill" "brechbill" "breakable" "bramble" "brambila"
  ;;  "brakebill" "brackbill" "botshabelo" "bookmobile" "bonnibelle" "bonnibel"
  ;;  "bobble" "bobadilla" "bluebottle" "bluebell" "blankenbeckler" "blackball"
  ;;  "biodegradable" "billable" "biebel" "biddable" "biblical" "bibler" "bible"
  ;;  "berkebile" "believable" "bearably" "bearable" "beachball" "bauble"
  ;;  "basketball" "baseball" "barboursville" "barbella" "barbell" "barbanel"
  ;;  "barbagallo" "bankable" "babula" "babler" "babel" "babbler" "babble"
  ;;  "abominable")

  (the-words (find-metapattern dict 'assonance "ivanhoe"))
  ;; Words with assonance against "ivanhoe"
  ;; ("zydeco" "xylophone" "virazole" "styrofoam" "microscopes" "microscope"
  ;;  "microphone" "kayapo" "ivanhoe" "ivaco" "isotopes" "isotope" "isentrope"
  ;;  "idaho's" "idaho" "gyroscopes" "gyroscope" "dynamo" "dialtone" "diagnosed"
  ;;  "diagnose" "cyclostomes" "cyclostome")

  (the-words (find-metapattern dict 'alliteration "phone"))
  ;; Words with alliteration agaisnt "phone"
  ;; ( ... "foam" "phoning" "fairway" "philistine" ...)

  (pronounce-utterance dict "Eat it!")
  ;; #<PRONUNCIATION IY T IH T>

  (the-words (find-metapattern dict 'rhyme (pronounce-utterance dict "Eat it!") :loose t))
  ;; ("restricts" "restrict" "elitists" "elitist" "defeatist" "betwixt")
  )

Features

This library is in-progress, and each feature is at varying degrees of readiness.

Status Symbol	Meaning
💡	Planning stage.
⛏	Initial groundwork started. Usually, this just means that it’s already done in Sylvia’s codebase.
🚧	Currently under active implementation.
✅	Done

✅ Phonetic Pattern Matching via Regular Expressions

cl-phonetic can search a phonetic dictionary for words whose pronunciations match a phonetic regular expression. These regex are similar to Perl in syntax, but, have nothing to do with ASCII or Unicode character sets. Instead, the alphabet of the language tested by these regex consists only of phonemes.

Phoneme Literals

In a phonetic regex, phoneme literals are defined according to the ARPABET, as it was used by cmudict. A full list of ARPABET phoneme encodings from that link is reproduced here.

Phoneme	Example	Translation
`AA`	odd	`AA D`
`AE`	at	`AE T`
`AH`	hut	`HH AH T`
`AO`	ought	`AO T`
`AW`	cow	`K AW`
`AY`	hide	`HH AY D`
`B`	be	`B IY`
`CH`	cheese	`CH IY Z`
`D`	dee	`D IY`
`DH`	thee	`DH IY`
`EH`	Ed	`EH D`
`ER`	hurt	`HH ER T`
`EY`	ate	`EY T`
`F`	fee	`F IY`
`G`	green	`G R IY N`
`HH`	he	`HH IY`
`IH`	it	`IH T`
`IY`	eat	`IY T`
`JH`	gee	`JH IY`
`K`	key	`K IY`
`L`	lee	`L IY`
`M`	me	`M IY`
`N`	knee	`N IY`
`NG`	ping	`P IH NG`
`OW`	oat	`OW T`
`OY`	toy	`T OY`
`P`	pee	`P IY`
`R`	read	`R IY D`
`S`	sea	`S IY`
`SH`	she	`SH IY`
`T`	tea	`T IY`
`TH`	theta	`TH EY T AH`
`UH`	hood	`HH UH D`
`UW`	two	`T UW`
`V`	vee	`V IY`
`W`	we	`W IY`
`Y`	yield	`Y IY L D`
`Z`	zee	`Z IY`
`ZH`	seizure	`S IY ZH ER`

When they occur in a phonetic regex, these phoneme literals should be space delimited. For example, K AE T is a phonetic regex which matches the English word “cat”.

Since these regex are Perl-like, K AE .* is also a valid phonetic regex, and matches words like “cat”, “Canberra”, “cathode”, etc.

Phoneme Class Expressions

cl-phonetic further extends Perl syntax by introducing a new facility for defining classes and sequences of phonemes. To start;

# matches any single consonant phoneme
@ matches any single vowel phoneme
% matches any single syllable

Both the # and @ class symbols may optionally accept arguments which further constrain matches. These arguments consist of comma delimited characters within angle brackets. For example, #<v,,f> which matches only voiced, fricative consonants.

You need only supply as many arguments as desired, and can leave fields empty as needed. For example, the following class definitions are all valid, and all compile to the same phoneme sets; @, @<>, @<,>, and @<,,>.

Consonant Class Options

For consonant classes (the #<,,> pattern), up to three arguments can be specified;

First, a single character which can restrict matches based on voicing.
Second, sequence of characters which restricts matches based on place of articulation.
Third, a sequence of characters which restricts matches based on method of articulation.

When multiple characters are supplied for a single parameter, the resulting matches are a union over those characters. That is, there’s an implicit OR over your arguments.

Consonant voicing arguments:

Character	Restricts Matches To
v	Voiced
u	Unvoiced

Consonant place-of-articulation arguments

Character	Restricts Matches To
a	Alveolar
b	Bilabial
d	Dental
g	Glottal
l	Labio-dental
p	Post-alveolar
t	Palatal
v	Velar

Consonant method-of-articulation arguments

Character	Restricts Matches To
a	Affricate
f	Fricative
l	Lateral
n	Nasal
p	Plosive
x	Approximant

Examples:

Phoneme Class Definition	What It Matches
`#`	All consonants
`#<,,>`	All consonants
`#<v>`	All voiced consonants
`#<v,,>`	All voiced consonants
`#<,,p>`	All plosive consonants
`#<v,,p>`	All consonants which are both voiced and plosive
`#<,bd,>`	All consonants which are either bilabial or dental
`#<,,fa>`	All consonants which are either fricative or affricate
`#<u,bd,fa>`	All consonants which are unvoiced, and also either bilabial or dental, and also either fricative or affricate

Vowel Class Options

For vowel classes (the @<,,> pattern), three parameters may also be specified;

First, height
Second, backness
Third, roundedness

The first two of these categories are fairly fluid, and so are encoded as numbers. As with consonants, when multiple characters are supplied for a single parameter, the resulting matches are a union over those characters. That is, there’s an implicit OR over your arguments.

Vowel height arguments:

Character	Restricts Matches To
1	Open
2	Near Open
3	Open Mid
4	Mid
5	Close Mid
6	Near Close
7	Close

Vowel backness arguments:

Character	Restricts Matches To
1	Front
2	Central
3	Back

Vowel roundedness arguments:

Character	Restricts Matches To
r	Rounded
u	Unrounded

Examples:

Phoneme Class Definition	What it Matches
`@`	All vowels
`@<,,>`	All vowels
`@<,,r>`	All rounded vowels
`@<12,,u>`	All vowels which are unrounded and either open or near open height.
`@<,23>`	All vowels with either a central or back backness

Diphthongs and the r-colored phoneme, for now, are excluded whenever any restrictions are applied. They will only match a plain @, or, their associated phoneme literals.

✅ Phonetic Metapatterns via Regular Expression Generators

cl-phonetic can function as a rhyming dictionary by way of phonetic metapatterns. Other literary devices, like assonance, consonance, and alliteration, can also be queried.

A phonetic metapattern is a function which transforms a pronunciation (the phoneme sequence associated with a word) into a regular expression. This resulting regular expression implements the given metapattern over the given word.

rhyme

The 'rhyme metapattern applied to a word word produces a regular expression which matches words that rhyme with word. A rhyming word is defined here as any phoneme sequence whose phonemes match exactly after the first vowel phoneme. With the :loose option, additional consonant phonemes may be interspersed.

consonance

The 'consonance metapattern produces a regular expression which matches all words containing the same sequence of consonant phonemes as the target word. Vowel phonemes are ignored. With the :loose option, additional consonants may be interspersed.

assonance

The 'assonance metapattern produces a regular expression which matches all words containing the same sequence of vowel phonemes as the target word. Consonant phonemes are ignored. With the :loose option, additional vowels may occur before or after the matched sequence.

alliteration

The 'alliteration metapattern produces a regular expression which matches all words which begin with the same phoneme as the target word.

⛏ Pronunciation Inferencing

Arbitrary character sequence to phoneme sequence mapping. Sylvia has a quirky ruleset for this, which works fairly well. But it might be more fun to fit a transducer instead.

⛏ Popularity Filtering & Sorting

Allow searches to be applied in order of word popularity, and limit by either popularity threshold or total match count. Helps to prevent obscure words cluttering results.

💡 Corpus Statistics

Calculating phoneme N-grams, at the bare minimum. Basically a quick-path for processing large corpus.

User Manual

Reading a Phonetic Dictionary

Currently, only cmudict-like text files are supported.

(defparameter *dict* (from-cmudict #P"cmudict"))

*DICT*

Pronounce a word.

pronounce-word produces a list of pronunciation objects.

Sometimes, there’s just one pronunciation in it:

(pronounce-word *dict* "creepy")

(#<PRONUNCIATION (K R IY P IY)>)
T

Sometimes, there’s more:

(pronounce-word *dict* "tomato")

(#<PRONUNCIATION (T AH M AA T OW)> #<PRONUNCIATION (T AH M EY T OW)>)
T

Search for words matching a phonetic regular expression.

regex-search returns an alist of words (strings) and pronunciation lists.

(regex-search *dict* "K AE T")

(("katt" #<PRONUNCIATION (K AE T)>) ("kat" #<PRONUNCIATION (K AE T)>)
 ("catt" #<PRONUNCIATION (K AE T)>) ("cat" #<PRONUNCIATION (K AE T)>))

the-words takes an alist of that form and returns list a list of words.

(the-words (regex-search *dict* "K AE T"))

("katt" "kat" "catt" "cat")

The regex are generally Perl-like. Searching is done as “matches”, meaning that the word’s pronunciation must match the entire regex. Add .* to both ends if you want a scanning behavior.

(the-words (regex-search *dict* ".* K AE T .*"))

("yekaterinburg" "wildcatting" "wildcatters" "wildcatter" "wildcats" "wildcat"
 "wicat" "tomcat" "thundercats" "thundercat" "scattershot" "scattering"
 "scattergory" "scattergories" "scattergood" "scattered" "scatter" "scatology"
 "scatological" "scat" "pussycats" "pussycat" "polecats" "polecat" "piscataway"
 "muscat" "metlakatla" "mchatton" "mcatee" "kotsonis's" "kotsonis'" "kotsonis"
 "kitcat" "kikatte" "katzman" "katzin" "katzer" "katzenstein" "katzenberger"
 "katzenberg's" "katzenberg" "katzenbach" "katzen" "katz" "kattner" "katt"
 "katsushi" "katsaros" "katsanos" "kats" "katmandu" "katashiba" "kat"
 "copycatting" "copycats" "copycat" "concatenation" "concatenating"
 "concatenates" "concatenated" "concatenate" "catwoman" "catwalk" "catty"
 "catton" "catto" "cattlemen's" "cattlemen" "cattle" "catterton" "catterson"
 "catterall" "cattanach" "catt" "catskills" "catskill" "cats" "catron"
 "catrett" "catrambone" "caton" "catoe" "catnip" "catnap" "catlin" "catlike"
 "catlett" "catledge" "catkins" "catfish" "caterwaul" "caterpiller's"
 "caterpiller" "caterpillars" "caterpillar's" "caterpillar" "category"
 "categorizing" "categorizes" "categorized" "categorize" "categorization"
 "categories" "categorically" "categorical" "catechism" "catcalls" "catcall"
 "catbird" "catatonic" "catastrophic" "cataracts" "cataract" "catapults"
 "catapulting" "catapulted" "catapult" "catamount" "catalyzed" "catalyze"
 "catalytic" "catalysts" "catalyst's" "catalyst" "catalonian" "catalonia"
 "cataloguing" "catalogues" "catalogued" "catalogue" "catalogs" "cataloging"
 "catalogers" "cataloger" "cataloged" "catalog" "catalina" "catalans" "catalan"
 "catala" "catain" "catacombs" "catacomb" "cataclysmic" "cataclysm"
 "cat-o-nine-tails" "cat-6" "cat-4" "cat-3" "cat-2" "cat-1" "cat's" "cat"
 "bobcats" "bobcat" "bacot")

Again, anything that works with Perl should work here. .? translates to “optionally, a single phoneme of any kind”.

(the-words (regex-search *dict* ".? AE T"))

("vat" "that" "tat" "shatt" "schadt" "sat" "ratte" "rat" "patt" "pat" "nat"
 "matte" "matt" "mat" "lat" "katt" "kat" "jagt" "hatt" "hat" "gnat" "gatt"
 "gat" "fat" "dat" "chat" "catt" "cat" "bhatt" "batte" "batt" "bat" "at")

And so on.

Consonants are encoded with # symbols.

(the-words (regex-search *dict* "# AE T"))

("vat" "that" "tat" "shatt" "schadt" "sat" "ratte" "rat" "patt" "pat" "nat"
 "matte" "matt" "mat" "lat" "katt" "kat" "jagt" "hatt" "hat" "gnat" "gatt"
 "gat" "fat" "dat" "chat" "catt" "cat" "bhatt" "batte" "batt" "bat")

They can be further restricted by voicing, place of articulation, and manner of articulation.

For example, here are the words ending with “AE T” that begin with a voiced, fricative consonant:

(the-words (regex-search *dict* "#<v,,f> AE T"))

("vat" "that")

And the words ending with “AE T” that begin with a bilabial, plosive consonant:

(the-words (regex-search *dict* "#<,b,p> AE T"))

("patt" "pat" "bhatt" "batte" "batt" "bat")

And the words ending with “AE T” that begin with a bilabial or labio-dental consonant:

(the-words (regex-search *dict* "#<,bl,> AE T"))

("vat" "patt" "pat" "matte" "matt" "mat" "fat" "bhatt" "batte" "batt" "bat")

All single syllable words beginning with a “B” phoneme, a single vowel, and a “D”.

(the-words (regex-search *dict* "B @ D"))

("byrd" "burd" "budde" "budd" "bud" "boyde" "boyd" "bowed" "booed" "bode"
 "bird" "bide" "bid" "beede" "bede" "bed" "bead" "bayed" "bawd" "baud" "bade"
 "bad" "baade")

The previous expression, restricted to vowels with a height between open and mid, inclusive.

(the-words (regex-search *dict* "B @<1234,,> D"))

("budde" "budd" "bud" "bed" "bawd" "baud" "bad" "baade")

Generating a phonetic regular expression

generate-regex creates a phonetic regular expression from a predefined metapattern and a word.

(generate-regex 'rhyme (first (pronounce-word *dict* "Candor")))

.* AE N D ER

Searching for this regex yields words that perfectly rhyme with “Candor”.

(the-words (regex-search *dict*
                         (generate-regex 'rhyme
                                         (first (pronounce-word *dict* "Candor")))))

("zander" "wicklander" "vandevander" "vander" "telander" "swartzlander"
 "subcommander" "standre" "stander" "stadtlander" "slander" "skenandore"
 "sjolander" "scalamandre" "santander" "sandor" "sander" "salamander"
 "rosander" "rander" "philander" "pander" "oleander" "nederlander" "meander"
 "mcalexander" "mander" "mainlander" "lysander" "leander" "landor" "lander"
 "highlander" "hander" "grander" "glander" "gerrymander" "gander" "evander"
 "coriander" "commander" "candor" "calamander" "bystander" "brander" "blander"
 "bander" "aulander" "ander" "alexander" "aleksandr" "aleksander")

But, if all you’re going to do is search for the generated regex, just use find-metapattern…

Searching for rhymes, and other metapatterns

find-metapattern wraps the process of generating a regular expression & searching it:

(the-words (find-metapattern *dict* 'rhyme "Turkey" :loose t))

("yerkey" "yerkes" "yaworski" "xerxes" "workweeks" "workweek" "worksheets"
 "worksheet" "tyburski" "twersky" "turski" "turnkey" "turkeys" "turkey's"
 "turkey" "swirsky" "swiderski" "sturkie" "stachurski" "sircy" "shirkey"
 "quirky" "purkey" "podgurski" "pirkey" "persky" "perky" "perkey" "pearcy"
 "murky" "mirsky" "merkley" "merkey" "kuberski" "koperski" "kirksey" "kirkley"
 "kirkey" "kirkby" "kasperski" "jerky" "hirschfield" "gursky" "gurski" "girsky"
 "gerski" "gerke" "figurski" "dworsky" "durkee" "burkley" "burkey" "burkeen"
 "birky" "birkey" "bertke" "berkley" "berklee" "berkey" "berkeley's" "berkeley"
 "anarchy" "aldercy" "albuquerque")

test-metapattern just tests whether a metapattern holds over two words.

Here, it does;

(test-metapattern *dict* 'alliteration "Xenon" "Czar")

(("Czar" #<PRONUNCIATION (Z AA R)>))

And here, it does not;

(test-metapattern *dict* 'rhyme "Wallet" "Stanford")

NIL

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
cl-phonetic.asd		cl-phonetic.asd
defpackage.lisp		defpackage.lisp
muse.lisp		muse.lisp
phonetic.lisp		phonetic.lisp
test.lisp		test.lisp
util.lisp		util.lisp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cl-phonetic

The Short Version

Features

✅ Phonetic Pattern Matching via Regular Expressions

✅ Phonetic Metapatterns via Regular Expression Generators

⛏ Pronunciation Inferencing

⛏ Popularity Filtering & Sorting

💡 Corpus Statistics

User Manual

Reading a Phonetic Dictionary

Pronounce a word.

Search for words matching a phonetic regular expression.

Generating a phonetic regular expression

Searching for rhymes, and other metapatterns

About

Releases

Packages

Languages

License

bgutter/cl-phonetic

Folders and files

Latest commit

History

Repository files navigation

cl-phonetic

The Short Version

Features

✅ Phonetic Pattern Matching via Regular Expressions

✅ Phonetic Metapatterns via Regular Expression Generators

⛏ Pronunciation Inferencing

⛏ Popularity Filtering & Sorting

💡 Corpus Statistics

User Manual

Reading a Phonetic Dictionary

Pronounce a word.

Search for words matching a phonetic regular expression.

Generating a phonetic regular expression

Searching for rhymes, and other metapatterns

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages