TMH is a library that gives access to open source tools for working with speech and language.
-- Transcription -- Audio embeddings (Wav2vec2) -- Phoneme extraction -- Text embeddings (BERT) -- Speaker diarization -- Text generation -- Text summarization
The package is focused on solving common problems in speech and language with an easy API where we take care of making sure that the functions gives you good results, and you don't have to worry about the details (but can check everything since it's all open source!).
https://tmh-docs.readthedocs.io/en/latest/docs.html#getting-started
To start the project you first need to install tmh and pyannote, since we are using newer packages.
pip install tmh
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
from tmh.transcribe import transcribe_from_audio_path
file_path = "./sv.wav"
transcription = "Nu prövar vi att spela in ljud på svenska sex laxar i en laxask de finns en stor banan"
print("creating transcription")
asr_transcription = transcribe_from_audio_path(file_path)
print("output")
print(asr_transcription)
print("the transcription is", transcription)
from tmh.transcribe import transcribe_from_audio_path
file_path = "./sv.wav"
transcription = "Nu prövar vi att spela in ljud på svenska sex laxar i en laxask de finns en stor banan"
print("creating transcription")
asr_transcription = transcribe_from_audio_path(file_path)
print("output")
print(asr_transcription)
print("the transcription is", transcription)
To use imager generation you need a computer with nvidia-gpu You also need to install torch with cuda
pip uninstall torch
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
Cyberpunk jesus at a rock concert
Run the following code to generate an image. The image will be stored in the current directory. The first input is the prompt to generate from, the second is the filename to save to.
from tmh.images.stable_diffusion import generate_image
generate_image("Cyberpunk jesus at a rock concert", "jesus.png")
from tmh.transcribe import transcribe_from_audio_path
file_path = "/home/bmoell/tmh/tmh/test.wav"
output = transcribe_from_audio_path(file_path, "English", output_word_offsets=True)
print("the output is", output)
the output is {'transcription': "hello hello hello i'm talking the end bye", 'speech_rate': 0.1142857142857143, 'averages': {'h': 0.02499999999999991, 'e': 0.019999999999999945, 'l': 0.020000000000000018, 'o': 0.020000000000000018, ' ': 0.040000000000000036, 'i': 0.030000000000000027, "'": 0.020000000000000018, 'm': 0.020000000000000018, 't': 0.020000000000000018, 'a': 0.020000000000000018, 'k': 0.020000000000000018, 'n': 0.02000000000000024, 'g': 0.020000000000000018, 'd': 0.020000000000000462, 'b': 0.020000000000000462, 'y': 0.020000000000000462}, 'standard_deviations': {'h': 0.008660254037844203, 'e': 3.0517331120737817e-16, 'l': 0.0, 'o': 0.0, ' ': 0.017320508075688693, 'i': 0.010000000000000009, "'": 0.0, 'm': 0.0, 't': 0.0, 'a': 0.0, 'k': 0.0, 'n': 2.220446049250313e-16, 'g': 0.0, 'd': 0.0, 'b': 0.0, 'y': 0.0}, 'variances': {'h': 7.499999999999681e-05, 'e': 9.313074987327527e-32, 'l': 0.0, 'o': 0.0, ' ': 0.00029999999999999715, 'i': 0.00010000000000000018, "'": 0.0, 'm': 0.0, 't': 0.0, 'a': 0.0, 'k': 0.0, 'n': 4.930380657631324e-32, 'g': 0.0, 'd': 0.0, 'b': 0.0, 'y': 0.0}}
from tmh.transcribe_with_vad import transcribe_from_audio_path_split_on_speech
file_path = "./sv.wav"
print("creating transcription")
asr_transcription_with_vad = transcribe_from_audio_path_split_on_speech(file_path)
print("transcription")
print(asr_transcription_with_vad)
In order to transcribe with an additional language model you need to install kenlm pip install https://github.com/kpu/kenlm/archive/master.zip
from tmh.transcribe_with_lm import transcribe_from_audio_path_with_lm
file_path = "./sv.wav"
transcription = transcribe_from_audio_path_with_lm(file_path, 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram')
print("output")
print(asr_transcription)
print("the transcription is", transcription)
from tmh.overlap import overlap_detection
file_path = "./sv.wav"
overlap = overlap_detection(audio_path)
print(overlap)
Speaker Diarization seperates an audio file into two different speakers
from tmh.separate_speakers import create_speaker_files_from_audio_path
file_path = "/home/bmoell/tmh/tmh/test.wav"
create_speaker_files_from_audio_path(file_path)
from tmh.transcribe import classify_language
file_path = "./sv.wav"
transcription = "Nu prövar vi att spela in ljud på svenska sex laxar i en laxask de finns en stor banan"
print("classifying language")
language = classify_language(file_path)
print("the language is", language)
from tmh.transcribe import classify_emotion
file_path = "./sv.wav"
print("classifying emotion")
language = classify_emotion(file_path)
print("the emotion is", language)
The speaker embeddings are made using the following library https://huggingface.co/speechbrain/spkrec-xvect-voxceleb
from tmh.transcribe import extract_speaker_embedding
file_path = "./sv.wav"
print("extracting speaker embedding")
embeddings = extract_speaker_embedding(file_path)
print("the speaker embedding is", embeddings)
Audio embeddings using wav2vec2 https://arxiv.org/abs/2006.11477 or hubert (default) https://arxiv.org/abs/2106.07447 can be used.
Note that embeddings that are not trained for ASR (using CTC) usually have better performance on classification tasks.
https://arxiv.org/pdf/2104.03502v1.pdf
From the paper, "when the model is finetuned for an ASR task, information that is not relevant for that task but might be relevant for speech emotion recognition is lost from the embeddings. For example, information about the pitch might not be important for speech recognition, while it is essential for speech emotion recognition."
from tmh.audio_embeddings import extract_audio_embeddings
audio_embeddings = get_audio_embeddings('/Users/bmoell/Code/test_tanscribe/sv.wav')
print(audio_embeddings)
from tmh.vad import extract_silences
file_path = "./sv.wav"
print("extracting silences")
embeddings = extract_silences(file_path)
print("the silences are", embeddings)
Please download this model and put it in your current folder to be able to run the model https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/DeepPhonemizer/en_us_cmudict_ipa_forward.pt The model assumes that the model is stored at ./en_us_cmudict_ipa_forward.pt (you can change the model checkpoint param to save to another location)
from tmh.phonemes import get_phonemes
phonemes = get_phonemes("I'm eating a cake", model_checkpoint='./en_us_cmudict_ipa_forward.pt', language="English")
print(phonemes)
To use the swedish phonemes you need a swedish model stored at the model checkpoint path.
from tmh.phonemes import get_phonemes
phonemes = get_phonemes('Välkommen till tal, musik och hörsel', model_checkpoint='swedish_model.pt', language="Swedish")
print(phonemes)
Make sure you install these packages before running tacotron 2
pip install numpy scipy librosa unidecode inflect librosa
apt-get update
apt-get install -y libsndfile1
You can use the text generation api to generate text based on any pretrained model from huggingface.
from tmh.text.text_generation import generate_text
output = generate_text(model='birgermoell/swedish-gpt', prompt="AI har möjligheten att", max_length=250, temperature=0.9)
print(output)
from tmh.text.text_generation import generate_text
output = generate_text(model='EleutherAI/gpt-neo-2.7B', prompt="EleutherAI has", max_length=250, temperature=0.9)
print(output)
from tmh.text.text_generation import translate_and_generate
output = translate_and_generate("AI har möjligheten att skapa ett nytt samhälle där människor", max_length=250, temperature=0.9)
print(output)
from tmh.text.get_embeddings import get_bert_embedding_from_text
embedding = get_bert_embedding_from_text("Hej, jag gillar glass", model="KB/bert-base-swedish-cased")
print(embedding)
from tmh.text.ner import named_entity_recognition
ner = named_entity_recognition('KTH är ett universitet i Stockholm')
print(ner)
from tmh.text.question_answering import get_answer
answer = get_answer({'question': 'What is the meaning of life', 'context': 'The meaning of life is to be happy'})
print(answer)
from tmh.text.translate import translate_text
sv_text = "Albert Einstein var son till Hermann och Pauline Einstein, vilka var icke-religiösa judar och tillhörde medelklassen. Fadern var försäljare och drev senare en elektroteknisk fabrik. Familjen bosatte sig 1880 i München där Einstein gick i en katolsk skola. Mängder av myter cirkulerar om Albert Einsteins person. En av dessa är att han som barn skulle ha haft svårigheter med matematik, vilket dock motsägs av hans utmärkta betyg i ämnet.[15] Han nämnde ofta senare att han inte trivdes i skolan på grund av dess pedagogik. Att Albert Einstein skulle vara släkt med musikvetaren Alfred Einstein är ett, ofta framfört, obevisat påstående. Alfred Einsteins dotter Eva har framhållit att något sådant släktskap inte existerar."
translation = translate_text(sv_text)
print(translation)
from tmh.text.zero_shot import get_zero_shot_classification
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classified_label = get_zero_shot_classification(sequence_to_classify, candidate_labels)
print(classified_label)
from tmh.text.summarization import get_summary
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
"""
sum = get_summary(ARTICLE)
print(sum)
from tmh.text.summarization import translate_and_summarize
sv_text = "Albert Einstein var son till Hermann och Pauline Einstein, vilka var icke-religiösa judar och tillhörde medelklassen. Fadern var försäljare och drev senare en elektroteknisk fabrik. Familjen bosatte sig 1880 i München där Einstein gick i en katolsk skola. Mängder av myter cirkulerar om Albert Einsteins person. En av dessa är att han som barn skulle ha haft svårigheter med matematik, vilket dock motsägs av hans utmärkta betyg i ämnet.[15] Han nämnde ofta senare att han inte trivdes i skolan på grund av dess pedagogik. Att Albert Einstein skulle vara släkt med musikvetaren Alfred Einstein är ett, ofta framfört, obevisat påstående. Alfred Einsteins dotter Eva har framhållit att något sådant släktskap inte existerar."
swedish_summary = translate_and_summarize(sv_text)
print(swedish_summary)
from tmh.text.sentiment_analysis import get_sentiment
sentiment = get_sentiment("Robots are the best")
print(sentiment)
from tmh.text.sentiment_analysis import get_emotion
emotion = get_emotion("i feel as if i havent blogged in ages are at least truly blogged i am doing an update cute")
print(emotion)
Generate code and save to file. To use
from tmh.code import generate_from_prompt, write_to_file
response = generate_from_prompt('''
A pytorch neural network model for MNIST
'''
)
write_to_file(response, "generated.py")
Change the version number
python3 -m build
twine upload --skip-existing dist/*
https://tmh-docs.readthedocs.io/en/latest/docs.html#getting-started