Skip to content

Commit

Permalink
Adding notes on custom apis and translation (#828)
Browse files Browse the repository at this point in the history
  • Loading branch information
raivisdejus authored Jul 7, 2024
1 parent 621be2c commit 635b332
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 12 deletions.
4 changes: 3 additions & 1 deletion docs/docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ sidebar_position: 5

2. **What can I try if the transcription runs too slowly?**

Try using a lower Whisper model size or using a Whisper.cpp model.
Speech recognition requires large amount of computation, so one option is to try using a lower Whisper model size or using a Whisper.cpp model to run speech recognition of your computer. If you have access to a computer with GPU that has at least 6GB of VRAM you can try using the Faster Whisper model.

Buzz also supports using OpenAI API to do speech recognition on a remote server. To use this feature you need to set OpenAI API key in Preferences. See [Preferences](https://chidiwilliams.github.io/buzz/docs/preferences) section for more details.

3. **How to record system audio?**

Expand Down
26 changes: 19 additions & 7 deletions docs/docs/preferences.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ Open the Preferences window from the Menu bar, or click `Ctrl/Cmd + ,`.

## General Preferences

### OpenAI API preferences

**API Key** - key to authenticate your requests to OpenAI API. To get API key from OpenAI see [this article](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).

**Base Url** - By default all requests are sent to API provided by OpenAI company. Their api url is `https://api.openai.com/v1/`. Compatible APIs are also provided by other companies. List of available API urls you can find on [discussion page](https://github.com/chidiwilliams/buzz/discussions/827)

### Default export file name

Sets the default export file name for file transcriptions. For
Expand All @@ -15,11 +21,17 @@ as `Input Filename (transcribed on 19-Sep-2023 20-39-25).txt` by default.

Available variables:

| Key | Description | Example |
|-------------------|-------------------------------------------|------------------------------------------------------------------|
| Key | Description | Example |
|-------------------|-------------------------------------------|----------------------------------------------------------------|
| `input_file_name` | File name of the imported file | `audio` (e.g. if the imported file path was `/path/to/audio.wav` |
| `task` | Transcription task | `transcribe`, `translate` |
| `language` | Language code | `en`, `fr`, `yo`, etc. |
| `model_type` | Model type | `Whisper`, `Whisper.cpp`, `Faster Whisper`, etc. |
| `model_size` | Model size | `tiny`, `base`, `small`, `medium`, `large`, etc. |
| `date_time` | Export time (format: `%d-%b-%Y %H-%M-%S`) | `19-Sep-2023 20-39-25` |
| `task` | Transcription task | `transcribe`, `translate` |
| `language` | Language code | `en`, `fr`, `yo`, etc. |
| `model_type` | Model type | `Whisper`, `Whisper.cpp`, `Faster Whisper`, etc. |
| `model_size` | Model size | `tiny`, `base`, `small`, `medium`, `large`, etc. |
| `date_time` | Export time (format: `%d-%b-%Y %H-%M-%S`) | `19-Sep-2023 20-39-25` |

### Live transcript exports

Live transcription export can be used to integrate Buzz with other applications like OBS Studio. When enabled, live text transcripts will be exported to a text file as they get generated and translated.

If AI translation is enabled for live recordings, the translated text will also be exported to the text file. Filename for the translated text will end with `.translated.txt`.
8 changes: 4 additions & 4 deletions docs/docs/usage/translations.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@
title: Translations
---

Latest development versions support AI translations.

To get latest development version of the Buzz log into GitHub and get it from Artifacts section of some latest [action run](https://github.com/chidiwilliams/buzz/actions). Linux users can get the latest version from latest snap edge channel `sudo snap install buzz --channel latest/edge`
Default `Translation` task uses Whisper model ability to translate to English. Since version `1.0.0` Buzz supports additional AI translations to any other language.

To use translation feature you will need to configure OpenAI API key and translation settings. Set OpenAI API ket in Preferences. Buzz also supports custom locally running translation AIs that support OpenAI API. For more information on locally running AIs see [ollama](https://ollama.com/blog/openai-compatibility) or [LM Studio](https://lmstudio.ai/).

Expand All @@ -14,4 +12,6 @@ For AI to know how to translate enter translation instructions in the "Instructi

> You are a professional translator, skilled in translating English to Spanish. You will only translate each sentence sent to you into Spanish and not add any notes or comments.
If you enable "Enable live recording transcription export" in Preferences, Live text transcripts will be exported to a text file as they get generated and translated. This file can be used to further integrate Live transcripts with other applications like OBS Studio.
If you enable "Enable live recording transcription export" in Preferences, Live text transcripts will be exported to a text file as they get generated and translated. This file can be used to further integrate Live transcripts with other applications like OBS Studio.

Approximate cost of translation for 1 hour long audio with ChatGPT `gpt-4o` model is around 0.50$

0 comments on commit 635b332

Please sign in to comment.