Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fields to verbose_json response and show examples on the home page #1802

Merged
merged 4 commits into from
Jan 30, 2024

Conversation

JacobLinCool
Copy link
Contributor

@JacobLinCool JacobLinCool commented Jan 23, 2024

The verbose_json format seems to aim to match the same format as OpenAI's, but it is currently missing several fields.

  • I tried to include task, language, duration, temperature, and avg_logprob.

Some observations:

  • The current segment.words do not match OpenAI's (OpenAI does not provide this one), but I think that's okay; it can be an extension to showcase the capabilities of whisper.cpp at the word level.
  • It seems that OpenAI also includes non-speech tokens.
  • For compression_ratio and no_speech_prob, if I have not missed anything, currently, they cannot be obtained from whisper.cpp?

OpenAI's verbose_json

{
  "task": "transcribe",
  "language": "english",
  "duration": 4.440000057220459,
  "text": "This is my voice sample for research purpose only.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 4.0,
      "text": " This is my voice sample for research purpose only.",
      "tokens": [
        50364,
        639,
        307,
        452,
        3177,
        6889,
        337,
        2132,
        4334,
        787,
        13,
        50564
      ],
      "temperature": 0.0,
      "avg_logprob": -0.4242081940174103,
      "compression_ratio": 0.8928571343421936,
      "no_speech_prob": 0.0008698556339368224
    }
  ]
}

whisper.cpp's verbose_json

{
  "task": "transcribe",
  "language": "english",
  "duration": 4.440000057220459,
  "text": " This is my voice sample for research purpose only.\n",
  "segments": [
    {
      "id": 0,
      "text": " This is my voice sample for research purpose only.",
      "start": 0.0,
      "end": 4.0,
      "tokens": [
        770,
        318,
        616,
        3809,
        6291,
        329,
        2267,
        4007,
        691,
        13
      ],
      "words": [
        {
          "word": " This",
          "start": 0.16,
          "end": 0.42,
          "probability": 0.8665725588798523
        },
        {
          "word": " is",
          "start": 0.42,
          "end": 0.63,
          "probability": 0.9934033751487732
        },
        ...,
        {
          "word": ".",
          "start": 4.0,
          "end": 4.0,
          "probability": 0.8532010316848755
        }
      ],
      "temperature": 0.0,
      "avg_logprob": -0.1288750171661377
    }
  ]
}

I also replaced the hello on the homepage with request examples. I think the first action after running the server with ./server is to open the URL in the terminal and check out the homepage. It may be easier for users to try the server this way. (A web interface for direct interaction would be better, but it requires some time to design...)

@bobqianic
Copy link
Collaborator

For compression_ratio and no_speech_prob, if I have not missed anything, currently, they cannot be obtained from whisper.cpp?

Indeed, we opt for entropy rather than gzip compression. Currently, I'm working on implementations related to the no_speech_prob

@JacobLinCool
Copy link
Contributor Author

Currently, I'm working on implementations related to the no_speech_prob.

Wow! That sounds great! I believe it can definitely improve the use of whisper.cpp.

I think we can merge this pull request first and then either add a new issue for that or include some TODO comments in the code. What do you think?

@JacobLinCool
Copy link
Contributor Author

I've added a form to the home page.
截圖 2024-01-30 凌晨4 38 51

@ggerganov ggerganov merged commit baa30ba into ggerganov:master Jan 30, 2024
39 checks passed
@JacobLinCool JacobLinCool deleted the server-example-improvement branch January 30, 2024 14:24
@JacobLinCool
Copy link
Contributor Author

The problem may not related to the changes in this PR, which only changed the frontend page and the output format. The mechanism of file processing is not involved.

I think the problem is about the changes in #1781, which allows passing audio file content as file name. (I think that's why you got error: failed to open 'RIFF$�' as WAV file)

@UniversalTechno
Copy link

sorry it was compilation mistake from my side thanks

bygreencn added a commit to bygreencn/whisper.cpp that referenced this pull request Feb 3, 2024
* ggerganov/master: (60 commits)
  sync : ggml (#0)
  ggml : fix IQ3_XXS on Metal (llama/5219)
  sync : ggml (llama/0)
  Faster AVX2 dot product for IQ2_XS (llama/5187)
  SOTA 3-bit quants (llama/5196)
  ggml alloc: Fix for null dereference on alloc failure (llama/5200)
  Nomic Vulkan backend (llama/4456)
  ggml : add max buffer sizes to opencl and metal backends (llama/5181)
  metal : free metal objects (llama/5161)
  gguf : fix comparison (ggml/715)
  `ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
  gguf : add input validation, prevent integer overflows (ggml/709)
  ci : fix yolo URLs + fix metal capture (ggml/712)
  metal : add debug capture backend function (ggml/694)
  common : fix wav buffer detection (ggerganov#1819)
  server : add fields to `verbose_json` response (ggerganov#1802)
  make : update MSYS_NT (ggerganov#1813)
  talk-llama : sync llama.cpp
  sync : ggml
  ggml : add Vulkan backend (llama/2059)
  ...
jiahansu pushed a commit to WiseSync/whisper.cpp that referenced this pull request Apr 17, 2024
* server: include additional fields in the verbose_json response as OpenAI does

* server: show request examples on home page

* server: todo note for compression_ratio and no_speech_prob

* server: add simple demo form to the homepage
viktor-silakov pushed a commit to viktor-silakov/whisper_node_mic.cpp that referenced this pull request May 11, 2024
* server: include additional fields in the verbose_json response as OpenAI does

* server: show request examples on home page

* server: todo note for compression_ratio and no_speech_prob

* server: add simple demo form to the homepage
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
* server: include additional fields in the verbose_json response as OpenAI does

* server: show request examples on home page

* server: todo note for compression_ratio and no_speech_prob

* server: add simple demo form to the homepage
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
* server: include additional fields in the verbose_json response as OpenAI does

* server: show request examples on home page

* server: todo note for compression_ratio and no_speech_prob

* server: add simple demo form to the homepage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants