Add fields to `verbose_json` response and show examples on the home page #1802

JacobLinCool · 2024-01-23T20:11:55Z

The verbose_json format seems to aim to match the same format as OpenAI's, but it is currently missing several fields.

I tried to include task, language, duration, temperature, and avg_logprob.

Some observations:

The current segment.words do not match OpenAI's (OpenAI does not provide this one), but I think that's okay; it can be an extension to showcase the capabilities of whisper.cpp at the word level.
It seems that OpenAI also includes non-speech tokens.
For compression_ratio and no_speech_prob, if I have not missed anything, currently, they cannot be obtained from whisper.cpp?

OpenAI's verbose_json

{
  "task": "transcribe",
  "language": "english",
  "duration": 4.440000057220459,
  "text": "This is my voice sample for research purpose only.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 4.0,
      "text": " This is my voice sample for research purpose only.",
      "tokens": [
        50364,
        639,
        307,
        452,
        3177,
        6889,
        337,
        2132,
        4334,
        787,
        13,
        50564
      ],
      "temperature": 0.0,
      "avg_logprob": -0.4242081940174103,
      "compression_ratio": 0.8928571343421936,
      "no_speech_prob": 0.0008698556339368224
    }
  ]
}

whisper.cpp's verbose_json

{
  "task": "transcribe",
  "language": "english",
  "duration": 4.440000057220459,
  "text": " This is my voice sample for research purpose only.\n",
  "segments": [
    {
      "id": 0,
      "text": " This is my voice sample for research purpose only.",
      "start": 0.0,
      "end": 4.0,
      "tokens": [
        770,
        318,
        616,
        3809,
        6291,
        329,
        2267,
        4007,
        691,
        13
      ],
      "words": [
        {
          "word": " This",
          "start": 0.16,
          "end": 0.42,
          "probability": 0.8665725588798523
        },
        {
          "word": " is",
          "start": 0.42,
          "end": 0.63,
          "probability": 0.9934033751487732
        },
        ...,
        {
          "word": ".",
          "start": 4.0,
          "end": 4.0,
          "probability": 0.8532010316848755
        }
      ],
      "temperature": 0.0,
      "avg_logprob": -0.1288750171661377
    }
  ]
}

I also replaced the hello on the homepage with request examples. I think the first action after running the server with ./server is to open the URL in the terminal and check out the homepage. It may be easier for users to try the server this way. (A web interface for direct interaction would be better, but it requires some time to design...)

…nAI does

bobqianic · 2024-01-24T01:42:05Z

For compression_ratio and no_speech_prob, if I have not missed anything, currently, they cannot be obtained from whisper.cpp?

Indeed, we opt for entropy rather than gzip compression. Currently, I'm working on implementations related to the no_speech_prob

JacobLinCool · 2024-01-24T22:32:47Z

Currently, I'm working on implementations related to the no_speech_prob.

Wow! That sounds great! I believe it can definitely improve the use of whisper.cpp.

I think we can merge this pull request first and then either add a new issue for that or include some TODO comments in the code. What do you think?

JacobLinCool · 2024-01-29T20:39:25Z

I've added a form to the home page.

JacobLinCool · 2024-01-30T14:47:12Z

The problem may not related to the changes in this PR, which only changed the frontend page and the output format. The mechanism of file processing is not involved.

I think the problem is about the changes in #1781, which allows passing audio file content as file name. (I think that's why you got error: failed to open 'RIFF$�' as WAV file)

UniversalTechno · 2024-01-30T17:54:07Z

sorry it was compilation mistake from my side thanks

* ggerganov/master: (60 commits) sync : ggml (#0) ggml : fix IQ3_XXS on Metal (llama/5219) sync : ggml (llama/0) Faster AVX2 dot product for IQ2_XS (llama/5187) SOTA 3-bit quants (llama/5196) ggml alloc: Fix for null dereference on alloc failure (llama/5200) Nomic Vulkan backend (llama/4456) ggml : add max buffer sizes to opencl and metal backends (llama/5181) metal : free metal objects (llama/5161) gguf : fix comparison (ggml/715) `ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686) gguf : add input validation, prevent integer overflows (ggml/709) ci : fix yolo URLs + fix metal capture (ggml/712) metal : add debug capture backend function (ggml/694) common : fix wav buffer detection (ggerganov#1819) server : add fields to `verbose_json` response (ggerganov#1802) make : update MSYS_NT (ggerganov#1813) talk-llama : sync llama.cpp sync : ggml ggml : add Vulkan backend (llama/2059) ...

* server: include additional fields in the verbose_json response as OpenAI does * server: show request examples on home page * server: todo note for compression_ratio and no_speech_prob * server: add simple demo form to the homepage

JacobLinCool added 2 commits January 24, 2024 03:38

server: include additional fields in the verbose_json response as Ope…

7689841

…nAI does

server: show request examples on home page

ba13056

JacobLinCool added 2 commits January 30, 2024 04:14

server: todo note for compression_ratio and no_speech_prob

7161762

server: add simple demo form to the homepage

272506f

ggerganov approved these changes Jan 30, 2024

View reviewed changes

ggerganov merged commit baa30ba into ggerganov:master Jan 30, 2024
39 checks passed

JacobLinCool deleted the server-example-improvement branch January 30, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fields to `verbose_json` response and show examples on the home page #1802

Add fields to `verbose_json` response and show examples on the home page #1802

JacobLinCool commented Jan 23, 2024 •

edited

Loading

bobqianic commented Jan 24, 2024

JacobLinCool commented Jan 24, 2024

JacobLinCool commented Jan 29, 2024

JacobLinCool commented Jan 30, 2024

UniversalTechno commented Jan 30, 2024

Add fields to verbose_json response and show examples on the home page #1802

Add fields to verbose_json response and show examples on the home page #1802

Conversation

JacobLinCool commented Jan 23, 2024 • edited Loading

bobqianic commented Jan 24, 2024

JacobLinCool commented Jan 24, 2024

JacobLinCool commented Jan 29, 2024

JacobLinCool commented Jan 30, 2024

UniversalTechno commented Jan 30, 2024

Add fields to `verbose_json` response and show examples on the home page #1802

Add fields to `verbose_json` response and show examples on the home page #1802

JacobLinCool commented Jan 23, 2024 •

edited

Loading