From 424de80112a5b39dc87d2bd96637beb4164e658d Mon Sep 17 00:00:00 2001 From: Danijel Korzinek Date: Sun, 24 Mar 2019 17:07:07 +0100 Subject: [PATCH] Fixed a bug with erroneous description of port-num. Explained TCP and netcat issues in more detail. --- src/doc/online_decoding.dox | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/src/doc/online_decoding.dox b/src/doc/online_decoding.dox index dc04d9bef4e..9bcc2575be1 100644 --- a/src/doc/online_decoding.dox +++ b/src/doc/online_decoding.dox @@ -444,22 +444,25 @@ The program to run the TCP sever is online2-tcp-nnet3-decode-faster located in t ~/src/online2bin folder. The usage is as follows: \verbatim -online2-tcp-nnet3-decode-faster +online2-tcp-nnet3-decode-faster \endverbatim For example: \verbatim -online2-tcp-nnet3-decode-faster model/final.mdl graph/HCLG.fst graph/words.txt 5050 +online2-tcp-nnet3-decode-faster model/final.mdl graph/HCLG.fst graph/words.txt \endverbatim The word symbol table is mandatory (unlike other nnet3 online decoding programs) because the server outputs word strings. Endpointing is mandatory to make the operation of the program reasonable. Other, non-standard options include: + - port-num - the port the server listens on (by default 5050) - samp-freq - sampling frequency of audio (usually 8000 for telephony and 16000 for other uses) - chunk-length - length of signal being processed by decoder at each step - output-period - how often we check for changes in the decoding (ie. output refresh rate, default 1s) - num-threads-startup - number of threads used when initializing iVector extractor + - read-timeout - it the program doesn't receive data during this timeout, the server terminates the connection. + Use -1 to disable this feature. The TCP protocol simply takes RAW signal on input (16-bit signed integer encoding at chosen sampling frequency) and outputs simple text using the following @@ -479,9 +482,25 @@ command should look like this: \verbatim online2-tcp-nnet3-decode-faster --samp-freq=8000 --frames-per-chunk=20 --extra-left-context-initial=0 --frame-subsampling-factor=3 --config=model/conf/online.conf --min-active=200 --max-active=7000 - --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 model/final.mdl graph/HCLG.fst graph/words.txt 5050 + --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --port-num=5050 model/final.mdl graph/HCLG.fst graph/words.txt \endverbatim +Note in order to make the communication as simple as possible, the server has to accept +any data on input and cannot figure out when the stream is over. It will therefore not +be able to terminate the connection and it is the client's resposibility to disconnect +when it is ready to do so. As a fallback for certain situations, the read-timeout option +was added, which will automatically disconnect if a chosen amount of seconds has passed. +Keep in mind, that this is not an ideal solution and it's a better idea to design your +client to properly disconnect the connection when neccessary. + +For testing purposes, we will use the netcat program. We will also use sox to reeoncode the +files properly from any source. Netcat has an issue that, similarly to what was stated above +about the server, it cannot always interpret the data and usually it won't automatically +disconnect the TCP connection. To get around this, we will use the '-N' switch, which kills +the connection once streaming of the file is complete, but this can have a small sideffect of +not reading the whole output from the Kaldi server if the discconect comes too fast. Just +keep this in mind if you intend to implement any of these programs into a production environment. + To send a WAV file into the server, it first needs to be decoded into raw audio, then it can be sent to the socket: \verbatim