AudioPreProcessor
Average
Energy
FeatureMerger
FeatureNormalizer
FileFeeder
FileWriter
Java
MatrixApply
Merger
NoiseAdd
Python
Router
SoundcardPlayer
SoundcardRecorder
Submodule
Subsample
Component that takes audio and does the following operations on it: Zero-mean, Pre-emphasis, resampling.
This component supports both BinaryDecoderMessage as well as AudioDecoderMessage input in its "stream_audio" slot. The output slots are enumerated in the form of "streamed_audio_0
", "streamed_audio_1
" etc, up to the number specified in "max_out_channels
"
Parameter | Type | Description |
---|---|---|
max_out_channels | int | maximum number of output channels that need to be defined |
output_scale | float | Scale the output audio by this factor |
preemphasis_factor | float | Pre-emphasis factor (float, no-op value=1.0) |
target_sampling_rate | float | If required, resample audio to this rate |
zero_mean | bool | Zero-mean incoming waveform (true, false) |
Input slot | Message Type |
---|---|
streamed_audio | AudioDecoderMessage,BinaryDecoderMessage |
Output slot |
---|
audio_info |
Averages the incoming feature stream. Will issue the averaged features when seeing end-of-utterance
Parameter | Type | Description |
---|---|---|
apply_log | bool | Doing summation in log domain (true, false) |
Input slot | Message Type |
---|---|
features | FeaturesDecoderMessage |
Output slot |
---|
features |
Calculates energy features on incoming windowed audio (feature input message produced by Window component)
Input slot | Message Type |
---|---|
windowed_audio | FeaturesDecoderMessage |
Output slot |
---|
features |
Combines incoming feature and audio streams into one stream. Specify each incoming stream in the 'inputs' list with feature_stream_0, feature_stream_1 etc. The features will be merged in that order.
Parameter | Type | Description |
---|---|---|
num_streams | int | Number of incoming streams to merge |
Input slot | Message Type |
---|---|
feature_stream_[0-9] | FeaturesDecoderMessage,AudioDecoderMessage |
Output slot |
---|
features |
Normalizes a feature stream with various algorithms (covariance, diagonal etc)
This normalization component supports 4 different types of "normalization":
"covariance:" This builds up a proper covariance matrix (full, or only diagonal) and normalizes the features according to it. The normalization can be selectively be means normalization ("norm_means") and vars, i.e. covariance normalization ("norm_vars"). "decay_num_frames" introduces a "memory decay" through a simple exponential decay
"L2": Normalize each feature column vector with its L2 norm
"n_frame_average": Maybe not really "normalization", but it combines n frames (from "update_stats_every_n_frames") into one averaged output vector. So, it produces one output vector every n input vectors. "zero_pad_input_to_n_frames" specifies if for the case of to little data at the very end (i.e. < N frames), whether to zero-pad it to N frames, or rather averages over the frames we have.
"log_limiter": Not really quite sure what it does, it seems to normalize between max and min with some scaling factor. Look at the source code for more information.
Parameter | Type | Description |
---|---|---|
covariance_type | string | Full or only diagonal covariance normalization (full, diagonal) |
decay_num_frames | int64_t | Frame window size for 1/e dropoff of memory (set <= 0 for infinite memory) |
feature_size | int | incoming features size |
norm_means | bool | Normalize means |
norm_vars | bool | Normalize variances |
normalization_type | string | Normalization type (covariance, log_limiter, L2, n_frame_average) |
processing_mode | string | Whether to process in batch or low-latency mode (Batch, LowLatency) |
update_stats_every_n_frames | int64_t | In LowLatency mode, with what frequency to update the statistics. Lower value = lower latency, but also more CPU. |
zero_pad_input_to_n_frames | bool | Whether to zero-pad the input matrix to have the size of 'update_stats_every_n_frames' |
Input slot | Message Type |
---|---|
features | FeaturesDecoderMessage |
Output slot |
---|
features |
Component that feeds file content into the network. Can read analist, text, JSON and Numpy npz features
This is the component for providing your Godec graph with data for processing. The different feeding formats in details:
"analist": A text file describing line-by-line segments in audio file (support formats: WAV and NIST_1A, i.e. SPHERE). The general format is "<wave file base name without extension> -c <channel number, 1-based> -t <audio format> -f <sample start>-<sample end> -o <utterance ID> -spkr <speaker ID>" . "wave_dir" parameter specifies the direction of where to find the wave files, "wave_extension" the file extension. The "feed_realtime_factor" specifies how much faster than realtime it should feed the audio (higher value = faster)
"text": A simple line-by-line file with text in it. The component will feed one line at a time, as a BinaryDecoderMessage with timestamps according to how many words were in the line
"json": A single file containing a JSON array of items. Each item will be pushed as a JsonDecoderMessage
"numpy_npz": A Python Numpy npz file. The parameter "keys_list_file" specifies a file which contains a line-by-line list of npz+key combo, e.g. "my_feats.npz:a", which would extract key "a" from my_feats.npz. "feature_chunk_size" sets the size of the feature chunks to be pushed
The "control_type" parameter describes whether the FileFeeder will just work off one single configuration on startup, i.e. batch processing ("single_on_startup"), or whether it should receive these configurations via an external slot ("external") that pushes the exact same component JSON configuration from the outside via the Java API. The latter is essentially for "dynamic batch processing" where the FileFeeder gets pointed to new data dynamically.
Parameter | Type | Description |
---|---|---|
audio_chunk_size | int | Audio size in samples for each chunk |
control_type | string | Where this FileFeeder gets its source data from: single-shot feeding on startup ('single_on_startup'), or as JSON input from an input stream ('external') |
feature_chunk_size | int | Size in frames of each pushed chunk |
feed_realtime_factor | float | Controls how fast the audio is pushed. A value of 1.0 simulates soundcard reading of audio (i.e. pushing a 1-second chunk takes 1 second), a higher value pushes faster. Use 100000 for batch pushing |
input_file | string | Input file |
keys_list_file | string | File containing a list (line by line) of keys that are contained in the npz file and are then fed in that order |
source_type | string | File type of source file (analist, text, numpy_npz, json) |
time_upsample_factor | int | Factor by which the internal time stamps are increased. This is to prevent multiple subunits having the same time stamp. |
wave_dir | string | Audio waves directory |
wave_extension | string | Wave file extension |
Input slot | Message Type |
---|---|
control | JsonDecoderMessage |
Output slot |
---|
conversation_state |
output_stream |
Component that writes stream to file
The writing equivalent to the FileFeeder component, for saving output. Available "input_type":
"audio": For writing AudioDecoderMessage messages. "output_file_prefix" specifies the path prefix that each utterance gets written to. The incoming audio are float values expected to be normalied to -1.0/1.0 range
"raw_text": BinaryDecoderMessage expected that gets converted into text and written into "output_file"
"json": Expects JsonDecoderMessage as input, concatenates the JSONs into the "output_file"
"features": FeatureDecoderMessage as input, output file is a Numpy NPZ file, with the utterance IDs as keys
Like the FileFeeder, the "control_type" specifies whether this is a one-shot run that goes straight off the JSON parameters ("single_on_startup"), or whether it receives these JSON parameters through an external channel ("external"). When in the external mode, it also requires the ConversationState stream from the FileFeeder that produced the content
Parameter | Type | Description |
---|---|---|
control_type | string | Where this FileWriter gets its output configuration from: single-shot on startup ('single_on_startup'), or as JSON input from an input stream ('external') |
input_type | string | Input stream type (audio, raw_text, features, json) |
json_output_format | string | Output format for json (raw_json, ctm, fst_search, mt) |
npz_file | string | Output Numpy npz file name |
output_file | string | Json output file path |
output_file_prefix | string | Output audio file path prefix |
sample_depth | int | wave file sample depth (8,16,32) |
Input slot | Message Type |
---|---|
control | JsonDecoderMessage |
file_feeder_conversation_state | ConversationStateDecoderMessage |
input_stream | AnyDecoderMessage |
streamed_audio | AudioDecoderMessage |
Loads an arbitrary JAR, allocates a class object and calls ProcessMessage() on it.
This component can load any Java code inside a JAR file. It expects a class to be defined ("class_name") and will instantiate the class constructor with a String that contains the "class_constructor_param" parameter (set this to whatever you need to initialize your instance in a specific manner). The parameters "expected_inputs" and "expected_outputs" configure the input and output streams of this component, and in turn also what gets passed into the the Java ProcessMessage() function.
The signature of ProcessMessage is
HashMap<String,DecoderMessage> ProcessMessage(HashMap<String, DecoderMessage>)
The input HashMap's keys correspond to the input streams, the output is expected to contain the necessary output streams of the component.
See java/com/bbn/godec/regression/TextTransform.java
and test/jni_test.json
for an example.
Parameter | Type | Description |
---|---|---|
class_constructor_param | string | string parameter passed into constructor |
class_name | string | Full-qualified class name, e.g. java/util/ArrayList |
expected_inputs | string | comma-separated list of expected input slots |
expected_outputs | string | comma-separated list of expected output slots |
Input slot | Message Type |
---|---|
<slots from 'expected_inputs'> | AnyDecoderMessage |
Output slot |
---|
Slots From 'expected_outputs' |
Applies a matrix (from a stream) to an incoming feature stream
Parameter | Type | Description |
---|---|---|
augment_features | bool | Whether to add a row of 1.0 elements to the bottom of the incoming feature vector (to enable all affine transforms) |
matrix_npy | string | matrix file name. Matrix should be plain Numpy 'npy' format |
matrix_source | string | Where the matrix comes from: From a file ('file') or pushed in through an input stream ('stream') |
Input slot | Message Type |
---|---|
features | FeaturesDecoderMessage |
matrix | MatrixDecoderMessage |
Output slot |
---|
transformed_features |
Merges streams that were created by Router. Streams are specified as input_stream_0, input_stream_1 etc, up to "num_streams". Time map is the one created by the Router
Refer to the Router extended description to a more detailed description. This component expects all conversation states from the upstream Router, as well as all the processed streams
Parameter | Type | Description |
---|---|---|
num_streams | int | Number of streams that are merged |
Input slot | Message Type |
---|---|
conversation_state_[0-9] | ConverstionStateDecoderMessage |
input_streams_[0-9] | AnyDecoderMessage |
Output slot |
---|
output_stream |
Adds noise to audio stream
Parameter | Type | Description |
---|---|---|
noise_scaling_factor | float | Scale factor, in relation to incoming audio, of noise. 1.0 = 0dB noise, lower is quieter |
Input slot | Message Type |
---|---|
streamed_audio | AudioDecoderMessage |
Output slot |
---|
streamed_audio |
Calls a Python script
Just like the Java component, this allows for calling an arbitrary Python script specified in "script_file_name" (omit the .py ending and any preceding path, those go into "python_path") and doing some processing inside it. The Python script needs to define a class with the "class_name" name, and the component will instantiate an instance with the "class_constructor_param" string as the only parameter. "python_executable" points to the Python executable to use, "python_path" to the "PYTHONPATH" values to set so Python finds all dependent libraries.
The input is a dict with the specified input streams as key, and the value whatever the message is (currently only FeatureDecoderMessage type is implemented, which is are Numpy matrices), the output (i.e. the return value) should be in the same format.
Look at test/python_test.json
for an example.
Parameter | Type | Description |
---|---|---|
class_constructor_param | string | string parameter passed into class constructor |
class_name | string | The name of the class that contains the ProcessMessage function |
expected_inputs | string | comma-separated list of expected input slots |
expected_outputs | string | comma-separated list of expected output slots |
python_executable | string | Python executable to use |
python_path | string | PYTHONPATH to set (cwd and script folder get added automatically) |
script_file_name | string | Python script file name |
Input slot | Message Type |
---|---|
<slots from 'expected_inputs'> | AnyDecoderMessage |
Output slot |
---|
Slots From 'expected_outputs' |
Splits an incoming stream into separate streams, based on the binary decision of another input stream
The router component can be used to split streams across several branches (e.g. if you have a language detector upstream and want to direct the streams to the language-specific component). The mechanism is very easy, the Router create N output conversation state message streams, each of which has the "ignore_data" set to true or false, depending on the routing at that point. The branches' components can check this flag and do some optimization (while still emitting empty messages that account for the stream time). So, all branch components see all the data. The Merger component can be used downstream to merge the output together (it chooses the right results from each stream according to those conversation state messages.
There are two modes the Router decides where to route:
"sad_nbest": "routing_stream" is expected to be an NbestDecoderMessage, where the 0-th nbest entry is expected to contain a sequence of 0 (nonspeech) or 1 (speech), which the router will use to route the stream
"utterance_round_robin": Simple round robin on an utterance-by-utterance basis
Parameter | Type | Description |
---|---|---|
num_outputs | int | Number of outputs to distribute to |
router_type | string | Type of routing. Valid values: 'sad_nbest', 'utterance_round_robin' |
Input slot | Message Type |
---|---|
routing_stream | AnyDecoderMessage |
Output slot |
---|
Slot: conversation_state_[0-9] |
Opens sounds card, plays incoming audio data
Parameter | Type | Description |
---|---|---|
num_channels | int | Number of channels (1=mono, 2=stereo) |
sample_depth | int | Sample depth (8,16,24 etc bit) |
sampling_rate | float | Sampling rate |
soundcard_identifier | string | Soundcard identifier on Linux, use 'aplay -L' for list |
Input slot | Message Type |
---|---|
streamed_audio | AudioDecoderMessage |
Opens sounds card, records audio and pushes it as audio stream
The "start_on_boot" parameter, when set to "false", will require the "control" stream to be specified. That stream expects a ConversationStateDecoderMessage. If the stream is within an utterance (according to the ConversationState) the soundcard will output recorded audio, if it sees the end-of-utterance it stops, and so on.
Parameter | Type | Description |
---|---|---|
chunk_size | int | Chunk size in samples (set to -1 for minimum latency configuratio) |
num_channels | int | Number of channels (1=mono, 2=stereo) |
sample_depth | int | Sample depth (8,16,24 etc bit) |
sampling_rate | float | Sampling rate |
soundcard_identifier | string | Soundcard identifier on Linux, use 'arecord -L' for list |
start_on_boot | bool | Whether audio should be pushed immediately |
time_upsample_factor | int | Factor by which the internal time stamps are increased. This is to prevent multiple subunits having the same time stamp. |
Input slot | Message Type |
---|---|
control | ConversationStateDecoderMessage |
Output slot |
---|
conversation_state |
streamed_audio |
The Godec equivalent of a 'function call'
This component allows for a "Godec within a Godec". You specify the JSON file for that graph with the "file" parameter, and the entire sub-graph gets treated as a single component. Great for building reusable libraries.
For a detailed discussion, see here, section "Submodules".
Parameter | Type | Description |
---|---|---|
file | string | The json for this subnetwork |
Input slot | Message Type |
---|---|
name of slot inside sub-network that will be injected | AnyDecoderMessage |
Output slot |
---|
Slot: name of stream inside sub-network to be pulled out |
Godec equivalent of subsample-feats executable, for subsampling/repeating features
Parameter | Type | Description |
---|---|---|
num_skip_frames | int | Only emit every <num_skip_frames> frames. Set to negative to also repeat the emitted frame <num_skip_frames> (resulting in the same number of total frames) |
Input slot | Message Type |
---|---|
features | FeaturesDecoderMessage |
Output slot |
---|
features |