Godec core component list

AudioPreProcessor
Average
Energy
FeatureMerger
FeatureNormalizer
FileFeeder
FileWriter
Java
MatrixApply
Merger
NoiseAdd
Python
Router
SoundcardPlayer
SoundcardRecorder
Submodule
Subsample

AudioPreProcessor

Short description:

Component that takes audio and does the following operations on it: Zero-mean, Pre-emphasis, resampling.

Extended description:

This component supports both BinaryDecoderMessage as well as AudioDecoderMessage input in its "stream_audio" slot. The output slots are enumerated in the form of "streamed_audio_0", "streamed_audio_1" etc, up to the number specified in "max_out_channels"

Parameters

Parameter	Type	Description
max_out_channels	int	maximum number of output channels that need to be defined
output_scale	float	Scale the output audio by this factor
preemphasis_factor	float	Pre-emphasis factor (float, no-op value=1.0)
target_sampling_rate	float	If required, resample audio to this rate
zero_mean	bool	Zero-mean incoming waveform (true, false)

Inputs

Input slot	Message Type
streamed_audio	AudioDecoderMessage,BinaryDecoderMessage

Outputs

Output slot
audio_info

Average

Short description:

Averages the incoming feature stream. Will issue the averaged features when seeing end-of-utterance

Parameters

Parameter	Type	Description
apply_log	bool	Doing summation in log domain (true, false)

Inputs

Input slot	Message Type
features	FeaturesDecoderMessage

Outputs

Output slot
features

Energy

Short description:

Calculates energy features on incoming windowed audio (feature input message produced by Window component)

Inputs

Input slot	Message Type
windowed_audio	FeaturesDecoderMessage

Outputs

Output slot
features

FeatureMerger

Short description:

Combines incoming feature and audio streams into one stream. Specify each incoming stream in the 'inputs' list with feature_stream_0, feature_stream_1 etc. The features will be merged in that order.

Parameters

Parameter	Type	Description
num_streams	int	Number of incoming streams to merge

Inputs

Input slot	Message Type
feature_stream_[0-9]	FeaturesDecoderMessage,AudioDecoderMessage

Outputs

Output slot
features

FeatureNormalizer

Short description:

Normalizes a feature stream with various algorithms (covariance, diagonal etc)

Extended description:

This normalization component supports 4 different types of "normalization":

"covariance:" This builds up a proper covariance matrix (full, or only diagonal) and normalizes the features according to it. The normalization can be selectively be means normalization ("norm_means") and vars, i.e. covariance normalization ("norm_vars"). "decay_num_frames" introduces a "memory decay" through a simple exponential decay

"L2": Normalize each feature column vector with its L2 norm

"n_frame_average": Maybe not really "normalization", but it combines n frames (from "update_stats_every_n_frames") into one averaged output vector. So, it produces one output vector every n input vectors. "zero_pad_input_to_n_frames" specifies if for the case of to little data at the very end (i.e. < N frames), whether to zero-pad it to N frames, or rather averages over the frames we have.

"log_limiter": Not really quite sure what it does, it seems to normalize between max and min with some scaling factor. Look at the source code for more information.

Parameters

Parameter	Type	Description
covariance_type	string	Full or only diagonal covariance normalization (full, diagonal)
decay_num_frames	int64_t	Frame window size for 1/e dropoff of memory (set <= 0 for infinite memory)
feature_size	int	incoming features size
norm_means	bool	Normalize means
norm_vars	bool	Normalize variances
normalization_type	string	Normalization type (covariance, log_limiter, L2, n_frame_average)
processing_mode	string	Whether to process in batch or low-latency mode (Batch, LowLatency)
update_stats_every_n_frames	int64_t	In LowLatency mode, with what frequency to update the statistics. Lower value = lower latency, but also more CPU.
zero_pad_input_to_n_frames	bool	Whether to zero-pad the input matrix to have the size of 'update_stats_every_n_frames'

Inputs

Input slot	Message Type
features	FeaturesDecoderMessage

Outputs

Output slot
features

FileFeeder

Short description:

Component that feeds file content into the network. Can read analist, text, JSON and Numpy npz features

Extended description:

This is the component for providing your Godec graph with data for processing. The different feeding formats in details:

"analist": A text file describing line-by-line segments in audio file (support formats: WAV and NIST_1A, i.e. SPHERE). The general format is "<wave file base name without extension> -c <channel number, 1-based> -t <audio format> -f <sample start>-<sample end> -o <utterance ID> -spkr <speaker ID>" . "wave_dir" parameter specifies the direction of where to find the wave files, "wave_extension" the file extension. The "feed_realtime_factor" specifies how much faster than realtime it should feed the audio (higher value = faster)

"text": A simple line-by-line file with text in it. The component will feed one line at a time, as a BinaryDecoderMessage with timestamps according to how many words were in the line

"json": A single file containing a JSON array of items. Each item will be pushed as a JsonDecoderMessage

"numpy_npz": A Python Numpy npz file. The parameter "keys_list_file" specifies a file which contains a line-by-line list of npz+key combo, e.g. "my_feats.npz:a", which would extract key "a" from my_feats.npz. "feature_chunk_size" sets the size of the feature chunks to be pushed

The "control_type" parameter describes whether the FileFeeder will just work off one single configuration on startup, i.e. batch processing ("single_on_startup"), or whether it should receive these configurations via an external slot ("external") that pushes the exact same component JSON configuration from the outside via the Java API. The latter is essentially for "dynamic batch processing" where the FileFeeder gets pointed to new data dynamically.

Parameters

Parameter	Type	Description
audio_chunk_size	int	Audio size in samples for each chunk
control_type	string	Where this FileFeeder gets its source data from: single-shot feeding on startup ('single_on_startup'), or as JSON input from an input stream ('external')
feature_chunk_size	int	Size in frames of each pushed chunk
feed_realtime_factor	float	Controls how fast the audio is pushed. A value of 1.0 simulates soundcard reading of audio (i.e. pushing a 1-second chunk takes 1 second), a higher value pushes faster. Use 100000 for batch pushing
input_file	string	Input file
keys_list_file	string	File containing a list (line by line) of keys that are contained in the npz file and are then fed in that order
source_type	string	File type of source file (analist, text, numpy_npz, json)
time_upsample_factor	int	Factor by which the internal time stamps are increased. This is to prevent multiple subunits having the same time stamp.
wave_dir	string	Audio waves directory
wave_extension	string	Wave file extension

Inputs

Input slot	Message Type
control	JsonDecoderMessage

Outputs

Output slot
conversation_state
output_stream

FileWriter

Short description:

Component that writes stream to file

Extended description:

The writing equivalent to the FileFeeder component, for saving output. Available "input_type":

"audio": For writing AudioDecoderMessage messages. "output_file_prefix" specifies the path prefix that each utterance gets written to. The incoming audio are float values expected to be normalied to -1.0/1.0 range

"raw_text": BinaryDecoderMessage expected that gets converted into text and written into "output_file"

"json": Expects JsonDecoderMessage as input, concatenates the JSONs into the "output_file"

"features": FeatureDecoderMessage as input, output file is a Numpy NPZ file, with the utterance IDs as keys

Like the FileFeeder, the "control_type" specifies whether this is a one-shot run that goes straight off the JSON parameters ("single_on_startup"), or whether it receives these JSON parameters through an external channel ("external"). When in the external mode, it also requires the ConversationState stream from the FileFeeder that produced the content

Parameters

Parameter	Type	Description
control_type	string	Where this FileWriter gets its output configuration from: single-shot on startup ('single_on_startup'), or as JSON input from an input stream ('external')
input_type	string	Input stream type (audio, raw_text, features, json)
json_output_format	string	Output format for json (raw_json, ctm, fst_search, mt)
npz_file	string	Output Numpy npz file name
output_file	string	Json output file path
output_file_prefix	string	Output audio file path prefix
sample_depth	int	wave file sample depth (8,16,32)

Inputs

Input slot	Message Type
control	JsonDecoderMessage
file_feeder_conversation_state	ConversationStateDecoderMessage
input_stream	AnyDecoderMessage
streamed_audio	AudioDecoderMessage

Java

Short description:

Loads an arbitrary JAR, allocates a class object and calls ProcessMessage() on it.

Extended description:

This component can load any Java code inside a JAR file. It expects a class to be defined ("class_name") and will instantiate the class constructor with a String that contains the "class_constructor_param" parameter (set this to whatever you need to initialize your instance in a specific manner). The parameters "expected_inputs" and "expected_outputs" configure the input and output streams of this component, and in turn also what gets passed into the the Java ProcessMessage() function.

The signature of ProcessMessage is

HashMap<String,DecoderMessage> ProcessMessage(HashMap<String, DecoderMessage>)

The input HashMap's keys correspond to the input streams, the output is expected to contain the necessary output streams of the component.

See java/com/bbn/godec/regression/TextTransform.java and test/jni_test.json for an example.

Parameters

Parameter	Type	Description
class_constructor_param	string	string parameter passed into constructor
class_name	string	Full-qualified class name, e.g. java/util/ArrayList
expected_inputs	string	comma-separated list of expected input slots
expected_outputs	string	comma-separated list of expected output slots

Inputs

Input slot	Message Type
<slots from 'expected_inputs'>	AnyDecoderMessage

Outputs

Output slot
Slots From 'expected_outputs'

MatrixApply

Short description:

Applies a matrix (from a stream) to an incoming feature stream

Parameters

Parameter	Type	Description
augment_features	bool	Whether to add a row of 1.0 elements to the bottom of the incoming feature vector (to enable all affine transforms)
matrix_npy	string	matrix file name. Matrix should be plain Numpy 'npy' format
matrix_source	string	Where the matrix comes from: From a file ('file') or pushed in through an input stream ('stream')

Inputs

Input slot	Message Type
features	FeaturesDecoderMessage
matrix	MatrixDecoderMessage

Outputs

Output slot
transformed_features

Merger

Short description:

Merges streams that were created by Router. Streams are specified as input_stream_0, input_stream_1 etc, up to "num_streams". Time map is the one created by the Router

Extended description:

Refer to the Router extended description to a more detailed description. This component expects all conversation states from the upstream Router, as well as all the processed streams

Parameters

Parameter	Type	Description
num_streams	int	Number of streams that are merged

Inputs

Input slot	Message Type
conversation_state_[0-9]	ConverstionStateDecoderMessage
input_streams_[0-9]	AnyDecoderMessage

Outputs

Output slot
output_stream

NoiseAdd

Short description:

Adds noise to audio stream

Parameters

Parameter	Type	Description
noise_scaling_factor	float	Scale factor, in relation to incoming audio, of noise. 1.0 = 0dB noise, lower is quieter

Inputs

Input slot	Message Type
streamed_audio	AudioDecoderMessage

Outputs

Output slot
streamed_audio

Python

Short description:

Calls a Python script

Extended description:

Just like the Java component, this allows for calling an arbitrary Python script specified in "script_file_name" (omit the .py ending and any preceding path, those go into "python_path") and doing some processing inside it. The Python script needs to define a class with the "class_name" name, and the component will instantiate an instance with the "class_constructor_param" string as the only parameter. "python_executable" points to the Python executable to use, "python_path" to the "PYTHONPATH" values to set so Python finds all dependent libraries.

The input is a dict with the specified input streams as key, and the value whatever the message is (currently only FeatureDecoderMessage type is implemented, which is are Numpy matrices), the output (i.e. the return value) should be in the same format.

Look at test/python_test.json for an example.

Parameters

Parameter	Type	Description
class_constructor_param	string	string parameter passed into class constructor
class_name	string	The name of the class that contains the ProcessMessage function
expected_inputs	string	comma-separated list of expected input slots
expected_outputs	string	comma-separated list of expected output slots
python_executable	string	Python executable to use
python_path	string	PYTHONPATH to set (cwd and script folder get added automatically)
script_file_name	string	Python script file name

Inputs

Input slot	Message Type
<slots from 'expected_inputs'>	AnyDecoderMessage

Outputs

Output slot
Slots From 'expected_outputs'

Router

Short description:

Splits an incoming stream into separate streams, based on the binary decision of another input stream

Extended description:

The router component can be used to split streams across several branches (e.g. if you have a language detector upstream and want to direct the streams to the language-specific component). The mechanism is very easy, the Router create N output conversation state message streams, each of which has the "ignore_data" set to true or false, depending on the routing at that point. The branches' components can check this flag and do some optimization (while still emitting empty messages that account for the stream time). So, all branch components see all the data. The Merger component can be used downstream to merge the output together (it chooses the right results from each stream according to those conversation state messages.

There are two modes the Router decides where to route:

"sad_nbest": "routing_stream" is expected to be an NbestDecoderMessage, where the 0-th nbest entry is expected to contain a sequence of 0 (nonspeech) or 1 (speech), which the router will use to route the stream

"utterance_round_robin": Simple round robin on an utterance-by-utterance basis

Parameters

Parameter	Type	Description
num_outputs	int	Number of outputs to distribute to
router_type	string	Type of routing. Valid values: 'sad_nbest', 'utterance_round_robin'

Inputs

Input slot	Message Type
routing_stream	AnyDecoderMessage

Outputs

Output slot
Slot: conversation_state_[0-9]

SoundcardPlayer

Short description:

Opens sounds card, plays incoming audio data

Parameters

Parameter	Type	Description
num_channels	int	Number of channels (1=mono, 2=stereo)
sample_depth	int	Sample depth (8,16,24 etc bit)
sampling_rate	float	Sampling rate
soundcard_identifier	string	Soundcard identifier on Linux, use 'aplay -L' for list

Inputs

Input slot	Message Type
streamed_audio	AudioDecoderMessage

SoundcardRecorder

Short description:

Opens sounds card, records audio and pushes it as audio stream

Extended description:

The "start_on_boot" parameter, when set to "false", will require the "control" stream to be specified. That stream expects a ConversationStateDecoderMessage. If the stream is within an utterance (according to the ConversationState) the soundcard will output recorded audio, if it sees the end-of-utterance it stops, and so on.

Parameters

Parameter	Type	Description
chunk_size	int	Chunk size in samples (set to -1 for minimum latency configuratio)
num_channels	int	Number of channels (1=mono, 2=stereo)
sample_depth	int	Sample depth (8,16,24 etc bit)
sampling_rate	float	Sampling rate
soundcard_identifier	string	Soundcard identifier on Linux, use 'arecord -L' for list
start_on_boot	bool	Whether audio should be pushed immediately
time_upsample_factor	int	Factor by which the internal time stamps are increased. This is to prevent multiple subunits having the same time stamp.

Inputs

Input slot	Message Type
control	ConversationStateDecoderMessage

Outputs

Output slot
conversation_state
streamed_audio

Submodule

Short description:

The Godec equivalent of a 'function call'

Extended description:

This component allows for a "Godec within a Godec". You specify the JSON file for that graph with the "file" parameter, and the entire sub-graph gets treated as a single component. Great for building reusable libraries.

For a detailed discussion, see here, section "Submodules".

Parameters

Parameter	Type	Description
file	string	The json for this subnetwork

Inputs

Input slot	Message Type
name of slot inside sub-network that will be injected	AnyDecoderMessage

Outputs

Output slot
Slot: name of stream inside sub-network to be pulled out

Subsample

Short description:

Godec equivalent of subsample-feats executable, for subsampling/repeating features

Parameters

Parameter	Type	Description
num_skip_frames	int	Only emit every <num_skip_frames> frames. Set to negative to also repeat the emitted frame <num_skip_frames> (resulting in the same number of total frames)

Inputs

Input slot	Message Type
features	FeaturesDecoderMessage

Outputs

Output slot
features

Files

CoreComponents.md

Latest commit

History

CoreComponents.md

File metadata and controls

Godec core component list

AudioPreProcessor

Short description:

Extended description:

Parameters

Inputs

Outputs

Average

Short description:

Parameters

Inputs

Outputs

Energy

Short description:

Inputs

Outputs

FeatureMerger

Short description:

Parameters

Inputs

Outputs

FeatureNormalizer

Short description:

Extended description:

Parameters

Inputs

Outputs

FileFeeder

Short description:

Extended description:

Parameters

Inputs

Outputs

FileWriter

Short description:

Extended description:

Parameters

Inputs

Java

Short description:

Extended description:

Parameters

Inputs

Outputs

MatrixApply

Short description:

Parameters

Inputs

Outputs

Merger

Short description:

Extended description:

Parameters

Inputs

Outputs

NoiseAdd

Short description:

Parameters

Inputs

Outputs

Python

Short description:

Extended description:

Parameters

Inputs

Outputs

Router

Short description:

Extended description:

Parameters

Inputs

Outputs

SoundcardPlayer

Short description: