Do not rely on the originating times of messages from the SystemSpeechRecognizer and MicrosoftSpeechRecognizer components to be precise with respect to the input audio stream.
Workaround:
If such precision is required, align the bytes in the StreamingSpeechRecognitionResult.Audio
property of the output message with the raw input audio to locate the corresponding utterance within the input audio stream.