Skip to content

26-Point MFCC & 512-Point FFT Generator & Visualizer in Java, C++, and NEON intrinsics

License

Notifications You must be signed in to change notification settings

ShoYamanishi/AndroidMFCC

Repository files navigation

Real-time 26-Point MFCC & 512-Point Radix-2 FFT Generator & Visualizer on Android in Java, C++ and NEON Intrinsics

Performance Comparison of 3 Implementations

The following table and figure shows the average time in seconds observed for MFCC generation per 400-sample frame.

  • Java : All written in Java code

  • C++ : All written in Native C++ with JNI interface.

  • C++ & NEON/SSE : Written in C++ with 4-lane ArmV7 NEON/SSE SIMD intrinsics for Hamming, FFT, and DCT.

Conditions

  • Galaxy S9 -O0 : Galaxy S9 with 8-Core Snapdragon 845 with C++ compiler optimization level 0

  • Galaxy S9 -O3 : Galaxy S9 with 8-Core Snapdragon 845 with C++ compiler optimization level 3

  • Emulator (X86) -O0 : Android Emulator on a Host PC with 4-Core emulation with C++ compiler optimization level 0

  • Emulator (X86) -O3 : Android Emulator on a Host PC with 4-Core emulation with C++ compiler optimization level 3

The numbers are all in seconds.

Tables Java C++ C++ NEON SSE
Galaxy S9 -O0 0.0016 0.0015 0.0011
Galaxy S9 -O3 0.0016 0.00050 0.00034
Emulator(X86) -O0 0.00020 0.00012 0.00012
Emulator(X86) -O3 0.00020 0.000049 0.000047

Remarks

The Java implementation works surprisingly well. On the test target, it takes 2[ms] to process one frame. Assuming one of the core is availalble all the time, the realtime factor is greater than 5.

The author is not capable of further tuning with assembler beyond the intrinsics, but further performance improvement by CPU-specific assembler-level optimization may be possible.

Install

  1. Download the contents and open with AndroidStudio.

  2. Copy NEON_2_SSE.h from https://github.com/intel/ARM_NEON_2_x86_SSE into app/src/main/cpp/.

  3. Build.

It was tested with the following environment.

  • Android Studio 3.5.1

  • Min SDK Version 21

  • Virtual device API29, Android 10.0, x86

If it does not work, check the App permission for Mic on the device. Also, try chanding RECORDING_RATE in AudioReceiver.

Description

Originally motivated to measure the real-time performance of audio signal processing on Android devices. This is a study implementation as a bench-mark for a Native C++ implematation with 4-lane ARM NEON SIMD intrinsics.

  • Audio input 16KHz monaural linear PCM taken from AudioRecorder

  • Frame size 400 samples (25[ms]), Frame shift 160 samples (10[ms])

  • Pre-emphasis (tap 0.96)

  • Hamming window per frame

  • 512-Point Radix-2 Cooley-Tukey recursive FFT

  • Mel Filterbank, 26 banks, top 8KHz, bottom 300Hz, with flooring at 1.0

  • DCT into 26-point MFCC [quefrency] with DC.

Spectrum Visualization for Fun

The upper part is the 26-point MFCC. The lower part is the 256-point spectrum taken from 512-point FFT. Plese click the thumbnails to enlarge.

Code

Signal Processing in Java

  • HammingWindowJava: Pre-emphasis & Hamming for a 400-sample frame
  • FFT512Java: 512-point Radix-2 Cooley-Tukey recursive FFT with pre-calculated Twiddle table
  • MelFilterBanksJava: Generates MelFilterBanks log energy coefficients with Bins and precalculated table.
  • DCTJava: 26-point DCT with a pre-calculated table.

Signal Processing in C++

All the parts related to NEON intrinsics are enclosed by #ifdef HAVE_NEON ... #endif.

  • mfcc_impl01.cpp: This file contains the following classes and some JNI glue code.

    • class HamminwWindow : Pre-emphasis & Hamming for a 400-sample frame. It utiizes NEON for the float mult loop.

    • class FFT512 : 512-point Radix-2 Cooley-Tukey recursive FFT with pre-calculated Twiddle table. It utlizes NEON for the even-odd splitting and the butterfly calculations.

    • class MelFilterBanks : Generates MelFilterBanks log energy coefficients with Bins and precalculated table. It does not utilize NEON.

    • class DCT : 26-point DCT with a pre-calculated table. It utilizes NEON in the inner-loop of mult-add.

Visualization

Others

  • AudioReceiver: receives audio with android.media.AudioRecorder in chunks in realtime.

  • AudioChunkAggregator: arranges the audio data into 400[ms] frames with 10[ms] frame shift.

Dependencies

  • cpu_features : linked to the binary to obtain processor info for convenience. Apache License.

  • NEON_2_SSE : used to to compile NEON intrinsics for X86 Android Emulator. It converts ARM NEON intrinsics to equivalents in Intel SSE. Intel's own license but basically distributable retaining the original copyright notice.

References

  • J. S. Bridle and M. D. Brown (1974), "An Experimental Automatic Word-Recognition System", JSRU Report No. 1003, Joint Speech Research Unit, Ruislip, England.

  • "Digital signal processing" by Proakis, Manolakis 4th edition Chap 8: Efficient Computation of the DFT: Fast Fourier Transform

  • Mel Frequency Cepstral Coefficient (MFCC) tutorial : Nice tutorial.

  • libmfcc : C-implementation.

  • MFCC.cpp : another nice C-implemetation

Contact

For technical and commercial inquiries, please contact: Shoichiro Yamanishi

[email protected]

About

26-Point MFCC & 512-Point FFT Generator & Visualizer in Java, C++, and NEON intrinsics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published