Skip to content
/ Olaf Public
forked from JorenSix/Olaf

Olaf: Overly Lightweight Acoustic Fingerprinting is a portable acoustic fingerprinting system.

License

Notifications You must be signed in to change notification settings

prj-2501/Olaf

 
 

Repository files navigation

OLAF – Overly Lightweight Acoustic Fingerprinting

Olaf is a C application / library for landmark based acoustic fingerprinting. Olaf is able to extract fingerprints from an audio stream, and either store those fingerprints in a database, or find a match between extracted fingerprints and stored fingerprints. Olaf does this efficiently in order to be used on embedded platforms, traditional computers or in web browsers via WASM.

Please be aware of the patents US7627477 B2 and US6990453 and perhaps others. They describe techniques used in algorithms implemented within Olaf. These patents limit the use of Olaf under various conditions and for several regions. Please make sure to consult your intellectual property rights specialist if you are in doubt about these restrictions. If these restrictions apply, please respect the patent holders rights. The main aim of Olaf is to serve as a learning platform on efficient (embedded) acoustic fingerprinting algorithms.

Why Olaf?

Olaf stands out for three reasons. Olaf runs on embedded devices. Olaf is fast on traditional computers. Olaf runs in the browsers.

There seem to be no lightweight acoustic fingerprinting libraries that are straightforward to run on embedded platforms. On embedded platforms memory and computational resources are severely limited. Olaf is written in portable C with these restrictions in mind. Olaf mainly targets 32-bit ARM devices such as some Teensy’s, some Arduino’s and the ESP32. Other modern embedded platforms with similar specifications and might work as well.

Olaf, being written in portable C, operates also on traditional computers. There, the efficiency of Olaf makes it run fast. On embedded devices reference fingerprints are stored in memory. On traditional computers fingerprints are stored in a high-performance key-value-store: LMDB. LMDB offers an a B+-tree based persistent storage ideal for small keys and values with low storage overhead.

Olaf works in the browser. Via Emscripten Olaf can be compiled to WASM. This makes it relatively straightforward to combine the capabilities of the Web Audio API and Olaf to create browser based audio fingerprinting applications.

Olaf in the browser
Olaf running in a browser

Olaf was featured on hackaday. There is also a small discussion about Olaf on Hacker News.

Olaf requirements

To use Olaf ffmpeg and ruby need to be installed on your system. While the core of Olaf is in pure c, a Ruby script provides an easy to use interface to its capabilities. The Ruby script converts audio (with ffmpeg), parses command line arguments and reports results in a readable format.

To install ffmpeg and ruby on a Debian like system:

apt-get install ffmpeg ruby

On macOS ruby is available by default and ffmpeg can be installed with homebrew

brew install ffmpeg 

Compilation and installation of Olaf

To compile the version with a key value store for traditional computers use the following. By default the makefile uses gcc set to the C11 standard. It should be possible to use other compilers compliant with the C11 standard as well. So either make sure gcc is installed correctly or modify the Makefile for your compiler of choice. Compilation and installation:

make
make install

To compile the in memory version e.g. for use on embedded devices. The embedded version basically swaps out the key-value store with an array and the functions to find matches within an array. The same remark about compilers as above holds.

make mem

To compile Olaf to WASM the emcc compiler from Emcripten is used. Make sure Emscripten is correctly installed and run the following:

make web

python3 -m http.server --directory wasm #start a web browser
open "http://localhost:8000/spectrogram.html" #open the url in standard browser

By default the Olaf web version matches with fingerprints defined in a header file.

Compilation with Zig

Alternatively you can build Olaf with zig. Zig is a programming language which also ships with a c compiler. In this project Zig is employed as an easy to use cross compiler.

A zig.build file is provided. Just call “zig build” to compile for your platform. On an M1 mac it gives the following:

zig build
file zig-out/bin/olaf_c # Mach-O 64-bit executable arm64

The power of Zig is, however, more obvious when an other target is provided. For a list of platforms, please consult the output of zig targets. To build a windows executable run the following commands. Afterwards, pre-built windows executable can be found in pre-built

zig build -Dtarget=x86_64-windows-gnu -Drelease-small
file zig-out/bin/olaf_c.exe # PE32+ executable (console) x86-64, for MS Windows

Zig allso supports WebAssembly as a target platform as an alterative to Emscripten. The zig.build file includes a conditional to build the memory db for WASM. To get a WASM binary call the following:

zig build -Dtarget=wasm32-wasi-musl -Drelease-small
file zig-out/bin/olaf_c.wasm #WebAssembly (wasm) binary module version 0x1 (MVP)

Olaf on Docker

Olaf can be containerized with Docker. The provided Dockerfile is based on Alpine linux and installs only the needed requirements. A way to build the container and run Olaf:

docker build -t olaf:1.0 .
wget "https://filesamples.com/samples/audio/mp3/sample3.mp3"
docker run -v $HOME/.olaf/docker_dbs:/root/.olaf -v $PWD:/root/audio olaf:1.0 olaf store sample3.mp3
docker run -v $HOME/.olaf/docker_dbs:/root/.olaf -v $PWD:/root/audio olaf:1.0 olaf stats

In this case the $HOME/.olaf/docker_dbs is mapped to /root/.olaf in the container to store the database. The audio enters tye system via a mapping of $PWD to /root/audio in the container, which is also the work dir in the container. So relative paths with respect to $PWD of the host can be resolved in the container, absolute paths can not.

Olaf Usage

Olaf provides a command line interface it can be called using olaf subapplication [argument…]. For each task Olaf has to perform, there is a subapplication. There are subapplications to store fingerprints and to query audio fragments. A typical interaction with olaf could look like this:

Olaf interaction

Store fingerprints

The store command extracts fingerprints from an audio file and stores them in a reference database. The incoming audio is decoded and resampled using ffmpeg. ffmpeg needs to be installed on your system and available on the path.

olaf store audio_item...

The audio_item can be:

  1. An audio file: @olaf store audio.mp3@, if the audio file contains multiple channels they are mixed to a mono.
  2. A video file. The first audio stream is extracted from the video container and used as input: @olaf store video.mkv@
  3. A folder name: Olaf attempts to recursively find all audio files within the folder. It does this with a limited allowlist of known audio file name extensions. @olaf store /home/user/Music@
  4. A text file: The text file should contain a list of file names. The following commands recursively finds all mp3 within the current directory and subsequently stores them in the reference database.
find . -name "*.mp3" > list.txt
olaf store list.txt

Internally each audio stream is given an identifier using a one time Jenkins Hash function. This identifier is returned when a match is found. A list connecting these identifiers to file names is also stored automatically.

Query fingerprints

The query command extracts fingerprints and compares them with what is in the database

olaf query query_fragment.opus

See http://www.ffmpeg.org/ffmpeg-devices.html#Input-Devices

ffmpeg -f avfoundation -list_devices true -i ""

ffmpeg -f avfoundation -i “none:default” -ac 1 -ar 16000 -f f32le -acodec pcm_f32le pipe:1 | olaf query 

Monitor

The monitor command splits the query file into steps of x seconds. When working in steps of 5 seconds, then the first five seconds are matched with the reference database and matches are reported. Subsequently it goes on with the next 5 seconds and so forth. This is practical if an unsegmented audio file needs to be matched with the reference database.

	olaf monitor audio_item.mp3...

Deduplicate a collection

This command finds duplicate audio content in a folder. First each audio file is stored in the reference database. Next each file is used as a query and matched with the reference database. The file should, evidently, match itself but self-matches are ignored, leaving only duplicates.

A duplicate means that audio from the original is found in another file. The start and stop times of the found fragment are reported. If the match reports a start of nearly zero and a duration similar to the duration of the original audio file then a ‘full duplicate’ is found: it is almost certainly the same exact track. If only a couple of seconds are reported it means that only a couple of seconds of the original audio are found in the duplicate.

	olaf dedup field_recordings/archive

There is also the dedupm command which first operates similarly to the dedup command but uses the monitor command in stead of the query command. This is useful if you expect partial matches.

Database stats

To get statistics on the database use stats. It prints information on the b-tree structure backing the storage.

olaf stats

Delete fingerprints

Deletion of fingerprints is similar to adding prints:

olaf delete item.mp3

Note that it is currently unclear what the performance implications are when adding and removing many items to the db. In other words: how balanced the B+ tree remains with many leaves removed.

Limitations

  • Currently a complete audio file is read in memory in order to process it. While the algorithm allows streaming, the current implementation does not allow to store incoming streams. Now you can choose whether to compile in the single or stream reader: choose either olaf_reader_single.c or olaf_reader_stream.c, which is a bit slower.
  • Only one write process: LMDB limits the number of processes that can write to the key-value store to a single process. Attempting to write to the key-value store while an other write-process is busy should put the process automatically in a wait state untile the write lock is released (by the other process). Reading can be done frome multiple processes at the same time.
  • Audio decoding is done externally. The core of Olaf does fingerprinting. ffmpeg or similar audio decoding software is required to decode and resample various audio formats.
  • Removing items from the reference database is currently not supported. The underlying database does support deletion of items so it should be relatively easy to add this functionality.
  • Performance implications when removing many items from the database are currently unclear. In other words: how balanced does the B+tree remain when removing many leaves.
  • Olaf is single threaded. The main reasons are simplicity and limitations of embedded platforms. The single threaded design keeps the code simple. On embedded platforms with single core CPU’s multithreading makes no sense.
    On traditional computers there might be a performance gain by implementing multi-threading. However, the time spent on decoding audio and storing fingerprints is much larger than analysis/extraction so the gain might be limited. As an work-around multiple processes can be used simultaniously to query the database.

Further Reading

Some relevant reading material about (landmark based) acoustic fingerprinting. The order gives an idea of relevance to the Olaf project.

  1. Wang, Avery L. An Industrial-Strength Audio Search Algorithm (2003)
  2. Six, Joren and Leman, Marc Panako – A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification (2014)
  3. Cano, Pedro and Batlle, Eloi and Kalker, Ton and Haitsma, Jaap A Review of Audio Fingerprinting (2005)
  4. Arzt, Andreas and Bock, Sebastian and Widmer, Gerhard Fast Identification of Piece and Score Position via Symbolic Fingerprinting (2012)
  5. Fenet, Sebastien and Richard, Gael and Grenier, Yves A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting (2011)
  6. Ellis, Dan and Whitman, Brian and Porter, Alastair Echoprint – An Open Music Identification Service (2011)
  7. Sonnleitner, Reinhard and Widmer, Gerhard Quad-based Audio Fingerprinting Robust To Time And Frequency Scaling (2014)
  8. Sonnleitner, Reinhard and Widmer, Gerhard __
    Robust Quad-Based Audio Fingerprinting__
    (2015)

The programming style of Olaf attempts to use an OOP inspired way to organize code and divide responsibilities and interfaces.
For more information see on this style consult this document about OOP in C. Also
of interest is the Modern C book by Jens Gustedt.

Credits

Olaf by Joren Six at IPEM, Ghent University.

About

Olaf: Overly Lightweight Acoustic Fingerprinting is a portable acoustic fingerprinting system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 65.8%
  • C 32.6%
  • Ruby 1.2%
  • Makefile 0.2%
  • Zig 0.1%
  • Dockerfile 0.1%