Olaf is a C application / library for landmark based acoustic fingerprinting. Olaf is able to extract fingerprints from an audio stream, and either store those fingerprints in a database, or find a match between extracted fingerprints and stored fingerprints. Olaf does this efficiently in order to be used on embedded platforms, traditional computers or in web browsers via WASM.
Please be aware of the patents US7627477 B2 and US6990453 and perhaps others. They describe techniques used in algorithms implemented within Olaf. These patents limit the use of Olaf under various conditions and for several regions. Please make sure to consult your intellectual property rights specialist if you are in doubt about these restrictions. If these restrictions apply, please respect the patent holders rights. The main aim of Olaf is to serve as a learning platform on efficient (embedded) acoustic fingerprinting algorithms.
Olaf stands out for three reasons. Olaf runs on embedded devices. Olaf is fast on traditional computers. Olaf runs in the browsers.
There seem to be no lightweight acoustic fingerprinting libraries that are straightforward to run on embedded platforms. On embedded platforms memory and computational resources are severely limited. Olaf is written in portable C with these restrictions in mind. Olaf mainly targets 32-bit ARM devices such as some Teensy’s, some Arduino’s and the ESP32. Other modern embedded platforms with similar specifications and might work as well.
Olaf, being written in portable C, operates also on traditional computers. There, the efficiency of Olaf makes it run fast. On embedded devices reference fingerprints are stored in memory. On traditional computers fingerprints are stored in a high-performance key-value-store: LMDB. LMDB offers an a B+-tree based persistent storage ideal for small keys and values with low storage overhead.
Olaf works in the browser. Via Emscripten Olaf can be compiled to WASM. This makes it relatively straightforward to combine the capabilities of the Web Audio API and Olaf to create browser based audio fingerprinting applications.
Olaf was featured on hackaday. There is also a small discussion about Olaf on Hacker News.
To use Olaf ffmpeg
and ruby
need to be installed on your system. While the core of Olaf is in pure c, a Ruby script provides an easy to use interface to its capabilities. The Ruby script converts audio (with ffmpeg), parses command line arguments and reports results in a readable format.
To install ffmpeg and ruby on a Debian like system:
apt-get install ffmpeg ruby
On macOS ruby is available by default and ffmpeg
can be installed with homebrew
brew install ffmpeg
To compile the version with a key value store for traditional computers use the following. By default the makefile uses gcc set to the C11 standard. It should be possible to use other compilers compliant with the C11 standard as well. So either make sure gcc is installed correctly or modify the Makefile for your compiler of choice. Compilation and installation:
make make install
To compile the in memory version e.g. for use on embedded devices. The embedded version basically swaps out the key-value store with an array and the functions to find matches within an array. The same remark about compilers as above holds.
make mem
To compile Olaf to WASM the emcc compiler from Emcripten is used. Make sure Emscripten is correctly installed and run the following:
make web python3 -m http.server --directory wasm #start a web browser open "http://localhost:8000/spectrogram.html" #open the url in standard browser
By default the Olaf web version matches with fingerprints defined in a header file.
Alternatively you can build Olaf with zig. Zig is a programming language which also ships with a c compiler. In this project Zig is employed as an easy to use cross compiler.
A zig.build
file is provided. Just call “zig build” to compile for your platform. On an M1 mac it gives the following:
zig build file zig-out/bin/olaf_c # Mach-O 64-bit executable arm64
The power of Zig is, however, more obvious when an other target is provided. For a list of platforms, please consult the output of zig targets
. To build a windows executable run the following commands. Afterwards, pre-built windows executable can be found in pre-built
zig build -Dtarget=x86_64-windows-gnu -Drelease-small file zig-out/bin/olaf_c.exe # PE32+ executable (console) x86-64, for MS Windows
Zig allso supports WebAssembly as a target platform as an alterative to Emscripten. The zig.build
file includes a conditional to build the memory db for WASM. To get a WASM binary call the following:
zig build -Dtarget=wasm32-wasi-musl -Drelease-small file zig-out/bin/olaf_c.wasm #WebAssembly (wasm) binary module version 0x1 (MVP)
Olaf can be containerized with Docker. The provided Dockerfile is based on Alpine linux and installs only the needed requirements. A way to build the container and run Olaf:
docker build -t olaf:1.0 . wget "https://filesamples.com/samples/audio/mp3/sample3.mp3" docker run -v $HOME/.olaf/docker_dbs:/root/.olaf -v $PWD:/root/audio olaf:1.0 olaf store sample3.mp3 docker run -v $HOME/.olaf/docker_dbs:/root/.olaf -v $PWD:/root/audio olaf:1.0 olaf stats
In this case the $HOME/.olaf/docker_dbs
is mapped to /root/.olaf
in the container to store the database. The audio enters tye system via a mapping of $PWD
to /root/audio
in the container, which is also the work dir in the container. So relative paths with respect to $PWD
of the host can be resolved in the container, absolute paths can not.
Olaf provides a command line interface it can be called using olaf subapplication [argument…]. For each task Olaf has to perform, there is a subapplication. There are subapplications to store fingerprints and to query audio fragments. A typical interaction with olaf could look like this:
The store command extracts fingerprints from an audio file and stores them in a reference database. The incoming audio is decoded and resampled using ffmpeg. ffmpeg needs to be installed on your system and available on the path.
olaf store audio_item...
The audio_item can be:
- An audio file:
@olaf store audio.mp3
@, if the audio file contains multiple channels they are mixed to a mono. - A video file. The first audio stream is extracted from the video container and used as input:
@olaf store video.mkv
@ - A folder name: Olaf attempts to recursively find all audio files within the folder. It does this with a limited allowlist of known audio file name extensions.
@olaf store /home/user/Music
@ - A text file: The text file should contain a list of file names. The following commands recursively finds all mp3 within the current directory and subsequently stores them in the reference database.
find . -name "*.mp3" > list.txt olaf store list.txt
Internally each audio stream is given an identifier using a one time Jenkins Hash function. This identifier is returned when a match is found. A list connecting these identifiers to file names is also stored automatically.
The query command extracts fingerprints and compares them with what is in the database
olaf query query_fragment.opus
See http://www.ffmpeg.org/ffmpeg-devices.html#Input-Devices
ffmpeg -f avfoundation -list_devices true -i ""
ffmpeg -f avfoundation -i “none:default” -ac 1 -ar 16000 -f f32le -acodec pcm_f32le pipe:1 | olaf query
The monitor command splits the query file into steps of x seconds. When working in steps of 5 seconds, then the first five seconds are matched with the reference database and matches are reported. Subsequently it goes on with the next 5 seconds and so forth. This is practical if an unsegmented audio file needs to be matched with the reference database.
olaf monitor audio_item.mp3...
This command finds duplicate audio content in a folder. First each audio file is stored in the reference database. Next each file is used as a query and matched with the reference database. The file should, evidently, match itself but self-matches are ignored, leaving only duplicates.
A duplicate means that audio from the original is found in another file. The start and stop times of the found fragment are reported. If the match reports a start of nearly zero and a duration similar to the duration of the original audio file then a ‘full duplicate’ is found: it is almost certainly the same exact track. If only a couple of seconds are reported it means that only a couple of seconds of the original audio are found in the duplicate.
olaf dedup field_recordings/archive
There is also the dedupm
command which first operates similarly to the dedup
command but uses the monitor command in stead of the query command. This is useful if you expect partial matches.
To get statistics on the database use stats
. It prints information on the b-tree structure backing the storage.
olaf stats
Deletion of fingerprints is similar to adding prints:
olaf delete item.mp3
Note that it is currently unclear what the performance implications are when adding and removing many items to the db. In other words: how balanced the B+ tree remains with many leaves removed.
Currently a complete audio file is read in memory in order to process it. While the algorithm allows streaming, the current implementation does not allow to store incoming streams.Now you can choose whether to compile in the single or stream reader: choose eitherolaf_reader_single.c
orolaf_reader_stream.c
, which is a bit slower.- Only one write process: LMDB limits the number of processes that can write to the key-value store to a single process. Attempting to write to the key-value store while an other write-process is busy should put the process automatically in a wait state untile the write lock is released (by the other process). Reading can be done frome multiple processes at the same time.
- Audio decoding is done externally. The core of Olaf does fingerprinting. ffmpeg or similar audio decoding software is required to decode and resample various audio formats.
Removing items from the reference database is currently not supported. The underlying database does support deletion of items so it should be relatively easy to add this functionality.- Performance implications when removing many items from the database are currently unclear. In other words: how balanced does the B+tree remain when removing many leaves.
- Olaf is single threaded. The main reasons are simplicity and limitations of embedded platforms. The single threaded design keeps the code simple. On embedded platforms with single core CPU’s multithreading makes no sense.
On traditional computers there might be a performance gain by implementing multi-threading. However, the time spent on decoding audio and storing fingerprints is much larger than analysis/extraction so the gain might be limited. As an work-around multiple processes can be used simultaniously to query the database.
Some relevant reading material about (landmark based) acoustic fingerprinting. The order gives an idea of relevance to the Olaf project.
- Wang, Avery L. An Industrial-Strength Audio Search Algorithm (2003)
- Six, Joren and Leman, Marc Panako – A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification (2014)
- Cano, Pedro and Batlle, Eloi and Kalker, Ton and Haitsma, Jaap A Review of Audio Fingerprinting (2005)
- Arzt, Andreas and Bock, Sebastian and Widmer, Gerhard Fast Identification of Piece and Score Position via Symbolic Fingerprinting (2012)
- Fenet, Sebastien and Richard, Gael and Grenier, Yves A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting (2011)
- Ellis, Dan and Whitman, Brian and Porter, Alastair Echoprint – An Open Music Identification Service (2011)
- Sonnleitner, Reinhard and Widmer, Gerhard Quad-based Audio Fingerprinting Robust To Time And Frequency Scaling (2014)
- Sonnleitner, Reinhard and Widmer, Gerhard __
Robust Quad-Based Audio Fingerprinting__ (2015)
The programming style of Olaf attempts to use an OOP inspired way to organize code and divide responsibilities and interfaces.
For more information see on this style consult this document about OOP in C. Also
of interest is the Modern C book by Jens Gustedt.
- PFFFT a pretty fast FFT library. BSD license
- LMDB Lightning Memory-Mapped Database, a fast key value store with a permissive software license (the OpenLDAP Public License) by Howard Chu.
- Hash table by Simon Howard, ISC lisence
Olaf by Joren Six at IPEM, Ghent University.