Music and Audio Research Group (MARG), Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea¹ Interdisciplinary Program in Artificial Intelligence, Seoul National University² Artificial Intelligence Institute, Seoul National University³
Fig 1. Training Single-Instrument Encoder
Fig 2. Training Multi-Instrument Encoder
Fig 3. Retrieving similar instruments from the instrument library using the proposed method.
The proposed method consists of the Single-Instrument Encoder and the Multi-Instrument Encoder. The Single-Instrument Encoder extracts an instrument embedding from a single-track audio of the instrument. Using the instrument embeddings computed by the Single-Instrument Encoder as a set of target embeddings, the Multi-Instrument Encoder is trained to estimate the multiple instrument embeddings. Here are samples of our proposed dataset, and results from our Instrument Encoder models.
The goal of our method is to retrieve the instruments used in the reference music from a library of musical instrument samples. For the inference stage as depicted in Fig 3, we used the samples from Nlakh-single for the library of musical instruments.
Following are some Multi-Instrument Encoder’s retrieval results.
The following examples show three cases: (a) perfect match, (b) predicts only some part of the ground truth, and (c) predicts different instruments from the ground truth.
As is seen in the examples, the correct answer is well predicted for which musical instrument exists inside the multi-track music. Furthermore, we can see that even if the wrong answer is presented, the characteristics of the musical instrument are similar because it has a short distance in embedding space.
Case (a) Perfect Match
Instrument : guitar_acoustic_15
Instrument : organ_electronic_104
Instrument : reed_acoustic_11
Instrument : string_acoustic_5
Instrument : guitar_acoustic_15
Instrument : organ_electronic_104
Instrument : reed_acoustic_11
Instrument : string_acoustic_5
Case (b). Predicts only some part of the ground truth
Instrument : guitar_acoustic_10
Instrument : reed_acoustic_37
Instrument : keyboard_acoustic_4
Instrument : guitar_acoustic_10
Instrument : reed_acoustic_37
Case (c). Predicts other instruments from the ground truth.
Instrument : bass_electronic_27
Instrument : keyboard_synthetic_0
Instrument : guitar_acoustic_10
Instrument : bass_electronic_25
Instrument : keyboard_synthetic_0