Music and Audio Research Group (MARG), Department of Intelligence and Information, Seoul National University, Seoul, Republic of Korea¹ Interdisciplinary Program in Artificial Intelligence, Seoul National University² Artificial Intelligence Institute, Seoul National University³

Fig 1. Training Single-Instrument Encoder

Fig 2. Training Multi-Instrument Encoder

Fig 3. Retrieving similar instruments from the instrument library using the proposed method.
The proposed method consists of the Single-Instrument Encoder and the Multi-Instrument Encoder. The Single-Instrument Encoder extracts an instrument embedding from a single-track audio of the instrument. Using the instrument embeddings computed by the Single-Instrument Encoder as a set of target embeddings, the Multi-Instrument Encoder is trained to estimate the multiple instrument embeddings. Here are samples of our proposed dataset, and results from our Instrument Encoder models.
The goal of our method is to retrieve the instruments used in the reference music from a library of musical instrument samples. For the inference stage as depicted in Fig 3, we used the samples from Nlakh-single for the library of musical instruments.
Following are some Multi-Instrument Encoder’s retrieval results.
The following examples show three cases: (a) perfect match, (b) predicts only some part of the ground truth, and (c) predicts different instruments from the ground truth.
As is seen in the examples, the correct answer is well predicted for which musical instrument exists inside the multi-track music. Furthermore, we can see that even if the wrong answer is presented, the characteristics of the musical instrument are similar because it has a short distance in embedding space.
Case (a) Perfect Match

Instrument : guitar_acoustic_15

Instrument : organ_electronic_104

Instrument : reed_acoustic_11

Instrument : string_acoustic_5

Instrument : guitar_acoustic_15

Instrument : organ_electronic_104

Instrument : reed_acoustic_11

Instrument : string_acoustic_5

Case (b). Predicts only some part of the ground truth

Instrument : guitar_acoustic_10

Instrument : reed_acoustic_37

Instrument : keyboard_acoustic_4

Instrument : guitar_acoustic_10

Instrument : reed_acoustic_37

Case (c). Predicts other instruments from the ground truth.

Instrument : bass_electronic_27

Instrument : keyboard_synthetic_0

Instrument : guitar_acoustic_10
