While the MIR-1K dataset is the first public dataset specifically created for singing voice separation, rapid advances in computing technology demand even higher quality datasets. Thus, we construct a new publicly available iKala dataset containing:
- Wavfile: 252 audio clips of 30s each (44.1kHz, 16-bit WAV format with music and voice recorded at the left and right channels respectively)
- PitchLabel: pitch contour annotations (32ms frames, LAB format with continuous MIDI pitches or 0 for non-vocal breaks; convertible to JAMS with ikala_melody_parser.py)
- Lyrics: lyrics with timestamps (LAB format with start time in ms, end time in ms, word, and pronunciation if Chinese; convertible to JAMS with ikala_lyrics_parser.py)
Each clip is named in the form SongId_ClipId (ClipId can be verse or chorus). These clips are sampled from 206 iKala songs featuring professional singers and musicians. Please see the following table for a detailed comparison between the two datasets:
Number of clips
|Voice recorded separately
Pitch contour annotations
|Voice type annotations
Lyrics with speech
|Lyrics with timestamps
Separate chorus and verse
For Chinese lyrics (except those sung in Taiwanese, see table below), pronunciations in IPA are provided to assist lyrics-informed separation. Thanks to Lien-Chiao Lin and Hung-Shin Lee for their contributions.
iKala hired six studio singers to sing the songs. The singer to song ID mapping can be found in id_mapping.txt. This information may be used to develop singer-independent or (supervised) singer-dependent models for source separation.