The iKala dataset comprises of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX). The music accompaniment and the singing voice are recorded at the left and right channels respectively and can be found under the Wavfile directory. In addition, the human-labeled pitch contours and timestamped lyrics can be found under PitchLabel and Lyrics respectively.


NEWS (2017-12-04): the iKala dataset is no longer available since iKala has grown into an internet video service and is no longer in the online karaoke business. Researchers are encouraged to work on the SiSEC datasets instead.

To request the iKala dataset:

If approved, it might take up to five working days to receive the download link by email.


This dataset is created by a team of researchers at iKala, MIR Lab and MACLab.


The dataset is presented in the following paper:

  • T.-S. Chan, T.-C. Yeh, Z.-C. Fan, H.-W. Chen, L. Su, Y.-H. Yang, and R. Jang,
    Vocal activity informed singing voice separation with the iKala dataset,
    in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2015, pp. 718-722.
    doi: 10.1109/ICASSP.2015.7178063 (preprint)

And used in the following papers:

  • T.-S. T. Chan and Y.-H. Yang,
    Complex and quaternionic principal component pursuit and its application to audio separation,
    IEEE Signal Process. Lett., vol. 23, no. 2, pp. 287-291, 2016.
    doi: 10.1109/LSP.2016.2514845 (preprint) (code)
  • T.-S. T. Chan and Y.-H. Yang,
    Informed group-sparse representation for singing voice separation,
    IEEE Signal Process. Lett., vol. 24, no. 2, pp. 156-160, 2017.
    doi: 10.1109/LSP.2017.2647810 (preprint) (code)
  • Z.-C. Fan, T.-S. T. Chan, Y.-H. Yang, and J.-S. R. Jang,
    Music signal processing using vector product neural networks,
    in Proc. IJCNN Workshop Deep Learning for Music, 2017, pp. 26-30.
    doi: 10.13140/RG.2.2.22227.99364/1