dataset

Informed Singing Voice Separation Results (ICASSP 2015)

Contents

Prologue

Due to copyright restrictions, we are not authorized to upload the audio for iKala. However, the audio for MIR-1K is available (click ♬ to listen).

Notation

Signals

  • Vox: the singing voice signal
  • Mix: the mixture signal (mixed at 0 dB SNR)
  • PV: the pitch vector in semitones (human labeled)

Algorithms

  • RPCA: the original RPCA [1] with no masks (least informed)
  • RPCAs: RPCAs [2] with vocal/non-vocal masks (informed)
  • RPCAs-IBM: RPCAs with ideal binary masks (most informed)

1  RPCAs on the iKala Dataset

Here are the RPCAs results on three excerpts from the iKala dataset.

1.1  Results for 45305_chorus.wav

We can see that the results are better with more information on this mixture:

Vox demo1-vox
Vox (Spectrogram) demo1-vox-spectrogram
Mix demo1-mix
Mix (Spectrogram) demo1-mix-spectrogram
PV demo1-pv
RPCA
(NSDR = 2.80 dB)
demo1-rpca
RPCAs
(NSDR = 4.81 dB)
demo1-rpcas
RPCAs-IBM
(NSDR = 10.45 dB)
demo1-rpcas-ibm

1.2  Results for 31113_verse.wav

Again the results are better with more information, but the improvements are less significant as this mixture is harder to separate:

Vox demo2-vox
Vox (Spectrogram) demo2-vox-spectrogram
Mix demo2-mix
Mix (Spectrogram) demo2-mix-spectrogram
PV demo2-pv
RPCA
(NSDR = 2.34 dB)
demo2-rpca
RPCAs
(NSDR = 3.41 dB)
demo2-rpcas
RPCAs-IBM
(NSDR = 5.70 dB)
demo2-rpcas-ibm

1.3  Results for 54191_verse.wav

Once again the results are better with more information, but the improvements are less significant as this mixture is also hard to separate:

Vox demo3-vox
Vox (Spectrogram) demo3-vox-spectrogram
Mix demo3-mix
Mix (Spectrogram) demo3-mix-spectrogram
PV demo3-pv
RPCA
(NSDR = 1.84 dB)
demo3-rpca
RPCAs
(NSDR = 2.86 dB)
demo3-rpcas
RPCAs-IBM
(NSDR = 5.77 dB)
demo3-rpcas-ibm

2  RPCAs on the MIR-1K Dataset

For comparison, here are the RPCAs results on three excerpts from the MIR-1K dataset.

2.1  Results for amy_1_07.wav

The results are clearly better with more information. This excerpt is the easiest to improve because there are plenty of silence at the end:

Vox demo4-vox
Vox (Spectrogram) demo4-vox-spectrogram
Mix demo4-mix
Mix (Spectrogram) demo4-mix-spectrogram
PV demo4-pv
RPCA
(NSDR = 4.72 dB)
demo4-rpca
RPCAs
(NSDR = 9.04 dB)
demo4-rpcas
RPCAs-IBM
(NSDR = 14.48 dB)
demo4-rpcas-ibm

2.2  Results for khair_5_05.wav

Again the results are better with extra information, but the improvements are less significant as there are less silence in this excerpt:

Vox demo5-vox
Vox (Spectrogram) demo5-vox-spectrogram
Mix demo5-mix
Mix (Spectrogram) demo5-mix-spectrogram
PV demo5-pv
RPCA
(NSDR = 1.60 dB)
demo5-rpca
RPCAs
(NSDR = 2.29 dB)
demo5-rpcas
RPCAs-IBM
(NSDR = 5.56 dB)
demo5-rpcas-ibm

2.3  Results for stool_2_08.wav

Once again the results are better with more information. However, RPCAs-IBM results indicate that there is a huge room for improvement, particularly when the voice is softer than the music:

Vox demo6-vox
Vox (Spectrogram) demo6-vox-spectrogram
Mix demo6-mix
Mix (Spectrogram) demo6-mix-spectrogram
PV demo6-pv
RPCA
(NSDR = 5.95 dB)
demo6-rpca
RPCAs
(NSDR = 6.50 dB)
demo6-rpcas
RPCAs-IBM
(NSDR = 11.63 dB)
demo6-rpcas-ibm

References

[1] P.-S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, “Singing-voice separation from monaural recordings using robust principal component analysis,” in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2012, pp. 57-60.

[2] T.-S. Chan, T.-C. Yeh, Z.-C. Fan, H.-W. Chen, L. Su, Y.-H. Yang, and R. Jang, “Vocal activity informed singing voice separation with the iKala dataset,” in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2015, pp. 718-722.