A Medium-Scale Dataset for Music Emotion Recognition
Developed by Yi-Hsuan
Ref: Ranking-based Emotion Recognition for Music Organization and
Retrieval, accepted for publication, IEEE
Trans. Audio, Speech, and Language Processing.
- This dataset is developed in
the course of developing ranking-based methods for improving the
dimensional music emotion recognition approach 
- From the dimensional
perspective, emotions are points in a Cartesian coordinate system with,
for example, valence and arousal as the dimensions 
- Therefore, the annotations
provided here are numerical
values in [-1, 1], rather than class labels
- Only annotate valence (how positive/negative
the perceived emotion is)
- 1240 Chinese pop songs
- Each song is represented by
the 30-sec segment starting
from its initial 30th second
- 22,050 sampling frequency, 16
bits precision, and mono channel
- Due to copyright issues, the
audio files cannot be distributed. One may utilize the program
provided by Dan Ellis to synthesize the audio signal from MFCC features.
- An on-line subjective test is
conducted during Aug. 2008 to Nov. 2008 to collect emotion annotations
- A total of 666 subjects participate the
subjective test, making each song annotated by 4.3 subjects on the average
- Each subject is invited to
annotate 8 randomly selected music pieces using both rating- and
scroll bar with end points denoting 0 and 100
are asked to make pairwise comparisons of the emotions of songs
1. Left: Music emotion
tournament groups eight randomly chosen music pieces in seven tournaments.
We use bold line to indicate the winner of each tournament. Right: the resulting preference
matrix (partial), with the entry (i,j) painted black to indicate that
the piece i is ranked higher than
j. The global ordering
f>b>c=h>a=d=e=g can then be estimated by a greedy algorithm.
# of features
Extracts the average tempo of music
and a 60-bin rhythm histogram to describe the general rhythmic in music.
MIR toolbox 1.2
Extracts 3 sensory
dissonance features (roughness, irregularity, inharmonicity), 2 pitch features
(pitch salient and the centroid of chromagram), and 3 tonal features (mode,
harmonic change, key clarity). Take mean and standard deviation for
cepstral coefficients (MFCC), a representation of the short-term (e.g.
23ms) power spectrum of an audio signal. Take mean and standard deviation
to integrate the short-term features.
features including spectral flatness measures and spectral crest factors.
- You may dichotomize the
numerical values to use this dataset in training an emotion classifier
- You may go here to fetch
annotations of both arousal and valence of a smaller dataset consists of
60 English pop songs 
- The demonstration of a user
interface for emotion-based music retrieval is available on youtube 
 Y.-H. Yang et al, “A
regression approach to music emotion recognition,” IEEE Trans. Audio, Speech and Language Processing, vol. 16, no.
2, pp. 448–457, 2008.
 J. A. Russell, “A
circumplex model of affect”, Journal
of Personality and Social Psychology, vol. 39, no. 6, pp.
 Y.-H. Yang et al, “Music
emotion recognition: The role of individuality,” Proc. ACM Int. Workshop on Human-centered Multimedia, pp.
 Y.-H. Yang et al, “Mr.
Emo: Music retrieval in the emotion plane,” Proc. ACM Int. Conf. Multimedia, pp. 1003–1004, 2008.