A Dataset for Personalized Music Emotion Recognition
Developed by Yi-Hsuan
Ref: Prediction of the Distribution of Perceived Music Emotions Using
Discrete Samples, submitted.
- This dataset is developed in the
course of developing a personalized music emotion recognition system
- From the dimensional
perspective, emotions are points in a Cartesian coordinate system with,
for example, valence and arousal as the dimensions 
- Therefore, the annotations
provided here are numerical
values in [-1, 1], rather than class labels
- all_in_one: all files are in
the Matlab .mat format
- X.mat [features]:
including harmonic, pitch, spectral, temporal, and rhythmic features
- Y.mat [annotations]:
(60 x 4); valence, arousal,
std of valence, std of arousal
- PY.mat [annotations
of each participants]: (160 x 30); the order of the
annotation is: [v1 a1 v2 a2 ... v15 a15]
- L.mat [participant
id]: (160 x 1); 160 participants → 99 subjects
- C.mat [participant
to song set]: (160 x 1) 160 participants → 4 song sets
- P.mat [user
information]: (160 x 15) 160 participants, 15 features
(demographic properties, music experiences, the Big Five personality
traits ) (usrdata_format.txt)
- 60 English pop songs
- Each song is represented by
the 30-sec segment manually
trimmed from the chorus section
- 22,050 sampling frequency, 16
bits precision, and mono channel
- Each song annotated by 40 subjects
- A total of 99 subjects (46 male, 53 female,
no limitations on the background) are recruited from the campus for
- We partition the dataset to
four subsets, each with 15
music pieces whose emotions are roughly uniformly distributed in the emotion
plane, and randomly select one of the subset for a subject to annotate
- The subjects are asked to
annotate the AV values with a graphic interface called “AnnoEmo”
 in a computer lab
- The subjects are asked to
annotate the perceived emotion 
- We employ the following
toolboxes and algorithms for feature extraction
# of features
MIR toolbox 1.2
Extracts 3 sensory
dissonance features (roughness, irregularity, inharmonicity), 2 pitch
features (pitch salient and the centroid of chromagram), and 3 tonal
features (mode, harmonic change, key clarity). Take mean and standard
deviation for temporal aggregation.
Extract features of
the pitch and pitch strength time series estimated by SWIPE and
Extract 5 pitch
features including tonic, main pitch class, octave range of the dominant
pitch, main tonal interval relation, and the overall pitch strength .
features including spectral flatness measures and spectral crest factors.
Mel-frequency cepstral coefficients (MFCC), a representation of the
short-term (e.g. 23ms) power spectrum of an audio signal. Take mean and
standard deviation to integrate the short-term features.
Extract the 12-D
octave-based spectral contrast to capture the relative energy distribution
of the harmonic components in the spectrum .
Extract temporal features
including zero-crossing rate, temporal centroid, and log attack time
Extracts the average tempo of
music and a 60-bin rhythm histogram to describe the general rhythmic in
- You may dichotomize the
numerical values to use this dataset in training an emotion classifier
- The demonstration of a user
interface for emotion-based music retrieval is available on youtube 
 Y.-H. Yang et al, “A
regression approach to music emotion recognition,” IEEE Trans. Audio, Speech and Language Processing, vol. 16, no.
2, pp. 448–457, 2008.
 Y.-H. Yang et al, “Music
emotion recognition: The role of individuality,” Proc. ACM Int. Workshop on Human-centered Multimedia, pp.
 J. A. Russell, “A
circumplex model of affect”, Journal
of Personality and Social Psychology, vol. 39, no. 6, pp. 1161–1178,
 P. N. Juslin and J. A.
Sloboda, Music and Emotion: Theory and Research. New York: Oxford University
 G. Tzanetakis and P. Cook,
“Musical genre classification of audio signals,” IEEE Trans.
Speech & Audio Processing, vol. 10, no. 5, pp. 293–302, 2002,
 D. N. Jiang, L. Lu, H. J.
Zhang, J. H. Tao, and L. H. Cai, “Music type classification by spectral
contrast features,” in Proc. IEEE Int. Conf. Multimedia Expo., 2002,
 Y.-H. Yang et al, “Mr.
Emo: Music retrieval in the emotion plane,” Proc. ACM Int. Conf. Multimedia, pp. 1003–1004, 2008.