A Dataset for Personalized Music Emotion Recognition

Developed by Yi-Hsuan Yang, National Taiwan University.
Ref:  Prediction of the Distribution of Perceived Music Emotions Using Discrete Samples, submitted.


  • This dataset is developed in the course of developing a personalized music emotion recognition system [1], [2]
  • From the dimensional perspective, emotions are points in a Cartesian coordinate system with, for example, valence and arousal as the dimensions [3]
  • Therefore, the annotations provided here are numerical values in [-1, 1], rather than class labels


  • all_in_one: all files are in the Matlab .mat format
  • X.mat [features]: including harmonic, pitch, spectral, temporal, and rhythmic features
  • Y.mat [annotations]: (60 x 4);  valence, arousal, std of valence, std of arousal
  • PY.mat [annotations of each participants]: (160 x 30); the order of the annotation is: [v1 a1 v2 a2 ... v15 a15]
  • L.mat [participant id]: (160 x 1); 160 participants 99 subjects
  • C.mat [participant to song set]: (160 x 1) 160 participants 4 song sets
  • P.mat [user information]: (160 x 15) 160 participants, 15 features (demographic properties, music experiences, the Big Five personality traits [2]) (usrdata_format.txt)

Music Collection

  • 60 English pop songs (list_filename)
  • Each song is represented by the 30-sec segment manually trimmed from the chorus section
  • 22,050 sampling frequency, 16 bits precision, and mono channel

Emotion Annotation

  • Each song annotated by 40 subjects
  • A total of 99 subjects (46 male, 53 female, no limitations on the background) are recruited from the campus for annotation
  • We partition the dataset to four subsets, each with 15 music pieces whose emotions are roughly uniformly distributed in the emotion plane, and randomly select one of the subset for a subject to annotate
  • The subjects are asked to annotate the AV values with a graphic interface called “AnnoEmo” [2] in a computer lab
  • The subjects are asked to annotate the perceived emotion [4]

Feature Extraction

  • We employ the following toolboxes and algorithms for feature extraction



# of features


MIR toolbox 1.2


Extracts 3 sensory dissonance features (roughness, irregularity, inharmonicity), 2 pitch features (pitch salient and the centroid of chromagram), and 3 tonal features (mode, harmonic change, key clarity). Take mean and standard deviation for temporal aggregation.



Extract features of the pitch and pitch strength time series estimated by SWIPE and SWIPE’.

Marsyas 0.1


Extract 5 pitch features including tonic, main pitch class, octave range of the dominant pitch, main tonal interval relation, and the overall pitch strength [5].

Marsyas 0.2


Extracts timbral features including spectral flatness measures and spectral crest factors.

MA toolbox


Extracts Mel-frequency cepstral coefficients (MFCC), a representation of the short-term (e.g. 23ms) power spectrum of an audio signal. Take mean and standard deviation to integrate the short-term features.

Spectral contrast


Extract the 12-D octave-based spectral contrast to capture the relative energy distribution of the harmonic components in the spectrum [6].

Sound Description Toolbox


Extract temporal features including zero-crossing rate, temporal centroid, and log attack time

Rhythm pattern extractor


Extracts the average tempo of music and a 60-bin rhythm histogram to describe the general rhythmic in music.


  • You may dichotomize the numerical values to use this dataset in training an emotion classifier
  • The demonstration of a user interface for emotion-based music retrieval is available on youtube [7]


[1] Y.-H. Yang et al, “A regression approach to music emotion recognition,” IEEE Trans. Audio, Speech and Language Processing, vol. 16, no. 2, pp. 448–457, 2008.

[2] Y.-H. Yang et al, “Music emotion recognition: The role of individuality,” Proc. ACM Int. Workshop on Human-centered Multimedia, pp. 13–21, 2007.

[3] J. A. Russell, “A circumplex model of affect”, Journal of Personality and Social Psychology, vol. 39, no. 6, pp. 1161–1178, 1980.

[4] P. N. Juslin and J. A. Sloboda, Music and Emotion: Theory and Research. New York: Oxford University Press, 2001.

[5] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech & Audio Processing, vol. 10, no. 5, pp. 293–302, 2002,

[6] D. N. Jiang, L. Lu, H. J. Zhang, J. H. Tao, and L. H. Cai, “Music type classification by spectral contrast features,” in Proc. IEEE Int. Conf. Multimedia Expo., 2002, pp. 113–116.

[7] Y.-H. Yang et al, “Mr. Emo: Music retrieval in the emotion plane,” Proc. ACM Int. Conf. Multimedia, pp. 1003–1004, 2008.


Any feedbacks or comments are welcomed

(last update: 2010/12/25)