Music Emotion Recognition: The Role of Individuality

Y.-H. Yang, Y.-F. Su, Y.-C. Lin, and H.-H. Chen, "Music emotion recognition: The role of individuality,"
in Proc. International Workshop on Human-centered Multimedia 2007 (HCM'07), in conjunction with ACM Multimedia.

[full text]

Data   AnnoEmo   Comments

Data Sets

  • source audio files available upon request.

  • All_in_one (zip)

    X.mat (features): (60 x 45)
    60 songs, 45 features [psy15 + marsyas30](linearly scaled)

    Y.mat (annotations): (60 x 4)
    60 songs, 4 values [valence, arousal, std of valence, std of arousal]
    This is the average annotation from the subjective test, used to train the general regressor.

    P.mat (user information): (160 x 15)
    160 participants, 15 features [demographic properties, music experiences, the Big Five personality traits]
    Many of the participants are repeated, actually 99 participants.

    PY.mat (annotations of each participant): (160 x 30)
    160 participants, 30 annotations.
    Each partipiant annotates the [valence, arousal] of a song set (15 songs), results in 30 annotations.
    The order of the annotation is: [v1 a1 v2 a2 ... v15 a15]

    L.mat (participant id): (160 x 1)
    160 participants 99 persons
    The id of each partipants. Pariticipants with identical ids are actually the same person.
    (the persons who annotate all the 60 songs are those with ids 20, 52, 66, 67, 69, 70)

    C.mat (participant to song set): (160 x 1)
    160 participants 4 song sets
    The 60 songs are divided into 4 sets(1~4).
    C records which set a particpiant (among the 160 particpiants) annotate.

    feat_list.txt    usrdata_format.txt    song_list.txt

    By combining X and Y, one can train a general regressor.
    By combining P, C, X, and PY, one can implement the GWMER approach.
    By combining L, C, X, and PY, one can implement the PMER approach.

    Software: 'AnnoEmo'

    We design a user interface called 'AnnoEmo' for the subjective test using the Java language.
    AnnoEmo is mainly composed of UserDataFrame and AVplane.

    UserDataFrame (Fig) collects personal informations .
    AVplane (Fig) helps a participant annotate the AV values for each song by using a mouse to click a point in the emotion plane displayed by computer. After clicking a point, a rectangle is formed on the specified point. The participant can then click on the rectangle to listen to the associated music, or drag and drop the rectangle to other places since after listening to other songs the participant may want to modify the annotation of previous songs. Once formed, the rectangle would exist throughout the subjective test. Therefore, as the participant annotates more songs, more rectangles are presented on the emotion plane, making it easy for the participant to compare the annotations of different music samples.

    source code     readme   (latest update: 2007.7.20)

    We have tried to make the code understandable and less buggy.
    Kindly report comments or bugs to Yi-Hsuan Yang :)


    The paper presents a systematic approach to individual and group difference issues relevant to the design of music genre/emotion classification research.

    It is certainly interesting to consider these kinds of issues with regards to musical affect classification and the authors are to be admired for their attempt to deal, in a methodical fashion, with a very complex and difficult to-define problem.

    While the approach presented in the paper seems to be sound, the current method, and, most significantly, the scale of the study, are insufficient to capture essential aspects of the problem the authors have set out to address.


    Any feedbacks or comments are welcomed!