in

Typical music emotion classification (MEC) approaches categorize emotions and apply pattern recognition methods to train a classifier. However, categorized emotions are too ambiguous for efficient music retrieval. In this paper, we model emotions as continuous variables composed of arousal and valence values (AV values), and formulate MEC as a regression problem. The multiple linear regression, support vector regression, and AdaBoost.RT are adopted to evaluate the prediction accuracy. Since the regression approach is inherently continuous, it is free of the ambiguity problem existing in its categorical counterparts.

Methods

1)

2)

¡÷ A tutorial on support vector regression.

¡÷ LIBSVM: a library for support vector machines (link).

[parameters: arousal -c 0.5 -g 0.0078125 -p 0.125]

[parameters: valence -c 4 -g 0.0078125 -p 0.25]

3)

¡÷ AdaBoost.RT: a boosting algorithm for regression problems.

[parameters:

threshold

number of iterations: 30

threshold

Results

1)

¡÷ Inherently continuous,

¡÷

¡÷ Allows more efficient music retrieval and management.

¡÷ One can also easily convert the regression results to binary or quaternary ones if categorical taxonomy is required.

2)

¡÷ Has

¡÷ Learns the predicting rules according to the ground truth and can be trained to reach optimal performance.

¡÷

3) R^2 statistics reaches

Classification accuracy reaches

Data Sets

Music clips are trimmed to 25 seconds and converted to a uniform format (22,050 Hz, 16 bits, and mono channel PCM WAV). The same music database contains 195 popular songs from Western, Chinese, and Japanese albums. Subjects (most college students) are asked to listen to a subset of music dataset and to choose two values, each ranges from -1.0 to 1.0 in 11 levels, to indicate their feeling about the AV values of the music sample. The ground truth is set as the mean of the AV values of all subjects tested. On the average, more than ten pairs of AV values are collected from the subjective test for each music sample.

feat_list.txt

list of the 114 features

DBR.txt

193 records, 114 features format: [v a sv sa f1 f2 ... f114]

v: valence(-5~5)

a: arousal(-5~5)

sv: standard deviation of valence (from subjective test)

sa: standard deviation of arousal (from subjective test)

f: features (without normalization)

f1-f28(28): DWCH features

f29-f58(30): Marsyas features

f59-f102(44): PsySound features

f103-f114(12): Spectral contrast features

f64,f71-f84(15): PsySound-15 features

DBR_A.txt

in SVM format: [a 1:f1 2:f2 ... 114:f114]

DBR_V.txt

in SVM format: [v 1:f1 2:f2 ... 114:f114]

Any feedbacks or comments are welcomed!

affige@gmail.com

http://mpac.ee.ntu.edu.tw/~yihsuan/