Session 1: Presentation by professors

10:00 to 10:30

Web Science at Southampton

Thomas Irvine
University of Southampton, UK

10:30 to 11:00

Machine Learning for Creative AI Applications in Music

Yi-Hsuan Yang
Academia Sinica, Taiwan

11:30 to 12:00

Synthesis by Analysis: A Methodology for Automated Music Generation

Satoru Fukayama
Advanced Industrial Science and Technology (AIST), Japan

Session 2: Presentation by students

13:30 to 14:00

A Formula for Music Similarity: Utilising Score-Based Recommendation

Anna Kent-Muller
University of Southampton, UK

14:30 to 15:00

10:00-10:30

Thomas Irvine

Web Science at Southampton

abstract

The World Wide Web has affected the lives of everyone, even those who have never used a website, by transforming governments, businesses, civil society and individual lives. The Web Science Institute (WSI) is located at the intersection between technology and society, researching how the Web is changing the world and the world is changing the Web and providing a bridge between the two. Our vision is that the Web Science Institute will be a globally recognised authority on the development and social impact of Web technologies, offering analysis, tools, data and advice to government, business and civil society. The WSI draws together world-leading researchers from across the University of Southampton in a range of interdisciplinary research activities. Our members include researchers from the social and computational sciences, the humanities, medicine and health sciences, business and law and the natural sciences. We run major international research programmes funded by UK Research Councils, the EU, government and industrial partners. Our current research programmes are being shaped increasingly by the challenges and opportunities afforded by Artificial Intelligence. Our Executive Director, Professor Dame Wendy Hall, recently authored the UK government’s official review on prospects for the UK’s growing AI industries.

bio

Thomas Irvine is Associate Professor of Music and a Non-Executive Director of the Web Science Institute. He is an intellectual historian with interests in the intersections of sound, music and science on a global scale. His book Listening to China: Sound and the Sino-Western Encounter, 1770-1839 is forthcoming from the University of Chicago Press. As a Mid-Career Fellow of the British Academy he spent part of the 2015-2016 academic year as a visiting scholar at National Chiao Tung University.

Back

10:30-11:00

Yi-Hsuan Yang

Machine Learning for Creative AI Applications in Music

abstract

In this talk, I will briefly introduce two ongoing projects in our lab at Academia Sinica on creative applications in music, including the singing voice separation project and the DJnet project. The first project is about separating the singing voice from the musical accompaniments, which can be used as a pre-processing step for many music related applications. The second project is about creating an AI DJ that knows how to manipulate, sample, and sequence musical pieces to create a personalized playlist. Hao-Wen Dong will present some result of another ongoing project, called GenMusic (music generation), in the afternoon session. The goal of these projects is to enrich the way people create and interact with music in their daily lives, using the latest machine learning (deep learning) techniques.

bio

Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010. He is also a Joint-Appointment Associate Professor with the National Cheng Kung University, Taiwan. His research interests include music information retrieval, affective computing, multimedia, and machine learning. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan, and the 2015 Best Conference Paper Award of the IEEE Multimedia Communications Technical Committee. He is an author of the book Music Emotion Recognition (CRC Press 2011). In 2014, he served as a Technical Program Co-Chair of the International Society for Music Information Retrieval Conference (ISMIR). In 2016, he started his term as an Associate Editor for the IEEE Transactions on Affective Computing and the IEEE Transactions on Multimedia. Dr. Yang is a senior member of the IEEE.

slide

https://www.slideshare.net/affige/machine-learning-for-creative-ai-applications-in-music

Back

11:00-11:30

Ching-hua Chuan

Modeling Temporal Tonal Relations in Polyphonic Music through Deep Networks with a Novel Image-Based Representation

abstract

We propose an end-to-end approach for modeling polyphonic music with a novel graphical representation, based on music theory, in a deep neural network. Despite the success of deep learning in various applications, it remains a challenge to incorporate existing domain knowledge in a network without affecting its training routines. In this paper we present a novel approach for predictive music modeling and music generation that incorporates domain knowledge in its representation. In this work, music is transformed into a 2D representation, inspired by tonnetz from music theory, which graphically encodes musical relationships between pitches. This representation is incorporated in a deep network structure consisting of multilayered convolutional neural networks (CNN, for learning an efficient abstract encoding of the representation) and recurrent neural networks with long short-term memory cells (LSTM, for capturing temporal dependencies in music sequences). Experimental results show that the tonnetz representation produces musical sequences that are more tonally stable and contain more repeated patterns than sequences generated by pianoroll-based models.

bio

Ching-Hua Chuan is a research associate professor of interactive media at University of Miami. She received her Ph.D. in computer science from University of Southern California (Los Angeles, CA, USA) Viterbi School of Engineering. Dr. Chuan’s research interests include artificial intelligence, machine learning, music information retrieval and audio signal processing. She has published refereed articles in journals and conferences on audio content analysis, style-specific music generation, machine learning applications, and music and multimedia information retrieval. She was the recipient of the best new investigator paper award at the Grace Hopper Celebration of Women in Computing in 2010.

slide

Click me to download

Back

11:30-12:00

Satoru Fukayama

Synthesis by Analysis: A Methodology for Automated Music Generation

abstract

This talk aims at eliciting discussion on how we can generate interesting music with a set of algorithmic procedures. The discussion includes an issue of how we regard creativity of humans. This talk begins by reviewing the previous discussion on creativity which indicates that studying previous work and using methodical techniques can explore possible solutions, and can lead to creative activities. Based on this point of view, I will present my attempts on automated music generation where the using methods can be regarded as "Synthesis by Analysis" approach. It decomposes an existing musical piece into components and generates a novel piece by combining them with various combinations. Finally, future work of this methodology is discussed with emphasis on machine learning techniques.

bio

Satoru Fukayama received his Ph.D. degree in Information Science and Technology in 2013, from the University of Tokyo. He is currently a Senior Researcher at the National Institute of Advanced Industrial Science and Technology (AIST), Japan. His primary interests are in the theory and applications of automated music generation which is leveraged by machine learning. He is also a composer and has studied hamony and counterpoint with Etsuo Kawasaki and Kenji Kunikoshi. He has received awards including IPSJ Yamashita SIG Research Award, Specially Selected Paper Award and several Best Presentation Awards from the Information Processing Society of Japan.

Back

13:00-13:30

Hao-Wen Dong

MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment

abstract

Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it.

bio

Hao-Wen Dong is currently a research assistant in the Music and Audio Computing (MAC) Lab at Academia Sinica under the supervision of Dr. Yi-Hsuan Yang. He has completed his bachelor’s degree in Electrical Engineering major at National Taiwan University. He is currently working on symbolic music generation using deep learning.

slide

Click me to download

Back

13:30-14:00

Anna Kent-Muller

A Formula for Music Similarity: Utilising Score-Based Recommendation

abstract

Similarity plays an important role in determining whether two pieces of music have a relationship. Currently, audio-based methods are the dominant approach for music similarity analysis. These methods struggle to extract high-level musical features, meaning that these are not incorporated into similarity comparisons. I propose that score-based music analysis can provide a more useful similarity comparison than audio analysis. This paper will examine the perceived audibility of score-based approaches to musical similarity through a listening study. Melody, Riemannian theory, and Schenkerian voice-leading were found to align with auditory perceptions of similarity. Aspects of derivation and internal similarity were also found to be important for similarity judgements.

bio

Anna Kent-Muller is a second year PhD student in the Web Science Institite. Her research focuses on musical similarity, with particular interests in music recommendation and copyright infringement. In October 2017 she presented a paper in Shanghai at the Digital Libraries for Musicology workshop on how digital techniques can transform the field of musicology. In the coming months she will a paper on her concept of ‘Big Musicology’ at the Netherlands institute for permanent access to digital research resources (DANS). This will lead to research placement at Utrecht University (October-December 2018).

slide

Click me to download

Back

14:00-14:30

Yin-Jyun Luo

Singing Voice Correction using Canonical Time Warping and Domain Adaptive Features

abstract

Expressive singing voice correction is an appealing but challenging problem. A promising solution is through an accurate temporal alignment which synchronizes two singing recordings, whose performance depends on both a robust time-warping algorithm and feature representations for alignments. We thereby divide this talk into two parts. In the first part, we propose to address the problem of singing voice correction by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Moreover, we demonstrate the applicability of the proposed method in a practical, real-world scenario. In the second part, we demonstrate the capability of domain-adaptive representation learning for this problem. Our proposed model contains a Variational Autoencoder (VAE) that encodes monophonic singing into a latent space, and the resulting latent representations along with its model parameters are then reused to regularize the representation learning of polyphonic singing. The experiments on cross-domain music alignment, namely monophonic-to-polyphonic music alignment of singing voice show that the learned representations lead to higher alignment accuracy than that using conventional features.

bio

Yin-Jyun Luo is currently a research assistant in the Music and Culture Technology Lab lead by Dr. Li Su in Institute of Information Science, Academia Sinica, Taiwan. He received an Master of Science in Music Technology, National Chiao Tung University, Taiwan, and will soon begin his PhD in Agency for Science, Technology and Research (A*STAR), Singapore. Yin-Jyun’s research interest is focused on deep learning and its applications of music signal processing.

slide

Click me to download

Back

14:30-15:00

Clarissa Brough

Constructions of Online Identity: Active and Reflexive Identity Work on Spotify

abstract

For some, the emergence of interactive music technologies during the twenty-first century has transformed the dissemination and consumption of music. One of the most recent innovations has been the development and increase in on-demand music streaming platforms, such as Spotify, Pandora and Apple Music. These streaming services are now popular sources for listening to, sharing, rating and recommending music. Music is a powerful resource for individual and collective identities, so what do these music streaming platforms mean for our identity work? In this paper, I will briefly outline some of the ways that music accumulated on a particular platform, Spotify, can enable users to perform active identity work. I also convey how Spotify attempts to reflect a user’s online identity by generating personalised recommendations, an act that I term ‘profile construction’. Ultimately, I demonstrate how this reflexive identity could potentially culminate in processes of self-fashioning through music being governed by the technology of recommender systems.

bio

Clarissa Brough is a second year PhD student in Web Science and Music. She has a keen interest in the study of popular music, modes of music consumption and more broadly gender and identity work. Her current research, funded by the Electronic and Physical Sciences Research Council, studies music streaming platforms, which are becoming more popular sources for listening to, sharing, rating and recommending music. In particular, she investigates how music streaming platforms with integrated recommender systems have the potential to significantly influence how processes of self-fashioning through music work are achieved today. This is a very interdisciplinary project, supervised by academics in Music, Sociology and Business.

slide

Click me to download

Back

Get in touch

楊奕軒 Yi-Hsuan Yang

  • affige@gmail.com

謝宗翰 Bill Hsieh

  • bill317996@gmail.com