Session 2

SESSION 2: 2.00 - 4.00 pm
Session chair: Gaël Richard (Télécom ParisTech)

2.00 - 2.25 pm
Music Identification via Vocabulary Tree with MFCC Peaks
Tianjing Xu (Beihang University); Adams Wei Yu (Beihang University); Xianglong Liu (Beihang University); Bo Lang (Beihang University)

In this paper, a Vocabulary Tree based framework is proposed for music identification whose target is to recognize a fragment from a song database. the key to a high recognition precision within this framework is a novel feature, namely MFCC Peaks, which is a combination of MFCC and Spectral Peaks features. Our approach consists of three stages. We first build the Vocabulary Tree with 2 million MFCC Peaks features extracted from hundreds of music. Then each song in the database is quantified into some words by traveling from root down to a certain leaf. Given a query input, we apply the same quantization procedure to this fragment, score the archive according to the TF-IDF scheme and return the best matches. The experimental results demonstrate that our proposed feature has strong identifying and generalization ability. Other trials show that our approach scales well with the size of database. Further comparison also demonstrates thát while our algorithm achieves approximately the same retrieval precision as other state-of-the-art methods, it cost less time and memory.

2.25 - 2.50 pm
Finding Geographically Representative Music via Social Media
Charles Parker (Eastman Kodak Company); Dhiraj Joshi (Eastman Kodak Company); Phoury Lei (Eastman Kodak Company); Jiebo Luo (Eastman Kodak Company)

People can draw a myriad of semantic associations with music. The semantics can be geographical, ethnographical, society- or time-driven, or simply personal. For certain types of music, however, this semantic association is more prominent and coherent across most peoples. Such music cn often serve as an ideal accompaniment for a user activity or setting (that shares the semantics of the music), especially in media authoring applications. Among the strongest associations a piece of music can have is with the geographical area from which it generates. With video-sharing in sites such as YouTube having become a norm, one would expect that music videos tagged with a geographic location keyword are representative of the respective geographical theme. However, in the past few years, the proliferance of western pop culture throughout the world has resulted in popularity of ethnic pop (resembling Western pop) that sounds quite distinct from traditional regional music. While a human expert may easily distinguish between such ethnic pop and traditional regional music, the problem of automatically differentiating between them is still new. The problem becomes more challenging with similarities in music from many different regions. In this paper, we attempt to automatically identify music with strong geographical semantics (that is, "traditional-sounding music" for different geographical regions), using only music gathered from social media sources as our training and testing data. We also explore the use of hierarchical clustering to discover relationships between the music of different cultures, again using only social media.

2.50 - 3.15 pm
Music Genre Classification using Explicit Semantic Analysis
Kamelia Aryafar (Drexel University); Ali Shokoufandeh (Drexel University)

Music genre classification is the categorization of a piece of music into its corresponding categorical labels created by humans and has been traditionally performed through a manual process. Automatic music genre classification, a fundamental problem in the musical information retrieval community, has been gaining more attention with advances in the development of the digital music industry. Most current genre classification methods tend to be based on the extraction of short-time features in combination with high-level audio features to perform genre classification. However, the representation of short-time features, using time windows, in a semantic space has received little attention. This paper proposes a vector space model of mel-frequency cepstral coefficients (MFCCs) that can, in turn, be used by a supervised learning schema for music genre classification. Inspired by explicit semantic analysis of textual documents using term frequency-inverse document frequency (tf-idf), a semantic space model is proposed to represent music samples. The effectiveness of this representation of audio samples is then demonstrated in music genre classification using various machine learning classification algorithms, including support vector machines (SVMs) and k-nearest neighbor clustering. Our preliminary results suggest that the proposed method is comparable to genre classification methods that use low-level audio features.

3.15 - 4.00 pm
PANEL - Bridging Opportunities for the Music and Multimedia Domains
Panelists: Ye Wang (National University of Singapore), Jean Bresson (IRCAM), Alan Hanjalic (Delft University of Technology), Gerald Friedland (UC Berkeley), Douglas Eck (Google Inc.)
Moderator: Cynthia Liem (Delft University of Technology)

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer