DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

**g, Li**; Liu, Bo; Choi, Jaeyoung; Janin, Adam; Bernd, Julia; Mahoney, Michael W.; Friedland, Gerald

Computer Science > Sound

arXiv:1607.04378 (cs)

[Submitted on 15 Jul 2016]

Title:DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

Authors:Li** **g, Bo Liu, Jaeyoung Choi, Adam Janin, Julia Bernd, Michael W. Mahoney, Gerald Friedland

View PDF

Abstract:This paper presents a novel two-phase method for audio representation, Discriminative and Compact Audio Representation (DCAR), and evaluates its performance at detecting events in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on Grassmannian manifolds, which we found represents the structure of audio effectively.
Our experiments used the YLI-MED dataset (an open TRECVID-style video corpus based on YFCC100M), which includes ten events. The results show that the proposed DCAR representation consistently outperforms state-of-the-art audio representations. DCAR's advantage over i-vector, mv-vector, and GMM representations is significant for both easier and harder discrimination tasks. We discuss how these performance differences across easy and hard cases follow from how each type of model leverages (or doesn't leverage) the intrinsic structure of the data. Furthermore, DCAR shows a particularly notable accuracy advantage on events where humans have more difficulty classifying the videos, i.e., events with lower mean annotator confidence.

Comments:	An abbreviated version of this paper will be published in ACM Multimedia 2016
Subjects:	Sound (cs.SD); Multimedia (cs.MM)
ACM classes:	H.5.1
Cite as:	arXiv:1607.04378 [cs.SD]
	(or arXiv:1607.04378v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1607.04378

Submission history

From: Li** **g Dr. [view email]
[v1] Fri, 15 Jul 2016 04:28:14 UTC (518 KB)

Computer Science > Sound

Title:DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators