U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

SingFake: Singing Voice Deepfake Detection

NCJ Number
308810
Author(s)
Yongyi Zang; You Zhang; Mojtaba Heydari; Zhiyao Duan
Date Published
2023
Annotation

This paper proposes the Singing Voice Deepfake Detection (SVDD) task using a dataset named SingFake; the paper describes the researchers’ methodology, data collection and analysis, and study results, and discusses another report on a similar topic dealing with clean singing voices mixed with instrumental music for Chinese songs under controlled conditions.

Abstract

The rise of singing voice synthesis presents critical challenges to artists and industry stakeholders over unauthorized voice usage. Unlike synthesized speech, synthesized singing voices are typically released in songs containing strong background music that may hide synthesis artifacts. Additionally, singing voices present different acoustic and linguistic characteristics from speech utterances. These unique properties make singing voice deepfake detection a relevant but significantly different problem from synthetic speech detection. In this work, the authors propose the singing voice deepfake detection task. They first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide and 29.40 hours of deepfake song clips in five languages from 40 singers. The authors provide a train/validation/test split where the test sets include various scenarios. They then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances. THey find these systems lag significantly behind their performance on speech test data. When trained on SingFake, either using separated vocal tracks or song mixtures, these systems show substantial improvement. However, the authors’ evaluations also identify challenges associated with unseen singers, communication codecs, languages, and musical contexts, calling for dedicated research into singing voice deepfake detection. The SingFake dataset and related resources are available. (Published Abstract Provided)