CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

NCJ Number

309447

Date Published

September 2024

Author(s)

Yongyi Zang; Jiatong Shi; You Zhang; Ryuichi Yamamoto; Jionghao Han; Yuxun Tang; et al

Length

5 pages

Annotation

This paper reports on experiments that demonstrate the importance of feature selection as well as generalization towards deepfake methods that deviate from training distribution; it presents the CtrSVDD dataset which was curated for controlled singing voice deepfake protection with enhanced controllability, diversity, and data openness.

Abstract

This paper discusses the impact of recent singing voice synthesis and conversion advancements, and the resulting need for singing voice deepfake detection (SVDD) models. It introduces the CtrSVDD model, a large-scale, diverse collection of bonafide and deepfake singing vocals, which are synthesized using cutting edge methods from publicly accessible singing voice datasets, including 47.64 hours of bonafide and 260.34 hours of deepfake singing vocals, and spanning 14 deepfake methods and 164 singer identities. The CtrSVDD benchmark dataset was curated for controlled SVDD with enhanced controllability, diversity, and data openness with the hope that it will accelerate research toward SVDD. The paper describes the CtrSVDD dataset design, baseline systems, and the experiments and results that led to the CtrSVDD model presented here. The CtrSVDD dataset, baseline system implementations, and trained model weights are publicly accessible.

Date Published: September 1, 2024

Downloads

HTML

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Downloads

Related Topics

Similar Publications

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Additional Details

Downloads

Related Topics

Similar Publications