This paper reports on experiments that demonstrate the importance of feature selection as well as generalization towards deepfake methods that deviate from training distribution; it presents the CtrSVDD dataset which was curated for controlled singing voice deepfake protection with enhanced controllability, diversity, and data openness.
This paper discusses the impact of recent singing voice synthesis and conversion advancements, and the resulting need for singing voice deepfake detection (SVDD) models. It introduces the CtrSVDD model, a large-scale, diverse collection of bonafide and deepfake singing vocals, which are synthesized using cutting edge methods from publicly accessible singing voice datasets, including 47.64 hours of bonafide and 260.34 hours of deepfake singing vocals, and spanning 14 deepfake methods and 164 singer identities. The CtrSVDD benchmark dataset was curated for controlled SVDD with enhanced controllability, diversity, and data openness with the hope that it will accelerate research toward SVDD. The paper describes the CtrSVDD dataset design, baseline systems, and the experiments and results that led to the CtrSVDD model presented here. The CtrSVDD dataset, baseline system implementations, and trained model weights are publicly accessible.
Downloads
Similar Publications
- Utilizing Derivatizing Agents for the Differentiation of Cannabinoid isomers in Complex Food, Beverage and Personal-care Product Matrices by Ambient Ionization Mass Spectrometry
- Determining the Precision of High-Throughput Sequencing and Its Influence on Aptamer Selection
- Prosecuting Cold Cases Using DNA