Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards understanding attack specific condition...

Towards understanding attack specific conditions and their contributions in spoofing detection

In this work, Bhusan Chettri analyses the performance of the voice spoofing detection system under varied spoofing attack conditions. The work is focussed on replay spoofing attack which involves playing back pre-recorded samples of target speaker into voice-authentication systems.

Bhusan Chettri

February 17, 2023
Tweet

More Decks by Bhusan Chettri

Other Decks in Research

Transcript

  1. Analysing Replay Spoofing Countermeasure Performance Under Varied Conditions Bhusan Chettri1,

    Bob L. Sturm2, Emmanouil Benetos1 1School of EECS, Queen Mary University of London, United Kingdom 2School of EECS, KTH Royal Institute of Engineering, Stockholm, Sweden September 18, 2018
  2. Automatic speaker verification (ASV) Is the speaker who he/she claims

    to be? APPLICATION: User authentication (eg. banks, call centres, smart phones etc.)
  3. Genuine Vs replayed speech RedDots[1] corpus used for simulating replay

    attack and ASVspoof 2017 corpus [2] created. Factors of interest in a replay attack: Acoustic Environment (AE), Playback Device (PD) and Recording Device (RD). [1] K.A. Lee and others, “The RedDots Data Collection for Speaker Recognition”, in Interspeech 2015. [2] H. Delgado and others, “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, In Speaker Odyssey 2018.
  4. Replay spoofing corpus Table 1: Statistics of the ASVspoof 2017

    2.0 corpora [1]. RC denotes replay configurations1 . Dur: Duration in hours. Subset # Spk # RC # Genuine # Replay Dur Train 10 3 1507 1507 2.22 Development 8 10 760 950 1.44 Evaluation 24 57 1298 12008 11.94 Total 42 61 3565 14465 15.6 [1] H. Delgado and M. Todisco and Md. Sahidullah and N. Evans and T. Kinnunen and K.A Lee and J. Yamagishi. “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, In Speaker Odyssey 2018. 1Unique combination of Acoustic Environment, Playback device and Recording device in a replay attack.
  5. Research goals: motivated from [1] Which factor influences the most?

    Acoustic environment (AE)? Playback device (PD)? Recording device (RD)? Interaction of AE, RD and PD in a replay attack? Analyse countermeasures for different replay conditions. [1] H. Delgado and M. Todisco and Md. Sahidullah and N. Evans and T. Kinnunen and K.A Lee and J. Yamagishi. “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, In Speaker Odyssey 2018.
  6. Experimental setup and results Table 2: Performance in terms of

    Equal Error Rate (EER)2. Id System Features EER% 1 Baseline [1] Constant-Q Cepstral Coefficients (CQCC) 12.2 2 CNN2 Power spectrograms 27.8 3 GMM1 Mel-Frequency Cepstral Coefficients (MFCCs) 27.8 GMM2 Inverted-MFCCs (IMFCC) 18.3 4 SVM1 i-vectors [2] derived from MFCCs 24.6 SVM2 i-vectors derived from IMFCCs 16.3 5 Fused4 Score-level fusion of systems in 3-4 11.0 [1] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, In Speaker Odyssey 2018. [2] N. Dehak et al., “Front-End Factor Analysis for Speaker Verification”, IEEE TASLP, 2011. 2Official metric used in the ASVspoof 2017 challenge. Operation point where false acceptance rate and false rejection rate are equal.
  7. Impact under different quality of replay attacks Table 3: Countermeasure

    performance (EER%) under “low” quality and “high” quality replay attack conditions in the evaluation subset. Quality ID Baseline Fused4 CNN2 GMM1 GMM2 SVM1 SVM2 Low RC15 8.0 10.7 19.9 8.0 23.2 13.6 24.2 RC16 9.0 6.6 21.4 12.5 13.9 13.3 18.0 RC19 10.5 8.5 49.9 7.0 23.0 3.5 26.0 High RC55 15.0 11.0 9.8 42.9 3.8 47.0 3.5 RC56 36.0 29.2 22.5 43.4 26.5 48.1 22.1 RC57 33.0 27.4 26.6 44.3 26.4 49.6 22.3 GMM2 show superior performance for high-quality replay attacks but opposite for low-quality attacks.
  8. Frame-level analysis of IMFCC-based GMM2 system 0 20 40 60

    80 100 −7 −6 −5 −4 −3 Log Energy 0 20 40 60 80 100 # Frames −50 −40 −30 −20 −10 0 Log Likelihood 0 20 40 60 80 100 −10 −8 −6 −4 −2 0 20 40 60 80 100 # Frames −150 −100 −50 0 50 100 Figure 1: Log energy and log-likelihood distribution across the first 100 frames for replayed (left) and genuine (right) examples in RC55 using GMM2. Green profile: log-likelihood difference; Orange profile: Spoof GMM log-likelihood; Blue: Genuine GMM log-likelihood.
  9. Conclusions 1. Difficult to analyse the factors that influence the

    most in a replay spoofing detection - and their interaction ! 2. Inappropriate to claim that ambient, reverberation noise is a key to replay detection - A contradiction to the findings of [1]. 3. IMFCCs seem to perform well for high-quality attacks but show worse performance on low-quality attacks – A contradiction ! 4. We find that models also tend to use data-specific attributes during class prediction. [1] H. Delgado and M. Todisco and Md. Sahidullah and N. Evans and T. Kinnunen and K.A Lee and J. Yamagishi. “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, In Speaker Odyssey 2018.