of Automatic Speaker Recognition Systems Figure 2 – Phases in Automatic Speaker Recognition Systems Feature extraction Modelling UBM, TVM Training phase Background models Feature extraction Model adaptation Speaker models Enrolment phase Feature extraction Compare against claimed model Compare against all models in the system identification ? verification ? Accept or reject Highest scoring model Testing phase Automatic Speaker Recognition Text Dependent Text Independent Speaker Identification Speaker Verification Unknown utterance Target speaker utterance Two types of task Two types of spoken text
models (GMM-UBM) [1] GMM-supervector+SVM [1] Joint factor analysis [2] i-vectors (state of the art)[2] Deep neural networks [3] 1. Tomi Kinnunen and Haizhou Li, ”An overview of text-independent speaker recognition: from features to supervectors”, Speech communication, 2010. 2. N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. ”Front-End Factor Analysis for Speaker Verification”, IEEE TASLP, 2011. 3. F. Richardson, D. Reynolds, and N. Dehak. ”Deep neural network approaches to speaker and language recognition”, IEEE Signal Processing Letters, October 2015.
• Spoofing attacks: ✔ Impersonation ✔ Replay ✔ Text-to-Speech ✔ Voice conversion Difficulty level: Spoofer perspective 1. Replay 2. Text-to-Speech 3. Voice conversion 4. Impersonation Difficulty level: research perspective 1. Text-to-Speech 2. Voice conversion 3. Replay 4. Impersonation Where do we stand ? 1. Replay 2. Impersonation Our main focus 1. Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li. “Spoofing and countermeasures for speaker verification: a survey”, Speech Communications, 2015. 2. https://lyrebird.ai/ 3. https://helpx.adobe.com/audition/using/text-to-speeech.html - ASV systems vulnerable to spoofing attacks [1] - Commercial applications : Adobe TTS, Lyrebird [2,3] Need for ASV anti-spoofing ??
SCMC APGDF Training features Genuine GMM Spoofed GMM Log likelihood ratio Parameterization Train / Dev /Eval Dev/Eval features GMM models GMM models score Decision Genuine or Spoofed Fig3: Single feature-based anti-spoofing system MFCC IMFCC LFCC RFCC LPCC SCMC APGDF Fig4: Score fusion based anti-spoofing system Individual system scores System scores Score fusion (AVG, LS, KNN, LASSO) Fused score Decision Genuine or Spoofed Primary system: KNN fusion IMFCC, MFCC, LFCC, RFCC, SCMC 512 mixture components Contrastive 1: KNN fusion All 7 single-feature systems Contrastive 2: LS fusion All 7 single-feature systems
%), on ASVspoof 2017 development and evaluation data. System Development set Evaluation set baseline 11.4 30.6 Primary 1.9 ± 0.73 34.78 Contrastive1 2.12 ±0.76 37.65 Contrastive2 3.25 ±0.84 36.33
Research collaboration: Sheffield University & University of Eastern Finland. 3. Literature review: ASV spoofing. 4. Actively been supervised: 18 supervision logs. 5. Submitted paper in Interspeech-2017. 6. Multi-variate analysis work (going on).
replay spoofing attacks. 2. Alternative applications of speaker models: spoken language learning, entertainment. 3. Investigating neural network approaches to anti-spoofing.