Diving Deep into Speech Accent Classification: A Case Study

Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model

May 27, 2023

After training the full sample during 10 epochs (the process takes about 1 hour of NVIDIA TESLA P100 GPU available for Kaggle users), the accuracy has increased from 26% to about 92.7%:

Source: author, speech_accent_classification | Kaggle

Here is an example of a classification pipeline based on two audio samples (foreign and native):

Finally, the model is saved to the Huggingface repository.

An example of how to use the model from the repository website:

Source: author, screenshot from facebook/wav2vec2-base-960h · Hugging Face

As expected, my sample recorded speech is detected as a foreign (non-native) speaker with high confidence.

I hope these results can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.

Dmytro’s Newsletter

Diving Deep into Speech Accent Classification: A Case Study

Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model