Diving Deep into Speech Accent Classification: A Case Study
Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model

After training the full sample during 10 epochs (the process takes about 1 hour of NVIDIA TESLA P100 GPU available for Kaggle users), the accuracy has increased from 26% to about 92.7%:

Here is an example of a classification pipeline based on two audio samples (foreign and native):


Finally, the model is saved to the Huggingface repository.
An example of how to use the model from the repository website:

As expected, my sample recorded speech is detected as a foreign (non-native) speaker with high confidence.
I hope these results can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.