In my research for removing translator's audio parts in a lecture, I've come across this term: Speaker diarization. It's the method to distinguish different speaker voices.
It was cute at first, but after listening to many of Stephen Tong's videos, I now find the translator parts very annoying and time wasting.
So this shall be my project.