What about classifying who is speaking and just muting audio when Lex is speaking? First, extract a lot of samples (e.g 2-5 seconds of Audio) from a lot of his podcasts and label them as 1/0 or Lex/Other person speaking.
Take those samples and convert them to a frequency spectrum. For each sample, average (or use max, min, whatever) the values over the time sample. Take bins of values (e.g. 100hz, 120 hz, 140hz), and filter out all values outside of the human speaking range.
What you then have is a training set that is a set of features that are the amplitude of each frequency, and a target of 1 (Lex is speaking) or 0 (Somebody else is speaking).
Use your ML or Deep Learning Algo of choice to see if you can get useful results out of it.
What about classifying who is speaking and just muting audio when Lex is speaking? First, extract a lot of samples (e.g 2-5 seconds of Audio) from a lot of his podcasts and label them as 1/0 or Lex/Other person speaking.
Take those samples and convert them to a frequency spectrum. For each sample, average (or use max, min, whatever) the values over the time sample. Take bins of values (e.g. 100hz, 120 hz, 140hz), and filter out all values outside of the human speaking range.
What you then have is a training set that is a set of features that are the amplitude of each frequency, and a target of 1 (Lex is speaking) or 0 (Somebody else is speaking).
Use your ML or Deep Learning Algo of choice to see if you can get useful results out of it.