How lazy can you possibly get? Well, that’s my story on how to transcribe text from boring videos, and checking for the keywords, before even checking the video/audio.
For the start, first cornerstone was to actually get the video. Most of the Streaming Players use HLS media player that heavily rely on m3u8 extension (those who remember playlists in WinAmp, might remember it), that sets base URL for all the video segments that will be streamed.
If you hit “Play” on the media player, while having your DevTools Network tab open, you’d see something like that:
After some time google around Python/PHP http bindings to fetch the content, the most optimal solution was
ffmpeg -i http://example.org/playlist.m3u8 -c copy -bsf:a aac_adtstoasc output.mp4
Once done, you can check the video for consistency (either with -i command, or simply scrolling through the video).
Just to save on whole procedure, we convert mp4 to only mp3 audio stream with “`ffmpeg“` once again:
ffmpeg -i video.mp4 -b:a 192K -vn music.mp3
Since we have mp3 ready for being check, Amazon Transcribe kicks in, but you need to store your mp3 somewhere. The easiest way is to get yourself S3 bucket from Amazon, and point S3 URL of the file using Transcribe.
Overall result, of the same 1.5 hrs video being converted into transcribed text, with enabled/disabled speakers identification. Approximately 25-30 mins to get 1.5 MB JSON file of the text, with separate
spk_1|spk_2 and time codes.