Score:1

I want FFmpeg to quit recording of mic when it detects silence below a threshold

in flag

I am trying to make a real time speech to text transcription script. The below script works for recorded wav file.

 from asrecognition import ASREngine
 asr = ASREngine("tr", model_path="mpoyraz/wav2vec2-xls-r-300m-cv6-turkish")
 audio_paths = ["prerecorded.wav"]
 transcriptions = asr.transcribe(audio_paths)
 print(transcriptions)

But I want to capture voice from microphone, when a silence level for a while is detected,
recording of voice must stop and it will be piped to speech recognition engine for transcription. Then recording of voice from mic must restart again.

I thought FFmpeg could achieve it, but how?

Score:1
zw flag

You could achieve this in a combination of pulseaudio and ffmpeg:

**Code based on the python pulsectl lib **

pulse = pulsectl.Pulse("Test1")

  1. Retrieve the pulse.sink_input_list -which only exists if sink is present (e.g. a mic) ->pulseSinkInputInfoList.

    pulseSinkInputInfoList = pulse.sink_input_list()
    monName =pulse.sink_info(pulseSinkInputInfoList[0].sink).monitor_source_name
    sources = pulse.source_list()
    for pulseSourceInfo in sources:
     if pulseSourceInfo.name==monName:
         while True:
             mos=pulseSourceInfo.index
             peak= pulse.get_peak_sample(mos, 0.2)
             if peak > 0:
                 execute ffmpeg like:
                   fmpeg -f pulse -i alsa_input.pci-0000_00_1b.0.analog-stereo -ac 1 recording.m4a
    

This is rather a stack overflow theme, therefore I kept it short - just to give you a gist of it. The base idea is, that you use pulse audio to detect some silence (or the opposite) in the "peak" line. Then you could execute an ffmpeg command to record a fragment. To my knowlege you'll end up with a bunch of short clip which you could join/concat afterwards using ffmepg's concat protocol.

Infos about ffmpegs pulse protocol and concat protocol

An implementation of that protocol in python can be found here

in flag
Thank you for the idea you gave. Though I'm not a professional coder, Python is a library rich programming language. I found this article which is worth reading https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279
kanehekili avatar
zw flag
I didn't know something like this exists. Interesting read, thanks for the link
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.