You could achieve this in a combination of pulseaudio and ffmpeg:
**Code based on the python pulsectl lib **
pulse = pulsectl.Pulse("Test1")
Retrieve the pulse.sink_input_list -which only exists if sink is present (e.g. a mic) ->pulseSinkInputInfoList.
pulseSinkInputInfoList = pulse.sink_input_list()
monName =pulse.sink_info(pulseSinkInputInfoList[0].sink).monitor_source_name
sources = pulse.source_list()
for pulseSourceInfo in sources:
if pulseSourceInfo.name==monName:
while True:
mos=pulseSourceInfo.index
peak= pulse.get_peak_sample(mos, 0.2)
if peak > 0:
execute ffmpeg like:
fmpeg -f pulse -i alsa_input.pci-0000_00_1b.0.analog-stereo -ac 1 recording.m4a
This is rather a stack overflow theme, therefore I kept it short - just to give you a gist of it.
The base idea is, that you use pulse audio to detect some silence (or the opposite) in the "peak" line. Then you could execute an ffmpeg command to record a fragment. To my knowlege you'll end up with a bunch of short clip which you could join/concat afterwards using ffmepg's concat protocol.
Infos about ffmpegs pulse protocol and concat protocol
An implementation of that protocol in python can be found here