-
Vosk's basic example took forever and transcribed nothing. Whisper is getting stuff, but the "medium"-sized model has taken 1.5 hours for 30 minutes of audio. It did say it would take double the time, so I guess that's sort of in the ballpark?
Anyway, the future is not as futuristic as one might expect.