docker build -t stt-service .
docker run -it –rm –name stt-container stt-service
docker exec stt-container touch /tmp/sl5_record.trigger
Trying to containerize the application with Docker is a fantastic “fancy” step. It’s the ultimate way to solve the “it works on my machine” problem by packaging the application and all its dependencies into a single, portable image.
However, we will run into some fundamental challenges because this application is designed to interact with the host’s desktop (audio, keyboard). This is something Docker is explicitly designed to prevent.
How to Build and Run the Docker Image¶
Build the image: Open a terminal in your project root and run:
docker build -t stt-service .
Run the container:
docker run -it --rm --name stt-container stt-service
The Result: What Works and What (Critically) Does Not¶
With some luck, the container will build and run. You should see the log output from dictation_service.py
indicating that it has started, loaded the models, and is now waiting.
This is a partial success! The core Python application and its dependencies are running in a perfectly isolated environment.
HOWEVER, the application is now fundamentally broken due to Docker’s design:
NO Microphone Access: The container is isolated from your host’s hardware. The
sounddevice
library will fail when it tries to find an input device.Workaround (Linux only): You can try to mount the host’s sound device into the container by adding
--device /dev/snd
to yourdocker run
command. This is complex and host-specific.
NO Typing Output (
xdotool
): The container has no access to your host’s desktop environment or windows. It cannot “type” text into another application. This functionality is completely broken by design.NO Desktop Notifications (
notify-send
): Same as above. The container cannot send notifications to your host’s desktop.NO File Trigger (
inotify
): Theinotify
-based file trigger will not work as you expect. You cannot simplytouch /tmp/sl5_record.trigger
on your host machine. You would have to use a separate command to create the file inside the running container:docker exec stt-container touch /tmp/sl5_record.trigger
Conclusion: “Fancy” but Fundamentally Incompatible¶
Creating this Dockerfile proves that the application’s core logic can be packaged. However, it also proves that the application’s current design—relying on direct hardware (mic) and desktop (typing, notifications) interaction—is fundamentally incompatible with containerization.
To make this truly work in Docker, the application would need to be re-architected:
Instead of listening to a local mic, it would need to accept an audio stream over the network (e.g., via a web API).
Instead of typing text with
xdotool
, it would need to return the transcribed text via that same web API.