- Added Textual-based TUI with file selection and progress monitoring - Implemented transcription service with OpenAI API and local Whisper backends - Added markdown formatter for transcription output - Configuration management for persistent API keys and output directory - Comprehensive README with installation and usage instructions - Support for multi-file batch processing - Beautiful terminal UI with modal dialogs for user input
6.4 KiB
Whisper Transcription TUI
A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.
Features
- Dual Transcription Methods:
- OpenAI Whisper API (fast, cloud-based, paid per minute)
- Local Whisper (free, offline, slower)
- Flexible File Selection: Browse and select single or multiple audio/video files
- Persistent Configuration: Remember API keys and output directory between sessions
- Beautiful TUI: Modern terminal interface similar to OpenCode
- Markdown Output: Transcriptions saved as markdown with metadata headers
- Multi-file Processing: Batch transcribe multiple files sequentially
Supported Formats
Audio
MP3, WAV, M4A, FLAC, OGG, WMA, AAC
Video
MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V
Installation
Prerequisites
- Python 3.8 or higher
- FFmpeg (for audio/video processing)
FFmpeg Installation
Windows
# Using Chocolatey
choco install ffmpeg
# Or using Scoop
scoop install ffmpeg
# Or download from https://ffmpeg.org/download.html
macOS
brew install ffmpeg
Linux
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Fedora
sudo dnf install ffmpeg
# Arch
sudo pacman -S ffmpeg
Python Package Installation
- Clone the repository:
git clone <repository-url>
cd 00_Whisper
- Create and activate a virtual environment:
# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
Configuration
OpenAI API Key Setup
- Get your API key from OpenAI Platform
- The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
- The key will be saved to
.envfile and reused in future sessions
⚠️ Important: Never commit your .env file to version control. It's already in .gitignore.
Local Whisper Model
You can configure the local Whisper model size by editing your .env file:
# Options: tiny, base, small, medium, large
# Default: base
WHISPER_MODEL=base
Larger models are more accurate but require more memory and time.
Usage
Running the Application
python main.py
Basic Workflow
-
Select Output Directory (first time only)
- Navigate using arrow keys
- Press "Select" to choose a directory for transcription outputs
-
Select Files
- Navigate the directory tree to find your audio/video files
- Click "Add File" to add files to the queue
- Add multiple files or just one
- Click "Continue" when done
-
Choose Transcription Method
- Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
- If using OpenAI and no key is configured, you'll be prompted to enter it
-
View Progress
- Monitor transcription progress in real-time
- Each file is processed sequentially
-
Review Results
- See list of successfully transcribed files
- Output files are saved as markdown in your configured directory
Output Format
Each transcription is saved as a markdown file with the format: {filename}_transcription.md
# Transcription: meeting.mp4
**Source File:** meeting.mp4
**Date:** 2025-01-20T10:30:45.123456
**Duration:** 1:23:45
**Language:** en
**Word Count:** 2847
---
[Transcribed text here...]
Keyboard Shortcuts
- Tab: Navigate between buttons and widgets
- Enter: Activate selected button
- Arrow Keys: Navigate menus and trees
- Ctrl+C: Exit application at any time
Troubleshooting
"API key not found" error
Make sure you've entered your OpenAI API key when prompted. Check that your .env file contains:
OPENAI_API_KEY=your_key_here
"File format not supported" error
Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.
Local Whisper is very slow
This is normal for larger model sizes. Try using the "tiny" or "base" model in your .env:
WHISPER_MODEL=tiny
FFmpeg not found error
Ensure FFmpeg is installed and in your system PATH. Test by running:
ffmpeg -version
Permission denied errors
Ensure the output directory is writable and the application has permission to create files there.
Project Structure
.
├── main.py # Entry point
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
├── README.md # This file
└── src/
├── __init__.py
├── app.py # Main TUI application
├── config.py # Configuration management
├── transcriber.py # Transcription service
├── formatter.py # Markdown formatting
└── file_handler.py # File handling utilities
Development
Running Tests
# Coming soon
Building Documentation
# Coming soon
Performance Tips
- Use Local Whisper for Small Files: Local processing is free and fast for small audio files
- Use OpenAI API for Accuracy: The cloud version handles edge cases better
- Set Appropriate Model Size: Larger models are more accurate but slower and use more memory
- Batch Processing: Process multiple files to amortize setup time
API Costs
OpenAI Whisper API
- Current pricing: $0.02 per minute of audio
- Minimum charge per request
Check OpenAI Pricing for current rates.
Limitations
- Local Whisper models can be large (up to 3GB for "large")
- First-time local Whisper setup downloads the model (~1-2GB for "base")
- OpenAI API requires internet connection and active API credits
- Batch processing is sequential (one file at a time)
Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues.
License
This project is provided as-is for educational and personal use.
Support
For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.
Acknowledgments
- OpenAI Whisper for transcription technology
- Textual for the TUI framework
- OpenCode for UI/UX inspiration