# Whisper Transcription TUI A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing. ## Features - **Dual Transcription Methods**: - OpenAI Whisper API (fast, cloud-based, paid per minute) - Local Whisper (free, offline, slower) - **Flexible File Selection**: Browse and select single or multiple audio/video files - **Persistent Configuration**: Remember API keys and output directory between sessions - **Beautiful TUI**: Modern terminal interface similar to OpenCode - **Markdown Output**: Transcriptions saved as markdown with metadata headers - **Multi-file Processing**: Batch transcribe multiple files sequentially ## Supported Formats ### Audio MP3, WAV, M4A, FLAC, OGG, WMA, AAC ### Video MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V ## Installation ### Prerequisites - Python 3.8 or higher - FFmpeg (for audio/video processing) ### FFmpeg Installation #### Windows ```bash # Using Chocolatey choco install ffmpeg # Or using Scoop scoop install ffmpeg # Or download from https://ffmpeg.org/download.html ``` #### macOS ```bash brew install ffmpeg ``` #### Linux ```bash # Ubuntu/Debian sudo apt-get install ffmpeg # Fedora sudo dnf install ffmpeg # Arch sudo pacman -S ffmpeg ``` ### Python Package Installation 1. Clone the repository: ```bash git clone cd 00_Whisper ``` 2. Create and activate a virtual environment: ```bash # Windows python -m venv venv venv\Scripts\activate # macOS/Linux python3 -m venv venv source venv/bin/activate ``` 3. Install dependencies: ```bash pip install -r requirements.txt ``` ## Configuration ### OpenAI API Key Setup 1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys) 2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key 3. The key will be saved to `.env` file and reused in future sessions ⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`. ### Local Whisper Model You can configure the local Whisper model size by editing your `.env` file: ```bash # Options: tiny, base, small, medium, large # Default: base WHISPER_MODEL=base ``` Larger models are more accurate but require more memory and time. ## Usage ### Running the Application ```bash python main.py ``` ### Basic Workflow 1. **Select Output Directory** (first time only) - Navigate using arrow keys - Press "Select" to choose a directory for transcription outputs 2. **Select Files** - Navigate the directory tree to find your audio/video files - Click "Add File" to add files to the queue - Add multiple files or just one - Click "Continue" when done 3. **Choose Transcription Method** - Select between OpenAI API (fast, paid) or Local Whisper (free, slower) - If using OpenAI and no key is configured, you'll be prompted to enter it 4. **View Progress** - Monitor transcription progress in real-time - Each file is processed sequentially 5. **Review Results** - See list of successfully transcribed files - Output files are saved as markdown in your configured directory ### Output Format Each transcription is saved as a markdown file with the format: `{filename}_transcription.md` ```markdown # Transcription: meeting.mp4 **Source File:** meeting.mp4 **Date:** 2025-01-20T10:30:45.123456 **Duration:** 1:23:45 **Language:** en **Word Count:** 2847 --- [Transcribed text here...] ``` ## Keyboard Shortcuts - **Tab**: Navigate between buttons and widgets - **Enter**: Activate selected button - **Arrow Keys**: Navigate menus and trees - **Ctrl+C**: Exit application at any time ## Troubleshooting ### "API key not found" error Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains: ``` OPENAI_API_KEY=your_key_here ``` ### "File format not supported" error Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats. ### Local Whisper is very slow This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`: ``` WHISPER_MODEL=tiny ``` ### FFmpeg not found error Ensure FFmpeg is installed and in your system PATH. Test by running: ```bash ffmpeg -version ``` ### Permission denied errors Ensure the output directory is writable and the application has permission to create files there. ## Project Structure ``` . ├── main.py # Entry point ├── requirements.txt # Python dependencies ├── .env.example # Environment template ├── .gitignore # Git ignore rules ├── README.md # This file └── src/ ├── __init__.py ├── app.py # Main TUI application ├── config.py # Configuration management ├── transcriber.py # Transcription service ├── formatter.py # Markdown formatting └── file_handler.py # File handling utilities ``` ## Development ### Running Tests ```bash # Coming soon ``` ### Building Documentation ```bash # Coming soon ``` ## Performance Tips 1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files 2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better 3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory 4. **Batch Processing**: Process multiple files to amortize setup time ## API Costs ### OpenAI Whisper API - Current pricing: $0.02 per minute of audio - Minimum charge per request Check [OpenAI Pricing](https://openai.com/pricing/) for current rates. ## Limitations - Local Whisper models can be large (up to 3GB for "large") - First-time local Whisper setup downloads the model (~1-2GB for "base") - OpenAI API requires internet connection and active API credits - Batch processing is sequential (one file at a time) ## Contributing Contributions are welcome! Please feel free to submit pull requests or open issues. ## License This project is provided as-is for educational and personal use. ## Support For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers. ## Acknowledgments - [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology - [Textual](https://github.com/textualize/textual) for the TUI framework - [OpenCode](https://opencode.ai) for UI/UX inspiration