NicholaiVogel c5ccd14048 feat: Initial Whisper Transcription TUI implementation
- Added Textual-based TUI with file selection and progress monitoring

- Implemented transcription service with OpenAI API and local Whisper backends

- Added markdown formatter for transcription output

- Configuration management for persistent API keys and output directory

- Comprehensive README with installation and usage instructions

- Support for multi-file batch processing

- Beautiful terminal UI with modal dialogs for user input
2025-10-20 16:43:35 -06:00

259 lines
6.4 KiB
Markdown

# Whisper Transcription TUI
A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.
## Features
- **Dual Transcription Methods**:
- OpenAI Whisper API (fast, cloud-based, paid per minute)
- Local Whisper (free, offline, slower)
- **Flexible File Selection**: Browse and select single or multiple audio/video files
- **Persistent Configuration**: Remember API keys and output directory between sessions
- **Beautiful TUI**: Modern terminal interface similar to OpenCode
- **Markdown Output**: Transcriptions saved as markdown with metadata headers
- **Multi-file Processing**: Batch transcribe multiple files sequentially
## Supported Formats
### Audio
MP3, WAV, M4A, FLAC, OGG, WMA, AAC
### Video
MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V
## Installation
### Prerequisites
- Python 3.8 or higher
- FFmpeg (for audio/video processing)
### FFmpeg Installation
#### Windows
```bash
# Using Chocolatey
choco install ffmpeg
# Or using Scoop
scoop install ffmpeg
# Or download from https://ffmpeg.org/download.html
```
#### macOS
```bash
brew install ffmpeg
```
#### Linux
```bash
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Fedora
sudo dnf install ffmpeg
# Arch
sudo pacman -S ffmpeg
```
### Python Package Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd 00_Whisper
```
2. Create and activate a virtual environment:
```bash
# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
## Configuration
### OpenAI API Key Setup
1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
3. The key will be saved to `.env` file and reused in future sessions
⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`.
### Local Whisper Model
You can configure the local Whisper model size by editing your `.env` file:
```bash
# Options: tiny, base, small, medium, large
# Default: base
WHISPER_MODEL=base
```
Larger models are more accurate but require more memory and time.
## Usage
### Running the Application
```bash
python main.py
```
### Basic Workflow
1. **Select Output Directory** (first time only)
- Navigate using arrow keys
- Press "Select" to choose a directory for transcription outputs
2. **Select Files**
- Navigate the directory tree to find your audio/video files
- Click "Add File" to add files to the queue
- Add multiple files or just one
- Click "Continue" when done
3. **Choose Transcription Method**
- Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
- If using OpenAI and no key is configured, you'll be prompted to enter it
4. **View Progress**
- Monitor transcription progress in real-time
- Each file is processed sequentially
5. **Review Results**
- See list of successfully transcribed files
- Output files are saved as markdown in your configured directory
### Output Format
Each transcription is saved as a markdown file with the format: `{filename}_transcription.md`
```markdown
# Transcription: meeting.mp4
**Source File:** meeting.mp4
**Date:** 2025-01-20T10:30:45.123456
**Duration:** 1:23:45
**Language:** en
**Word Count:** 2847
---
[Transcribed text here...]
```
## Keyboard Shortcuts
- **Tab**: Navigate between buttons and widgets
- **Enter**: Activate selected button
- **Arrow Keys**: Navigate menus and trees
- **Ctrl+C**: Exit application at any time
## Troubleshooting
### "API key not found" error
Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains:
```
OPENAI_API_KEY=your_key_here
```
### "File format not supported" error
Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.
### Local Whisper is very slow
This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`:
```
WHISPER_MODEL=tiny
```
### FFmpeg not found error
Ensure FFmpeg is installed and in your system PATH. Test by running:
```bash
ffmpeg -version
```
### Permission denied errors
Ensure the output directory is writable and the application has permission to create files there.
## Project Structure
```
.
├── main.py # Entry point
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
├── README.md # This file
└── src/
├── __init__.py
├── app.py # Main TUI application
├── config.py # Configuration management
├── transcriber.py # Transcription service
├── formatter.py # Markdown formatting
└── file_handler.py # File handling utilities
```
## Development
### Running Tests
```bash
# Coming soon
```
### Building Documentation
```bash
# Coming soon
```
## Performance Tips
1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files
2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better
3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory
4. **Batch Processing**: Process multiple files to amortize setup time
## API Costs
### OpenAI Whisper API
- Current pricing: $0.02 per minute of audio
- Minimum charge per request
Check [OpenAI Pricing](https://openai.com/pricing/) for current rates.
## Limitations
- Local Whisper models can be large (up to 3GB for "large")
- First-time local Whisper setup downloads the model (~1-2GB for "base")
- OpenAI API requires internet connection and active API credits
- Batch processing is sequential (one file at a time)
## Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues.
## License
This project is provided as-is for educational and personal use.
## Support
For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.
## Acknowledgments
- [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology
- [Textual](https://github.com/textualize/textual) for the TUI framework
- [OpenCode](https://opencode.ai) for UI/UX inspiration