- Added Textual-based TUI with file selection and progress monitoring - Implemented transcription service with OpenAI API and local Whisper backends - Added markdown formatter for transcription output - Configuration management for persistent API keys and output directory - Comprehensive README with installation and usage instructions - Support for multi-file batch processing - Beautiful terminal UI with modal dialogs for user input
259 lines
6.4 KiB
Markdown
259 lines
6.4 KiB
Markdown
# Whisper Transcription TUI
|
|
|
|
A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.
|
|
|
|
## Features
|
|
|
|
- **Dual Transcription Methods**:
|
|
- OpenAI Whisper API (fast, cloud-based, paid per minute)
|
|
- Local Whisper (free, offline, slower)
|
|
- **Flexible File Selection**: Browse and select single or multiple audio/video files
|
|
- **Persistent Configuration**: Remember API keys and output directory between sessions
|
|
- **Beautiful TUI**: Modern terminal interface similar to OpenCode
|
|
- **Markdown Output**: Transcriptions saved as markdown with metadata headers
|
|
- **Multi-file Processing**: Batch transcribe multiple files sequentially
|
|
|
|
## Supported Formats
|
|
|
|
### Audio
|
|
MP3, WAV, M4A, FLAC, OGG, WMA, AAC
|
|
|
|
### Video
|
|
MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V
|
|
|
|
## Installation
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.8 or higher
|
|
- FFmpeg (for audio/video processing)
|
|
|
|
### FFmpeg Installation
|
|
|
|
#### Windows
|
|
```bash
|
|
# Using Chocolatey
|
|
choco install ffmpeg
|
|
|
|
# Or using Scoop
|
|
scoop install ffmpeg
|
|
|
|
# Or download from https://ffmpeg.org/download.html
|
|
```
|
|
|
|
#### macOS
|
|
```bash
|
|
brew install ffmpeg
|
|
```
|
|
|
|
#### Linux
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt-get install ffmpeg
|
|
|
|
# Fedora
|
|
sudo dnf install ffmpeg
|
|
|
|
# Arch
|
|
sudo pacman -S ffmpeg
|
|
```
|
|
|
|
### Python Package Installation
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd 00_Whisper
|
|
```
|
|
|
|
2. Create and activate a virtual environment:
|
|
```bash
|
|
# Windows
|
|
python -m venv venv
|
|
venv\Scripts\activate
|
|
|
|
# macOS/Linux
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
```
|
|
|
|
3. Install dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### OpenAI API Key Setup
|
|
|
|
1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
|
|
2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
|
|
3. The key will be saved to `.env` file and reused in future sessions
|
|
|
|
⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`.
|
|
|
|
### Local Whisper Model
|
|
|
|
You can configure the local Whisper model size by editing your `.env` file:
|
|
|
|
```bash
|
|
# Options: tiny, base, small, medium, large
|
|
# Default: base
|
|
WHISPER_MODEL=base
|
|
```
|
|
|
|
Larger models are more accurate but require more memory and time.
|
|
|
|
## Usage
|
|
|
|
### Running the Application
|
|
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
### Basic Workflow
|
|
|
|
1. **Select Output Directory** (first time only)
|
|
- Navigate using arrow keys
|
|
- Press "Select" to choose a directory for transcription outputs
|
|
|
|
2. **Select Files**
|
|
- Navigate the directory tree to find your audio/video files
|
|
- Click "Add File" to add files to the queue
|
|
- Add multiple files or just one
|
|
- Click "Continue" when done
|
|
|
|
3. **Choose Transcription Method**
|
|
- Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
|
|
- If using OpenAI and no key is configured, you'll be prompted to enter it
|
|
|
|
4. **View Progress**
|
|
- Monitor transcription progress in real-time
|
|
- Each file is processed sequentially
|
|
|
|
5. **Review Results**
|
|
- See list of successfully transcribed files
|
|
- Output files are saved as markdown in your configured directory
|
|
|
|
### Output Format
|
|
|
|
Each transcription is saved as a markdown file with the format: `{filename}_transcription.md`
|
|
|
|
```markdown
|
|
# Transcription: meeting.mp4
|
|
|
|
**Source File:** meeting.mp4
|
|
**Date:** 2025-01-20T10:30:45.123456
|
|
**Duration:** 1:23:45
|
|
**Language:** en
|
|
**Word Count:** 2847
|
|
|
|
---
|
|
|
|
[Transcribed text here...]
|
|
```
|
|
|
|
## Keyboard Shortcuts
|
|
|
|
- **Tab**: Navigate between buttons and widgets
|
|
- **Enter**: Activate selected button
|
|
- **Arrow Keys**: Navigate menus and trees
|
|
- **Ctrl+C**: Exit application at any time
|
|
|
|
## Troubleshooting
|
|
|
|
### "API key not found" error
|
|
Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains:
|
|
```
|
|
OPENAI_API_KEY=your_key_here
|
|
```
|
|
|
|
### "File format not supported" error
|
|
Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.
|
|
|
|
### Local Whisper is very slow
|
|
This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`:
|
|
```
|
|
WHISPER_MODEL=tiny
|
|
```
|
|
|
|
### FFmpeg not found error
|
|
Ensure FFmpeg is installed and in your system PATH. Test by running:
|
|
```bash
|
|
ffmpeg -version
|
|
```
|
|
|
|
### Permission denied errors
|
|
Ensure the output directory is writable and the application has permission to create files there.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
.
|
|
├── main.py # Entry point
|
|
├── requirements.txt # Python dependencies
|
|
├── .env.example # Environment template
|
|
├── .gitignore # Git ignore rules
|
|
├── README.md # This file
|
|
└── src/
|
|
├── __init__.py
|
|
├── app.py # Main TUI application
|
|
├── config.py # Configuration management
|
|
├── transcriber.py # Transcription service
|
|
├── formatter.py # Markdown formatting
|
|
└── file_handler.py # File handling utilities
|
|
```
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
```bash
|
|
# Coming soon
|
|
```
|
|
|
|
### Building Documentation
|
|
```bash
|
|
# Coming soon
|
|
```
|
|
|
|
## Performance Tips
|
|
|
|
1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files
|
|
2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better
|
|
3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory
|
|
4. **Batch Processing**: Process multiple files to amortize setup time
|
|
|
|
## API Costs
|
|
|
|
### OpenAI Whisper API
|
|
- Current pricing: $0.02 per minute of audio
|
|
- Minimum charge per request
|
|
|
|
Check [OpenAI Pricing](https://openai.com/pricing/) for current rates.
|
|
|
|
## Limitations
|
|
|
|
- Local Whisper models can be large (up to 3GB for "large")
|
|
- First-time local Whisper setup downloads the model (~1-2GB for "base")
|
|
- OpenAI API requires internet connection and active API credits
|
|
- Batch processing is sequential (one file at a time)
|
|
|
|
## Contributing
|
|
|
|
Contributions are welcome! Please feel free to submit pull requests or open issues.
|
|
|
|
## License
|
|
|
|
This project is provided as-is for educational and personal use.
|
|
|
|
## Support
|
|
|
|
For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.
|
|
|
|
## Acknowledgments
|
|
|
|
- [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology
|
|
- [Textual](https://github.com/textualize/textual) for the TUI framework
|
|
- [OpenCode](https://opencode.ai) for UI/UX inspiration
|