NicholaiVogel c5ccd14048 feat: Initial Whisper Transcription TUI implementation

- Added Textual-based TUI with file selection and progress monitoring

- Implemented transcription service with OpenAI API and local Whisper backends

- Added markdown formatter for transcription output

- Configuration management for persistent API keys and output directory

- Comprehensive README with installation and usage instructions

- Support for multi-file batch processing

- Beautiful terminal UI with modal dialogs for user input

2025-10-20 16:43:35 -06:00

6.4 KiB

Raw Blame History

Whisper Transcription TUI

A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.

Features

Dual Transcription Methods:
- OpenAI Whisper API (fast, cloud-based, paid per minute)
- Local Whisper (free, offline, slower)
Flexible File Selection: Browse and select single or multiple audio/video files
Persistent Configuration: Remember API keys and output directory between sessions
Beautiful TUI: Modern terminal interface similar to OpenCode
Markdown Output: Transcriptions saved as markdown with metadata headers
Multi-file Processing: Batch transcribe multiple files sequentially

Supported Formats

Audio

MP3, WAV, M4A, FLAC, OGG, WMA, AAC

Video

MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V

Installation

Prerequisites

Python 3.8 or higher
FFmpeg (for audio/video processing)

FFmpeg Installation

Windows

# Using Chocolatey
choco install ffmpeg

# Or using Scoop
scoop install ffmpeg

# Or download from https://ffmpeg.org/download.html

macOS

brew install ffmpeg

Linux

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Fedora
sudo dnf install ffmpeg

# Arch
sudo pacman -S ffmpeg

Python Package Installation

Clone the repository:

git clone <repository-url>
cd 00_Whisper

Create and activate a virtual environment:

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Configuration

OpenAI API Key Setup

Get your API key from OpenAI Platform
The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
The key will be saved to .env file and reused in future sessions

⚠️ Important: Never commit your .env file to version control. It's already in .gitignore.

Local Whisper Model

You can configure the local Whisper model size by editing your .env file:

# Options: tiny, base, small, medium, large
# Default: base
WHISPER_MODEL=base

Larger models are more accurate but require more memory and time.

Usage

Running the Application

python main.py

Basic Workflow

Select Output Directory (first time only)
- Navigate using arrow keys
- Press "Select" to choose a directory for transcription outputs
Select Files
- Navigate the directory tree to find your audio/video files
- Click "Add File" to add files to the queue
- Add multiple files or just one
- Click "Continue" when done
Choose Transcription Method
- Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
- If using OpenAI and no key is configured, you'll be prompted to enter it
View Progress
- Monitor transcription progress in real-time
- Each file is processed sequentially
Review Results
- See list of successfully transcribed files
- Output files are saved as markdown in your configured directory

Output Format

Each transcription is saved as a markdown file with the format: {filename}_transcription.md

# Transcription: meeting.mp4

**Source File:** meeting.mp4
**Date:** 2025-01-20T10:30:45.123456
**Duration:** 1:23:45
**Language:** en
**Word Count:** 2847

---

[Transcribed text here...]

Keyboard Shortcuts

Tab: Navigate between buttons and widgets
Enter: Activate selected button
Arrow Keys: Navigate menus and trees
Ctrl+C: Exit application at any time

Troubleshooting

"API key not found" error

Make sure you've entered your OpenAI API key when prompted. Check that your .env file contains:

OPENAI_API_KEY=your_key_here

"File format not supported" error

Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.

Local Whisper is very slow

This is normal for larger model sizes. Try using the "tiny" or "base" model in your .env:

WHISPER_MODEL=tiny

FFmpeg not found error

Ensure FFmpeg is installed and in your system PATH. Test by running:

ffmpeg -version

Permission denied errors

Ensure the output directory is writable and the application has permission to create files there.

Project Structure

.
├── main.py                    # Entry point
├── requirements.txt           # Python dependencies
├── .env.example              # Environment template
├── .gitignore                # Git ignore rules
├── README.md                 # This file
└── src/
    ├── __init__.py
    ├── app.py               # Main TUI application
    ├── config.py            # Configuration management
    ├── transcriber.py       # Transcription service
    ├── formatter.py         # Markdown formatting
    └── file_handler.py      # File handling utilities

Development

Running Tests

# Coming soon

Building Documentation

# Coming soon

Performance Tips

Use Local Whisper for Small Files: Local processing is free and fast for small audio files
Use OpenAI API for Accuracy: The cloud version handles edge cases better
Set Appropriate Model Size: Larger models are more accurate but slower and use more memory
Batch Processing: Process multiple files to amortize setup time

API Costs

OpenAI Whisper API

Current pricing: $0.02 per minute of audio
Minimum charge per request

Check OpenAI Pricing for current rates.

Limitations

Local Whisper models can be large (up to 3GB for "large")
First-time local Whisper setup downloads the model (~1-2GB for "base")
OpenAI API requires internet connection and active API credits
Batch processing is sequential (one file at a time)

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

License

This project is provided as-is for educational and personal use.

Support

For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.

Acknowledgments

OpenAI Whisper for transcription technology
Textual for the TUI framework
OpenCode for UI/UX inspiration

6.4 KiB Raw Blame History