feat: Initial Whisper Transcription TUI implementation
- Added Textual-based TUI with file selection and progress monitoring - Implemented transcription service with OpenAI API and local Whisper backends - Added markdown formatter for transcription output - Configuration management for persistent API keys and output directory - Comprehensive README with installation and usage instructions - Support for multi-file batch processing - Beautiful terminal UI with modal dialogs for user input
This commit is contained in:
parent
7e405f7354
commit
c5ccd14048
8
.env.example
Normal file
8
.env.example
Normal file
@ -0,0 +1,8 @@
|
||||
# OpenAI API Key (required for API-based transcription)
|
||||
# Get your key from: https://platform.openai.com/api-keys
|
||||
OPENAI_API_KEY=your_api_key_here
|
||||
|
||||
# Optional: Whisper Model Size for local transcription
|
||||
# Options: tiny, base, small, medium, large
|
||||
# Default: base
|
||||
WHISPER_MODEL=base
|
||||
55
.gitignore
vendored
55
.gitignore
vendored
@ -1,2 +1,55 @@
|
||||
opencode
|
||||
# Environment and Configuration
|
||||
.env
|
||||
.env.local
|
||||
config.json
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
pip-wheel-metadata/
|
||||
share/python-wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
MANIFEST
|
||||
|
||||
# Virtual Environments
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
.venv
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Test and coverage
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
htmlcov/
|
||||
|
||||
# Output
|
||||
transcriptions/
|
||||
output/
|
||||
|
||||
258
README.md
Normal file
258
README.md
Normal file
@ -0,0 +1,258 @@
|
||||
# Whisper Transcription TUI
|
||||
|
||||
A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.
|
||||
|
||||
## Features
|
||||
|
||||
- **Dual Transcription Methods**:
|
||||
- OpenAI Whisper API (fast, cloud-based, paid per minute)
|
||||
- Local Whisper (free, offline, slower)
|
||||
- **Flexible File Selection**: Browse and select single or multiple audio/video files
|
||||
- **Persistent Configuration**: Remember API keys and output directory between sessions
|
||||
- **Beautiful TUI**: Modern terminal interface similar to OpenCode
|
||||
- **Markdown Output**: Transcriptions saved as markdown with metadata headers
|
||||
- **Multi-file Processing**: Batch transcribe multiple files sequentially
|
||||
|
||||
## Supported Formats
|
||||
|
||||
### Audio
|
||||
MP3, WAV, M4A, FLAC, OGG, WMA, AAC
|
||||
|
||||
### Video
|
||||
MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.8 or higher
|
||||
- FFmpeg (for audio/video processing)
|
||||
|
||||
### FFmpeg Installation
|
||||
|
||||
#### Windows
|
||||
```bash
|
||||
# Using Chocolatey
|
||||
choco install ffmpeg
|
||||
|
||||
# Or using Scoop
|
||||
scoop install ffmpeg
|
||||
|
||||
# Or download from https://ffmpeg.org/download.html
|
||||
```
|
||||
|
||||
#### macOS
|
||||
```bash
|
||||
brew install ffmpeg
|
||||
```
|
||||
|
||||
#### Linux
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install ffmpeg
|
||||
|
||||
# Fedora
|
||||
sudo dnf install ffmpeg
|
||||
|
||||
# Arch
|
||||
sudo pacman -S ffmpeg
|
||||
```
|
||||
|
||||
### Python Package Installation
|
||||
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd 00_Whisper
|
||||
```
|
||||
|
||||
2. Create and activate a virtual environment:
|
||||
```bash
|
||||
# Windows
|
||||
python -m venv venv
|
||||
venv\Scripts\activate
|
||||
|
||||
# macOS/Linux
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
3. Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### OpenAI API Key Setup
|
||||
|
||||
1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
|
||||
2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
|
||||
3. The key will be saved to `.env` file and reused in future sessions
|
||||
|
||||
⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`.
|
||||
|
||||
### Local Whisper Model
|
||||
|
||||
You can configure the local Whisper model size by editing your `.env` file:
|
||||
|
||||
```bash
|
||||
# Options: tiny, base, small, medium, large
|
||||
# Default: base
|
||||
WHISPER_MODEL=base
|
||||
```
|
||||
|
||||
Larger models are more accurate but require more memory and time.
|
||||
|
||||
## Usage
|
||||
|
||||
### Running the Application
|
||||
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Basic Workflow
|
||||
|
||||
1. **Select Output Directory** (first time only)
|
||||
- Navigate using arrow keys
|
||||
- Press "Select" to choose a directory for transcription outputs
|
||||
|
||||
2. **Select Files**
|
||||
- Navigate the directory tree to find your audio/video files
|
||||
- Click "Add File" to add files to the queue
|
||||
- Add multiple files or just one
|
||||
- Click "Continue" when done
|
||||
|
||||
3. **Choose Transcription Method**
|
||||
- Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
|
||||
- If using OpenAI and no key is configured, you'll be prompted to enter it
|
||||
|
||||
4. **View Progress**
|
||||
- Monitor transcription progress in real-time
|
||||
- Each file is processed sequentially
|
||||
|
||||
5. **Review Results**
|
||||
- See list of successfully transcribed files
|
||||
- Output files are saved as markdown in your configured directory
|
||||
|
||||
### Output Format
|
||||
|
||||
Each transcription is saved as a markdown file with the format: `{filename}_transcription.md`
|
||||
|
||||
```markdown
|
||||
# Transcription: meeting.mp4
|
||||
|
||||
**Source File:** meeting.mp4
|
||||
**Date:** 2025-01-20T10:30:45.123456
|
||||
**Duration:** 1:23:45
|
||||
**Language:** en
|
||||
**Word Count:** 2847
|
||||
|
||||
---
|
||||
|
||||
[Transcribed text here...]
|
||||
```
|
||||
|
||||
## Keyboard Shortcuts
|
||||
|
||||
- **Tab**: Navigate between buttons and widgets
|
||||
- **Enter**: Activate selected button
|
||||
- **Arrow Keys**: Navigate menus and trees
|
||||
- **Ctrl+C**: Exit application at any time
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "API key not found" error
|
||||
Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains:
|
||||
```
|
||||
OPENAI_API_KEY=your_key_here
|
||||
```
|
||||
|
||||
### "File format not supported" error
|
||||
Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.
|
||||
|
||||
### Local Whisper is very slow
|
||||
This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`:
|
||||
```
|
||||
WHISPER_MODEL=tiny
|
||||
```
|
||||
|
||||
### FFmpeg not found error
|
||||
Ensure FFmpeg is installed and in your system PATH. Test by running:
|
||||
```bash
|
||||
ffmpeg -version
|
||||
```
|
||||
|
||||
### Permission denied errors
|
||||
Ensure the output directory is writable and the application has permission to create files there.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
.
|
||||
├── main.py # Entry point
|
||||
├── requirements.txt # Python dependencies
|
||||
├── .env.example # Environment template
|
||||
├── .gitignore # Git ignore rules
|
||||
├── README.md # This file
|
||||
└── src/
|
||||
├── __init__.py
|
||||
├── app.py # Main TUI application
|
||||
├── config.py # Configuration management
|
||||
├── transcriber.py # Transcription service
|
||||
├── formatter.py # Markdown formatting
|
||||
└── file_handler.py # File handling utilities
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
# Coming soon
|
||||
```
|
||||
|
||||
### Building Documentation
|
||||
```bash
|
||||
# Coming soon
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files
|
||||
2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better
|
||||
3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory
|
||||
4. **Batch Processing**: Process multiple files to amortize setup time
|
||||
|
||||
## API Costs
|
||||
|
||||
### OpenAI Whisper API
|
||||
- Current pricing: $0.02 per minute of audio
|
||||
- Minimum charge per request
|
||||
|
||||
Check [OpenAI Pricing](https://openai.com/pricing/) for current rates.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Local Whisper models can be large (up to 3GB for "large")
|
||||
- First-time local Whisper setup downloads the model (~1-2GB for "base")
|
||||
- OpenAI API requires internet connection and active API credits
|
||||
- Batch processing is sequential (one file at a time)
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Please feel free to submit pull requests or open issues.
|
||||
|
||||
## License
|
||||
|
||||
This project is provided as-is for educational and personal use.
|
||||
|
||||
## Support
|
||||
|
||||
For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology
|
||||
- [Textual](https://github.com/textualize/textual) for the TUI framework
|
||||
- [OpenCode](https://opencode.ai) for UI/UX inspiration
|
||||
26
main.py
Normal file
26
main.py
Normal file
@ -0,0 +1,26 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Whisper Transcription TUI - Main entry point."""
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add src to path for imports
|
||||
sys.path.insert(0, str(Path(__file__).parent / "src"))
|
||||
|
||||
from app import TranscriptionApp
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Run the transcription application."""
|
||||
try:
|
||||
app = TranscriptionApp()
|
||||
app.run()
|
||||
except KeyboardInterrupt:
|
||||
print("\nApplication interrupted by user.")
|
||||
sys.exit(0)
|
||||
except Exception as e:
|
||||
print(f"Error: {str(e)}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -1,19 +1,17 @@
|
||||
# Minimal requirements for Phase 1 MVP
|
||||
# UI / tray choices are optional at this stage - keep core deps minimal
|
||||
# TUI Transcription Application Dependencies
|
||||
|
||||
# For local whisper (optional when implemented)
|
||||
# faster-whisper
|
||||
# Terminal UI Framework
|
||||
textual>=0.40.0
|
||||
|
||||
# For audio capture (choose one)
|
||||
# sounddevice
|
||||
# pyaudio
|
||||
# OpenAI API and Whisper
|
||||
openai>=1.0.0
|
||||
openai-whisper>=20240314
|
||||
|
||||
# OpenAI API client
|
||||
openai>=0.27.0
|
||||
# Configuration and Environment
|
||||
python-dotenv>=1.0.0
|
||||
|
||||
# Optional: a GUI toolkit for Phase 2
|
||||
# PySide6
|
||||
# Utilities and Formatting
|
||||
rich>=13.0.0
|
||||
|
||||
# For packaging and utilities
|
||||
rich
|
||||
python-dotenv
|
||||
# Audio processing
|
||||
ffmpeg-python>=0.2.1
|
||||
|
||||
1
src/__init__.py
Normal file
1
src/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
"""Whisper Transcription TUI Application"""
|
||||
444
src/app.py
Normal file
444
src/app.py
Normal file
@ -0,0 +1,444 @@
|
||||
"""Main Textual TUI application for transcription."""
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from textual.app import ComposeResult, App
|
||||
from textual.containers import Container, Vertical
|
||||
from textual.screen import Screen, ModalScreen
|
||||
from textual.widgets import (
|
||||
Header,
|
||||
Footer,
|
||||
Static,
|
||||
Input,
|
||||
Button,
|
||||
Label,
|
||||
DirectoryTree,
|
||||
SelectionList,
|
||||
Select,
|
||||
RichLog,
|
||||
)
|
||||
from textual.widgets.selection_list import Selection
|
||||
|
||||
from .config import ConfigManager
|
||||
from .file_handler import FileHandler
|
||||
from .transcriber import TranscriptionService
|
||||
from .formatter import MarkdownFormatter
|
||||
|
||||
|
||||
class ApiKeyModal(ModalScreen):
|
||||
"""Modal screen for entering OpenAI API key."""
|
||||
|
||||
def __init__(self, config: ConfigManager):
|
||||
"""Initialize API key modal."""
|
||||
super().__init__()
|
||||
self.config = config
|
||||
self.api_key: Optional[str] = None
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
"""Compose modal widgets."""
|
||||
yield Vertical(
|
||||
Label("OpenAI API Key Required"),
|
||||
Label("Enter your OpenAI API key (get it from https://platform.openai.com/api-keys):"),
|
||||
Input(id="api_key_input", password=True),
|
||||
Container(
|
||||
Button("Save", id="save_api", variant="primary"),
|
||||
Button("Cancel", id="cancel_api"),
|
||||
),
|
||||
id="api_key_modal",
|
||||
)
|
||||
|
||||
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||
"""Handle button press."""
|
||||
if event.button.id == "save_api":
|
||||
api_key_input = self.query_one("#api_key_input", Input)
|
||||
if api_key_input.value.strip():
|
||||
self.api_key = api_key_input.value.strip()
|
||||
self.config.set_api_key(self.api_key)
|
||||
self.app.pop_screen()
|
||||
else:
|
||||
self.app.notify("API key cannot be empty", timeout=2)
|
||||
elif event.button.id == "cancel_api":
|
||||
self.app.pop_screen()
|
||||
|
||||
|
||||
class MethodSelectModal(ModalScreen):
|
||||
"""Modal screen for selecting transcription method."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize method selection modal."""
|
||||
super().__init__()
|
||||
self.selected_method: Optional[str] = None
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
"""Compose modal widgets."""
|
||||
yield Vertical(
|
||||
Label("Select Transcription Method"),
|
||||
Select(
|
||||
options=[
|
||||
("OpenAI Whisper API (fast, costs money)", "openai"),
|
||||
("Local Whisper (free, slower)", "local"),
|
||||
],
|
||||
id="method_select",
|
||||
),
|
||||
Container(
|
||||
Button("Select", id="select_method", variant="primary"),
|
||||
Button("Cancel", id="cancel_method"),
|
||||
),
|
||||
id="method_modal",
|
||||
)
|
||||
|
||||
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||
"""Handle button press."""
|
||||
if event.button.id == "select_method":
|
||||
select = self.query_one("#method_select", Select)
|
||||
if select.value != Select.BLANK:
|
||||
self.selected_method = select.value
|
||||
self.app.pop_screen()
|
||||
else:
|
||||
self.app.notify("Please select a method", timeout=2)
|
||||
elif event.button.id == "cancel_method":
|
||||
self.app.pop_screen()
|
||||
|
||||
|
||||
class OutputDirModal(ModalScreen):
|
||||
"""Modal screen for selecting output directory."""
|
||||
|
||||
def __init__(self, config: ConfigManager):
|
||||
"""Initialize output directory modal."""
|
||||
super().__init__()
|
||||
self.config = config
|
||||
self.selected_dir: Optional[Path] = None
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
"""Compose modal widgets."""
|
||||
yield Vertical(
|
||||
Label("Select Output Directory"),
|
||||
DirectoryTree("/", id="dir_tree"),
|
||||
Container(
|
||||
Button("Select", id="select_dir", variant="primary"),
|
||||
Button("Cancel", id="cancel_dir"),
|
||||
),
|
||||
id="output_dir_modal",
|
||||
)
|
||||
|
||||
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||
"""Handle button press."""
|
||||
if event.button.id == "select_dir":
|
||||
tree = self.query_one("#dir_tree", DirectoryTree)
|
||||
if tree.cursor_node:
|
||||
self.selected_dir = Path(tree.cursor_node.data)
|
||||
is_valid, error = FileHandler.validate_directory(self.selected_dir)
|
||||
if is_valid:
|
||||
self.config.set_output_directory(self.selected_dir)
|
||||
self.app.pop_screen()
|
||||
else:
|
||||
self.app.notify(f"Error: {error}", timeout=3)
|
||||
else:
|
||||
self.app.notify("Please select a directory", timeout=2)
|
||||
elif event.button.id == "cancel_dir":
|
||||
self.app.pop_screen()
|
||||
|
||||
|
||||
class FileSelectScreen(Screen):
|
||||
"""Screen for selecting files to transcribe."""
|
||||
|
||||
def __init__(self, config: ConfigManager):
|
||||
"""Initialize file selection screen."""
|
||||
super().__init__()
|
||||
self.config = config
|
||||
self.selected_files: list[Path] = []
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
"""Compose screen widgets."""
|
||||
yield Header()
|
||||
yield Vertical(
|
||||
Label("Select files to transcribe (Supported: MP3, WAV, M4A, FLAC, MP4, AVI, MKV, MOV)"),
|
||||
DirectoryTree("/", id="file_tree"),
|
||||
Static(id="file_info"),
|
||||
Container(
|
||||
Button("Add File", id="add_file", variant="primary"),
|
||||
Button("Continue", id="continue_btn", variant="success"),
|
||||
Button("Cancel", id="cancel_btn", variant="error"),
|
||||
),
|
||||
id="file_select_container",
|
||||
)
|
||||
yield Footer()
|
||||
|
||||
def on_mount(self) -> None:
|
||||
"""Called when screen is mounted."""
|
||||
self.query_one("#file_tree", DirectoryTree).focus()
|
||||
|
||||
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||
"""Handle button press."""
|
||||
if event.button.id == "add_file":
|
||||
tree = self.query_one("#file_tree", DirectoryTree)
|
||||
if tree.cursor_node:
|
||||
file_path = Path(tree.cursor_node.data)
|
||||
is_valid, error = FileHandler.validate_file(file_path)
|
||||
if is_valid:
|
||||
if file_path not in self.selected_files:
|
||||
self.selected_files.append(file_path)
|
||||
self._update_file_info()
|
||||
else:
|
||||
self.app.notify("File already added", timeout=2)
|
||||
else:
|
||||
self.app.notify(f"Error: {error}", timeout=2)
|
||||
else:
|
||||
self.app.notify("Please select a file", timeout=2)
|
||||
elif event.button.id == "continue_btn":
|
||||
if self.selected_files:
|
||||
self.app.post_message(self.FileSelected(self.selected_files))
|
||||
else:
|
||||
self.app.notify("Please select at least one file", timeout=2)
|
||||
elif event.button.id == "cancel_btn":
|
||||
self.app.exit()
|
||||
|
||||
def _update_file_info(self) -> None:
|
||||
"""Update file information display."""
|
||||
info = self.query_one("#file_info", Static)
|
||||
file_list = "\n".join([f"✓ {f.name}" for f in self.selected_files])
|
||||
info.update(f"Selected files:\n{file_list}\n\nTotal: {len(self.selected_files)}")
|
||||
|
||||
class FileSelected:
|
||||
"""Message for file selection."""
|
||||
|
||||
def __init__(self, files: list[Path]):
|
||||
"""Initialize message."""
|
||||
self.files = files
|
||||
|
||||
|
||||
class ProgressScreen(Screen):
|
||||
"""Screen showing transcription progress."""
|
||||
|
||||
def __init__(self, files: list[Path], method: str, config: ConfigManager):
|
||||
"""Initialize progress screen."""
|
||||
super().__init__()
|
||||
self.files = files
|
||||
self.method = method
|
||||
self.config = config
|
||||
self.service = TranscriptionService()
|
||||
self.results: list[tuple[Path, str]] = []
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
"""Compose screen widgets."""
|
||||
yield Header()
|
||||
yield Vertical(
|
||||
Label("Transcription Progress"),
|
||||
RichLog(id="progress_log", markup=True),
|
||||
Container(
|
||||
Button("View Results", id="view_results", variant="primary"),
|
||||
Button("Exit", id="exit_btn", variant="error"),
|
||||
id="progress_controls",
|
||||
),
|
||||
)
|
||||
yield Footer()
|
||||
|
||||
def on_mount(self) -> None:
|
||||
"""Called when screen is mounted."""
|
||||
self.app.call_later(self._run_transcription)
|
||||
|
||||
def _run_transcription(self) -> None:
|
||||
"""Run transcription on all files."""
|
||||
log = self.query_one("#progress_log", RichLog)
|
||||
|
||||
# Setup transcription service
|
||||
try:
|
||||
if self.method == "openai":
|
||||
api_key = self.config.get_api_key()
|
||||
if not api_key:
|
||||
log.write("[red]Error: No API key configured[/red]")
|
||||
return
|
||||
self.service.set_openai_backend(api_key)
|
||||
log.write("[green]Using OpenAI Whisper API[/green]")
|
||||
else:
|
||||
model_size = self.config.get_whisper_model()
|
||||
self.service.set_local_backend(model_size)
|
||||
log.write(f"[green]Using Local Whisper ({model_size})[/green]")
|
||||
except Exception as e:
|
||||
log.write(f"[red]Error initializing backend: {str(e)}[/red]")
|
||||
return
|
||||
|
||||
output_dir = self.config.get_output_directory()
|
||||
if not output_dir:
|
||||
log.write("[red]Error: Output directory not configured[/red]")
|
||||
return
|
||||
|
||||
# Process each file
|
||||
for i, file_path in enumerate(self.files, 1):
|
||||
log.write(f"\n[yellow]Processing {i}/{len(self.files)}: {file_path.name}[/yellow]")
|
||||
|
||||
try:
|
||||
# Transcribe
|
||||
result = self.service.transcribe(file_path)
|
||||
log.write(f"[green]✓ Transcribed[/green]")
|
||||
|
||||
# Format as markdown
|
||||
markdown = MarkdownFormatter.format_transcription(
|
||||
result["text"],
|
||||
file_path,
|
||||
result.get("duration", 0.0),
|
||||
result.get("language", "en"),
|
||||
)
|
||||
|
||||
# Save to file
|
||||
output_filename = MarkdownFormatter.get_output_filename(file_path)
|
||||
output_path = FileHandler.get_output_path(file_path, output_dir, output_filename)
|
||||
output_path.write_text(markdown, encoding="utf-8")
|
||||
log.write(f"[green]✓ Saved to {output_path.name}[/green]")
|
||||
|
||||
self.results.append((file_path, str(output_path)))
|
||||
except Exception as e:
|
||||
log.write(f"[red]✗ Error: {str(e)}[/red]")
|
||||
|
||||
log.write(f"\n[cyan]Completed {len(self.results)}/{len(self.files)} files[/cyan]")
|
||||
|
||||
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||
"""Handle button press."""
|
||||
if event.button.id == "view_results":
|
||||
self.app.push_screen(ResultsScreen(self.results))
|
||||
elif event.button.id == "exit_btn":
|
||||
self.app.exit()
|
||||
|
||||
|
||||
class ResultsScreen(Screen):
|
||||
"""Screen displaying transcription results."""
|
||||
|
||||
def __init__(self, results: list[tuple[Path, str]]):
|
||||
"""Initialize results screen."""
|
||||
super().__init__()
|
||||
self.results = results
|
||||
|
||||
def compose(self) -> ComposeResult:
|
||||
"""Compose screen widgets."""
|
||||
yield Header()
|
||||
yield Vertical(
|
||||
Label("Transcription Results"),
|
||||
RichLog(id="results_log", markup=True),
|
||||
Container(
|
||||
Button("Back", id="back_btn"),
|
||||
Button("Exit", id="exit_btn", variant="error"),
|
||||
),
|
||||
)
|
||||
yield Footer()
|
||||
|
||||
def on_mount(self) -> None:
|
||||
"""Called when screen is mounted."""
|
||||
log = self.query_one("#results_log", RichLog)
|
||||
log.write("[cyan]Transcription Results[/cyan]\n")
|
||||
for source, output in self.results:
|
||||
log.write(f"[green]✓[/green] {source.name}")
|
||||
log.write(f" → {Path(output).name}\n")
|
||||
|
||||
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||
"""Handle button press."""
|
||||
if event.button.id == "back_btn":
|
||||
self.app.pop_screen()
|
||||
elif event.button.id == "exit_btn":
|
||||
self.app.exit()
|
||||
|
||||
|
||||
class TranscriptionApp(App):
|
||||
"""Main transcription application."""
|
||||
|
||||
CSS = """
|
||||
Screen {
|
||||
layout: vertical;
|
||||
}
|
||||
|
||||
#api_key_modal {
|
||||
width: 60;
|
||||
height: 12;
|
||||
border: solid green;
|
||||
}
|
||||
|
||||
#method_modal {
|
||||
width: 50;
|
||||
height: 10;
|
||||
border: solid blue;
|
||||
}
|
||||
|
||||
#output_dir_modal {
|
||||
width: 80;
|
||||
height: 20;
|
||||
border: solid purple;
|
||||
}
|
||||
|
||||
#file_select_container {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
|
||||
DirectoryTree {
|
||||
width: 1fr;
|
||||
height: 1fr;
|
||||
}
|
||||
|
||||
#file_info {
|
||||
width: 100%;
|
||||
height: auto;
|
||||
border: solid $accent;
|
||||
padding: 1;
|
||||
}
|
||||
|
||||
#progress_log {
|
||||
width: 100%;
|
||||
height: 1fr;
|
||||
border: solid $accent;
|
||||
padding: 1;
|
||||
}
|
||||
|
||||
Container {
|
||||
height: auto;
|
||||
margin: 1;
|
||||
}
|
||||
|
||||
Button {
|
||||
margin-right: 1;
|
||||
}
|
||||
|
||||
Label {
|
||||
margin-bottom: 1;
|
||||
}
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize application."""
|
||||
super().__init__()
|
||||
self.config = ConfigManager()
|
||||
|
||||
def on_mount(self) -> None:
|
||||
"""Called when app is mounted."""
|
||||
self.title = "Whisper Transcription TUI"
|
||||
self._check_setup()
|
||||
|
||||
def _check_setup(self) -> None:
|
||||
"""Check if setup is needed."""
|
||||
if not self.config.output_directory_configured():
|
||||
self.push_screen_wait(OutputDirModal(self.config), self._output_dir_set)
|
||||
else:
|
||||
self.push_screen(FileSelectScreen(self.config))
|
||||
|
||||
def _output_dir_set(self) -> None:
|
||||
"""Called when output directory is set."""
|
||||
self.push_screen(FileSelectScreen(self.config))
|
||||
|
||||
def on_file_select_screen_file_selected(self, message: FileSelectScreen.FileSelected) -> None:
|
||||
"""Handle file selection."""
|
||||
self.push_screen_wait(MethodSelectModal(), self._method_selected(message.files))
|
||||
|
||||
def _method_selected(self, files: list[Path]):
|
||||
"""Return handler for method selection."""
|
||||
def handler(modal: MethodSelectModal) -> None:
|
||||
if modal.selected_method:
|
||||
if modal.selected_method == "openai":
|
||||
if not self.config.api_key_configured():
|
||||
self.push_screen_wait(ApiKeyModal(self.config), lambda: self._start_transcription(files, "openai"))
|
||||
else:
|
||||
self._start_transcription(files, "openai")
|
||||
else:
|
||||
self._start_transcription(files, "local")
|
||||
return handler
|
||||
|
||||
def _start_transcription(self, files: list[Path], method: str) -> None:
|
||||
"""Start transcription process."""
|
||||
self.push_screen(ProgressScreen(files, method, self.config))
|
||||
93
src/config.py
Normal file
93
src/config.py
Normal file
@ -0,0 +1,93 @@
|
||||
"""Configuration management for API keys and output directory."""
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from dotenv import load_dotenv, set_key
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manages configuration including API keys and output directory preferences."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize config manager and load existing configuration."""
|
||||
self.env_path = Path(".env")
|
||||
self.config_path = Path("config.json")
|
||||
self.config_data: dict = {}
|
||||
load_dotenv()
|
||||
self._load_config()
|
||||
|
||||
def _load_config(self) -> None:
|
||||
"""Load configuration from config.json if it exists."""
|
||||
if self.config_path.exists():
|
||||
try:
|
||||
with open(self.config_path, "r") as f:
|
||||
self.config_data = json.load(f)
|
||||
except (json.JSONDecodeError, IOError):
|
||||
self.config_data = {}
|
||||
|
||||
def _save_config(self) -> None:
|
||||
"""Save configuration to config.json."""
|
||||
with open(self.config_path, "w") as f:
|
||||
json.dump(self.config_data, f, indent=2)
|
||||
|
||||
def get_api_key(self) -> Optional[str]:
|
||||
"""
|
||||
Get OpenAI API key from environment.
|
||||
|
||||
Returns:
|
||||
The API key if set, None otherwise.
|
||||
"""
|
||||
return os.getenv("OPENAI_API_KEY")
|
||||
|
||||
def set_api_key(self, api_key: str) -> None:
|
||||
"""
|
||||
Save API key to .env file.
|
||||
|
||||
Args:
|
||||
api_key: The API key to save.
|
||||
"""
|
||||
if not self.env_path.exists():
|
||||
self.env_path.write_text("# OpenAI Configuration\n")
|
||||
set_key(str(self.env_path), "OPENAI_API_KEY", api_key)
|
||||
os.environ["OPENAI_API_KEY"] = api_key
|
||||
|
||||
def get_output_directory(self) -> Optional[Path]:
|
||||
"""
|
||||
Get the configured output directory.
|
||||
|
||||
Returns:
|
||||
Path to output directory if configured, None otherwise.
|
||||
"""
|
||||
output_dir = self.config_data.get("output_directory")
|
||||
if output_dir:
|
||||
return Path(output_dir)
|
||||
return None
|
||||
|
||||
def set_output_directory(self, directory: Path) -> None:
|
||||
"""
|
||||
Save output directory preference to config.
|
||||
|
||||
Args:
|
||||
directory: The output directory path.
|
||||
"""
|
||||
self.config_data["output_directory"] = str(directory.resolve())
|
||||
self._save_config()
|
||||
|
||||
def get_whisper_model(self) -> str:
|
||||
"""
|
||||
Get the Whisper model size from environment.
|
||||
|
||||
Returns:
|
||||
Model size (default: "base").
|
||||
"""
|
||||
return os.getenv("WHISPER_MODEL", "base")
|
||||
|
||||
def api_key_configured(self) -> bool:
|
||||
"""Check if API key is configured."""
|
||||
return bool(self.get_api_key())
|
||||
|
||||
def output_directory_configured(self) -> bool:
|
||||
"""Check if output directory is configured."""
|
||||
return bool(self.get_output_directory())
|
||||
121
src/file_handler.py
Normal file
121
src/file_handler.py
Normal file
@ -0,0 +1,121 @@
|
||||
"""File handling and validation for audio/video files."""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
SUPPORTED_AUDIO_FORMATS = {
|
||||
".mp3", ".wav", ".m4a", ".flac", ".ogg", ".wma", ".aac",
|
||||
}
|
||||
|
||||
SUPPORTED_VIDEO_FORMATS = {
|
||||
".mp4", ".avi", ".mkv", ".mov", ".webm", ".flv", ".wmv", ".m4v",
|
||||
}
|
||||
|
||||
SUPPORTED_FORMATS = SUPPORTED_AUDIO_FORMATS | SUPPORTED_VIDEO_FORMATS
|
||||
|
||||
|
||||
class FileHandler:
|
||||
"""Handles file operations for transcription."""
|
||||
|
||||
@staticmethod
|
||||
def is_supported_file(file_path: Path) -> bool:
|
||||
"""
|
||||
Check if file is a supported audio/video format.
|
||||
|
||||
Args:
|
||||
file_path: Path to file.
|
||||
|
||||
Returns:
|
||||
True if file is supported, False otherwise.
|
||||
"""
|
||||
return file_path.suffix.lower() in SUPPORTED_FORMATS
|
||||
|
||||
@staticmethod
|
||||
def validate_file(file_path: Path) -> tuple[bool, str]:
|
||||
"""
|
||||
Validate that file exists and is supported.
|
||||
|
||||
Args:
|
||||
file_path: Path to file.
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, error_message).
|
||||
"""
|
||||
if not file_path.exists():
|
||||
return False, f"File not found: {file_path}"
|
||||
|
||||
if not file_path.is_file():
|
||||
return False, f"Path is not a file: {file_path}"
|
||||
|
||||
if not FileHandler.is_supported_file(file_path):
|
||||
return False, f"Unsupported format: {file_path.suffix}"
|
||||
|
||||
return True, ""
|
||||
|
||||
@staticmethod
|
||||
def validate_directory(directory: Path) -> tuple[bool, str]:
|
||||
"""
|
||||
Validate that directory exists and is writable.
|
||||
|
||||
Args:
|
||||
directory: Path to directory.
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, error_message).
|
||||
"""
|
||||
try:
|
||||
directory.mkdir(parents=True, exist_ok=True)
|
||||
test_file = directory / ".write_test"
|
||||
test_file.write_text("test")
|
||||
test_file.unlink()
|
||||
return True, ""
|
||||
except Exception as e:
|
||||
return False, f"Directory not writable: {str(e)}"
|
||||
|
||||
@staticmethod
|
||||
def get_output_path(file_path: Path, output_dir: Path, filename: str) -> Path:
|
||||
"""
|
||||
Get the output path for a transcription.
|
||||
|
||||
Args:
|
||||
file_path: Source audio/video file.
|
||||
output_dir: Output directory.
|
||||
filename: Output filename.
|
||||
|
||||
Returns:
|
||||
Full output path.
|
||||
"""
|
||||
output_path = output_dir / filename
|
||||
counter = 1
|
||||
base_stem = output_path.stem
|
||||
suffix = output_path.suffix
|
||||
|
||||
while output_path.exists():
|
||||
output_path = output_dir / f"{base_stem}_{counter}{suffix}"
|
||||
counter += 1
|
||||
|
||||
return output_path
|
||||
|
||||
@staticmethod
|
||||
def get_supported_formats_display() -> str:
|
||||
"""
|
||||
Get display string of supported formats.
|
||||
|
||||
Returns:
|
||||
Formatted string listing supported formats.
|
||||
"""
|
||||
audio = ", ".join(sorted(SUPPORTED_AUDIO_FORMATS))
|
||||
video = ", ".join(sorted(SUPPORTED_VIDEO_FORMATS))
|
||||
return f"Audio: {audio}\nVideo: {video}"
|
||||
|
||||
@staticmethod
|
||||
def filter_supported_files(paths: List[Path]) -> List[Path]:
|
||||
"""
|
||||
Filter a list of paths to only supported files.
|
||||
|
||||
Args:
|
||||
paths: List of file paths.
|
||||
|
||||
Returns:
|
||||
Filtered list of supported files.
|
||||
"""
|
||||
return [p for p in paths if p.is_file() and FileHandler.is_supported_file(p)]
|
||||
79
src/formatter.py
Normal file
79
src/formatter.py
Normal file
@ -0,0 +1,79 @@
|
||||
"""Markdown formatting for transcriptions."""
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class MarkdownFormatter:
|
||||
"""Formats transcriptions as markdown with metadata headers."""
|
||||
|
||||
@staticmethod
|
||||
def format_transcription(
|
||||
text: str,
|
||||
source_file: Path,
|
||||
duration: float = 0.0,
|
||||
language: str = "en",
|
||||
) -> str:
|
||||
"""
|
||||
Format transcription as markdown with metadata.
|
||||
|
||||
Args:
|
||||
text: The transcription text.
|
||||
source_file: Path to the source audio/video file.
|
||||
duration: Duration of the audio in seconds.
|
||||
language: Language code of the transcription.
|
||||
|
||||
Returns:
|
||||
Formatted markdown string.
|
||||
"""
|
||||
timestamp = datetime.now().isoformat()
|
||||
word_count = len(text.split())
|
||||
|
||||
markdown = f"""# Transcription: {source_file.name}
|
||||
|
||||
**Source File:** {source_file.name}
|
||||
**Date:** {timestamp}
|
||||
**Duration:** {MarkdownFormatter._format_duration(duration)}
|
||||
**Language:** {language}
|
||||
**Word Count:** {word_count}
|
||||
|
||||
---
|
||||
|
||||
{text}
|
||||
|
||||
"""
|
||||
return markdown
|
||||
|
||||
@staticmethod
|
||||
def _format_duration(seconds: float) -> str:
|
||||
"""
|
||||
Format duration in seconds to human-readable format.
|
||||
|
||||
Args:
|
||||
seconds: Duration in seconds.
|
||||
|
||||
Returns:
|
||||
Formatted duration string (e.g., "1:23:45").
|
||||
"""
|
||||
if seconds <= 0:
|
||||
return "Unknown"
|
||||
|
||||
hours = int(seconds // 3600)
|
||||
minutes = int((seconds % 3600) // 60)
|
||||
secs = int(seconds % 60)
|
||||
|
||||
if hours > 0:
|
||||
return f"{hours}:{minutes:02d}:{secs:02d}"
|
||||
return f"{minutes}:{secs:02d}"
|
||||
|
||||
@staticmethod
|
||||
def get_output_filename(source_file: Path) -> str:
|
||||
"""
|
||||
Generate markdown filename from source file.
|
||||
|
||||
Args:
|
||||
source_file: Path to the source audio/video file.
|
||||
|
||||
Returns:
|
||||
Filename with .md extension.
|
||||
"""
|
||||
return f"{source_file.stem}_transcription.md"
|
||||
141
src/transcriber.py
Normal file
141
src/transcriber.py
Normal file
@ -0,0 +1,141 @@
|
||||
"""Transcription service supporting both OpenAI API and local Whisper."""
|
||||
from abc import ABC, abstractmethod
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import whisper
|
||||
from openai import OpenAI
|
||||
|
||||
|
||||
class TranscriberBackend(ABC):
|
||||
"""Abstract base class for transcription backends."""
|
||||
|
||||
@abstractmethod
|
||||
def transcribe(self, file_path: Path) -> dict:
|
||||
"""
|
||||
Transcribe audio file.
|
||||
|
||||
Args:
|
||||
file_path: Path to audio/video file.
|
||||
|
||||
Returns:
|
||||
Dictionary with 'text' and optional 'language' and 'duration' keys.
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class OpenAITranscriber(TranscriberBackend):
|
||||
"""OpenAI Whisper API transcriber."""
|
||||
|
||||
def __init__(self, api_key: str):
|
||||
"""
|
||||
Initialize OpenAI transcriber.
|
||||
|
||||
Args:
|
||||
api_key: OpenAI API key.
|
||||
"""
|
||||
self.client = OpenAI(api_key=api_key)
|
||||
|
||||
def transcribe(self, file_path: Path) -> dict:
|
||||
"""
|
||||
Transcribe using OpenAI API.
|
||||
|
||||
Args:
|
||||
file_path: Path to audio/video file.
|
||||
|
||||
Returns:
|
||||
Dictionary with transcription result.
|
||||
"""
|
||||
with open(file_path, "rb") as audio_file:
|
||||
transcript = self.client.audio.transcriptions.create(
|
||||
model="whisper-1",
|
||||
file=audio_file,
|
||||
response_format="verbose_json",
|
||||
language="en",
|
||||
)
|
||||
|
||||
return {
|
||||
"text": transcript.text,
|
||||
"language": getattr(transcript, "language", "en"),
|
||||
"duration": getattr(transcript, "duration", 0.0),
|
||||
}
|
||||
|
||||
|
||||
class LocalWhisperTranscriber(TranscriberBackend):
|
||||
"""Local Whisper model transcriber."""
|
||||
|
||||
def __init__(self, model_size: str = "base"):
|
||||
"""
|
||||
Initialize local Whisper transcriber.
|
||||
|
||||
Args:
|
||||
model_size: Size of the Whisper model (tiny, base, small, medium, large).
|
||||
"""
|
||||
self.model_size = model_size
|
||||
self.model = whisper.load_model(model_size)
|
||||
|
||||
def transcribe(self, file_path: Path) -> dict:
|
||||
"""
|
||||
Transcribe using local Whisper model.
|
||||
|
||||
Args:
|
||||
file_path: Path to audio/video file.
|
||||
|
||||
Returns:
|
||||
Dictionary with transcription result.
|
||||
"""
|
||||
result = self.model.transcribe(
|
||||
str(file_path),
|
||||
language="en",
|
||||
fp16=False,
|
||||
)
|
||||
|
||||
return {
|
||||
"text": result["text"],
|
||||
"language": result.get("language", "en"),
|
||||
"duration": result.get("duration", 0.0),
|
||||
}
|
||||
|
||||
|
||||
class TranscriptionService:
|
||||
"""Main transcription service that manages both backends."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize transcription service."""
|
||||
self.transcriber: Optional[TranscriberBackend] = None
|
||||
|
||||
def set_openai_backend(self, api_key: str) -> None:
|
||||
"""
|
||||
Set OpenAI as the transcription backend.
|
||||
|
||||
Args:
|
||||
api_key: OpenAI API key.
|
||||
"""
|
||||
self.transcriber = OpenAITranscriber(api_key)
|
||||
|
||||
def set_local_backend(self, model_size: str = "base") -> None:
|
||||
"""
|
||||
Set local Whisper as the transcription backend.
|
||||
|
||||
Args:
|
||||
model_size: Size of the Whisper model.
|
||||
"""
|
||||
self.transcriber = LocalWhisperTranscriber(model_size)
|
||||
|
||||
def transcribe(self, file_path: Path) -> dict:
|
||||
"""
|
||||
Transcribe audio file using configured backend.
|
||||
|
||||
Args:
|
||||
file_path: Path to audio/video file.
|
||||
|
||||
Returns:
|
||||
Dictionary with transcription result.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If no backend is configured.
|
||||
"""
|
||||
if not self.transcriber:
|
||||
raise RuntimeError("No transcription backend configured")
|
||||
|
||||
return self.transcriber.transcribe(file_path)
|
||||
Loading…
x
Reference in New Issue
Block a user