feat: Initial Whisper Transcription TUI implementation
- Added Textual-based TUI with file selection and progress monitoring - Implemented transcription service with OpenAI API and local Whisper backends - Added markdown formatter for transcription output - Configuration management for persistent API keys and output directory - Comprehensive README with installation and usage instructions - Support for multi-file batch processing - Beautiful terminal UI with modal dialogs for user input
This commit is contained in:
parent
7e405f7354
commit
c5ccd14048
8
.env.example
Normal file
8
.env.example
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
# OpenAI API Key (required for API-based transcription)
|
||||||
|
# Get your key from: https://platform.openai.com/api-keys
|
||||||
|
OPENAI_API_KEY=your_api_key_here
|
||||||
|
|
||||||
|
# Optional: Whisper Model Size for local transcription
|
||||||
|
# Options: tiny, base, small, medium, large
|
||||||
|
# Default: base
|
||||||
|
WHISPER_MODEL=base
|
||||||
55
.gitignore
vendored
55
.gitignore
vendored
@ -1,2 +1,55 @@
|
|||||||
opencode
|
# Environment and Configuration
|
||||||
.env
|
.env
|
||||||
|
.env.local
|
||||||
|
config.json
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
*.so
|
||||||
|
.Python
|
||||||
|
build/
|
||||||
|
develop-eggs/
|
||||||
|
dist/
|
||||||
|
downloads/
|
||||||
|
eggs/
|
||||||
|
.eggs/
|
||||||
|
lib/
|
||||||
|
lib64/
|
||||||
|
parts/
|
||||||
|
sdist/
|
||||||
|
var/
|
||||||
|
wheels/
|
||||||
|
pip-wheel-metadata/
|
||||||
|
share/python-wheels/
|
||||||
|
*.egg-info/
|
||||||
|
.installed.cfg
|
||||||
|
*.egg
|
||||||
|
MANIFEST
|
||||||
|
|
||||||
|
# Virtual Environments
|
||||||
|
venv/
|
||||||
|
ENV/
|
||||||
|
env/
|
||||||
|
.venv
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# Test and coverage
|
||||||
|
.pytest_cache/
|
||||||
|
.coverage
|
||||||
|
htmlcov/
|
||||||
|
|
||||||
|
# Output
|
||||||
|
transcriptions/
|
||||||
|
output/
|
||||||
|
|||||||
258
README.md
Normal file
258
README.md
Normal file
@ -0,0 +1,258 @@
|
|||||||
|
# Whisper Transcription TUI
|
||||||
|
|
||||||
|
A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Dual Transcription Methods**:
|
||||||
|
- OpenAI Whisper API (fast, cloud-based, paid per minute)
|
||||||
|
- Local Whisper (free, offline, slower)
|
||||||
|
- **Flexible File Selection**: Browse and select single or multiple audio/video files
|
||||||
|
- **Persistent Configuration**: Remember API keys and output directory between sessions
|
||||||
|
- **Beautiful TUI**: Modern terminal interface similar to OpenCode
|
||||||
|
- **Markdown Output**: Transcriptions saved as markdown with metadata headers
|
||||||
|
- **Multi-file Processing**: Batch transcribe multiple files sequentially
|
||||||
|
|
||||||
|
## Supported Formats
|
||||||
|
|
||||||
|
### Audio
|
||||||
|
MP3, WAV, M4A, FLAC, OGG, WMA, AAC
|
||||||
|
|
||||||
|
### Video
|
||||||
|
MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Python 3.8 or higher
|
||||||
|
- FFmpeg (for audio/video processing)
|
||||||
|
|
||||||
|
### FFmpeg Installation
|
||||||
|
|
||||||
|
#### Windows
|
||||||
|
```bash
|
||||||
|
# Using Chocolatey
|
||||||
|
choco install ffmpeg
|
||||||
|
|
||||||
|
# Or using Scoop
|
||||||
|
scoop install ffmpeg
|
||||||
|
|
||||||
|
# Or download from https://ffmpeg.org/download.html
|
||||||
|
```
|
||||||
|
|
||||||
|
#### macOS
|
||||||
|
```bash
|
||||||
|
brew install ffmpeg
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Linux
|
||||||
|
```bash
|
||||||
|
# Ubuntu/Debian
|
||||||
|
sudo apt-get install ffmpeg
|
||||||
|
|
||||||
|
# Fedora
|
||||||
|
sudo dnf install ffmpeg
|
||||||
|
|
||||||
|
# Arch
|
||||||
|
sudo pacman -S ffmpeg
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python Package Installation
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
|
```bash
|
||||||
|
git clone <repository-url>
|
||||||
|
cd 00_Whisper
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create and activate a virtual environment:
|
||||||
|
```bash
|
||||||
|
# Windows
|
||||||
|
python -m venv venv
|
||||||
|
venv\Scripts\activate
|
||||||
|
|
||||||
|
# macOS/Linux
|
||||||
|
python3 -m venv venv
|
||||||
|
source venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Install dependencies:
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### OpenAI API Key Setup
|
||||||
|
|
||||||
|
1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
|
||||||
|
2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
|
||||||
|
3. The key will be saved to `.env` file and reused in future sessions
|
||||||
|
|
||||||
|
⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`.
|
||||||
|
|
||||||
|
### Local Whisper Model
|
||||||
|
|
||||||
|
You can configure the local Whisper model size by editing your `.env` file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Options: tiny, base, small, medium, large
|
||||||
|
# Default: base
|
||||||
|
WHISPER_MODEL=base
|
||||||
|
```
|
||||||
|
|
||||||
|
Larger models are more accurate but require more memory and time.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Running the Application
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Basic Workflow
|
||||||
|
|
||||||
|
1. **Select Output Directory** (first time only)
|
||||||
|
- Navigate using arrow keys
|
||||||
|
- Press "Select" to choose a directory for transcription outputs
|
||||||
|
|
||||||
|
2. **Select Files**
|
||||||
|
- Navigate the directory tree to find your audio/video files
|
||||||
|
- Click "Add File" to add files to the queue
|
||||||
|
- Add multiple files or just one
|
||||||
|
- Click "Continue" when done
|
||||||
|
|
||||||
|
3. **Choose Transcription Method**
|
||||||
|
- Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
|
||||||
|
- If using OpenAI and no key is configured, you'll be prompted to enter it
|
||||||
|
|
||||||
|
4. **View Progress**
|
||||||
|
- Monitor transcription progress in real-time
|
||||||
|
- Each file is processed sequentially
|
||||||
|
|
||||||
|
5. **Review Results**
|
||||||
|
- See list of successfully transcribed files
|
||||||
|
- Output files are saved as markdown in your configured directory
|
||||||
|
|
||||||
|
### Output Format
|
||||||
|
|
||||||
|
Each transcription is saved as a markdown file with the format: `{filename}_transcription.md`
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Transcription: meeting.mp4
|
||||||
|
|
||||||
|
**Source File:** meeting.mp4
|
||||||
|
**Date:** 2025-01-20T10:30:45.123456
|
||||||
|
**Duration:** 1:23:45
|
||||||
|
**Language:** en
|
||||||
|
**Word Count:** 2847
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[Transcribed text here...]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Keyboard Shortcuts
|
||||||
|
|
||||||
|
- **Tab**: Navigate between buttons and widgets
|
||||||
|
- **Enter**: Activate selected button
|
||||||
|
- **Arrow Keys**: Navigate menus and trees
|
||||||
|
- **Ctrl+C**: Exit application at any time
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "API key not found" error
|
||||||
|
Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains:
|
||||||
|
```
|
||||||
|
OPENAI_API_KEY=your_key_here
|
||||||
|
```
|
||||||
|
|
||||||
|
### "File format not supported" error
|
||||||
|
Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.
|
||||||
|
|
||||||
|
### Local Whisper is very slow
|
||||||
|
This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`:
|
||||||
|
```
|
||||||
|
WHISPER_MODEL=tiny
|
||||||
|
```
|
||||||
|
|
||||||
|
### FFmpeg not found error
|
||||||
|
Ensure FFmpeg is installed and in your system PATH. Test by running:
|
||||||
|
```bash
|
||||||
|
ffmpeg -version
|
||||||
|
```
|
||||||
|
|
||||||
|
### Permission denied errors
|
||||||
|
Ensure the output directory is writable and the application has permission to create files there.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
.
|
||||||
|
├── main.py # Entry point
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── .env.example # Environment template
|
||||||
|
├── .gitignore # Git ignore rules
|
||||||
|
├── README.md # This file
|
||||||
|
└── src/
|
||||||
|
├── __init__.py
|
||||||
|
├── app.py # Main TUI application
|
||||||
|
├── config.py # Configuration management
|
||||||
|
├── transcriber.py # Transcription service
|
||||||
|
├── formatter.py # Markdown formatting
|
||||||
|
└── file_handler.py # File handling utilities
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
|
```bash
|
||||||
|
# Coming soon
|
||||||
|
```
|
||||||
|
|
||||||
|
### Building Documentation
|
||||||
|
```bash
|
||||||
|
# Coming soon
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Tips
|
||||||
|
|
||||||
|
1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files
|
||||||
|
2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better
|
||||||
|
3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory
|
||||||
|
4. **Batch Processing**: Process multiple files to amortize setup time
|
||||||
|
|
||||||
|
## API Costs
|
||||||
|
|
||||||
|
### OpenAI Whisper API
|
||||||
|
- Current pricing: $0.02 per minute of audio
|
||||||
|
- Minimum charge per request
|
||||||
|
|
||||||
|
Check [OpenAI Pricing](https://openai.com/pricing/) for current rates.
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Local Whisper models can be large (up to 3GB for "large")
|
||||||
|
- First-time local Whisper setup downloads the model (~1-2GB for "base")
|
||||||
|
- OpenAI API requires internet connection and active API credits
|
||||||
|
- Batch processing is sequential (one file at a time)
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions are welcome! Please feel free to submit pull requests or open issues.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is provided as-is for educational and personal use.
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.
|
||||||
|
|
||||||
|
## Acknowledgments
|
||||||
|
|
||||||
|
- [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology
|
||||||
|
- [Textual](https://github.com/textualize/textual) for the TUI framework
|
||||||
|
- [OpenCode](https://opencode.ai) for UI/UX inspiration
|
||||||
26
main.py
Normal file
26
main.py
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Whisper Transcription TUI - Main entry point."""
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add src to path for imports
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent / "src"))
|
||||||
|
|
||||||
|
from app import TranscriptionApp
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
"""Run the transcription application."""
|
||||||
|
try:
|
||||||
|
app = TranscriptionApp()
|
||||||
|
app.run()
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print("\nApplication interrupted by user.")
|
||||||
|
sys.exit(0)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error: {str(e)}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@ -1,19 +1,17 @@
|
|||||||
# Minimal requirements for Phase 1 MVP
|
# TUI Transcription Application Dependencies
|
||||||
# UI / tray choices are optional at this stage - keep core deps minimal
|
|
||||||
|
|
||||||
# For local whisper (optional when implemented)
|
# Terminal UI Framework
|
||||||
# faster-whisper
|
textual>=0.40.0
|
||||||
|
|
||||||
# For audio capture (choose one)
|
# OpenAI API and Whisper
|
||||||
# sounddevice
|
openai>=1.0.0
|
||||||
# pyaudio
|
openai-whisper>=20240314
|
||||||
|
|
||||||
# OpenAI API client
|
# Configuration and Environment
|
||||||
openai>=0.27.0
|
python-dotenv>=1.0.0
|
||||||
|
|
||||||
# Optional: a GUI toolkit for Phase 2
|
# Utilities and Formatting
|
||||||
# PySide6
|
rich>=13.0.0
|
||||||
|
|
||||||
# For packaging and utilities
|
# Audio processing
|
||||||
rich
|
ffmpeg-python>=0.2.1
|
||||||
python-dotenv
|
|
||||||
|
|||||||
1
src/__init__.py
Normal file
1
src/__init__.py
Normal file
@ -0,0 +1 @@
|
|||||||
|
"""Whisper Transcription TUI Application"""
|
||||||
444
src/app.py
Normal file
444
src/app.py
Normal file
@ -0,0 +1,444 @@
|
|||||||
|
"""Main Textual TUI application for transcription."""
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from textual.app import ComposeResult, App
|
||||||
|
from textual.containers import Container, Vertical
|
||||||
|
from textual.screen import Screen, ModalScreen
|
||||||
|
from textual.widgets import (
|
||||||
|
Header,
|
||||||
|
Footer,
|
||||||
|
Static,
|
||||||
|
Input,
|
||||||
|
Button,
|
||||||
|
Label,
|
||||||
|
DirectoryTree,
|
||||||
|
SelectionList,
|
||||||
|
Select,
|
||||||
|
RichLog,
|
||||||
|
)
|
||||||
|
from textual.widgets.selection_list import Selection
|
||||||
|
|
||||||
|
from .config import ConfigManager
|
||||||
|
from .file_handler import FileHandler
|
||||||
|
from .transcriber import TranscriptionService
|
||||||
|
from .formatter import MarkdownFormatter
|
||||||
|
|
||||||
|
|
||||||
|
class ApiKeyModal(ModalScreen):
|
||||||
|
"""Modal screen for entering OpenAI API key."""
|
||||||
|
|
||||||
|
def __init__(self, config: ConfigManager):
|
||||||
|
"""Initialize API key modal."""
|
||||||
|
super().__init__()
|
||||||
|
self.config = config
|
||||||
|
self.api_key: Optional[str] = None
|
||||||
|
|
||||||
|
def compose(self) -> ComposeResult:
|
||||||
|
"""Compose modal widgets."""
|
||||||
|
yield Vertical(
|
||||||
|
Label("OpenAI API Key Required"),
|
||||||
|
Label("Enter your OpenAI API key (get it from https://platform.openai.com/api-keys):"),
|
||||||
|
Input(id="api_key_input", password=True),
|
||||||
|
Container(
|
||||||
|
Button("Save", id="save_api", variant="primary"),
|
||||||
|
Button("Cancel", id="cancel_api"),
|
||||||
|
),
|
||||||
|
id="api_key_modal",
|
||||||
|
)
|
||||||
|
|
||||||
|
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||||
|
"""Handle button press."""
|
||||||
|
if event.button.id == "save_api":
|
||||||
|
api_key_input = self.query_one("#api_key_input", Input)
|
||||||
|
if api_key_input.value.strip():
|
||||||
|
self.api_key = api_key_input.value.strip()
|
||||||
|
self.config.set_api_key(self.api_key)
|
||||||
|
self.app.pop_screen()
|
||||||
|
else:
|
||||||
|
self.app.notify("API key cannot be empty", timeout=2)
|
||||||
|
elif event.button.id == "cancel_api":
|
||||||
|
self.app.pop_screen()
|
||||||
|
|
||||||
|
|
||||||
|
class MethodSelectModal(ModalScreen):
|
||||||
|
"""Modal screen for selecting transcription method."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize method selection modal."""
|
||||||
|
super().__init__()
|
||||||
|
self.selected_method: Optional[str] = None
|
||||||
|
|
||||||
|
def compose(self) -> ComposeResult:
|
||||||
|
"""Compose modal widgets."""
|
||||||
|
yield Vertical(
|
||||||
|
Label("Select Transcription Method"),
|
||||||
|
Select(
|
||||||
|
options=[
|
||||||
|
("OpenAI Whisper API (fast, costs money)", "openai"),
|
||||||
|
("Local Whisper (free, slower)", "local"),
|
||||||
|
],
|
||||||
|
id="method_select",
|
||||||
|
),
|
||||||
|
Container(
|
||||||
|
Button("Select", id="select_method", variant="primary"),
|
||||||
|
Button("Cancel", id="cancel_method"),
|
||||||
|
),
|
||||||
|
id="method_modal",
|
||||||
|
)
|
||||||
|
|
||||||
|
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||||
|
"""Handle button press."""
|
||||||
|
if event.button.id == "select_method":
|
||||||
|
select = self.query_one("#method_select", Select)
|
||||||
|
if select.value != Select.BLANK:
|
||||||
|
self.selected_method = select.value
|
||||||
|
self.app.pop_screen()
|
||||||
|
else:
|
||||||
|
self.app.notify("Please select a method", timeout=2)
|
||||||
|
elif event.button.id == "cancel_method":
|
||||||
|
self.app.pop_screen()
|
||||||
|
|
||||||
|
|
||||||
|
class OutputDirModal(ModalScreen):
|
||||||
|
"""Modal screen for selecting output directory."""
|
||||||
|
|
||||||
|
def __init__(self, config: ConfigManager):
|
||||||
|
"""Initialize output directory modal."""
|
||||||
|
super().__init__()
|
||||||
|
self.config = config
|
||||||
|
self.selected_dir: Optional[Path] = None
|
||||||
|
|
||||||
|
def compose(self) -> ComposeResult:
|
||||||
|
"""Compose modal widgets."""
|
||||||
|
yield Vertical(
|
||||||
|
Label("Select Output Directory"),
|
||||||
|
DirectoryTree("/", id="dir_tree"),
|
||||||
|
Container(
|
||||||
|
Button("Select", id="select_dir", variant="primary"),
|
||||||
|
Button("Cancel", id="cancel_dir"),
|
||||||
|
),
|
||||||
|
id="output_dir_modal",
|
||||||
|
)
|
||||||
|
|
||||||
|
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||||
|
"""Handle button press."""
|
||||||
|
if event.button.id == "select_dir":
|
||||||
|
tree = self.query_one("#dir_tree", DirectoryTree)
|
||||||
|
if tree.cursor_node:
|
||||||
|
self.selected_dir = Path(tree.cursor_node.data)
|
||||||
|
is_valid, error = FileHandler.validate_directory(self.selected_dir)
|
||||||
|
if is_valid:
|
||||||
|
self.config.set_output_directory(self.selected_dir)
|
||||||
|
self.app.pop_screen()
|
||||||
|
else:
|
||||||
|
self.app.notify(f"Error: {error}", timeout=3)
|
||||||
|
else:
|
||||||
|
self.app.notify("Please select a directory", timeout=2)
|
||||||
|
elif event.button.id == "cancel_dir":
|
||||||
|
self.app.pop_screen()
|
||||||
|
|
||||||
|
|
||||||
|
class FileSelectScreen(Screen):
|
||||||
|
"""Screen for selecting files to transcribe."""
|
||||||
|
|
||||||
|
def __init__(self, config: ConfigManager):
|
||||||
|
"""Initialize file selection screen."""
|
||||||
|
super().__init__()
|
||||||
|
self.config = config
|
||||||
|
self.selected_files: list[Path] = []
|
||||||
|
|
||||||
|
def compose(self) -> ComposeResult:
|
||||||
|
"""Compose screen widgets."""
|
||||||
|
yield Header()
|
||||||
|
yield Vertical(
|
||||||
|
Label("Select files to transcribe (Supported: MP3, WAV, M4A, FLAC, MP4, AVI, MKV, MOV)"),
|
||||||
|
DirectoryTree("/", id="file_tree"),
|
||||||
|
Static(id="file_info"),
|
||||||
|
Container(
|
||||||
|
Button("Add File", id="add_file", variant="primary"),
|
||||||
|
Button("Continue", id="continue_btn", variant="success"),
|
||||||
|
Button("Cancel", id="cancel_btn", variant="error"),
|
||||||
|
),
|
||||||
|
id="file_select_container",
|
||||||
|
)
|
||||||
|
yield Footer()
|
||||||
|
|
||||||
|
def on_mount(self) -> None:
|
||||||
|
"""Called when screen is mounted."""
|
||||||
|
self.query_one("#file_tree", DirectoryTree).focus()
|
||||||
|
|
||||||
|
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||||
|
"""Handle button press."""
|
||||||
|
if event.button.id == "add_file":
|
||||||
|
tree = self.query_one("#file_tree", DirectoryTree)
|
||||||
|
if tree.cursor_node:
|
||||||
|
file_path = Path(tree.cursor_node.data)
|
||||||
|
is_valid, error = FileHandler.validate_file(file_path)
|
||||||
|
if is_valid:
|
||||||
|
if file_path not in self.selected_files:
|
||||||
|
self.selected_files.append(file_path)
|
||||||
|
self._update_file_info()
|
||||||
|
else:
|
||||||
|
self.app.notify("File already added", timeout=2)
|
||||||
|
else:
|
||||||
|
self.app.notify(f"Error: {error}", timeout=2)
|
||||||
|
else:
|
||||||
|
self.app.notify("Please select a file", timeout=2)
|
||||||
|
elif event.button.id == "continue_btn":
|
||||||
|
if self.selected_files:
|
||||||
|
self.app.post_message(self.FileSelected(self.selected_files))
|
||||||
|
else:
|
||||||
|
self.app.notify("Please select at least one file", timeout=2)
|
||||||
|
elif event.button.id == "cancel_btn":
|
||||||
|
self.app.exit()
|
||||||
|
|
||||||
|
def _update_file_info(self) -> None:
|
||||||
|
"""Update file information display."""
|
||||||
|
info = self.query_one("#file_info", Static)
|
||||||
|
file_list = "\n".join([f"✓ {f.name}" for f in self.selected_files])
|
||||||
|
info.update(f"Selected files:\n{file_list}\n\nTotal: {len(self.selected_files)}")
|
||||||
|
|
||||||
|
class FileSelected:
|
||||||
|
"""Message for file selection."""
|
||||||
|
|
||||||
|
def __init__(self, files: list[Path]):
|
||||||
|
"""Initialize message."""
|
||||||
|
self.files = files
|
||||||
|
|
||||||
|
|
||||||
|
class ProgressScreen(Screen):
|
||||||
|
"""Screen showing transcription progress."""
|
||||||
|
|
||||||
|
def __init__(self, files: list[Path], method: str, config: ConfigManager):
|
||||||
|
"""Initialize progress screen."""
|
||||||
|
super().__init__()
|
||||||
|
self.files = files
|
||||||
|
self.method = method
|
||||||
|
self.config = config
|
||||||
|
self.service = TranscriptionService()
|
||||||
|
self.results: list[tuple[Path, str]] = []
|
||||||
|
|
||||||
|
def compose(self) -> ComposeResult:
|
||||||
|
"""Compose screen widgets."""
|
||||||
|
yield Header()
|
||||||
|
yield Vertical(
|
||||||
|
Label("Transcription Progress"),
|
||||||
|
RichLog(id="progress_log", markup=True),
|
||||||
|
Container(
|
||||||
|
Button("View Results", id="view_results", variant="primary"),
|
||||||
|
Button("Exit", id="exit_btn", variant="error"),
|
||||||
|
id="progress_controls",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
yield Footer()
|
||||||
|
|
||||||
|
def on_mount(self) -> None:
|
||||||
|
"""Called when screen is mounted."""
|
||||||
|
self.app.call_later(self._run_transcription)
|
||||||
|
|
||||||
|
def _run_transcription(self) -> None:
|
||||||
|
"""Run transcription on all files."""
|
||||||
|
log = self.query_one("#progress_log", RichLog)
|
||||||
|
|
||||||
|
# Setup transcription service
|
||||||
|
try:
|
||||||
|
if self.method == "openai":
|
||||||
|
api_key = self.config.get_api_key()
|
||||||
|
if not api_key:
|
||||||
|
log.write("[red]Error: No API key configured[/red]")
|
||||||
|
return
|
||||||
|
self.service.set_openai_backend(api_key)
|
||||||
|
log.write("[green]Using OpenAI Whisper API[/green]")
|
||||||
|
else:
|
||||||
|
model_size = self.config.get_whisper_model()
|
||||||
|
self.service.set_local_backend(model_size)
|
||||||
|
log.write(f"[green]Using Local Whisper ({model_size})[/green]")
|
||||||
|
except Exception as e:
|
||||||
|
log.write(f"[red]Error initializing backend: {str(e)}[/red]")
|
||||||
|
return
|
||||||
|
|
||||||
|
output_dir = self.config.get_output_directory()
|
||||||
|
if not output_dir:
|
||||||
|
log.write("[red]Error: Output directory not configured[/red]")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Process each file
|
||||||
|
for i, file_path in enumerate(self.files, 1):
|
||||||
|
log.write(f"\n[yellow]Processing {i}/{len(self.files)}: {file_path.name}[/yellow]")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Transcribe
|
||||||
|
result = self.service.transcribe(file_path)
|
||||||
|
log.write(f"[green]✓ Transcribed[/green]")
|
||||||
|
|
||||||
|
# Format as markdown
|
||||||
|
markdown = MarkdownFormatter.format_transcription(
|
||||||
|
result["text"],
|
||||||
|
file_path,
|
||||||
|
result.get("duration", 0.0),
|
||||||
|
result.get("language", "en"),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Save to file
|
||||||
|
output_filename = MarkdownFormatter.get_output_filename(file_path)
|
||||||
|
output_path = FileHandler.get_output_path(file_path, output_dir, output_filename)
|
||||||
|
output_path.write_text(markdown, encoding="utf-8")
|
||||||
|
log.write(f"[green]✓ Saved to {output_path.name}[/green]")
|
||||||
|
|
||||||
|
self.results.append((file_path, str(output_path)))
|
||||||
|
except Exception as e:
|
||||||
|
log.write(f"[red]✗ Error: {str(e)}[/red]")
|
||||||
|
|
||||||
|
log.write(f"\n[cyan]Completed {len(self.results)}/{len(self.files)} files[/cyan]")
|
||||||
|
|
||||||
|
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||||
|
"""Handle button press."""
|
||||||
|
if event.button.id == "view_results":
|
||||||
|
self.app.push_screen(ResultsScreen(self.results))
|
||||||
|
elif event.button.id == "exit_btn":
|
||||||
|
self.app.exit()
|
||||||
|
|
||||||
|
|
||||||
|
class ResultsScreen(Screen):
|
||||||
|
"""Screen displaying transcription results."""
|
||||||
|
|
||||||
|
def __init__(self, results: list[tuple[Path, str]]):
|
||||||
|
"""Initialize results screen."""
|
||||||
|
super().__init__()
|
||||||
|
self.results = results
|
||||||
|
|
||||||
|
def compose(self) -> ComposeResult:
|
||||||
|
"""Compose screen widgets."""
|
||||||
|
yield Header()
|
||||||
|
yield Vertical(
|
||||||
|
Label("Transcription Results"),
|
||||||
|
RichLog(id="results_log", markup=True),
|
||||||
|
Container(
|
||||||
|
Button("Back", id="back_btn"),
|
||||||
|
Button("Exit", id="exit_btn", variant="error"),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
yield Footer()
|
||||||
|
|
||||||
|
def on_mount(self) -> None:
|
||||||
|
"""Called when screen is mounted."""
|
||||||
|
log = self.query_one("#results_log", RichLog)
|
||||||
|
log.write("[cyan]Transcription Results[/cyan]\n")
|
||||||
|
for source, output in self.results:
|
||||||
|
log.write(f"[green]✓[/green] {source.name}")
|
||||||
|
log.write(f" → {Path(output).name}\n")
|
||||||
|
|
||||||
|
def on_button_pressed(self, event: Button.Pressed) -> None:
|
||||||
|
"""Handle button press."""
|
||||||
|
if event.button.id == "back_btn":
|
||||||
|
self.app.pop_screen()
|
||||||
|
elif event.button.id == "exit_btn":
|
||||||
|
self.app.exit()
|
||||||
|
|
||||||
|
|
||||||
|
class TranscriptionApp(App):
|
||||||
|
"""Main transcription application."""
|
||||||
|
|
||||||
|
CSS = """
|
||||||
|
Screen {
|
||||||
|
layout: vertical;
|
||||||
|
}
|
||||||
|
|
||||||
|
#api_key_modal {
|
||||||
|
width: 60;
|
||||||
|
height: 12;
|
||||||
|
border: solid green;
|
||||||
|
}
|
||||||
|
|
||||||
|
#method_modal {
|
||||||
|
width: 50;
|
||||||
|
height: 10;
|
||||||
|
border: solid blue;
|
||||||
|
}
|
||||||
|
|
||||||
|
#output_dir_modal {
|
||||||
|
width: 80;
|
||||||
|
height: 20;
|
||||||
|
border: solid purple;
|
||||||
|
}
|
||||||
|
|
||||||
|
#file_select_container {
|
||||||
|
width: 100%;
|
||||||
|
height: 100%;
|
||||||
|
}
|
||||||
|
|
||||||
|
DirectoryTree {
|
||||||
|
width: 1fr;
|
||||||
|
height: 1fr;
|
||||||
|
}
|
||||||
|
|
||||||
|
#file_info {
|
||||||
|
width: 100%;
|
||||||
|
height: auto;
|
||||||
|
border: solid $accent;
|
||||||
|
padding: 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
#progress_log {
|
||||||
|
width: 100%;
|
||||||
|
height: 1fr;
|
||||||
|
border: solid $accent;
|
||||||
|
padding: 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
Container {
|
||||||
|
height: auto;
|
||||||
|
margin: 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
Button {
|
||||||
|
margin-right: 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
Label {
|
||||||
|
margin-bottom: 1;
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize application."""
|
||||||
|
super().__init__()
|
||||||
|
self.config = ConfigManager()
|
||||||
|
|
||||||
|
def on_mount(self) -> None:
|
||||||
|
"""Called when app is mounted."""
|
||||||
|
self.title = "Whisper Transcription TUI"
|
||||||
|
self._check_setup()
|
||||||
|
|
||||||
|
def _check_setup(self) -> None:
|
||||||
|
"""Check if setup is needed."""
|
||||||
|
if not self.config.output_directory_configured():
|
||||||
|
self.push_screen_wait(OutputDirModal(self.config), self._output_dir_set)
|
||||||
|
else:
|
||||||
|
self.push_screen(FileSelectScreen(self.config))
|
||||||
|
|
||||||
|
def _output_dir_set(self) -> None:
|
||||||
|
"""Called when output directory is set."""
|
||||||
|
self.push_screen(FileSelectScreen(self.config))
|
||||||
|
|
||||||
|
def on_file_select_screen_file_selected(self, message: FileSelectScreen.FileSelected) -> None:
|
||||||
|
"""Handle file selection."""
|
||||||
|
self.push_screen_wait(MethodSelectModal(), self._method_selected(message.files))
|
||||||
|
|
||||||
|
def _method_selected(self, files: list[Path]):
|
||||||
|
"""Return handler for method selection."""
|
||||||
|
def handler(modal: MethodSelectModal) -> None:
|
||||||
|
if modal.selected_method:
|
||||||
|
if modal.selected_method == "openai":
|
||||||
|
if not self.config.api_key_configured():
|
||||||
|
self.push_screen_wait(ApiKeyModal(self.config), lambda: self._start_transcription(files, "openai"))
|
||||||
|
else:
|
||||||
|
self._start_transcription(files, "openai")
|
||||||
|
else:
|
||||||
|
self._start_transcription(files, "local")
|
||||||
|
return handler
|
||||||
|
|
||||||
|
def _start_transcription(self, files: list[Path], method: str) -> None:
|
||||||
|
"""Start transcription process."""
|
||||||
|
self.push_screen(ProgressScreen(files, method, self.config))
|
||||||
93
src/config.py
Normal file
93
src/config.py
Normal file
@ -0,0 +1,93 @@
|
|||||||
|
"""Configuration management for API keys and output directory."""
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from dotenv import load_dotenv, set_key
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manages configuration including API keys and output directory preferences."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize config manager and load existing configuration."""
|
||||||
|
self.env_path = Path(".env")
|
||||||
|
self.config_path = Path("config.json")
|
||||||
|
self.config_data: dict = {}
|
||||||
|
load_dotenv()
|
||||||
|
self._load_config()
|
||||||
|
|
||||||
|
def _load_config(self) -> None:
|
||||||
|
"""Load configuration from config.json if it exists."""
|
||||||
|
if self.config_path.exists():
|
||||||
|
try:
|
||||||
|
with open(self.config_path, "r") as f:
|
||||||
|
self.config_data = json.load(f)
|
||||||
|
except (json.JSONDecodeError, IOError):
|
||||||
|
self.config_data = {}
|
||||||
|
|
||||||
|
def _save_config(self) -> None:
|
||||||
|
"""Save configuration to config.json."""
|
||||||
|
with open(self.config_path, "w") as f:
|
||||||
|
json.dump(self.config_data, f, indent=2)
|
||||||
|
|
||||||
|
def get_api_key(self) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Get OpenAI API key from environment.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The API key if set, None otherwise.
|
||||||
|
"""
|
||||||
|
return os.getenv("OPENAI_API_KEY")
|
||||||
|
|
||||||
|
def set_api_key(self, api_key: str) -> None:
|
||||||
|
"""
|
||||||
|
Save API key to .env file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
api_key: The API key to save.
|
||||||
|
"""
|
||||||
|
if not self.env_path.exists():
|
||||||
|
self.env_path.write_text("# OpenAI Configuration\n")
|
||||||
|
set_key(str(self.env_path), "OPENAI_API_KEY", api_key)
|
||||||
|
os.environ["OPENAI_API_KEY"] = api_key
|
||||||
|
|
||||||
|
def get_output_directory(self) -> Optional[Path]:
|
||||||
|
"""
|
||||||
|
Get the configured output directory.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to output directory if configured, None otherwise.
|
||||||
|
"""
|
||||||
|
output_dir = self.config_data.get("output_directory")
|
||||||
|
if output_dir:
|
||||||
|
return Path(output_dir)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def set_output_directory(self, directory: Path) -> None:
|
||||||
|
"""
|
||||||
|
Save output directory preference to config.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
directory: The output directory path.
|
||||||
|
"""
|
||||||
|
self.config_data["output_directory"] = str(directory.resolve())
|
||||||
|
self._save_config()
|
||||||
|
|
||||||
|
def get_whisper_model(self) -> str:
|
||||||
|
"""
|
||||||
|
Get the Whisper model size from environment.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Model size (default: "base").
|
||||||
|
"""
|
||||||
|
return os.getenv("WHISPER_MODEL", "base")
|
||||||
|
|
||||||
|
def api_key_configured(self) -> bool:
|
||||||
|
"""Check if API key is configured."""
|
||||||
|
return bool(self.get_api_key())
|
||||||
|
|
||||||
|
def output_directory_configured(self) -> bool:
|
||||||
|
"""Check if output directory is configured."""
|
||||||
|
return bool(self.get_output_directory())
|
||||||
121
src/file_handler.py
Normal file
121
src/file_handler.py
Normal file
@ -0,0 +1,121 @@
|
|||||||
|
"""File handling and validation for audio/video files."""
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
SUPPORTED_AUDIO_FORMATS = {
|
||||||
|
".mp3", ".wav", ".m4a", ".flac", ".ogg", ".wma", ".aac",
|
||||||
|
}
|
||||||
|
|
||||||
|
SUPPORTED_VIDEO_FORMATS = {
|
||||||
|
".mp4", ".avi", ".mkv", ".mov", ".webm", ".flv", ".wmv", ".m4v",
|
||||||
|
}
|
||||||
|
|
||||||
|
SUPPORTED_FORMATS = SUPPORTED_AUDIO_FORMATS | SUPPORTED_VIDEO_FORMATS
|
||||||
|
|
||||||
|
|
||||||
|
class FileHandler:
|
||||||
|
"""Handles file operations for transcription."""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def is_supported_file(file_path: Path) -> bool:
|
||||||
|
"""
|
||||||
|
Check if file is a supported audio/video format.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Path to file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if file is supported, False otherwise.
|
||||||
|
"""
|
||||||
|
return file_path.suffix.lower() in SUPPORTED_FORMATS
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def validate_file(file_path: Path) -> tuple[bool, str]:
|
||||||
|
"""
|
||||||
|
Validate that file exists and is supported.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Path to file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (is_valid, error_message).
|
||||||
|
"""
|
||||||
|
if not file_path.exists():
|
||||||
|
return False, f"File not found: {file_path}"
|
||||||
|
|
||||||
|
if not file_path.is_file():
|
||||||
|
return False, f"Path is not a file: {file_path}"
|
||||||
|
|
||||||
|
if not FileHandler.is_supported_file(file_path):
|
||||||
|
return False, f"Unsupported format: {file_path.suffix}"
|
||||||
|
|
||||||
|
return True, ""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def validate_directory(directory: Path) -> tuple[bool, str]:
|
||||||
|
"""
|
||||||
|
Validate that directory exists and is writable.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
directory: Path to directory.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (is_valid, error_message).
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
directory.mkdir(parents=True, exist_ok=True)
|
||||||
|
test_file = directory / ".write_test"
|
||||||
|
test_file.write_text("test")
|
||||||
|
test_file.unlink()
|
||||||
|
return True, ""
|
||||||
|
except Exception as e:
|
||||||
|
return False, f"Directory not writable: {str(e)}"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_output_path(file_path: Path, output_dir: Path, filename: str) -> Path:
|
||||||
|
"""
|
||||||
|
Get the output path for a transcription.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Source audio/video file.
|
||||||
|
output_dir: Output directory.
|
||||||
|
filename: Output filename.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full output path.
|
||||||
|
"""
|
||||||
|
output_path = output_dir / filename
|
||||||
|
counter = 1
|
||||||
|
base_stem = output_path.stem
|
||||||
|
suffix = output_path.suffix
|
||||||
|
|
||||||
|
while output_path.exists():
|
||||||
|
output_path = output_dir / f"{base_stem}_{counter}{suffix}"
|
||||||
|
counter += 1
|
||||||
|
|
||||||
|
return output_path
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_supported_formats_display() -> str:
|
||||||
|
"""
|
||||||
|
Get display string of supported formats.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted string listing supported formats.
|
||||||
|
"""
|
||||||
|
audio = ", ".join(sorted(SUPPORTED_AUDIO_FORMATS))
|
||||||
|
video = ", ".join(sorted(SUPPORTED_VIDEO_FORMATS))
|
||||||
|
return f"Audio: {audio}\nVideo: {video}"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def filter_supported_files(paths: List[Path]) -> List[Path]:
|
||||||
|
"""
|
||||||
|
Filter a list of paths to only supported files.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
paths: List of file paths.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Filtered list of supported files.
|
||||||
|
"""
|
||||||
|
return [p for p in paths if p.is_file() and FileHandler.is_supported_file(p)]
|
||||||
79
src/formatter.py
Normal file
79
src/formatter.py
Normal file
@ -0,0 +1,79 @@
|
|||||||
|
"""Markdown formatting for transcriptions."""
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
class MarkdownFormatter:
|
||||||
|
"""Formats transcriptions as markdown with metadata headers."""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_transcription(
|
||||||
|
text: str,
|
||||||
|
source_file: Path,
|
||||||
|
duration: float = 0.0,
|
||||||
|
language: str = "en",
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Format transcription as markdown with metadata.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: The transcription text.
|
||||||
|
source_file: Path to the source audio/video file.
|
||||||
|
duration: Duration of the audio in seconds.
|
||||||
|
language: Language code of the transcription.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted markdown string.
|
||||||
|
"""
|
||||||
|
timestamp = datetime.now().isoformat()
|
||||||
|
word_count = len(text.split())
|
||||||
|
|
||||||
|
markdown = f"""# Transcription: {source_file.name}
|
||||||
|
|
||||||
|
**Source File:** {source_file.name}
|
||||||
|
**Date:** {timestamp}
|
||||||
|
**Duration:** {MarkdownFormatter._format_duration(duration)}
|
||||||
|
**Language:** {language}
|
||||||
|
**Word Count:** {word_count}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
{text}
|
||||||
|
|
||||||
|
"""
|
||||||
|
return markdown
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _format_duration(seconds: float) -> str:
|
||||||
|
"""
|
||||||
|
Format duration in seconds to human-readable format.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
seconds: Duration in seconds.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted duration string (e.g., "1:23:45").
|
||||||
|
"""
|
||||||
|
if seconds <= 0:
|
||||||
|
return "Unknown"
|
||||||
|
|
||||||
|
hours = int(seconds // 3600)
|
||||||
|
minutes = int((seconds % 3600) // 60)
|
||||||
|
secs = int(seconds % 60)
|
||||||
|
|
||||||
|
if hours > 0:
|
||||||
|
return f"{hours}:{minutes:02d}:{secs:02d}"
|
||||||
|
return f"{minutes}:{secs:02d}"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_output_filename(source_file: Path) -> str:
|
||||||
|
"""
|
||||||
|
Generate markdown filename from source file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_file: Path to the source audio/video file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Filename with .md extension.
|
||||||
|
"""
|
||||||
|
return f"{source_file.stem}_transcription.md"
|
||||||
141
src/transcriber.py
Normal file
141
src/transcriber.py
Normal file
@ -0,0 +1,141 @@
|
|||||||
|
"""Transcription service supporting both OpenAI API and local Whisper."""
|
||||||
|
from abc import ABC, abstractmethod
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import whisper
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
|
||||||
|
class TranscriberBackend(ABC):
|
||||||
|
"""Abstract base class for transcription backends."""
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def transcribe(self, file_path: Path) -> dict:
|
||||||
|
"""
|
||||||
|
Transcribe audio file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Path to audio/video file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with 'text' and optional 'language' and 'duration' keys.
|
||||||
|
"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class OpenAITranscriber(TranscriberBackend):
|
||||||
|
"""OpenAI Whisper API transcriber."""
|
||||||
|
|
||||||
|
def __init__(self, api_key: str):
|
||||||
|
"""
|
||||||
|
Initialize OpenAI transcriber.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
api_key: OpenAI API key.
|
||||||
|
"""
|
||||||
|
self.client = OpenAI(api_key=api_key)
|
||||||
|
|
||||||
|
def transcribe(self, file_path: Path) -> dict:
|
||||||
|
"""
|
||||||
|
Transcribe using OpenAI API.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Path to audio/video file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with transcription result.
|
||||||
|
"""
|
||||||
|
with open(file_path, "rb") as audio_file:
|
||||||
|
transcript = self.client.audio.transcriptions.create(
|
||||||
|
model="whisper-1",
|
||||||
|
file=audio_file,
|
||||||
|
response_format="verbose_json",
|
||||||
|
language="en",
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"text": transcript.text,
|
||||||
|
"language": getattr(transcript, "language", "en"),
|
||||||
|
"duration": getattr(transcript, "duration", 0.0),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class LocalWhisperTranscriber(TranscriberBackend):
|
||||||
|
"""Local Whisper model transcriber."""
|
||||||
|
|
||||||
|
def __init__(self, model_size: str = "base"):
|
||||||
|
"""
|
||||||
|
Initialize local Whisper transcriber.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_size: Size of the Whisper model (tiny, base, small, medium, large).
|
||||||
|
"""
|
||||||
|
self.model_size = model_size
|
||||||
|
self.model = whisper.load_model(model_size)
|
||||||
|
|
||||||
|
def transcribe(self, file_path: Path) -> dict:
|
||||||
|
"""
|
||||||
|
Transcribe using local Whisper model.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Path to audio/video file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with transcription result.
|
||||||
|
"""
|
||||||
|
result = self.model.transcribe(
|
||||||
|
str(file_path),
|
||||||
|
language="en",
|
||||||
|
fp16=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"text": result["text"],
|
||||||
|
"language": result.get("language", "en"),
|
||||||
|
"duration": result.get("duration", 0.0),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class TranscriptionService:
|
||||||
|
"""Main transcription service that manages both backends."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize transcription service."""
|
||||||
|
self.transcriber: Optional[TranscriberBackend] = None
|
||||||
|
|
||||||
|
def set_openai_backend(self, api_key: str) -> None:
|
||||||
|
"""
|
||||||
|
Set OpenAI as the transcription backend.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
api_key: OpenAI API key.
|
||||||
|
"""
|
||||||
|
self.transcriber = OpenAITranscriber(api_key)
|
||||||
|
|
||||||
|
def set_local_backend(self, model_size: str = "base") -> None:
|
||||||
|
"""
|
||||||
|
Set local Whisper as the transcription backend.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_size: Size of the Whisper model.
|
||||||
|
"""
|
||||||
|
self.transcriber = LocalWhisperTranscriber(model_size)
|
||||||
|
|
||||||
|
def transcribe(self, file_path: Path) -> dict:
|
||||||
|
"""
|
||||||
|
Transcribe audio file using configured backend.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path: Path to audio/video file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with transcription result.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
RuntimeError: If no backend is configured.
|
||||||
|
"""
|
||||||
|
if not self.transcriber:
|
||||||
|
raise RuntimeError("No transcription backend configured")
|
||||||
|
|
||||||
|
return self.transcriber.transcribe(file_path)
|
||||||
Loading…
x
Reference in New Issue
Block a user