feat: Initial Whisper Transcription TUI implementation

- Added Textual-based TUI with file selection and progress monitoring - Implemented transcription service with OpenAI API and local Whisper backends - Added markdown formatter for transcription output - Configuration management for persistent API keys and output directory - Comprehensive README with installation and usage instructions - Support for multi-file batch processing - Beautiful terminal UI with modal dialogs for user input
2025-10-20 16:43:35 -06:00 · 2025-10-20 16:43:35 -06:00 · c5ccd14048
commit c5ccd14048
parent 7e405f7354
11 changed files with 1237 additions and 15 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,8 @@
+# OpenAI API Key (required for API-based transcription)
+# Get your key from: https://platform.openai.com/api-keys
+OPENAI_API_KEY=your_api_key_here
+
+# Optional: Whisper Model Size for local transcription
+# Options: tiny, base, small, medium, large
+# Default: base
+WHISPER_MODEL=base
--- a/.gitignore
+++ b/.gitignore
@ -1,2 +1,55 @@
-opencode
+# Environment and Configuration
 .env
+.env.local
+config.json
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# Virtual Environments
+venv/
+ENV/
+env/
+.venv
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Test and coverage
+.pytest_cache/
+.coverage
+htmlcov/
+
+# Output
+transcriptions/
+output/
--- a/README.md
+++ b/README.md
@ -0,0 +1,258 @@
+# Whisper Transcription TUI
+
+A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.
+
+## Features
+
+- **Dual Transcription Methods**:
+  - OpenAI Whisper API (fast, cloud-based, paid per minute)
+  - Local Whisper (free, offline, slower)
+- **Flexible File Selection**: Browse and select single or multiple audio/video files
+- **Persistent Configuration**: Remember API keys and output directory between sessions
+- **Beautiful TUI**: Modern terminal interface similar to OpenCode
+- **Markdown Output**: Transcriptions saved as markdown with metadata headers
+- **Multi-file Processing**: Batch transcribe multiple files sequentially
+
+## Supported Formats
+
+### Audio
+MP3, WAV, M4A, FLAC, OGG, WMA, AAC
+
+### Video
+MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V
+
+## Installation
+
+### Prerequisites
+
+- Python 3.8 or higher
+- FFmpeg (for audio/video processing)
+
+### FFmpeg Installation
+
+#### Windows
+```bash
+# Using Chocolatey
+choco install ffmpeg
+
+# Or using Scoop
+scoop install ffmpeg
+
+# Or download from https://ffmpeg.org/download.html
+```
+
+#### macOS
+```bash
+brew install ffmpeg
+```
+
+#### Linux
+```bash
+# Ubuntu/Debian
+sudo apt-get install ffmpeg
+
+# Fedora
+sudo dnf install ffmpeg
+
+# Arch
+sudo pacman -S ffmpeg
+```
+
+### Python Package Installation
+
+1. Clone the repository:
+```bash
+git clone <repository-url>
+cd 00_Whisper
+```
+
+2. Create and activate a virtual environment:
+```bash
+# Windows
+python -m venv venv
+venv\Scripts\activate
+
+# macOS/Linux
+python3 -m venv venv
+source venv/bin/activate
+```
+
+3. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+
+## Configuration
+
+### OpenAI API Key Setup
+
+1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
+2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
+3. The key will be saved to `.env` file and reused in future sessions
+
+⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`.
+
+### Local Whisper Model
+
+You can configure the local Whisper model size by editing your `.env` file:
+
+```bash
+# Options: tiny, base, small, medium, large
+# Default: base
+WHISPER_MODEL=base
+```
+
+Larger models are more accurate but require more memory and time.
+
+## Usage
+
+### Running the Application
+
+```bash
+python main.py
+```
+
+### Basic Workflow
+
+1. **Select Output Directory** (first time only)
+   - Navigate using arrow keys
+   - Press "Select" to choose a directory for transcription outputs
+
+2. **Select Files**
+   - Navigate the directory tree to find your audio/video files
+   - Click "Add File" to add files to the queue
+   - Add multiple files or just one
+   - Click "Continue" when done
+
+3. **Choose Transcription Method**
+   - Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
+   - If using OpenAI and no key is configured, you'll be prompted to enter it
+
+4. **View Progress**
+   - Monitor transcription progress in real-time
+   - Each file is processed sequentially
+
+5. **Review Results**
+   - See list of successfully transcribed files
+   - Output files are saved as markdown in your configured directory
+
+### Output Format
+
+Each transcription is saved as a markdown file with the format: `{filename}_transcription.md`
+
+```markdown
+# Transcription: meeting.mp4
+
+**Source File:** meeting.mp4
+**Date:** 2025-01-20T10:30:45.123456
+**Duration:** 1:23:45
+**Language:** en
+**Word Count:** 2847
+
+---
+
+[Transcribed text here...]
+```
+
+## Keyboard Shortcuts
+
+- **Tab**: Navigate between buttons and widgets
+- **Enter**: Activate selected button
+- **Arrow Keys**: Navigate menus and trees
+- **Ctrl+C**: Exit application at any time
+
+## Troubleshooting
+
+### "API key not found" error
+Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains:
+```
+OPENAI_API_KEY=your_key_here
+```
+
+### "File format not supported" error
+Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.
+
+### Local Whisper is very slow
+This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`:
+```
+WHISPER_MODEL=tiny
+```
+
+### FFmpeg not found error
+Ensure FFmpeg is installed and in your system PATH. Test by running:
+```bash
+ffmpeg -version
+```
+
+### Permission denied errors
+Ensure the output directory is writable and the application has permission to create files there.
+
+## Project Structure
+
+```
+.
+├── main.py                    # Entry point
+├── requirements.txt           # Python dependencies
+├── .env.example              # Environment template
+├── .gitignore                # Git ignore rules
+├── README.md                 # This file
+└── src/
+    ├── __init__.py
+    ├── app.py               # Main TUI application
+    ├── config.py            # Configuration management
+    ├── transcriber.py       # Transcription service
+    ├── formatter.py         # Markdown formatting
+    └── file_handler.py      # File handling utilities
+```
+
+## Development
+
+### Running Tests
+```bash
+# Coming soon
+```
+
+### Building Documentation
+```bash
+# Coming soon
+```
+
+## Performance Tips
+
+1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files
+2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better
+3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory
+4. **Batch Processing**: Process multiple files to amortize setup time
+
+## API Costs
+
+### OpenAI Whisper API
+- Current pricing: $0.02 per minute of audio
+- Minimum charge per request
+
+Check [OpenAI Pricing](https://openai.com/pricing/) for current rates.
+
+## Limitations
+
+- Local Whisper models can be large (up to 3GB for "large")
+- First-time local Whisper setup downloads the model (~1-2GB for "base")
+- OpenAI API requires internet connection and active API credits
+- Batch processing is sequential (one file at a time)
+
+## Contributing
+
+Contributions are welcome! Please feel free to submit pull requests or open issues.
+
+## License
+
+This project is provided as-is for educational and personal use.
+
+## Support
+
+For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.
+
+## Acknowledgments
+
+- [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology
+- [Textual](https://github.com/textualize/textual) for the TUI framework
+- [OpenCode](https://opencode.ai) for UI/UX inspiration
--- a/main.py
+++ b/main.py
@ -0,0 +1,26 @@
+#!/usr/bin/env python3
+"""Whisper Transcription TUI - Main entry point."""
+import sys
+from pathlib import Path
+
+# Add src to path for imports
+sys.path.insert(0, str(Path(__file__).parent / "src"))
+
+from app import TranscriptionApp
+
+
+def main() -> None:
+    """Run the transcription application."""
+    try:
+        app = TranscriptionApp()
+        app.run()
+    except KeyboardInterrupt:
+        print("\nApplication interrupted by user.")
+        sys.exit(0)
+    except Exception as e:
+        print(f"Error: {str(e)}", file=sys.stderr)
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/requirements.txt
+++ b/requirements.txt
@ -1,19 +1,17 @@
-# Minimal requirements for Phase 1 MVP
-# UI / tray choices are optional at this stage - keep core deps minimal
+# TUI Transcription Application Dependencies

-# For local whisper (optional when implemented)
-# faster-whisper
+# Terminal UI Framework
+textual>=0.40.0

-# For audio capture (choose one)
-# sounddevice
-# pyaudio
+# OpenAI API and Whisper
+openai>=1.0.0
+openai-whisper>=20240314

-# OpenAI API client
-openai>=0.27.0
+# Configuration and Environment
+python-dotenv>=1.0.0

-# Optional: a GUI toolkit for Phase 2
-# PySide6
+# Utilities and Formatting
+rich>=13.0.0

-# For packaging and utilities
-rich
-python-dotenv
+# Audio processing
+ffmpeg-python>=0.2.1
--- a/src/init.py
+++ b/src/init.py
@ -0,0 +1 @@
+"""Whisper Transcription TUI Application"""
--- a/src/app.py
+++ b/src/app.py
@ -0,0 +1,444 @@
+"""Main Textual TUI application for transcription."""
+from pathlib import Path
+from typing import Optional
+
+from textual.app import ComposeResult, App
+from textual.containers import Container, Vertical
+from textual.screen import Screen, ModalScreen
+from textual.widgets import (
+    Header,
+    Footer,
+    Static,
+    Input,
+    Button,
+    Label,
+    DirectoryTree,
+    SelectionList,
+    Select,
+    RichLog,
+)
+from textual.widgets.selection_list import Selection
+
+from .config import ConfigManager
+from .file_handler import FileHandler
+from .transcriber import TranscriptionService
+from .formatter import MarkdownFormatter
+
+
+class ApiKeyModal(ModalScreen):
+    """Modal screen for entering OpenAI API key."""
+
+    def __init__(self, config: ConfigManager):
+        """Initialize API key modal."""
+        super().__init__()
+        self.config = config
+        self.api_key: Optional[str] = None
+
+    def compose(self) -> ComposeResult:
+        """Compose modal widgets."""
+        yield Vertical(
+            Label("OpenAI API Key Required"),
+            Label("Enter your OpenAI API key (get it from https://platform.openai.com/api-keys):"),
+            Input(id="api_key_input", password=True),
+            Container(
+                Button("Save", id="save_api", variant="primary"),
+                Button("Cancel", id="cancel_api"),
+            ),
+            id="api_key_modal",
+        )
+
+    def on_button_pressed(self, event: Button.Pressed) -> None:
+        """Handle button press."""
+        if event.button.id == "save_api":
+            api_key_input = self.query_one("#api_key_input", Input)
+            if api_key_input.value.strip():
+                self.api_key = api_key_input.value.strip()
+                self.config.set_api_key(self.api_key)
+                self.app.pop_screen()
+            else:
+                self.app.notify("API key cannot be empty", timeout=2)
+        elif event.button.id == "cancel_api":
+            self.app.pop_screen()
+
+
+class MethodSelectModal(ModalScreen):
+    """Modal screen for selecting transcription method."""
+
+    def __init__(self):
+        """Initialize method selection modal."""
+        super().__init__()
+        self.selected_method: Optional[str] = None
+
+    def compose(self) -> ComposeResult:
+        """Compose modal widgets."""
+        yield Vertical(
+            Label("Select Transcription Method"),
+            Select(
+                options=[
+                    ("OpenAI Whisper API (fast, costs money)", "openai"),
+                    ("Local Whisper (free, slower)", "local"),
+                ],
+                id="method_select",
+            ),
+            Container(
+                Button("Select", id="select_method", variant="primary"),
+                Button("Cancel", id="cancel_method"),
+            ),
+            id="method_modal",
+        )
+
+    def on_button_pressed(self, event: Button.Pressed) -> None:
+        """Handle button press."""
+        if event.button.id == "select_method":
+            select = self.query_one("#method_select", Select)
+            if select.value != Select.BLANK:
+                self.selected_method = select.value
+                self.app.pop_screen()
+            else:
+                self.app.notify("Please select a method", timeout=2)
+        elif event.button.id == "cancel_method":
+            self.app.pop_screen()
+
+
+class OutputDirModal(ModalScreen):
+    """Modal screen for selecting output directory."""
+
+    def __init__(self, config: ConfigManager):
+        """Initialize output directory modal."""
+        super().__init__()
+        self.config = config
+        self.selected_dir: Optional[Path] = None
+
+    def compose(self) -> ComposeResult:
+        """Compose modal widgets."""
+        yield Vertical(
+            Label("Select Output Directory"),
+            DirectoryTree("/", id="dir_tree"),
+            Container(
+                Button("Select", id="select_dir", variant="primary"),
+                Button("Cancel", id="cancel_dir"),
+            ),
+            id="output_dir_modal",
+        )
+
+    def on_button_pressed(self, event: Button.Pressed) -> None:
+        """Handle button press."""
+        if event.button.id == "select_dir":
+            tree = self.query_one("#dir_tree", DirectoryTree)
+            if tree.cursor_node:
+                self.selected_dir = Path(tree.cursor_node.data)
+                is_valid, error = FileHandler.validate_directory(self.selected_dir)
+                if is_valid:
+                    self.config.set_output_directory(self.selected_dir)
+                    self.app.pop_screen()
+                else:
+                    self.app.notify(f"Error: {error}", timeout=3)
+            else:
+                self.app.notify("Please select a directory", timeout=2)
+        elif event.button.id == "cancel_dir":
+            self.app.pop_screen()
+
+
+class FileSelectScreen(Screen):
+    """Screen for selecting files to transcribe."""
+
+    def __init__(self, config: ConfigManager):
+        """Initialize file selection screen."""
+        super().__init__()
+        self.config = config
+        self.selected_files: list[Path] = []
+
+    def compose(self) -> ComposeResult:
+        """Compose screen widgets."""
+        yield Header()
+        yield Vertical(
+            Label("Select files to transcribe (Supported: MP3, WAV, M4A, FLAC, MP4, AVI, MKV, MOV)"),
+            DirectoryTree("/", id="file_tree"),
+            Static(id="file_info"),
+            Container(
+                Button("Add File", id="add_file", variant="primary"),
+                Button("Continue", id="continue_btn", variant="success"),
+                Button("Cancel", id="cancel_btn", variant="error"),
+            ),
+            id="file_select_container",
+        )
+        yield Footer()
+
+    def on_mount(self) -> None:
+        """Called when screen is mounted."""
+        self.query_one("#file_tree", DirectoryTree).focus()
+
+    def on_button_pressed(self, event: Button.Pressed) -> None:
+        """Handle button press."""
+        if event.button.id == "add_file":
+            tree = self.query_one("#file_tree", DirectoryTree)
+            if tree.cursor_node:
+                file_path = Path(tree.cursor_node.data)
+                is_valid, error = FileHandler.validate_file(file_path)
+                if is_valid:
+                    if file_path not in self.selected_files:
+                        self.selected_files.append(file_path)
+                        self._update_file_info()
+                    else:
+                        self.app.notify("File already added", timeout=2)
+                else:
+                    self.app.notify(f"Error: {error}", timeout=2)
+            else:
+                self.app.notify("Please select a file", timeout=2)
+        elif event.button.id == "continue_btn":
+            if self.selected_files:
+                self.app.post_message(self.FileSelected(self.selected_files))
+            else:
+                self.app.notify("Please select at least one file", timeout=2)
+        elif event.button.id == "cancel_btn":
+            self.app.exit()
+
+    def _update_file_info(self) -> None:
+        """Update file information display."""
+        info = self.query_one("#file_info", Static)
+        file_list = "\n".join([f"✓ {f.name}" for f in self.selected_files])
+        info.update(f"Selected files:\n{file_list}\n\nTotal: {len(self.selected_files)}")
+
+    class FileSelected:
+        """Message for file selection."""
+
+        def __init__(self, files: list[Path]):
+            """Initialize message."""
+            self.files = files
+
+
+class ProgressScreen(Screen):
+    """Screen showing transcription progress."""
+
+    def __init__(self, files: list[Path], method: str, config: ConfigManager):
+        """Initialize progress screen."""
+        super().__init__()
+        self.files = files
+        self.method = method
+        self.config = config
+        self.service = TranscriptionService()
+        self.results: list[tuple[Path, str]] = []
+
+    def compose(self) -> ComposeResult:
+        """Compose screen widgets."""
+        yield Header()
+        yield Vertical(
+            Label("Transcription Progress"),
+            RichLog(id="progress_log", markup=True),
+            Container(
+                Button("View Results", id="view_results", variant="primary"),
+                Button("Exit", id="exit_btn", variant="error"),
+                id="progress_controls",
+            ),
+        )
+        yield Footer()
+
+    def on_mount(self) -> None:
+        """Called when screen is mounted."""
+        self.app.call_later(self._run_transcription)
+
+    def _run_transcription(self) -> None:
+        """Run transcription on all files."""
+        log = self.query_one("#progress_log", RichLog)
+
+        # Setup transcription service
+        try:
+            if self.method == "openai":
+                api_key = self.config.get_api_key()
+                if not api_key:
+                    log.write("[red]Error: No API key configured[/red]")
+                    return
+                self.service.set_openai_backend(api_key)
+                log.write("[green]Using OpenAI Whisper API[/green]")
+            else:
+                model_size = self.config.get_whisper_model()
+                self.service.set_local_backend(model_size)
+                log.write(f"[green]Using Local Whisper ({model_size})[/green]")
+        except Exception as e:
+            log.write(f"[red]Error initializing backend: {str(e)}[/red]")
+            return
+
+        output_dir = self.config.get_output_directory()
+        if not output_dir:
+            log.write("[red]Error: Output directory not configured[/red]")
+            return
+
+        # Process each file
+        for i, file_path in enumerate(self.files, 1):
+            log.write(f"\n[yellow]Processing {i}/{len(self.files)}: {file_path.name}[/yellow]")
+
+            try:
+                # Transcribe
+                result = self.service.transcribe(file_path)
+                log.write(f"[green]✓ Transcribed[/green]")
+
+                # Format as markdown
+                markdown = MarkdownFormatter.format_transcription(
+                    result["text"],
+                    file_path,
+                    result.get("duration", 0.0),
+                    result.get("language", "en"),
+                )
+
+                # Save to file
+                output_filename = MarkdownFormatter.get_output_filename(file_path)
+                output_path = FileHandler.get_output_path(file_path, output_dir, output_filename)
+                output_path.write_text(markdown, encoding="utf-8")
+                log.write(f"[green]✓ Saved to {output_path.name}[/green]")
+
+                self.results.append((file_path, str(output_path)))
+            except Exception as e:
+                log.write(f"[red]✗ Error: {str(e)}[/red]")
+
+        log.write(f"\n[cyan]Completed {len(self.results)}/{len(self.files)} files[/cyan]")
+
+    def on_button_pressed(self, event: Button.Pressed) -> None:
+        """Handle button press."""
+        if event.button.id == "view_results":
+            self.app.push_screen(ResultsScreen(self.results))
+        elif event.button.id == "exit_btn":
+            self.app.exit()
+
+
+class ResultsScreen(Screen):
+    """Screen displaying transcription results."""
+
+    def __init__(self, results: list[tuple[Path, str]]):
+        """Initialize results screen."""
+        super().__init__()
+        self.results = results
+
+    def compose(self) -> ComposeResult:
+        """Compose screen widgets."""
+        yield Header()
+        yield Vertical(
+            Label("Transcription Results"),
+            RichLog(id="results_log", markup=True),
+            Container(
+                Button("Back", id="back_btn"),
+                Button("Exit", id="exit_btn", variant="error"),
+            ),
+        )
+        yield Footer()
+
+    def on_mount(self) -> None:
+        """Called when screen is mounted."""
+        log = self.query_one("#results_log", RichLog)
+        log.write("[cyan]Transcription Results[/cyan]\n")
+        for source, output in self.results:
+            log.write(f"[green]✓[/green] {source.name}")
+            log.write(f"  → {Path(output).name}\n")
+
+    def on_button_pressed(self, event: Button.Pressed) -> None:
+        """Handle button press."""
+        if event.button.id == "back_btn":
+            self.app.pop_screen()
+        elif event.button.id == "exit_btn":
+            self.app.exit()
+
+
+class TranscriptionApp(App):
+    """Main transcription application."""
+
+    CSS = """
+    Screen {
+        layout: vertical;
+    }
+
+    #api_key_modal {
+        width: 60;
+        height: 12;
+        border: solid green;
+    }
+
+    #method_modal {
+        width: 50;
+        height: 10;
+        border: solid blue;
+    }
+
+    #output_dir_modal {
+        width: 80;
+        height: 20;
+        border: solid purple;
+    }
+
+    #file_select_container {
+        width: 100%;
+        height: 100%;
+    }
+
+    DirectoryTree {
+        width: 1fr;
+        height: 1fr;
+    }
+
+    #file_info {
+        width: 100%;
+        height: auto;
+        border: solid $accent;
+        padding: 1;
+    }
+
+    #progress_log {
+        width: 100%;
+        height: 1fr;
+        border: solid $accent;
+        padding: 1;
+    }
+
+    Container {
+        height: auto;
+        margin: 1;
+    }
+
+    Button {
+        margin-right: 1;
+    }
+
+    Label {
+        margin-bottom: 1;
+    }
+    """
+
+    def __init__(self):
+        """Initialize application."""
+        super().__init__()
+        self.config = ConfigManager()
+
+    def on_mount(self) -> None:
+        """Called when app is mounted."""
+        self.title = "Whisper Transcription TUI"
+        self._check_setup()
+
+    def _check_setup(self) -> None:
+        """Check if setup is needed."""
+        if not self.config.output_directory_configured():
+            self.push_screen_wait(OutputDirModal(self.config), self._output_dir_set)
+        else:
+            self.push_screen(FileSelectScreen(self.config))
+
+    def _output_dir_set(self) -> None:
+        """Called when output directory is set."""
+        self.push_screen(FileSelectScreen(self.config))
+
+    def on_file_select_screen_file_selected(self, message: FileSelectScreen.FileSelected) -> None:
+        """Handle file selection."""
+        self.push_screen_wait(MethodSelectModal(), self._method_selected(message.files))
+
+    def _method_selected(self, files: list[Path]):
+        """Return handler for method selection."""
+        def handler(modal: MethodSelectModal) -> None:
+            if modal.selected_method:
+                if modal.selected_method == "openai":
+                    if not self.config.api_key_configured():
+                        self.push_screen_wait(ApiKeyModal(self.config), lambda: self._start_transcription(files, "openai"))
+                    else:
+                        self._start_transcription(files, "openai")
+                else:
+                    self._start_transcription(files, "local")
+        return handler
+
+    def _start_transcription(self, files: list[Path], method: str) -> None:
+        """Start transcription process."""
+        self.push_screen(ProgressScreen(files, method, self.config))
--- a/src/config.py
+++ b/src/config.py
@ -0,0 +1,93 @@
+"""Configuration management for API keys and output directory."""
+import json
+import os
+from pathlib import Path
+from typing import Optional
+
+from dotenv import load_dotenv, set_key
+
+
+class ConfigManager:
+    """Manages configuration including API keys and output directory preferences."""
+
+    def __init__(self):
+        """Initialize config manager and load existing configuration."""
+        self.env_path = Path(".env")
+        self.config_path = Path("config.json")
+        self.config_data: dict = {}
+        load_dotenv()
+        self._load_config()
+
+    def _load_config(self) -> None:
+        """Load configuration from config.json if it exists."""
+        if self.config_path.exists():
+            try:
+                with open(self.config_path, "r") as f:
+                    self.config_data = json.load(f)
+            except (json.JSONDecodeError, IOError):
+                self.config_data = {}
+
+    def _save_config(self) -> None:
+        """Save configuration to config.json."""
+        with open(self.config_path, "w") as f:
+            json.dump(self.config_data, f, indent=2)
+
+    def get_api_key(self) -> Optional[str]:
+        """
+        Get OpenAI API key from environment.
+
+        Returns:
+            The API key if set, None otherwise.
+        """
+        return os.getenv("OPENAI_API_KEY")
+
+    def set_api_key(self, api_key: str) -> None:
+        """
+        Save API key to .env file.
+
+        Args:
+            api_key: The API key to save.
+        """
+        if not self.env_path.exists():
+            self.env_path.write_text("# OpenAI Configuration\n")
+        set_key(str(self.env_path), "OPENAI_API_KEY", api_key)
+        os.environ["OPENAI_API_KEY"] = api_key
+
+    def get_output_directory(self) -> Optional[Path]:
+        """
+        Get the configured output directory.
+
+        Returns:
+            Path to output directory if configured, None otherwise.
+        """
+        output_dir = self.config_data.get("output_directory")
+        if output_dir:
+            return Path(output_dir)
+        return None
+
+    def set_output_directory(self, directory: Path) -> None:
+        """
+        Save output directory preference to config.
+
+        Args:
+            directory: The output directory path.
+        """
+        self.config_data["output_directory"] = str(directory.resolve())
+        self._save_config()
+
+    def get_whisper_model(self) -> str:
+        """
+        Get the Whisper model size from environment.
+
+        Returns:
+            Model size (default: "base").
+        """
+        return os.getenv("WHISPER_MODEL", "base")
+
+    def api_key_configured(self) -> bool:
+        """Check if API key is configured."""
+        return bool(self.get_api_key())
+
+    def output_directory_configured(self) -> bool:
+        """Check if output directory is configured."""
+        return bool(self.get_output_directory())
--- a/src/file_handler.py
+++ b/src/file_handler.py
@ -0,0 +1,121 @@
+"""File handling and validation for audio/video files."""
+from pathlib import Path
+from typing import List
+
+SUPPORTED_AUDIO_FORMATS = {
+    ".mp3", ".wav", ".m4a", ".flac", ".ogg", ".wma", ".aac",
+}
+
+SUPPORTED_VIDEO_FORMATS = {
+    ".mp4", ".avi", ".mkv", ".mov", ".webm", ".flv", ".wmv", ".m4v",
+}
+
+SUPPORTED_FORMATS = SUPPORTED_AUDIO_FORMATS | SUPPORTED_VIDEO_FORMATS
+
+
+class FileHandler:
+    """Handles file operations for transcription."""
+
+    @staticmethod
+    def is_supported_file(file_path: Path) -> bool:
+        """
+        Check if file is a supported audio/video format.
+
+        Args:
+            file_path: Path to file.
+
+        Returns:
+            True if file is supported, False otherwise.
+        """
+        return file_path.suffix.lower() in SUPPORTED_FORMATS
+
+    @staticmethod
+    def validate_file(file_path: Path) -> tuple[bool, str]:
+        """
+        Validate that file exists and is supported.
+
+        Args:
+            file_path: Path to file.
+
+        Returns:
+            Tuple of (is_valid, error_message).
+        """
+        if not file_path.exists():
+            return False, f"File not found: {file_path}"
+
+        if not file_path.is_file():
+            return False, f"Path is not a file: {file_path}"
+
+        if not FileHandler.is_supported_file(file_path):
+            return False, f"Unsupported format: {file_path.suffix}"
+
+        return True, ""
+
+    @staticmethod
+    def validate_directory(directory: Path) -> tuple[bool, str]:
+        """
+        Validate that directory exists and is writable.
+
+        Args:
+            directory: Path to directory.
+
+        Returns:
+            Tuple of (is_valid, error_message).
+        """
+        try:
+            directory.mkdir(parents=True, exist_ok=True)
+            test_file = directory / ".write_test"
+            test_file.write_text("test")
+            test_file.unlink()
+            return True, ""
+        except Exception as e:
+            return False, f"Directory not writable: {str(e)}"
+
+    @staticmethod
+    def get_output_path(file_path: Path, output_dir: Path, filename: str) -> Path:
+        """
+        Get the output path for a transcription.
+
+        Args:
+            file_path: Source audio/video file.
+            output_dir: Output directory.
+            filename: Output filename.
+
+        Returns:
+            Full output path.
+        """
+        output_path = output_dir / filename
+        counter = 1
+        base_stem = output_path.stem
+        suffix = output_path.suffix
+
+        while output_path.exists():
+            output_path = output_dir / f"{base_stem}_{counter}{suffix}"
+            counter += 1
+
+        return output_path
+
+    @staticmethod
+    def get_supported_formats_display() -> str:
+        """
+        Get display string of supported formats.
+
+        Returns:
+            Formatted string listing supported formats.
+        """
+        audio = ", ".join(sorted(SUPPORTED_AUDIO_FORMATS))
+        video = ", ".join(sorted(SUPPORTED_VIDEO_FORMATS))
+        return f"Audio: {audio}\nVideo: {video}"
+
+    @staticmethod
+    def filter_supported_files(paths: List[Path]) -> List[Path]:
+        """
+        Filter a list of paths to only supported files.
+
+        Args:
+            paths: List of file paths.
+
+        Returns:
+            Filtered list of supported files.
+        """
+        return [p for p in paths if p.is_file() and FileHandler.is_supported_file(p)]
--- a/src/formatter.py
+++ b/src/formatter.py
@ -0,0 +1,79 @@
+"""Markdown formatting for transcriptions."""
+from datetime import datetime
+from pathlib import Path
+
+
+class MarkdownFormatter:
+    """Formats transcriptions as markdown with metadata headers."""
+
+    @staticmethod
+    def format_transcription(
+        text: str,
+        source_file: Path,
+        duration: float = 0.0,
+        language: str = "en",
+    ) -> str:
+        """
+        Format transcription as markdown with metadata.
+
+        Args:
+            text: The transcription text.
+            source_file: Path to the source audio/video file.
+            duration: Duration of the audio in seconds.
+            language: Language code of the transcription.
+
+        Returns:
+            Formatted markdown string.
+        """
+        timestamp = datetime.now().isoformat()
+        word_count = len(text.split())
+
+        markdown = f"""# Transcription: {source_file.name}
+
+**Source File:** {source_file.name}
+**Date:** {timestamp}
+**Duration:** {MarkdownFormatter._format_duration(duration)}
+**Language:** {language}
+**Word Count:** {word_count}
+
+---
+
+{text}
+
+"""
+        return markdown
+
+    @staticmethod
+    def _format_duration(seconds: float) -> str:
+        """
+        Format duration in seconds to human-readable format.
+
+        Args:
+            seconds: Duration in seconds.
+
+        Returns:
+            Formatted duration string (e.g., "1:23:45").
+        """
+        if seconds <= 0:
+            return "Unknown"
+
+        hours = int(seconds // 3600)
+        minutes = int((seconds % 3600) // 60)
+        secs = int(seconds % 60)
+
+        if hours > 0:
+            return f"{hours}:{minutes:02d}:{secs:02d}"
+        return f"{minutes}:{secs:02d}"
+
+    @staticmethod
+    def get_output_filename(source_file: Path) -> str:
+        """
+        Generate markdown filename from source file.
+
+        Args:
+            source_file: Path to the source audio/video file.
+
+        Returns:
+            Filename with .md extension.
+        """
+        return f"{source_file.stem}_transcription.md"
--- a/src/transcriber.py
+++ b/src/transcriber.py
@ -0,0 +1,141 @@
+"""Transcription service supporting both OpenAI API and local Whisper."""
+from abc import ABC, abstractmethod
+from pathlib import Path
+from typing import Optional
+
+import whisper
+from openai import OpenAI
+
+
+class TranscriberBackend(ABC):
+    """Abstract base class for transcription backends."""
+
+    @abstractmethod
+    def transcribe(self, file_path: Path) -> dict:
+        """
+        Transcribe audio file.
+
+        Args:
+            file_path: Path to audio/video file.
+
+        Returns:
+            Dictionary with 'text' and optional 'language' and 'duration' keys.
+        """
+        pass
+
+
+class OpenAITranscriber(TranscriberBackend):
+    """OpenAI Whisper API transcriber."""
+
+    def __init__(self, api_key: str):
+        """
+        Initialize OpenAI transcriber.
+
+        Args:
+            api_key: OpenAI API key.
+        """
+        self.client = OpenAI(api_key=api_key)
+
+    def transcribe(self, file_path: Path) -> dict:
+        """
+        Transcribe using OpenAI API.
+
+        Args:
+            file_path: Path to audio/video file.
+
+        Returns:
+            Dictionary with transcription result.
+        """
+        with open(file_path, "rb") as audio_file:
+            transcript = self.client.audio.transcriptions.create(
+                model="whisper-1",
+                file=audio_file,
+                response_format="verbose_json",
+                language="en",
+            )
+
+        return {
+            "text": transcript.text,
+            "language": getattr(transcript, "language", "en"),
+            "duration": getattr(transcript, "duration", 0.0),
+        }
+
+
+class LocalWhisperTranscriber(TranscriberBackend):
+    """Local Whisper model transcriber."""
+
+    def __init__(self, model_size: str = "base"):
+        """
+        Initialize local Whisper transcriber.
+
+        Args:
+            model_size: Size of the Whisper model (tiny, base, small, medium, large).
+        """
+        self.model_size = model_size
+        self.model = whisper.load_model(model_size)
+
+    def transcribe(self, file_path: Path) -> dict:
+        """
+        Transcribe using local Whisper model.
+
+        Args:
+            file_path: Path to audio/video file.
+
+        Returns:
+            Dictionary with transcription result.
+        """
+        result = self.model.transcribe(
+            str(file_path),
+            language="en",
+            fp16=False,
+        )
+
+        return {
+            "text": result["text"],
+            "language": result.get("language", "en"),
+            "duration": result.get("duration", 0.0),
+        }
+
+
+class TranscriptionService:
+    """Main transcription service that manages both backends."""
+
+    def __init__(self):
+        """Initialize transcription service."""
+        self.transcriber: Optional[TranscriberBackend] = None
+
+    def set_openai_backend(self, api_key: str) -> None:
+        """
+        Set OpenAI as the transcription backend.
+
+        Args:
+            api_key: OpenAI API key.
+        """
+        self.transcriber = OpenAITranscriber(api_key)
+
+    def set_local_backend(self, model_size: str = "base") -> None:
+        """
+        Set local Whisper as the transcription backend.
+
+        Args:
+            model_size: Size of the Whisper model.
+        """
+        self.transcriber = LocalWhisperTranscriber(model_size)
+
+    def transcribe(self, file_path: Path) -> dict:
+        """
+        Transcribe audio file using configured backend.
+
+        Args:
+            file_path: Path to audio/video file.
+
+        Returns:
+            Dictionary with transcription result.
+
+        Raises:
+            RuntimeError: If no backend is configured.
+        """
+        if not self.transcriber:
+            raise RuntimeError("No transcription backend configured")
+
+        return self.transcriber.transcribe(file_path)