transcription-bot/README.md

# Whisper Transcription TUI

A modern terminal user interface (TUI) application for transcribing audio and video files using OpenAI's Whisper, with support for both cloud-based API and local processing.

## Features

- **Dual Transcription Methods**:
  - OpenAI Whisper API (fast, cloud-based, paid per minute)
  - Local Whisper (free, offline, slower)
- **Flexible File Selection**: Browse and select single or multiple audio/video files
- **Persistent Configuration**: Remember API keys and output directory between sessions
- **Beautiful TUI**: Modern terminal interface similar to OpenCode
- **Markdown Output**: Transcriptions saved as markdown with metadata headers
- **Multi-file Processing**: Batch transcribe multiple files sequentially

## Supported Formats

### Audio
MP3, WAV, M4A, FLAC, OGG, WMA, AAC

### Video
MP4, AVI, MKV, MOV, WebM, FLV, WMV, M4V

## Installation

### Prerequisites

- Python 3.8 or higher
- FFmpeg (for audio/video processing)

### FFmpeg Installation

#### Windows
```bash
# Using Chocolatey
choco install ffmpeg

# Or using Scoop
scoop install ffmpeg

# Or download from https://ffmpeg.org/download.html
```

#### macOS
```bash
brew install ffmpeg
```

#### Linux
```bash
# Ubuntu/Debian
sudo apt-get install ffmpeg

# Fedora
sudo dnf install ffmpeg

# Arch
sudo pacman -S ffmpeg
```

### Python Package Installation

1. Clone the repository:
```bash
git clone <repository-url>
cd 00_Whisper
```

2. Create and activate a virtual environment:
```bash
# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

## Configuration

### OpenAI API Key Setup

1. Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
2. The first time you select the "OpenAI Whisper API" method, you'll be prompted to enter your API key
3. The key will be saved to `.env` file and reused in future sessions

⚠️ **Important**: Never commit your `.env` file to version control. It's already in `.gitignore`.

### Local Whisper Model

You can configure the local Whisper model size by editing your `.env` file:

```bash
# Options: tiny, base, small, medium, large
# Default: base
WHISPER_MODEL=base
```

Larger models are more accurate but require more memory and time.

## Usage

### Running the Application

```bash
python main.py
```

### Basic Workflow

1. **Select Output Directory** (first time only)
   - Navigate using arrow keys
   - Press "Select" to choose a directory for transcription outputs

2. **Select Files**
   - Navigate the directory tree to find your audio/video files
   - Click "Add File" to add files to the queue
   - Add multiple files or just one
   - Click "Continue" when done

3. **Choose Transcription Method**
   - Select between OpenAI API (fast, paid) or Local Whisper (free, slower)
   - If using OpenAI and no key is configured, you'll be prompted to enter it

4. **View Progress**
   - Monitor transcription progress in real-time
   - Each file is processed sequentially

5. **Review Results**
   - See list of successfully transcribed files
   - Output files are saved as markdown in your configured directory

### Output Format

Each transcription is saved as a markdown file with the format: `{filename}_transcription.md`

```markdown
# Transcription: meeting.mp4

**Source File:** meeting.mp4
**Date:** 2025-01-20T10:30:45.123456
**Duration:** 1:23:45
**Language:** en
**Word Count:** 2847

---

[Transcribed text here...]
```

## Keyboard Shortcuts

- **Tab**: Navigate between buttons and widgets
- **Enter**: Activate selected button
- **Arrow Keys**: Navigate menus and trees
- **Ctrl+C**: Exit application at any time

## Troubleshooting

### "API key not found" error
Make sure you've entered your OpenAI API key when prompted. Check that your `.env` file contains:
```
OPENAI_API_KEY=your_key_here
```

### "File format not supported" error
Check that your file extension is in the supported formats list. The app needs proper file extensions to recognize formats.

### Local Whisper is very slow
This is normal for larger model sizes. Try using the "tiny" or "base" model in your `.env`:
```
WHISPER_MODEL=tiny
```

### FFmpeg not found error
Ensure FFmpeg is installed and in your system PATH. Test by running:
```bash
ffmpeg -version
```

### Permission denied errors
Ensure the output directory is writable and the application has permission to create files there.

## Project Structure

```
.
├── main.py                    # Entry point
├── requirements.txt           # Python dependencies
├── .env.example              # Environment template
├── .gitignore                # Git ignore rules
├── README.md                 # This file
└── src/
    ├── __init__.py
    ├── app.py               # Main TUI application
    ├── config.py            # Configuration management
    ├── transcriber.py       # Transcription service
    ├── formatter.py         # Markdown formatting
    └── file_handler.py      # File handling utilities
```

## Development

### Running Tests
```bash
# Coming soon
```

### Building Documentation
```bash
# Coming soon
```

## Performance Tips

1. **Use Local Whisper for Small Files**: Local processing is free and fast for small audio files
2. **Use OpenAI API for Accuracy**: The cloud version handles edge cases better
3. **Set Appropriate Model Size**: Larger models are more accurate but slower and use more memory
4. **Batch Processing**: Process multiple files to amortize setup time

## API Costs

### OpenAI Whisper API
- Current pricing: $0.02 per minute of audio
- Minimum charge per request

Check [OpenAI Pricing](https://openai.com/pricing/) for current rates.

## Limitations

- Local Whisper models can be large (up to 3GB for "large")
- First-time local Whisper setup downloads the model (~1-2GB for "base")
- OpenAI API requires internet connection and active API credits
- Batch processing is sequential (one file at a time)

## Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

## License

This project is provided as-is for educational and personal use.

## Support

For issues, questions, or feature requests, please open an issue on GitHub or contact the maintainers.

## Acknowledgments

- [OpenAI Whisper](https://github.com/openai/whisper) for transcription technology
- [Textual](https://github.com/textualize/textual) for the TUI framework
- [OpenCode](https://opencode.ai) for UI/UX inspiration