obsidian-qdrant/README.md

# Qdrant Semantic Search for Obsidian

A powerful Obsidian plugin that indexes your entire vault into Qdrant for semantic search, using Ollama or OpenAI for text embeddings, with support for PDF and image text extraction via Text Extractor plugin.

## Features

- **Semantic Search**: Find content by meaning, not just keywords
- **Multiple Embedding Providers**:
  - Ollama (local, free) - default
  - OpenAI (cloud, paid)
- **Rich Content Support**:
  - Markdown files with frontmatter parsing
  - Code files with syntax highlighting
  - PDFs via Text Extractor plugin
  - Images with OCR via Text Extractor plugin
- **Hybrid Chunking**: Smart text splitting on headings with size-based fallback
- **Real-time Indexing**: Automatic indexing of file changes
- **Graph Visualization**: View document relationships (planned)
- **Comprehensive Settings**: Full control over indexing and search behavior

## Installation

### Prerequisites

1. **Qdrant**: You need a Qdrant instance running
   - Local: `docker run -p 6333:6333 qdrant/qdrant`
   - Cloud: Sign up at [Qdrant Cloud](https://cloud.qdrant.io/)

2. **Ollama** (recommended for local embeddings):
   ```bash
   # Install Ollama
   curl -fsSL https://ollama.ai/install.sh | sh

   # Pull an embedding model
   ollama pull nomic-embed-text
   ```

3. **Text Extractor Plugin** (optional, for PDF/image support):
   - Install from Community Plugins
   - Enables PDF text extraction and image OCR

### Plugin Installation

1. Download the latest release from GitHub
2. Extract `main.js`, `manifest.json`, and `styles.css` to your vault's `.obsidian/plugins/obsidian-qdrant/` folder
3. Enable the plugin in **Settings → Community plugins**

## Configuration

### Basic Setup

1. Open **Settings → Community plugins → Qdrant Semantic Search**
2. Configure your Qdrant connection:
   - **URL**: `http://localhost:6333` (local) or your Qdrant Cloud URL
   - **API Key**: Leave empty for local, add your key for cloud
3. Choose your embedding provider:
   - **Ollama**: Set model name (e.g., `nomic-embed-text`)
   - **OpenAI**: Add your API key and select model

### Advanced Settings

#### Indexing Configuration
- **Include Patterns**: File types to index (default: `*.md`, `*.txt`, `*.pdf`, `*.png`, `*.jpg`)
- **Exclude Patterns**: File patterns to skip
- **Max File Size**: Skip files larger than this (default: 10MB)
- **Ignored Folders**: Folders to skip (default: `.obsidian`, `.git`, `node_modules`)

#### Chunking Settings
- **Target Tokens**: Ideal chunk size (default: 500)
- **Overlap Tokens**: Overlap between chunks (default: 100)
- **Max Tokens**: Hard limit per chunk (default: 800)

#### Graph Visualization
- **Enable Graph View**: Show document relationships
- **Similarity Threshold**: Minimum similarity for edges (default: 0.7)
- **Max Nodes**: Maximum nodes to display (default: 100)

## Usage

### Commands

- **Semantic search**: Open the search modal
- **Index current file**: Index the currently open file
- **Full reindex vault**: Reindex all files
- **Clear index**: Remove all indexed data
- **Open graph view**: Show document relationships (when implemented)

### Search Interface

1. Use **Ctrl+P** (or **Cmd+P** on Mac) to open Command Palette
2. Type "Semantic search" and press Enter
3. Enter your search query
4. Browse results with keyboard navigation:
   - **Arrow keys**: Navigate results
   - **Enter**: Open selected result
   - **Escape**: Close search

### Status Bar

The plugin shows indexing progress in the status bar:
- **Ready**: System is ready
- **Indexing X%**: Shows progress during full reindex
- **Error**: Click to see error details

## Architecture

### Components

- **Extractors**: Parse different file types (markdown, code, PDFs, images)
- **Chunkers**: Split text into semantic chunks
- **Embedding Providers**: Generate vector embeddings
- **Qdrant Client**: Store and search vectors
- **Indexing Queue**: Manage background indexing
- **Search UI**: Provide search interface

### Data Flow

1. **File Change** → File Watcher → Indexing Queue
2. **Extract** → Chunk → Embed → Store in Qdrant
3. **Search Query** → Embed → Search Qdrant → Display Results

### Collection Schema

Each vault gets a collection named `vault_<sanitized_name>_<model>`. Points contain:
- **Vector**: Embedding from your chosen model
- **Payload**: Rich metadata (path, title, tags, chunk info, etc.)

## Troubleshooting

### Common Issues

#### "Indexing system not ready"
- Check Qdrant connection in settings
- Verify embedding provider configuration
- Check console for error messages

#### "No results found"
- Ensure files are indexed (check status bar)
- Try a full reindex
- Verify your search query isn't too specific

#### "Ollama connection failed"
- Ensure Ollama is running: `ollama serve`
- Check model is installed: `ollama list`
- Verify URL in settings (default: `http://localhost:11434`)

#### "OpenAI connection failed"
- Verify API key is correct
- Check you have credits/quota
- Ensure model name is valid

### Performance Tips

1. **Batch Size**: Increase for faster indexing (if you have memory)
2. **Concurrency**: Higher values for faster processing (but may overwhelm services)
3. **File Filters**: Exclude unnecessary files to speed up indexing
4. **Chunk Size**: Larger chunks = fewer vectors but less precise search

### Debugging

Enable developer console (**Ctrl+Shift+I**) to see detailed logs:
- Indexing progress and errors
- Search query processing
- Qdrant API calls
- Embedding generation

## Development

### Building from Source

```bash
git clone <repository>
cd obsidian-qdrant
npm install
npm run dev    # Watch mode
npm run build  # Production build
```

### Project Structure

```
src/
├── types.ts              # TypeScript interfaces
├── settings.ts           # Settings and defaults
├── main.ts              # Plugin entry point
├── qdrant/              # Qdrant client and collection management
├── embeddings/          # Embedding providers (Ollama, OpenAI)
├── extractors/          # Content extractors
├── chunking/            # Text chunking logic
├── indexing/            # Indexing orchestration
├── search/              # Search UI components
├── graph/               # Graph visualization
└── ui/                  # Settings UI
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## License

MIT License - see LICENSE file for details.

## Acknowledgments

- [Qdrant](https://qdrant.tech/) for the vector database
- [Ollama](https://ollama.ai/) for local embeddings
- [OpenAI](https://openai.com/) for cloud embeddings
- [Text Extractor](https://github.com/scambier/obsidian-text-extractor) for PDF/image support
- [Obsidian](https://obsidian.md/) for the amazing note-taking platform

## Roadmap

- [ ] Graph visualization with D3.js
- [ ] Hybrid search (dense + sparse vectors)
- [ ] More embedding providers (Cohere, Mistral, etc.)
- [ ] Advanced filtering in search
- [ ] Search result ranking improvements
- [ ] Mobile support
- [ ] Plugin API for other plugins
- [ ] Export/import index data
- [ ] Search analytics and insights