obsidian-qdrant/README.md
Nicholai 38889c1d65 Initial implementation of Qdrant Semantic Search plugin
- Complete plugin architecture with modular design
- Qdrant client with HTTP integration using requestUrl
- Ollama and OpenAI embedding providers with batching
- Hybrid chunking (semantic + size-based fallback)
- Content extractors for markdown, code, PDFs, and images
- Real-time indexing with file watcher and queue
- Search modal with keyboard navigation
- Comprehensive settings UI with connection testing
- Graph visualization framework (basic implementation)
- Full TypeScript types and error handling
- Desktop-only plugin with status bar integration
- Complete documentation and setup guide

Features implemented:
 Semantic search with vector embeddings
 Multiple embedding providers (Ollama/OpenAI)
 Rich content extraction (markdown, code, PDFs, images)
 Smart chunking with heading-based splits
 Real-time file indexing with progress tracking
 Standalone search interface
 Comprehensive settings and configuration
 Graph view foundation for document relationships
 Full error handling and logging
 Complete documentation and troubleshooting guide

Ready for testing with Qdrant instance and embedding provider setup.
2025-10-23 08:48:29 -06:00

226 lines
7.1 KiB
Markdown

# Qdrant Semantic Search for Obsidian
A powerful Obsidian plugin that indexes your entire vault into Qdrant for semantic search, using Ollama or OpenAI for text embeddings, with support for PDF and image text extraction via Text Extractor plugin.
## Features
- **Semantic Search**: Find content by meaning, not just keywords
- **Multiple Embedding Providers**:
- Ollama (local, free) - default
- OpenAI (cloud, paid)
- **Rich Content Support**:
- Markdown files with frontmatter parsing
- Code files with syntax highlighting
- PDFs via Text Extractor plugin
- Images with OCR via Text Extractor plugin
- **Hybrid Chunking**: Smart text splitting on headings with size-based fallback
- **Real-time Indexing**: Automatic indexing of file changes
- **Graph Visualization**: View document relationships (planned)
- **Comprehensive Settings**: Full control over indexing and search behavior
## Installation
### Prerequisites
1. **Qdrant**: You need a Qdrant instance running
- Local: `docker run -p 6333:6333 qdrant/qdrant`
- Cloud: Sign up at [Qdrant Cloud](https://cloud.qdrant.io/)
2. **Ollama** (recommended for local embeddings):
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull an embedding model
ollama pull nomic-embed-text
```
3. **Text Extractor Plugin** (optional, for PDF/image support):
- Install from Community Plugins
- Enables PDF text extraction and image OCR
### Plugin Installation
1. Download the latest release from GitHub
2. Extract `main.js`, `manifest.json`, and `styles.css` to your vault's `.obsidian/plugins/obsidian-qdrant/` folder
3. Enable the plugin in **Settings → Community plugins**
## Configuration
### Basic Setup
1. Open **Settings → Community plugins → Qdrant Semantic Search**
2. Configure your Qdrant connection:
- **URL**: `http://localhost:6333` (local) or your Qdrant Cloud URL
- **API Key**: Leave empty for local, add your key for cloud
3. Choose your embedding provider:
- **Ollama**: Set model name (e.g., `nomic-embed-text`)
- **OpenAI**: Add your API key and select model
### Advanced Settings
#### Indexing Configuration
- **Include Patterns**: File types to index (default: `*.md`, `*.txt`, `*.pdf`, `*.png`, `*.jpg`)
- **Exclude Patterns**: File patterns to skip
- **Max File Size**: Skip files larger than this (default: 10MB)
- **Ignored Folders**: Folders to skip (default: `.obsidian`, `.git`, `node_modules`)
#### Chunking Settings
- **Target Tokens**: Ideal chunk size (default: 500)
- **Overlap Tokens**: Overlap between chunks (default: 100)
- **Max Tokens**: Hard limit per chunk (default: 800)
#### Graph Visualization
- **Enable Graph View**: Show document relationships
- **Similarity Threshold**: Minimum similarity for edges (default: 0.7)
- **Max Nodes**: Maximum nodes to display (default: 100)
## Usage
### Commands
- **Semantic search**: Open the search modal
- **Index current file**: Index the currently open file
- **Full reindex vault**: Reindex all files
- **Clear index**: Remove all indexed data
- **Open graph view**: Show document relationships (when implemented)
### Search Interface
1. Use **Ctrl+P** (or **Cmd+P** on Mac) to open Command Palette
2. Type "Semantic search" and press Enter
3. Enter your search query
4. Browse results with keyboard navigation:
- **Arrow keys**: Navigate results
- **Enter**: Open selected result
- **Escape**: Close search
### Status Bar
The plugin shows indexing progress in the status bar:
- **Ready**: System is ready
- **Indexing X%**: Shows progress during full reindex
- **Error**: Click to see error details
## Architecture
### Components
- **Extractors**: Parse different file types (markdown, code, PDFs, images)
- **Chunkers**: Split text into semantic chunks
- **Embedding Providers**: Generate vector embeddings
- **Qdrant Client**: Store and search vectors
- **Indexing Queue**: Manage background indexing
- **Search UI**: Provide search interface
### Data Flow
1. **File Change** → File Watcher → Indexing Queue
2. **Extract** → Chunk → Embed → Store in Qdrant
3. **Search Query** → Embed → Search Qdrant → Display Results
### Collection Schema
Each vault gets a collection named `vault_<sanitized_name>_<model>`. Points contain:
- **Vector**: Embedding from your chosen model
- **Payload**: Rich metadata (path, title, tags, chunk info, etc.)
## Troubleshooting
### Common Issues
#### "Indexing system not ready"
- Check Qdrant connection in settings
- Verify embedding provider configuration
- Check console for error messages
#### "No results found"
- Ensure files are indexed (check status bar)
- Try a full reindex
- Verify your search query isn't too specific
#### "Ollama connection failed"
- Ensure Ollama is running: `ollama serve`
- Check model is installed: `ollama list`
- Verify URL in settings (default: `http://localhost:11434`)
#### "OpenAI connection failed"
- Verify API key is correct
- Check you have credits/quota
- Ensure model name is valid
### Performance Tips
1. **Batch Size**: Increase for faster indexing (if you have memory)
2. **Concurrency**: Higher values for faster processing (but may overwhelm services)
3. **File Filters**: Exclude unnecessary files to speed up indexing
4. **Chunk Size**: Larger chunks = fewer vectors but less precise search
### Debugging
Enable developer console (**Ctrl+Shift+I**) to see detailed logs:
- Indexing progress and errors
- Search query processing
- Qdrant API calls
- Embedding generation
## Development
### Building from Source
```bash
git clone <repository>
cd obsidian-qdrant
npm install
npm run dev # Watch mode
npm run build # Production build
```
### Project Structure
```
src/
├── types.ts # TypeScript interfaces
├── settings.ts # Settings and defaults
├── main.ts # Plugin entry point
├── qdrant/ # Qdrant client and collection management
├── embeddings/ # Embedding providers (Ollama, OpenAI)
├── extractors/ # Content extractors
├── chunking/ # Text chunking logic
├── indexing/ # Indexing orchestration
├── search/ # Search UI components
├── graph/ # Graph visualization
└── ui/ # Settings UI
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## License
MIT License - see LICENSE file for details.
## Acknowledgments
- [Qdrant](https://qdrant.tech/) for the vector database
- [Ollama](https://ollama.ai/) for local embeddings
- [OpenAI](https://openai.com/) for cloud embeddings
- [Text Extractor](https://github.com/scambier/obsidian-text-extractor) for PDF/image support
- [Obsidian](https://obsidian.md/) for the amazing note-taking platform
## Roadmap
- [ ] Graph visualization with D3.js
- [ ] Hybrid search (dense + sparse vectors)
- [ ] More embedding providers (Cohere, Mistral, etc.)
- [ ] Advanced filtering in search
- [ ] Search result ranking improvements
- [ ] Mobile support
- [ ] Plugin API for other plugins
- [ ] Export/import index data
- [ ] Search analytics and insights