- Complete plugin architecture with modular design - Qdrant client with HTTP integration using requestUrl - Ollama and OpenAI embedding providers with batching - Hybrid chunking (semantic + size-based fallback) - Content extractors for markdown, code, PDFs, and images - Real-time indexing with file watcher and queue - Search modal with keyboard navigation - Comprehensive settings UI with connection testing - Graph visualization framework (basic implementation) - Full TypeScript types and error handling - Desktop-only plugin with status bar integration - Complete documentation and setup guide Features implemented: ✅ Semantic search with vector embeddings ✅ Multiple embedding providers (Ollama/OpenAI) ✅ Rich content extraction (markdown, code, PDFs, images) ✅ Smart chunking with heading-based splits ✅ Real-time file indexing with progress tracking ✅ Standalone search interface ✅ Comprehensive settings and configuration ✅ Graph view foundation for document relationships ✅ Full error handling and logging ✅ Complete documentation and troubleshooting guide Ready for testing with Qdrant instance and embedding provider setup.
7.1 KiB
7.1 KiB
Qdrant Semantic Search for Obsidian
A powerful Obsidian plugin that indexes your entire vault into Qdrant for semantic search, using Ollama or OpenAI for text embeddings, with support for PDF and image text extraction via Text Extractor plugin.
Features
- Semantic Search: Find content by meaning, not just keywords
- Multiple Embedding Providers:
- Ollama (local, free) - default
- OpenAI (cloud, paid)
- Rich Content Support:
- Markdown files with frontmatter parsing
- Code files with syntax highlighting
- PDFs via Text Extractor plugin
- Images with OCR via Text Extractor plugin
- Hybrid Chunking: Smart text splitting on headings with size-based fallback
- Real-time Indexing: Automatic indexing of file changes
- Graph Visualization: View document relationships (planned)
- Comprehensive Settings: Full control over indexing and search behavior
Installation
Prerequisites
-
Qdrant: You need a Qdrant instance running
- Local:
docker run -p 6333:6333 qdrant/qdrant - Cloud: Sign up at Qdrant Cloud
- Local:
-
Ollama (recommended for local embeddings):
# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull an embedding model ollama pull nomic-embed-text -
Text Extractor Plugin (optional, for PDF/image support):
- Install from Community Plugins
- Enables PDF text extraction and image OCR
Plugin Installation
- Download the latest release from GitHub
- Extract
main.js,manifest.json, andstyles.cssto your vault's.obsidian/plugins/obsidian-qdrant/folder - Enable the plugin in Settings → Community plugins
Configuration
Basic Setup
- Open Settings → Community plugins → Qdrant Semantic Search
- Configure your Qdrant connection:
- URL:
http://localhost:6333(local) or your Qdrant Cloud URL - API Key: Leave empty for local, add your key for cloud
- URL:
- Choose your embedding provider:
- Ollama: Set model name (e.g.,
nomic-embed-text) - OpenAI: Add your API key and select model
- Ollama: Set model name (e.g.,
Advanced Settings
Indexing Configuration
- Include Patterns: File types to index (default:
*.md,*.txt,*.pdf,*.png,*.jpg) - Exclude Patterns: File patterns to skip
- Max File Size: Skip files larger than this (default: 10MB)
- Ignored Folders: Folders to skip (default:
.obsidian,.git,node_modules)
Chunking Settings
- Target Tokens: Ideal chunk size (default: 500)
- Overlap Tokens: Overlap between chunks (default: 100)
- Max Tokens: Hard limit per chunk (default: 800)
Graph Visualization
- Enable Graph View: Show document relationships
- Similarity Threshold: Minimum similarity for edges (default: 0.7)
- Max Nodes: Maximum nodes to display (default: 100)
Usage
Commands
- Semantic search: Open the search modal
- Index current file: Index the currently open file
- Full reindex vault: Reindex all files
- Clear index: Remove all indexed data
- Open graph view: Show document relationships (when implemented)
Search Interface
- Use Ctrl+P (or Cmd+P on Mac) to open Command Palette
- Type "Semantic search" and press Enter
- Enter your search query
- Browse results with keyboard navigation:
- Arrow keys: Navigate results
- Enter: Open selected result
- Escape: Close search
Status Bar
The plugin shows indexing progress in the status bar:
- Ready: System is ready
- Indexing X%: Shows progress during full reindex
- Error: Click to see error details
Architecture
Components
- Extractors: Parse different file types (markdown, code, PDFs, images)
- Chunkers: Split text into semantic chunks
- Embedding Providers: Generate vector embeddings
- Qdrant Client: Store and search vectors
- Indexing Queue: Manage background indexing
- Search UI: Provide search interface
Data Flow
- File Change → File Watcher → Indexing Queue
- Extract → Chunk → Embed → Store in Qdrant
- Search Query → Embed → Search Qdrant → Display Results
Collection Schema
Each vault gets a collection named vault_<sanitized_name>_<model>. Points contain:
- Vector: Embedding from your chosen model
- Payload: Rich metadata (path, title, tags, chunk info, etc.)
Troubleshooting
Common Issues
"Indexing system not ready"
- Check Qdrant connection in settings
- Verify embedding provider configuration
- Check console for error messages
"No results found"
- Ensure files are indexed (check status bar)
- Try a full reindex
- Verify your search query isn't too specific
"Ollama connection failed"
- Ensure Ollama is running:
ollama serve - Check model is installed:
ollama list - Verify URL in settings (default:
http://localhost:11434)
"OpenAI connection failed"
- Verify API key is correct
- Check you have credits/quota
- Ensure model name is valid
Performance Tips
- Batch Size: Increase for faster indexing (if you have memory)
- Concurrency: Higher values for faster processing (but may overwhelm services)
- File Filters: Exclude unnecessary files to speed up indexing
- Chunk Size: Larger chunks = fewer vectors but less precise search
Debugging
Enable developer console (Ctrl+Shift+I) to see detailed logs:
- Indexing progress and errors
- Search query processing
- Qdrant API calls
- Embedding generation
Development
Building from Source
git clone <repository>
cd obsidian-qdrant
npm install
npm run dev # Watch mode
npm run build # Production build
Project Structure
src/
├── types.ts # TypeScript interfaces
├── settings.ts # Settings and defaults
├── main.ts # Plugin entry point
├── qdrant/ # Qdrant client and collection management
├── embeddings/ # Embedding providers (Ollama, OpenAI)
├── extractors/ # Content extractors
├── chunking/ # Text chunking logic
├── indexing/ # Indexing orchestration
├── search/ # Search UI components
├── graph/ # Graph visualization
└── ui/ # Settings UI
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
MIT License - see LICENSE file for details.
Acknowledgments
- Qdrant for the vector database
- Ollama for local embeddings
- OpenAI for cloud embeddings
- Text Extractor for PDF/image support
- Obsidian for the amazing note-taking platform
Roadmap
- Graph visualization with D3.js
- Hybrid search (dense + sparse vectors)
- More embedding providers (Cohere, Mistral, etc.)
- Advanced filtering in search
- Search result ranking improvements
- Mobile support
- Plugin API for other plugins
- Export/import index data
- Search analytics and insights