Nicholai 38889c1d65 Initial implementation of Qdrant Semantic Search plugin

- Complete plugin architecture with modular design
- Qdrant client with HTTP integration using requestUrl
- Ollama and OpenAI embedding providers with batching
- Hybrid chunking (semantic + size-based fallback)
- Content extractors for markdown, code, PDFs, and images
- Real-time indexing with file watcher and queue
- Search modal with keyboard navigation
- Comprehensive settings UI with connection testing
- Graph visualization framework (basic implementation)
- Full TypeScript types and error handling
- Desktop-only plugin with status bar integration
- Complete documentation and setup guide

Features implemented:
✅ Semantic search with vector embeddings
✅ Multiple embedding providers (Ollama/OpenAI)
✅ Rich content extraction (markdown, code, PDFs, images)
✅ Smart chunking with heading-based splits
✅ Real-time file indexing with progress tracking
✅ Standalone search interface
✅ Comprehensive settings and configuration
✅ Graph view foundation for document relationships
✅ Full error handling and logging
✅ Complete documentation and troubleshooting guide

Ready for testing with Qdrant instance and embedding provider setup.

2025-10-23 08:48:29 -06:00

7.1 KiB

Raw Blame History

Qdrant Semantic Search for Obsidian

A powerful Obsidian plugin that indexes your entire vault into Qdrant for semantic search, using Ollama or OpenAI for text embeddings, with support for PDF and image text extraction via Text Extractor plugin.

Features

Semantic Search: Find content by meaning, not just keywords
Multiple Embedding Providers:
- Ollama (local, free) - default
- OpenAI (cloud, paid)
Rich Content Support:
- Markdown files with frontmatter parsing
- Code files with syntax highlighting
- PDFs via Text Extractor plugin
- Images with OCR via Text Extractor plugin
Hybrid Chunking: Smart text splitting on headings with size-based fallback
Real-time Indexing: Automatic indexing of file changes
Graph Visualization: View document relationships (planned)
Comprehensive Settings: Full control over indexing and search behavior

Installation

Prerequisites

Qdrant: You need a Qdrant instance running
- Local: docker run -p 6333:6333 qdrant/qdrant
- Cloud: Sign up at Qdrant Cloud

Ollama (recommended for local embeddings):

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull an embedding model
ollama pull nomic-embed-text

Text Extractor Plugin (optional, for PDF/image support):
- Install from Community Plugins
- Enables PDF text extraction and image OCR

Plugin Installation

Download the latest release from GitHub
Extract main.js, manifest.json, and styles.css to your vault's .obsidian/plugins/obsidian-qdrant/ folder
Enable the plugin in Settings → Community plugins

Configuration

Basic Setup

Open Settings → Community plugins → Qdrant Semantic Search
Configure your Qdrant connection:
- URL: http://localhost:6333 (local) or your Qdrant Cloud URL
- API Key: Leave empty for local, add your key for cloud
Choose your embedding provider:
- Ollama: Set model name (e.g., nomic-embed-text)
- OpenAI: Add your API key and select model

Advanced Settings

Indexing Configuration

Include Patterns: File types to index (default: *.md, *.txt, *.pdf, *.png, *.jpg)
Exclude Patterns: File patterns to skip
Max File Size: Skip files larger than this (default: 10MB)
Ignored Folders: Folders to skip (default: .obsidian, .git, node_modules)

Chunking Settings

Target Tokens: Ideal chunk size (default: 500)
Overlap Tokens: Overlap between chunks (default: 100)
Max Tokens: Hard limit per chunk (default: 800)

Graph Visualization

Enable Graph View: Show document relationships
Similarity Threshold: Minimum similarity for edges (default: 0.7)
Max Nodes: Maximum nodes to display (default: 100)

Usage

Commands

Semantic search: Open the search modal
Index current file: Index the currently open file
Full reindex vault: Reindex all files
Clear index: Remove all indexed data
Open graph view: Show document relationships (when implemented)

Search Interface

Use Ctrl+P (or Cmd+P on Mac) to open Command Palette
Type "Semantic search" and press Enter
Enter your search query
Browse results with keyboard navigation:
- Arrow keys: Navigate results
- Enter: Open selected result
- Escape: Close search

Status Bar

The plugin shows indexing progress in the status bar:

Ready: System is ready
Indexing X%: Shows progress during full reindex
Error: Click to see error details

Architecture

Components

Extractors: Parse different file types (markdown, code, PDFs, images)
Chunkers: Split text into semantic chunks
Embedding Providers: Generate vector embeddings
Qdrant Client: Store and search vectors
Indexing Queue: Manage background indexing
Search UI: Provide search interface

Data Flow

File Change → File Watcher → Indexing Queue
Extract → Chunk → Embed → Store in Qdrant
Search Query → Embed → Search Qdrant → Display Results

Collection Schema

Each vault gets a collection named vault_<sanitized_name>_<model>. Points contain:

Vector: Embedding from your chosen model
Payload: Rich metadata (path, title, tags, chunk info, etc.)

Troubleshooting

Common Issues

"Indexing system not ready"

Check Qdrant connection in settings
Verify embedding provider configuration
Check console for error messages

"No results found"

Ensure files are indexed (check status bar)
Try a full reindex
Verify your search query isn't too specific

"Ollama connection failed"

Ensure Ollama is running: ollama serve
Check model is installed: ollama list
Verify URL in settings (default: http://localhost:11434)

"OpenAI connection failed"

Verify API key is correct
Check you have credits/quota
Ensure model name is valid

Performance Tips

Batch Size: Increase for faster indexing (if you have memory)
Concurrency: Higher values for faster processing (but may overwhelm services)
File Filters: Exclude unnecessary files to speed up indexing
Chunk Size: Larger chunks = fewer vectors but less precise search

Debugging

Enable developer console (Ctrl+Shift+I) to see detailed logs:

Indexing progress and errors
Search query processing
Qdrant API calls
Embedding generation

Development

Building from Source

git clone <repository>
cd obsidian-qdrant
npm install
npm run dev    # Watch mode
npm run build  # Production build

Project Structure

src/
├── types.ts              # TypeScript interfaces
├── settings.ts           # Settings and defaults
├── main.ts              # Plugin entry point
├── qdrant/              # Qdrant client and collection management
├── embeddings/          # Embedding providers (Ollama, OpenAI)
├── extractors/          # Content extractors
├── chunking/            # Text chunking logic
├── indexing/            # Indexing orchestration
├── search/              # Search UI components
├── graph/               # Graph visualization
└── ui/                  # Settings UI

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Qdrant for the vector database
Ollama for local embeddings
OpenAI for cloud embeddings
Text Extractor for PDF/image support
Obsidian for the amazing note-taking platform

Roadmap

Graph visualization with D3.js
Hybrid search (dense + sparse vectors)
More embedding providers (Cohere, Mistral, etc.)
Advanced filtering in search
Search result ranking improvements
Mobile support
Plugin API for other plugins
Export/import index data
Search analytics and insights

7.1 KiB Raw Blame History