This commit resolves several critical issues that prevented the plugin from working correctly with Qdrant and adds essential metadata to indexed chunks. **Settings & Configuration:** - Fix settings initialization using deep merge instead of shallow Object.assign - Prevents nested settings from being lost during load - Ensures all default values are properly preserved - Add orchestrator reinitialization when settings are saved - Ensures QdrantClient and embedding providers use updated settings - Fixes issue where plugin used localhost instead of saved HTTPS URL **UUID Generation:** - Fix generateDeterministicUUID() creating invalid UUIDs - Was generating 35-character UUIDs instead of proper 36-character format - Now correctly creates valid UUID v4 format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx - Properly generates segment 5 (12 hex chars) from combined hash data - Fixes segment 4 to start with 8/9/a/b per UUID spec - Resolves Qdrant API rejections: "value X is not a valid point ID" **Chunk Metadata:** - Add chunk_text field to ChunkMetadata type - Stores the actual text content of each chunk in Qdrant payload - Essential for displaying search results and content preview - Add model name to chunk metadata - Populates model field with embedding provider name (e.g., "nomic-embed-text") - Enables tracking which model generated each embedding - Supports future multi-model collections **Debug Logging:** - Add logging for settings loading and URL tracking - Add logging for QdrantClient initialization - Add logging for orchestrator creation with settings **Documentation:** - Add CLAUDE.md with comprehensive architecture documentation - Build commands and development workflow - Core components and data processing pipeline - Important implementation details and debugging guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is an Obsidian plugin that provides semantic search over vault contents by indexing documents into Qdrant vector database using Ollama (local) or OpenAI (cloud) embeddings. It supports markdown, text files, PDFs, and images with OCR through the Text Extractor plugin.
Build & Development Commands
# Install dependencies
npm install
# Development mode (watch mode with hot reload)
npm run dev
# Production build
npm run build
# Type checking (run before committing)
npm run build
Development Workflow
- The plugin uses esbuild for bundling (configured in
esbuild.config.mjs) - Entry point is
main.tswhich bundles allsrc/**/*.tsfiles intomain.js - To test the plugin, symlink or copy
main.js,manifest.json, andstyles.cssto your vault's.obsidian/plugins/obsidian-qdrant/folder - TypeScript strict mode is enabled - all types must be properly defined
Architecture
Core Components (Event-Driven Pipeline)
The plugin follows an event-driven architecture with these key components:
-
IndexingOrchestrator (
src/indexing/orchestrator.ts): Central coordinator- Initializes all subsystems and manages their lifecycle
- Coordinates the indexing pipeline from file changes to vector storage
- Key initialization sequence: load manifest → get embedding dimension → initialize Qdrant collection → start file watching
-
FileWatcher (
src/indexing/fileWatcher.ts): Monitors vault changes- Listens to Obsidian vault events (create, modify, delete)
- Filters files based on settings (include/exclude patterns, ignored folders, max file size)
- Adds qualifying files to IndexingQueue
-
IndexingQueue (
src/indexing/indexQueue.ts): Async job processor- Processes files through the pipeline: Extract → Chunk → Embed → Upsert
- Manages concurrency and batch processing
- Tracks progress and error handling
-
FileManifest (
src/indexing/manifest.ts): Index state tracker- Stores metadata about indexed files (mtime, size, hash, chunk count)
- Determines which files need re-indexing (changed since last index)
- Identifies orphaned files (in index but deleted from vault)
- Persisted to vault's
.obsidian/plugins/obsidian-qdrant/folder
Data Processing Pipeline
Extract → Chunk → Embed → Store
-
Extractors (
src/extractors/): Extract content from different file typesMarkdownExtractor: Parses frontmatter, headings, links, tagsTextExtractor: Handles plain text and code filesTextExtractorPlugin: Integrates with Text Extractor plugin for PDFs/images- Each returns
ExtractedContentwith text and rich metadata
-
Chunker (
src/chunking/chunker.ts): Split content into semantic chunksHybridChunker: Splits on markdown headings first, then by token count- Uses simple word-based tokenizer (GPT-style would require large dependency)
- Maintains overlap between chunks for context continuity
- Each chunk includes metadata: path, title, tags, heading hierarchy, position
-
Embedding Providers (
src/embeddings/): Generate vector embeddings- Factory pattern:
createEmbeddingProvider()returns appropriate provider OllamaEmbeddingProvider: Uses local Ollama server (default: nomic-embed-text)OpenAIEmbeddingProvider: Uses OpenAI API- Batching and concurrency control for efficiency
- Factory pattern:
-
Qdrant Client (
src/qdrant/client.ts): Vector storage operations- Wraps Qdrant REST API using Obsidian's
requestUrl() - Handles collection management (create, ensure, delete)
- Point operations (upsert, search, delete, recommend)
- Uses cosine distance for similarity
- Wraps Qdrant REST API using Obsidian's
-
CollectionManager (
src/qdrant/collection.ts): Collection lifecycle- Creates collections with naming:
{vaultName}_{modelName}(sanitized) - Ensures correct vector dimensions match embedding provider
- Configures sparse vectors for hybrid search (future feature)
- Creates collections with naming:
Search Flow
- User opens SearchModal (
src/search/searchModal.ts) - Query text is embedded using the same provider as indexing
- Query vector is searched against Qdrant collection
- Results rendered by ResultRenderer (
src/search/resultRenderer.ts) - User can click result to open file at specific chunk position
Settings & Configuration
- Settings stored in vault's
.obsidian/plugins/obsidian-qdrant/data.json - Defaults in
src/settings.ts(especially important: DEFAULT_SETTINGS) - Critical: The orchestrator must use
this.settingsloaded from disk, not DEFAULT_SETTINGS - Settings tab (
src/ui/settingsTab.ts) provides UI for all configuration - Validation function ensures settings are complete before initialization
Type System
All types defined in src/types.ts:
PluginSettings: Complete configuration structureChunkMetadata: Rich metadata stored with each vector pointSearchResult,SearchOptions: Search interfaceIndexingProgress: Real-time progress trackingFileManifestEntry: Index state per file- Interface contracts:
EmbeddingProviderInterface,ExtractorInterface, etc.
Important Implementation Details
Qdrant Point IDs
- Must be UUIDs (not sequential integers) for Qdrant compatibility
- Generated using crypto.randomUUID() in the chunker
- Each chunk gets a unique ID that persists across re-indexing
Settings Initialization Timing
- Settings are loaded async in
main.ts:onload() - Orchestrator created after settings load with
new IndexingOrchestrator(this.app, this.settings) - Bug to watch for: Orchestrator components must reference the passed settings, not DEFAULT_SETTINGS
File Change Handling
- Create/Modify: Add to queue with 'update' action
- Delete: Add to queue with 'delete' action (removes points from Qdrant)
- Rename: Treated as delete old + create new
Error Handling
- Connection failures are caught and surfaced via status bar
- File extraction errors are logged but don't stop the queue
- Progress callback and error callback allow UI updates
Text Extractor Plugin Integration
- Optional dependency detected at runtime
- Falls back gracefully if not installed
- Uses Obsidian plugin API to access text-extractor methods
Common Development Patterns
Adding a New Extractor
- Implement
ExtractorInterfaceinsrc/extractors/ - Add to
ExtractorManager.initializeExtractors()priority list - Update
getExtractorStatus()to report the new extractor
Adding a New Embedding Provider
- Extend
BaseEmbeddingProviderinsrc/embeddings/ - Implement
embed(),getDimension(),getName(),testConnection() - Add to
createEmbeddingProvider()factory switch - Add provider enum value to
src/types.ts:EmbeddingProvider - Add settings interface and update
PluginSettings - Add UI controls in
src/ui/settingsTab.ts
Debugging Indexing Issues
- Check console logs: orchestrator logs initialization steps
- Verify settings:
this.settingsshould match what's in data.json - Check manifest: FileManifest tracks what's been indexed
- Test connections: Use settings tab "Test Connection" buttons
- Monitor status bar: Shows progress and errors
Testing & Validation
While there are no automated tests, manual testing workflow:
- Build the plugin:
npm run build - Copy to test vault's plugin folder
- Configure settings (Qdrant URL, embedding provider)
- Test connection buttons in settings
- Index a single file via command palette
- Verify in Qdrant dashboard that points were created
- Run semantic search and verify results
- Test file modifications and deletions
Known Issues & Limitations
- Graph visualization not yet implemented (UI placeholder exists)
- Hybrid search (sparse + dense) configured but not exposed in search UI
- Simple word-based tokenizer doesn't match exact GPT token counts
- No rate limiting on API calls (relies on provider batch/concurrency settings)
- File manifest doesn't handle vault renames (would require full reindex)