Nextcloud + Elasticsearch Discovery File Explorer
A discovery-first file explorer built with Next.js (App Router, TypeScript, Tailwind, shadcn UI) that connects to Nextcloud via WebDAV and indexes metadata/content into Elasticsearch (BM25 baseline, optional semantic hybrid search). Includes upload/download, folder tree browsing, global search, Sentry instrumentation, tags + history, and a Qdrant vector-store UI with UMAP + deck.gl visualization.
TL;DR (Quick Start)
Prereqs:
- Node 18+ and npm
- Docker (for Elasticsearch/Kibana/Tika/Qdrant services)
Setup:
-
Configure environment
- Copy
.env.exampleto.env.localand fill in your values (Nextcloud creds + Elasticsearch endpoint). - A pre-populated
.env.localis included for your provided Nextcloud host (update if needed).
- Copy
-
Provision services (Elasticsearch, Tika, Qdrant)
- See docs/docker/compose.samples.md for ready-to-use Compose samples and TrueNAS Scale notes.
-
Create Elasticsearch index
npm run create:index
-
Ingest Nextcloud into ES (BM25 baseline)
npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts
-
Run the app
npm run devthen open the reported URL (e.g., http://localhost:3000)
Why this exists
Nextcloud is the system of record for files; this app provides a discovery UX (fast search, filters, previews later) by indexing normalized documents into Elasticsearch. Apache Tika can extract plain text for rich BM25 search, and optional OpenAI-compatible embeddings enable semantic hybrid search. Qdrant support enables browsing and visualizing collections of vectors, including UMAP-based projection and interactive selection.
Architecture
- UI: Next.js App Router (TypeScript), Tailwind CSS, shadcn UI
- Data
- Nextcloud WebDAV for browse/upload/download
- Elasticsearch for indexing and search (BM25 baseline)
- Apache Tika server for content extraction (optional but recommended)
- Qdrant (vector database) for collections + embeddings visualization
- OpenAI-compatible embeddings (optional) for dense vectors/hybrid queries
- Observability: Sentry (client/server/edge) with logs and spans
Features
- Browse Nextcloud directories via a lazy-loading sidebar tree
- View files with size/type and actions in a table
- Upload files into the current folder
- Download/proxy via Next.js route
- Global search box (BM25 baseline) with optional “Semantic” toggle for hybrid search (when embeddings are configured)
- Markdown editing (CodeMirror 6) via GET/PUT content route
- Tags management: update tags and view tag history (append-only) for a file
- Optimistic UX: rename/copy/delete with optimistic updates and rollback on error
- Qdrant page: browse collections and points; UMAP + deck.gl scatter visualization with lasso selection and drill-through placeholder
Project Layout (highlights)
- src/lib
- env.ts: zod-validated env loader with flags
- paths.ts: normalization/helpers for DAV paths
- elasticsearch.ts: ES client, ensureIndex, bm25Search, knnSearch, hybridSearch, helpers
- webdav.ts: Nextcloud WebDAV wrapper (list/create/upload/download/stat/move/copy/delete/read/write) with spans
- embeddings.ts: OpenAI-compatible embeddings client
- qdrant.ts: Qdrant client wrapper (list collections, scroll/list points)
- src/app/api
- folders/list, folders/create
- files/list, files/upload, files/download
- files/rename (MOVE), files/copy (COPY), files/delete (DELETE)
- files/content (GET/PUT) for Markdown/text editing
- files/tags (POST update tags), files/tags/history (GET)
- qdrant/collections (GET), qdrant/points (GET)
- search/query (BM25 and optional hybrid)
- UI components/pages
- editor/markdown-editor.tsx
- files/file-row-actions.tsx, files/file-table.tsx, files/tags-dialog.tsx
- qdrant/page.tsx with embedding-scatter.tsx (UMAP + deck.gl)
- app/loading.tsx (skeletons)
- scripts
- create-index.ts: creates ES index + alias
- ingest-nextcloud.ts: crawl Nextcloud → optional Tika → index into ES
- docs
- docs/elasticsearch/mappings.json: canonical baseline mapping
- docs/docker/compose.samples.md: Compose samples & TrueNAS Scale notes
- Sentry init
- instrumentation-client.ts, sentry.server.config.ts, sentry.edge.config.ts
Environment Variables
Populate .env.local (not committed). See .env.example for full list.
Required:
- NEXTCLOUD_BASE_URL: e.g. https://your-nextcloud.example.com
- NEXTCLOUD_USERNAME: WebDAV user
- NEXTCLOUD_APP_PASSWORD: App password generated in Nextcloud (not login password)
- NEXTCLOUD_ROOT_PATH: e.g.
/remote.php/dav/files/admin - ELASTICSEARCH_URL: e.g. https://elastic.fortura.cc (your testing cluster) or http://localhost:9200
- ELASTICSEARCH_INDEX: default
files - ELASTICSEARCH_ALIAS: default
files_current
Optional:
- TIKA_BASE_URL: e.g. http://localhost:9998 (if using Apache Tika for extraction)
- QDRANT_URL, QDRANT_API_KEY: for Qdrant collections/points and visualization
- SENTRY_DSN: if provided, Sentry is enabled
- OPENAI_API_BASE, OPENAI_API_KEY, OPENAI_EMBEDDING_MODEL, EMBEDDING_DIM: Enable embeddings + semantic hybrid
Dependencies and Local Services (Docker)
See docs/docker/compose.samples.md for ready-to-use services:
- Elasticsearch (single-node) + Kibana
- Apache Tika
- Qdrant
Harden for production (auth, TLS, resource limits). Update .env.local to point to your services.
Install & Run
Install dependencies:
npm install
Run development server:
npm run dev
# Next.js will print the URL, for example http://localhost:3000
Build and start:
npm run build
npm start
Initialize Elasticsearch
Create the index and alias (files/files_current):
npm run create:index
# internally: tsx -r dotenv/config -r tsconfig-paths/register scripts/create-index.ts
To recreate:
npm run create:index -- --recreate
Ingest Nextcloud → Elasticsearch
Crawl Nextcloud via WebDAV, optionally extract content with Tika, index into ES:
npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts
Restrict to a subtree:
npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts -- --root=/remote.php/dav/files/admin/SomeFolder
After ingestion, try searching in the UI (BM25).
Qdrant: Collections and Embeddings Visualization
- Navigate to
/qdrantto browse collections and points. - Toggle “Vectors: on” to retrieve vectors and render the UMAP + deck.gl scatter.
- Use click to open a point (drill-through placeholder) and lasso select to select a set of points.
- Configure:
QDRANT_URL(e.g., https://vectors.biohazardvfx.com)QDRANT_API_KEY(if required by the service)
Tags & History
- Open the “Tags” action on a file to edit tags (comma-separated).
- Tag changes are persisted to ES (tags field) and appended as events in the
files_eventsindex. - Use “History” in the dialog to see the latest tag updates for the file.
Optional: Semantic Hybrid Search
To enable, configure embeddings:
- OPENAI_API_BASE (e.g., https://api.openai.com/v1 or your compatible endpoint)
- OPENAI_API_KEY
- OPENAI_EMBEDDING_MODEL (e.g., text-embedding-3-large)
- EMBEDDING_DIM (e.g., 1536)
Current code path supports:
- Hybrid search in API when
semantic: truein requests; UI has a “Semantic” toggle. - A future backfill script can be added to compute vectors for existing docs (planned).
Sentry
- Files:
- instrumentation-client.ts (client init)
- sentry.server.config.ts (node runtime)
- sentry.edge.config.ts (edge runtime)
- Enable by setting SENTRY_DSN in
.env.local. - Logging enabled (
consoleLoggingIntegration), capturing console.log/warn/error. - Spans instrument WebDAV and ES operations with meaningful op/name and attributes.
API Endpoints
- GET
/api/folders/list?path=/abs/path→ { folders[], files[] } - POST
/api/folders/createbody:{ path: "/abs/path" }→{ ok: true } - GET
/api/files/list?path=/abs/path&page=1&perPage=50→{ total, page, perPage, items[] } - POST
/api/files/upload(multipart) form-data:file,destPath - GET
/api/files/download?path=/abs/path→ stream download - POST
/api/files/rename→{ from, to } - POST
/api/files/copy→{ from, to } - POST
/api/files/delete→{ path } - GET/PUT
/api/files/content→{ path }and{ path, content, mimeType? } - POST
/api/files/tags→{ path, tags[] } - GET
/api/files/tags/history?path=...→ history events - GET
/api/qdrant/collections→ Qdrant collections - GET
/api/qdrant/points?collection=...→ Qdrant points (with/without vectors) - POST
/api/search/query→ body:{ q, filters?, sort?, page?, perPage?, semantic? }
UI Usage
- Navigate folders via the left sidebar. Root defaults to
NEXTCLOUD_ROOT_PATH. - Breadcrumbs show the current path; click to navigate.
- Global search queries ES (BM25). Toggle “Semantic” to blend vector similarity (when enabled).
- Use “Upload” to send a file to the current folder.
- Use file actions (row menu) to Edit content, Download, Rename/Move, Copy, Delete, and Tags.
- Open
/qdrantfor vector collections; toggle vectors to visualize embeddings.
Testing
- Unit tests (Vitest):
npm run testornpm run test:unit- Example: tests/unit/tags-dialog.test.ts
- E2E (Playwright):
npm run test:e2e(requiresnpx playwright installfirst)
- Integration tests: planned for API route handlers with mocks.
Known Caveats / TODO
- Breadcrumb segments may include technical path prefixes if navigating above the user root (optional improvement).
- Embeddings backfill script & “Find similar” API/UI are planned.
- deck.gl typing simplifications used for portability; visualization is functional but can be enhanced with richer interactions.
- Tests (integration/E2E) and additional UX polish can be expanded.
Development Notes
- Code style: TypeScript, ESLint (Next config), Tailwind v4.
- shadcn components added (Slate), most primitives included.
- ES Types via
@elastic/elasticsearchestypes. - Stream interop handled for Node → Web streams in downloads and uploads.
- Zod-based env loader normalizes URLs and validates required vars.
Troubleshooting
- Search 500s:
- Ensure Elasticsearch is running and reachable at
ELASTICSEARCH_URL. - Run
npm run create:indexto create the index and alias. - If using Tika for content, ensure
TIKA_BASE_URLis set and service is healthy (optional).
- Ensure Elasticsearch is running and reachable at
- WebDAV failures:
- Verify
NEXTCLOUD_BASE_URL,NEXTCLOUD_USERNAME,NEXTCLOUD_APP_PASSWORD, andNEXTCLOUD_ROOT_PATH. - Confirm the user has permission for the target path.
- Verify
- Qdrant:
- Ensure
QDRANT_URLandQDRANT_API_KEY(if required) are set. - Toggle “Vectors: on” to include vectors in point queries for visualization.
- Ensure
- Sentry issues:
- Ensure
SENTRY_DSNis set; check networking/outbound restrictions.
- Ensure
Scripts
npm run create:index: Create/recreate Elasticsearch index and aliasscripts/ingest-nextcloud.ts: Crawl Nextcloud → Tika (optional) → Elasticsearchnpm run test: Run unit tests (Vitest)npm run test:e2e: Run Playwright E2E tests
Security
- Keep
.env.localout of source control (already in .gitignore). - Use Nextcloud App Passwords, not login passwords.
- Harden Elasticsearch/Kibana/Tika/Qdrant for production (auth, TLS, resource limits).
Built per implementation_plan.md with a deterministic order of rollout and Sentry instrumentation aligned to organizational rules.