2025-09-16 09:18:08 -06:00
2025-09-13 04:38:09 -06:00

Nextcloud + Elasticsearch Discovery File Explorer

A discovery-first file explorer built with Next.js (App Router, TypeScript, Tailwind, shadcn UI) that connects to Nextcloud via WebDAV and indexes metadata/content into Elasticsearch (BM25 baseline, optional semantic hybrid search). Includes upload/download, folder tree browsing, global search, Sentry instrumentation, tags + history, and a Qdrant vector-store UI with UMAP + deck.gl visualization.

TL;DR (Quick Start)

Prereqs:

  • Node 18+ and npm
  • Docker (for Elasticsearch/Kibana/Tika/Qdrant services)

Setup:

  1. Configure environment

    • Copy .env.example to .env.local and fill in your values (Nextcloud creds + Elasticsearch endpoint).
    • A pre-populated .env.local is included for your provided Nextcloud host (update if needed).
  2. Provision services (Elasticsearch, Tika, Qdrant)

    • See docs/docker/compose.samples.md for ready-to-use Compose samples and TrueNAS Scale notes.
  3. Create Elasticsearch index

    • npm run create:index
  4. Ingest Nextcloud into ES (BM25 baseline)

    • npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts
  5. Run the app

Why this exists

Nextcloud is the system of record for files; this app provides a discovery UX (fast search, filters, previews later) by indexing normalized documents into Elasticsearch. Apache Tika can extract plain text for rich BM25 search, and optional OpenAI-compatible embeddings enable semantic hybrid search. Qdrant support enables browsing and visualizing collections of vectors, including UMAP-based projection and interactive selection.

Architecture

  • UI: Next.js App Router (TypeScript), Tailwind CSS, shadcn UI
  • Data
    • Nextcloud WebDAV for browse/upload/download
    • Elasticsearch for indexing and search (BM25 baseline)
    • Apache Tika server for content extraction (optional but recommended)
    • Qdrant (vector database) for collections + embeddings visualization
    • OpenAI-compatible embeddings (optional) for dense vectors/hybrid queries
  • Observability: Sentry (client/server/edge) with logs and spans

Features

  • Browse Nextcloud directories via a lazy-loading sidebar tree
  • View files with size/type and actions in a table
  • Upload files into the current folder
  • Download/proxy via Next.js route
  • Global search box (BM25 baseline) with optional “Semantic” toggle for hybrid search (when embeddings are configured)
  • Markdown editing (CodeMirror 6) via GET/PUT content route
  • Tags management: update tags and view tag history (append-only) for a file
  • Optimistic UX: rename/copy/delete with optimistic updates and rollback on error
  • Qdrant page: browse collections and points; UMAP + deck.gl scatter visualization with lasso selection and drill-through placeholder

Project Layout (highlights)

  • src/lib
    • env.ts: zod-validated env loader with flags
    • paths.ts: normalization/helpers for DAV paths
    • elasticsearch.ts: ES client, ensureIndex, bm25Search, knnSearch, hybridSearch, helpers
    • webdav.ts: Nextcloud WebDAV wrapper (list/create/upload/download/stat/move/copy/delete/read/write) with spans
    • embeddings.ts: OpenAI-compatible embeddings client
    • qdrant.ts: Qdrant client wrapper (list collections, scroll/list points)
  • src/app/api
    • folders/list, folders/create
    • files/list, files/upload, files/download
    • files/rename (MOVE), files/copy (COPY), files/delete (DELETE)
    • files/content (GET/PUT) for Markdown/text editing
    • files/tags (POST update tags), files/tags/history (GET)
    • qdrant/collections (GET), qdrant/points (GET)
    • search/query (BM25 and optional hybrid)
  • UI components/pages
    • editor/markdown-editor.tsx
    • files/file-row-actions.tsx, files/file-table.tsx, files/tags-dialog.tsx
    • qdrant/page.tsx with embedding-scatter.tsx (UMAP + deck.gl)
    • app/loading.tsx (skeletons)
  • scripts
    • create-index.ts: creates ES index + alias
    • ingest-nextcloud.ts: crawl Nextcloud → optional Tika → index into ES
  • docs
    • docs/elasticsearch/mappings.json: canonical baseline mapping
    • docs/docker/compose.samples.md: Compose samples & TrueNAS Scale notes
  • Sentry init
    • instrumentation-client.ts, sentry.server.config.ts, sentry.edge.config.ts

Environment Variables

Populate .env.local (not committed). See .env.example for full list.

Required:

  • NEXTCLOUD_BASE_URL: e.g. https://your-nextcloud.example.com
  • NEXTCLOUD_USERNAME: WebDAV user
  • NEXTCLOUD_APP_PASSWORD: App password generated in Nextcloud (not login password)
  • NEXTCLOUD_ROOT_PATH: e.g. /remote.php/dav/files/admin
  • ELASTICSEARCH_URL: e.g. https://elastic.fortura.cc (your testing cluster) or http://localhost:9200
  • ELASTICSEARCH_INDEX: default files
  • ELASTICSEARCH_ALIAS: default files_current

Optional:

  • TIKA_BASE_URL: e.g. http://localhost:9998 (if using Apache Tika for extraction)
  • QDRANT_URL, QDRANT_API_KEY: for Qdrant collections/points and visualization
  • SENTRY_DSN: if provided, Sentry is enabled
  • OPENAI_API_BASE, OPENAI_API_KEY, OPENAI_EMBEDDING_MODEL, EMBEDDING_DIM: Enable embeddings + semantic hybrid

Dependencies and Local Services (Docker)

See docs/docker/compose.samples.md for ready-to-use services:

  • Elasticsearch (single-node) + Kibana
  • Apache Tika
  • Qdrant

Harden for production (auth, TLS, resource limits). Update .env.local to point to your services.

Install & Run

Install dependencies:

npm install

Run development server:

npm run dev
# Next.js will print the URL, for example http://localhost:3000

Build and start:

npm run build
npm start

Initialize Elasticsearch

Create the index and alias (files/files_current):

npm run create:index
# internally: tsx -r dotenv/config -r tsconfig-paths/register scripts/create-index.ts

To recreate:

npm run create:index -- --recreate

Ingest Nextcloud → Elasticsearch

Crawl Nextcloud via WebDAV, optionally extract content with Tika, index into ES:

npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts

Restrict to a subtree:

npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts -- --root=/remote.php/dav/files/admin/SomeFolder

After ingestion, try searching in the UI (BM25).

Qdrant: Collections and Embeddings Visualization

  • Navigate to /qdrant to browse collections and points.
  • Toggle “Vectors: on” to retrieve vectors and render the UMAP + deck.gl scatter.
  • Use click to open a point (drill-through placeholder) and lasso select to select a set of points.
  • Configure:

Tags & History

  • Open the “Tags” action on a file to edit tags (comma-separated).
  • Tag changes are persisted to ES (tags field) and appended as events in the files_events index.
  • Use “History” in the dialog to see the latest tag updates for the file.

To enable, configure embeddings:

  • OPENAI_API_BASE (e.g., https://api.openai.com/v1 or your compatible endpoint)
  • OPENAI_API_KEY
  • OPENAI_EMBEDDING_MODEL (e.g., text-embedding-3-large)
  • EMBEDDING_DIM (e.g., 1536)

Current code path supports:

  • Hybrid search in API when semantic: true in requests; UI has a “Semantic” toggle.
  • A future backfill script can be added to compute vectors for existing docs (planned).

Sentry

  • Files:
    • instrumentation-client.ts (client init)
    • sentry.server.config.ts (node runtime)
    • sentry.edge.config.ts (edge runtime)
  • Enable by setting SENTRY_DSN in .env.local.
  • Logging enabled (consoleLoggingIntegration), capturing console.log/warn/error.
  • Spans instrument WebDAV and ES operations with meaningful op/name and attributes.

API Endpoints

  • GET /api/folders/list?path=/abs/path → { folders[], files[] }
  • POST /api/folders/create body: { path: "/abs/path" }{ ok: true }
  • GET /api/files/list?path=/abs/path&page=1&perPage=50{ total, page, perPage, items[] }
  • POST /api/files/upload (multipart) form-data: file, destPath
  • GET /api/files/download?path=/abs/path → stream download
  • POST /api/files/rename{ from, to }
  • POST /api/files/copy{ from, to }
  • POST /api/files/delete{ path }
  • GET/PUT /api/files/content{ path } and { path, content, mimeType? }
  • POST /api/files/tags{ path, tags[] }
  • GET /api/files/tags/history?path=... → history events
  • GET /api/qdrant/collections → Qdrant collections
  • GET /api/qdrant/points?collection=... → Qdrant points (with/without vectors)
  • POST /api/search/query → body: { q, filters?, sort?, page?, perPage?, semantic? }

UI Usage

  • Navigate folders via the left sidebar. Root defaults to NEXTCLOUD_ROOT_PATH.
  • Breadcrumbs show the current path; click to navigate.
  • Global search queries ES (BM25). Toggle “Semantic” to blend vector similarity (when enabled).
  • Use “Upload” to send a file to the current folder.
  • Use file actions (row menu) to Edit content, Download, Rename/Move, Copy, Delete, and Tags.
  • Open /qdrant for vector collections; toggle vectors to visualize embeddings.

Testing

  • Unit tests (Vitest):
    • npm run test or npm run test:unit
    • Example: tests/unit/tags-dialog.test.ts
  • E2E (Playwright):
    • npm run test:e2e (requires npx playwright install first)
  • Integration tests: planned for API route handlers with mocks.

Known Caveats / TODO

  • Breadcrumb segments may include technical path prefixes if navigating above the user root (optional improvement).
  • Embeddings backfill script & “Find similar” API/UI are planned.
  • deck.gl typing simplifications used for portability; visualization is functional but can be enhanced with richer interactions.
  • Tests (integration/E2E) and additional UX polish can be expanded.

Development Notes

  • Code style: TypeScript, ESLint (Next config), Tailwind v4.
  • shadcn components added (Slate), most primitives included.
  • ES Types via @elastic/elasticsearch estypes.
  • Stream interop handled for Node → Web streams in downloads and uploads.
  • Zod-based env loader normalizes URLs and validates required vars.

Troubleshooting

  • Search 500s:
    • Ensure Elasticsearch is running and reachable at ELASTICSEARCH_URL.
    • Run npm run create:index to create the index and alias.
    • If using Tika for content, ensure TIKA_BASE_URL is set and service is healthy (optional).
  • WebDAV failures:
    • Verify NEXTCLOUD_BASE_URL, NEXTCLOUD_USERNAME, NEXTCLOUD_APP_PASSWORD, and NEXTCLOUD_ROOT_PATH.
    • Confirm the user has permission for the target path.
  • Qdrant:
    • Ensure QDRANT_URL and QDRANT_API_KEY (if required) are set.
    • Toggle “Vectors: on” to include vectors in point queries for visualization.
  • Sentry issues:
    • Ensure SENTRY_DSN is set; check networking/outbound restrictions.

Scripts

  • npm run create:index: Create/recreate Elasticsearch index and alias
  • scripts/ingest-nextcloud.ts: Crawl Nextcloud → Tika (optional) → Elasticsearch
  • npm run test: Run unit tests (Vitest)
  • npm run test:e2e: Run Playwright E2E tests

Security

  • Keep .env.local out of source control (already in .gitignore).
  • Use Nextcloud App Passwords, not login passwords.
  • Harden Elasticsearch/Kibana/Tika/Qdrant for production (auth, TLS, resource limits).

Built per implementation_plan.md with a deterministic order of rollout and Sentry instrumentation aligned to organizational rules.

Description
A discovery-first file explorer built with Next.js (App Router, TypeScript, Tailwind, shadcn UI) that connects to Nextcloud via WebDAV and indexes metadata/content into Elasticsearch (BM25 baseline, optional semantic hybrid search). Includes upload/download, folder tree browsing, global search, and Sentry instrumentation.
Readme 1.2 MiB
Languages
TypeScript 98%
CSS 1.7%
JavaScript 0.3%