# Nextcloud + Elasticsearch Discovery File Explorer A discovery-first file explorer built with Next.js (App Router, TypeScript, Tailwind, shadcn UI) that connects to Nextcloud via WebDAV and indexes metadata/content into Elasticsearch (BM25 baseline, optional semantic hybrid search). Includes upload/download, folder tree browsing, global search, and Sentry instrumentation. ## TL;DR (Quick Start) Prereqs: - Node 18+ and npm - Docker (for Elasticsearch/Kibana/Tika services) Setup: 1) Configure environment - Copy `.env.example` to `.env.local` and fill in your values (Nextcloud creds + Elasticsearch endpoint). - A pre-populated `.env.local` is included for your provided Nextcloud host (update if needed). 2) Provision Elasticsearch + Tika (Docker sample below) 3) Create Elasticsearch index - `npm run create:index` 4) Ingest Nextcloud into ES (BM25 baseline) - `npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts` 5) Run the app - `npm run dev` then open the reported URL (e.g., http://localhost:3000) ## Why this exists Nextcloud is the system of record for files; this app provides a discovery UX (fast search, filters, previews later) by indexing normalized documents into Elasticsearch. Apache Tika can extract plain text for rich BM25 search, and optional OpenAI-compatible embeddings enable semantic hybrid search. ## Architecture - UI: Next.js App Router (TypeScript), Tailwind CSS, shadcn UI - Data - Nextcloud WebDAV for browse/upload/download - Elasticsearch for indexing and search (BM25 baseline) - Apache Tika server for content extraction (optional but recommended) - OpenAI-compatible embeddings (optional) for dense vectors/hybrid queries - Observability: Sentry (client/server/edge) with logs and spans ## Features - Browse Nextcloud directories via a lazy-loading sidebar tree - View files with size/type and actions in a table - Upload files into the current folder - Download/proxy via Next.js route - Global search box (BM25 baseline) with optional “Semantic” toggle for hybrid search (when embeddings are configured) - Sentry logging/tracing sprinkled around WebDAV and Elasticsearch calls ## Project Layout (highlights) - src/lib - env.ts: zod-validated env loader with flags - paths.ts: normalization/helpers for DAV paths - elasticsearch.ts: ES client, ensureIndex, bm25Search, knnSearch, hybridSearch - webdav.ts: Nextcloud WebDAV wrapper (list/create/upload/download/stat) with spans - embeddings.ts: OpenAI-compatible embeddings client - src/app/api - folders/list, folders/create - files/list, files/upload, files/download - search/query - scripts - create-index.ts: creates ES index + alias - ingest-nextcloud.ts: crawl Nextcloud → optional Tika → index into ES - docs/elasticsearch/mappings.json: canonical baseline mapping - Sentry init - instrumentation-client.ts, sentry.server.config.ts, sentry.edge.config.ts ## Environment Variables Populate `.env.local` (not committed). See `.env.example` for full list. Required: - NEXTCLOUD_BASE_URL: e.g. https://your-nextcloud.example.com - NEXTCLOUD_USERNAME: WebDAV user - NEXTCLOUD_APP_PASSWORD: App password generated in Nextcloud (not login password) - NEXTCLOUD_ROOT_PATH: e.g. `/remote.php/dav/files/admin` - ELASTICSEARCH_URL: e.g. http://localhost:9200 - ELASTICSEARCH_INDEX: default `files` - ELASTICSEARCH_ALIAS: default `files_current` Optional: - TIKA_BASE_URL: e.g. http://localhost:9998 (if using Apache Tika for extraction) - SENTRY_DSN: if provided, Sentry is enabled - OPENAI_API_BASE, OPENAI_API_KEY, OPENAI_EMBEDDING_MODEL, EMBEDDING_DIM: Enable embeddings + semantic hybrid ## Dependencies and Local Services (Docker) Below is a sample docker-compose for local development. Adjust versions for your environment. For TrueNAS Scale, translate this into an appropriate app configuration. ```yaml version: "3.9" services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.12.2 container_name: es environment: - discovery.type=single-node - xpack.security.enabled=false - ES_JAVA_OPTS=-Xms1g -Xmx1g ports: - "9200:9200" healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9200/_cluster/health"] interval: 10s timeout: 5s retries: 10 kibana: image: docker.elastic.co/kibana/kibana:8.12.2 container_name: kibana environment: - ELASTICSEARCH_HOSTS=http://elasticsearch:9200 ports: - "5601:5601" depends_on: elasticsearch: condition: service_healthy tika: image: apache/tika:latest-full container_name: tika ports: - "9998:9998" healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9998"] interval: 10s timeout: 5s retries: 10 ``` Notes: - Security is relaxed in this dev config (xpack security disabled). Harden for production. - Set `TIKA_BASE_URL=http://localhost:9998` to enable content extraction. ## Install & Run Install dependencies: ```bash npm install ``` Run development server: ```bash npm run dev # Next.js will print the URL, for example http://localhost:3000 ``` Build and start: ```bash npm run build npm start ``` ## Initialize Elasticsearch Create the index and alias (files/files_current): ```bash npm run create:index # internally: tsx -r dotenv/config -r tsconfig-paths/register scripts/create-index.ts ``` To recreate: ```bash npm run create:index -- --recreate ``` ## Ingest Nextcloud → Elasticsearch Crawl Nextcloud via WebDAV, optionally extract content with Tika, index into ES: ```bash npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts ``` Restrict to a subtree: ```bash npx tsx -r dotenv/config -r tsconfig-paths/register scripts/ingest-nextcloud.ts -- --root=/remote.php/dav/files/admin/SomeFolder ``` After ingestion, try searching in the UI (BM25). ## Optional: Semantic Hybrid Search To enable, configure embeddings: - OPENAI_API_BASE (e.g., https://api.openai.com/v1 or your compatible endpoint) - OPENAI_API_KEY - OPENAI_EMBEDDING_MODEL (e.g., text-embedding-3-large) - EMBEDDING_DIM (e.g., 1536) Current code path supports: - Hybrid search in API when `semantic: true` in requests; UI has a “Semantic” toggle. - A future backfill script can be added to compute vectors for existing docs (planned). ## Sentry - Files: - instrumentation-client.ts (client init) - sentry.server.config.ts (node runtime) - sentry.edge.config.ts (edge runtime) - Enable by setting SENTRY_DSN in `.env.local`. - Logging enabled (`consoleLoggingIntegration`), capturing console.log/warn/error. - Spans instrument WebDAV and ES operations with meaningful op/name and attributes. ## API Endpoints - GET `/api/folders/list?path=/abs/path` → { folders[], files[] } - POST `/api/folders/create` body: `{ path: "/abs/path" }` → `{ ok: true }` - GET `/api/files/list?path=/abs/path&page=1&perPage=50` → `{ total, page, perPage, items[] }` - POST `/api/files/upload` (multipart) form-data: `file`, `destPath` - GET `/api/files/download?path=/abs/path` → stream download - POST `/api/search/query` → body: `{ q, filters?, sort?, page?, perPage?, semantic? }` ## UI Usage - Navigate folders via the left sidebar. Root defaults to `NEXTCLOUD_ROOT_PATH`. - Breadcrumbs show the current path; click to navigate. - Global search queries ES (BM25). Toggle “Semantic” to blend vector similarity (when enabled). - Use “Upload” to send a file to the current folder. - Click a file name or download action to retrieve it. ## Known Caveats / TODO - When starting from `NEXTCLOUD_ROOT_PATH`, breadcrumb segments may include technical path prefixes (e.g., `/remote.php`, `/dav`) that aren’t browsable independently. This can be adjusted to start breadcrumbs from the user root only. - Embeddings backfill script & “Find similar” API/UI are planned. - Tests (unit/integration/E2E) and TrueNAS-specific compose notes can be added next. ## Development Notes - Code style: TypeScript, ESLint (Next config), Tailwind v4. - shadcn components added (Slate), most primitives included. - ES Types via `@elastic/elasticsearch` estypes. - Stream interop handled for Node → Web streams in downloads and uploads. - Zod-based env loader normalizes URLs and validates required vars. ## Troubleshooting - Search 500s: - Ensure Elasticsearch is running and reachable at `ELASTICSEARCH_URL`. - Run `npm run create:index` to create the index and alias. - If using Tika for content, ensure `TIKA_BASE_URL` is set and service is healthy (optional). - WebDAV failures: - Verify `NEXTCLOUD_BASE_URL`, `NEXTCLOUD_USERNAME`, `NEXTCLOUD_APP_PASSWORD`, and `NEXTCLOUD_ROOT_PATH`. - Confirm the user has permission for the target path. - Sentry issues: - Ensure `SENTRY_DSN` is set; check networking/outbound restrictions. ## Scripts - `npm run create:index`: Create/recreate Elasticsearch index and alias - `scripts/ingest-nextcloud.ts`: Crawl Nextcloud → Tika (optional) → Elasticsearch ## Security - Keep `.env.local` out of source control (already in .gitignore). - Use Nextcloud App Passwords, not login passwords. - Harden Elasticsearch/Kibana/Tika for production (auth, TLS, resource limits). --- Built per implementation_plan.md with a deterministic order of rollout and Sentry instrumentation aligned to organizational rules.