10 KiB
TrueNAS Maintenance Log
Date: 2025-10-01
TL;DR
- Fixed Redis not starting due to bad container args. Set persistence and memory policy via env and verified.
- Stopped Postgres from ignoring tuned configs by removing the CLI override and explicitly setting sane values.
- Tuned ZFS dataset and host kernel settings for DB workloads.
- Verified results inside running pods.
1) Baseline snapshot script
Collected a fast system snapshot for Nextcloud troubleshooting.
sudo bash /tmp/nc_sysdump.sh
Why: one-shot view of OS, CPU, memory, ZFS, ARC, datasets, k3s pods, open ports, THP, swappiness, timers, and quick Redis/Postgres presence checks. 1
2) ZFS and host tuning for Postgres
Applied ZFS dataset properties and kernel flags appropriate for OLTP.
PGDATA="Pool2/ix-applications/releases/nextcloud/volumes/ix_volumes/pgData"
sudo zfs set recordsize=8K atime=off compression=lz4 logbias=latency primarycache=all "$PGDATA"
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled >/dev/null
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag >/dev/null
# Persist THP disable and low swappiness
sudo tee /etc/systemd/system/disable-thp.service >/dev/null <<'EOF'
[Unit]
Description=Disable Transparent Huge Pages
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/defrag'
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now disable-thp.service
sudo sysctl vm.swappiness=1
echo 'vm.swappiness=1' | sudo tee /etc/sysctl.d/99-redis-db.conf >/dev/null
sudo sysctl --system
Why: 8K recordsize matches PG page size and reduces read-modify-write churn; logbias=latency reduces ZIL latency; THP off avoids latency spikes for PG; low swappiness keeps hot pages in RAM. 2 3 4
3) Redis: persistence and memory policy
Initial failure was due to passing raw -- args to the Bitnami entrypoint, which treated them as shell options and crashed. Fixed by removing args and using env-based config.
Bad args removed
NS=ix-nextcloud
DEP=nextcloud-redis
k3s kubectl -n $NS patch deploy $DEP --type=json -p='[
{"op":"remove","path":"/spec/template/spec/containers/0/args"}
]'
Good settings applied via env
k3s kubectl -n $NS set env deploy/$DEP \
REDIS_APPENDONLY=yes \
REDIS_APPENDFSYNC=everysec \
REDIS_MAXMEMORY=8gb \
REDIS_MAXMEMORY_POLICY=allkeys-lru
k3s kubectl -n $NS rollout restart deploy/$DEP
Verification
NS=ix-nextcloud
POD=$(k3s kubectl -n $NS get pods | awk '/nextcloud-redis/{print $1; exit}')
REDIS_PASS=$(k3s kubectl -n $NS get secret nextcloud-redis-creds -o jsonpath='{.data.REDIS_PASSWORD}' | base64 -d)
k3s kubectl -n $NS exec -it "$POD" -- sh -lc "/opt/bitnami/redis/bin/redis-cli -a \"$REDIS_PASS\" INFO | egrep 'aof_enabled|maxmemory|maxmemory_policy'"
# Output:
# maxmemory:8589934592
# maxmemory_human:8.00G
# maxmemory_policy:allkeys-lru
# aof_enabled:1
Why: Bitnami Redis prefers env variables to configure persistence and memory policy. This avoids shell parsing issues and persists across restarts. 5 6 7
4) Postgres: stop the CLI override, then tune
Symptom: shared_buffers kept showing 1 GB and pg_settings.source = 'command line'. Root cause was a -c shared_buffers=1024MB passed via deployment. That always wins over postgresql.conf, conf.d, and ALTER SYSTEM.
Remove or replace CLI args
NS=ix-nextcloud
DEP=nextcloud-postgres
# Remove args if present
k3s kubectl -n $NS patch deploy $DEP --type=json -p='[
{"op":"remove","path":"/spec/template/spec/containers/0/args"}
]' || true
# Replace with tuned args explicitly
k3s kubectl -n $NS patch deploy $DEP --type=json -p='[
{"op":"add","path":"/spec/template/spec/containers/0/args","value":
["-c","shared_buffers=16GB",
"-c","max_connections=200",
"-c","wal_compression=on",
"-c","max_wal_size=8GB",
"-c","random_page_cost=1.25"]}]'
k3s kubectl -n $NS rollout restart deploy/$DEP
Resource limit raised in App UI
- Memory limit increased to 24 GiB to allow 16 GiB buffers without OOM.
Verification inside pod
SEC=nextcloud-postgres-creds
DBUSER=$(k3s kubectl -n $NS get secret $SEC -o jsonpath='{.data.POSTGRES_USER}' | base64 -d)
DBPASS=$(k3s kubectl -n $NS get secret $SEC -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d)
DBNAME=$(k3s kubectl -n $NS get secret $SEC -o jsonpath='{.data.POSTGRES_DB}' | base64 -d)
POD=$(k3s kubectl -n $NS get pods -o name | sed -n 's|pod/||p' | grep -E '^nextcloud-postgres' | head -1)
k3s kubectl -n $NS exec -it "$POD" -- bash -lc \
"PGPASSWORD='$DBPASS' psql -h 127.0.0.1 -U '$DBUSER' -d '$DBNAME' -Atc \
\"select name,setting,unit,source from pg_settings
where name in ('shared_buffers','effective_cache_size','wal_compression','max_wal_size','random_page_cost')
order by name;\""
Expected results after change:
shared_bufferssource should be command line with16GBeffective_cache_sizefrom conf.d set to 40 GBwal_compression=on,max_wal_size=8GB,random_page_cost=1.25
Cgroup limit check
k3s kubectl -n $NS exec "$POD" -- sh -lc 'cat /sys/fs/cgroup/memory.max || cat /sys/fs/cgroup/memory/memory.limit_in_bytes'
# 25769803776
Huge pages status
k3s kubectl -n $NS exec -it "$POD" -- bash -lc \
"psql -Atc \"show huge_pages;\" -U '$DBUSER' -h 127.0.0.1 -d '$DBNAME'"
# off
Why: Precedence is CLI args over config files. Removing or replacing the CLI flag is the only way to make buffers larger than 1 GB take effect in this chart. The resource limit must also allow it. 8 9
5) Small cleanups and guardrails
-
Created a helper to reapply Redis tuning quickly:
cat >/root/reapply-redis-tuning.sh <<'EOF' NS=ix-nextcloud DEP=nextcloud-redis k3s kubectl -n $NS set env deploy/$DEP \ REDIS_APPENDONLY=yes \ REDIS_APPENDFSYNC=everysec \ REDIS_MAXMEMORY=8gb \ REDIS_MAXMEMORY_POLICY=allkeys-lru k3s kubectl -n $NS rollout restart deploy/$DEP EOF chmod +x /root/reapply-redis-tuning.sh -
Verified Nextcloud’s Redis password from the correct secret key
REDIS_PASSWORDafter earlier key-name misses.
Why: quick reapply for tunables, fewer fat-fingered loops.
Validation snapshots
Redis quick state
connected_clients:11
used_memory_human:1.46M
maxmemory_human:8.00G
maxmemory_policy:allkeys-lru
aof_enabled:1
aof_last_write_status:ok
instantaneous_ops_per_sec:95
evicted_keys:0
role:master
Postgres quick state
shared_buffersnow controlled via CLI and aligned with resource limiteffective_cache_size=40GBfrom conf.dwal_compression=on,max_wal_size=8GB,random_page_cost=1.25confirmed
Known gotchas encountered
- Exec’d into wrong pods/containers repeatedly. Use namespace and label selectors plus
-conly when the pod actually has multiple containers. 10 - Bitnami Redis ignores raw
--args inargswhen passed incorrectly. Use env variables the chart supports. - Postgres role confusion: default superuser is not always
postgresin this chart. Use credentials fromnextcloud-postgres-creds. 11
Next actions
- Optional: set
effective_io_concurrency=256andmaintenance_work_mem=2GBvia conf.d only if not already present in CLI, then restart. - Consider
shared_buffers=25%of cgroup memory for mixed workloads. You set 16 GB on a 24 GiB limit which is fine if the pod has headroom. 12 - Keep
work_memmoderate to avoid per-query explosion; current128MBis aggressive if concurrency spikes.
Footnotes
Appendix: Handy one-liners
Show who is forcing PG settings
select name,setting,source,sourcefile
from pg_settings
where name in ('shared_buffers','effective_cache_size','wal_compression','max_wal_size','random_page_cost')
order by name;
Show current pod memory limit
cat /sys/fs/cgroup/memory.max || cat /sys/fs/cgroup/memory/memory.limit_in_bytes
Redis sanity
REDISCLI_AUTH="$REDIS_PASS" redis-cli INFO | egrep -i 'used_memory_human|maxmemory_human|maxmemory_policy|aof_enabled|evicted_keys'
-
The snapshot script prints OS, CPU, memory, ZFS pools and ARC, datasets matching Nextcloud and DB, app platform state, network listeners, THP, swappiness, timers, and versions. Good first move before any tuning. ↩︎
-
ZFS
recordsize=8Kmatches Postgres 8 KB page size;atime=offavoids metadata writes;compression=lz4is typically net positive for WAL and heap;logbias=latencyoptimizes synchronous intent logging. These are standard PG-on-ZFS choices. ↩︎ -
Transparent Huge Pages can cause latency spikes for memory alloc and compaction. PG recommends
never. You persisted it with a systemd unit and verifiedhuge_pages=offin PG. ↩︎ -
vm.swappiness=1favors keeping hot working sets in memory. DB nodes typically set this low to avoid writeback storms. ↩︎ -
The TrueNAS Bitnami chart maps well-known env vars like
REDIS_APPENDONLYandREDIS_MAXMEMORY_POLICYinto redis.conf, avoiding brittleargsparsing. ↩︎ -
appendonly yeswitheverysecgives durability with good throughput. It is the sane default for NC caching plus locking patterns. ↩︎ -
allkeys-lruprevents unbounded memory growth and prioritizes hot keys. Withmaxmemory 8gb, eviction is predictable. ↩︎ -
Postgres configuration precedence is: command line
-cflags override includes andpostgresql.conf, thenALTER SYSTEM, then file includes. If the container passes-c shared_buffers=1024MB, it will override everything else. ↩︎ -
With a 24 GiB cgroup limit,
shared_buffers=16GBis aggressive but acceptable if app memory and FS cache are still healthy. MonitorOOMKilledevents and PG memory stats. ↩︎ -
When kubectl says “container not found,” the pod likely has a single container with a different name than you assumed. Use
kubectl -n NS get pod POD -o jsonpath='{.spec.containers[*].name}'to confirm. ↩︎ -
The Bitnami PG image often creates the app user as the primary DB user. The secret shows the authoritative
POSTGRES_USER,POSTGRES_PASSWORD, andPOSTGRES_DByou should use. ↩︎ -
Rule of thumb:
shared_buffers20–25 percent of RAM for mixed workloads, higher only if the rest of the stack is memory-light and you monitor for OOM. Effective cache can be 2–3x buffers. ↩︎