docs: rewrite local-dev-postgres-setup for container approach; bump pg16 -> pg18

Land the Postgres dev cluster recipe Jan provisioned on delandtj-home
(rootless podman + pgvector/pgvector:pg18, PG 18.4, pgvector 0.8.2) and
align all live ADR 0002 / Phase 2 sub-plan references from pg16 to pg18.

- docs/plans/local-dev-postgres-setup.md -- rewritten end-to-end:
  podman container vestige-pg with --restart=always, named volume
  vestige-pgdata, PGDATA=/var/lib/postgresql/data/pgdata, port mapping
  127.0.0.1:5432:5432, two-password split (superuser + app role),
  pgvector preinstalled, CREATE EXTENSION vector handled at setup,
  day-to-day commands, password rotation, dev-grade backup/restore,
  teardown, boot-persistence notes for rootless podman. Old native
  Arch install recipe moved to Out-of-scope (covered by image now).

- docs/adr/0002-phase-2-execution.md -- the open-thread mention of
  pgvector/pgvector:pg16 in the Follow-ups section now reads pg18.

- docs/plans/0002c-migrations.md -- container example in the local
  dev section updated to pg18.

- docs/plans/0002d-store-impl-bodies.md -- testcontainers GenericImage
  tag pg16 -> pg18; prose reference updated.

- docs/plans/0002h-testing-and-benches.md -- harness pg18 across
  testcontainers Postgres builder, image-caching prose, CI workflow
  example.

The archival master plan (docs/plans/0002-phase-2-postgres-backend.md)
keeps its original pg16 references intentionally; the supersession
notice already points readers to the live sub-plans.
This commit is contained in:
Jan De Landtsheer 2026-05-27 15:08:35 +02:00
parent fc0681ed0f
commit 21f0b29bae
No known key found for this signature in database
GPG key ID: 95CD37F0C226040B
5 changed files with 198 additions and 72 deletions

View file

@ -504,7 +504,7 @@ own migrations.
- Validate local Postgres dev cluster before PR C work begins. Recipe at
`docs/plans/local-dev-postgres-setup.md` is correct but needs to be applied
on this machine (delandtj-home): cluster is not initdb'd, pgvector is not
installed. Containerized `pgvector/pgvector:pg16` is a viable alternative
installed. Containerized `pgvector/pgvector:pg18` is a viable alternative
if pgvector packaging is friction. See open discussion thread.
### Phase 4 sketch: `sharing_rules` and the precedence chain

View file

@ -843,7 +843,7 @@ podman run --rm -d --name vestige-pg \
-e POSTGRES_USER=vestige \
-e POSTGRES_DB=vestige \
-p 5432:5432 \
docker.io/pgvector/pgvector:pg16
docker.io/pgvector/pgvector:pg18
export DATABASE_URL="postgresql://vestige:devpw@127.0.0.1:5432/vestige"
```

View file

@ -1612,7 +1612,7 @@ use vestige_core::storage::postgres::PgMemoryStore;
#[tokio::test]
async fn round_trip_crud_search_scheduling_edges() {
let docker = clients::Cli::default();
let image = GenericImage::new("pgvector/pgvector", "pg16")
let image = GenericImage::new("pgvector/pgvector", "pg18")
.with_env_var("POSTGRES_PASSWORD", "test")
.with_env_var("POSTGRES_DB", "vestige_test")
.with_exposed_port(5432);
@ -1759,7 +1759,7 @@ This sub-plan is complete when ALL of the following hold:
and the `Visibility` enum is exported alongside it. The SQLite
backend reads and writes the same four fields.
8. The `tests/postgres_round_trip.rs` integration test passes against
a `pgvector/pgvector:pg16` container (insert / get / update / delete
a `pgvector/pgvector:pg18` container (insert / get / update / delete
/ fts_search / vector_search / get_scheduling / update_scheduling
/ add_edge / get_edges / remove_edge / get_neighbors / cascade
delete).

View file

@ -166,12 +166,12 @@ use vestige_core::storage::postgres::PgMemoryStore;
pub async fn fresh_pg_store(
embedder: Arc<dyn Embedder>,
) -> Result<(PgMemoryStore, ContainerAsync<Postgres>)> {
// pgvector/pgvector:pg16 is the official pgvector image built on the
// postgres:16 base. testcontainers-modules::postgres::Postgres targets
// pgvector/pgvector:pg18 is the official pgvector image built on the
// postgres:18 base. testcontainers-modules::postgres::Postgres targets
// the upstream postgres image by default; we override name + tag.
let container = Postgres::default()
.with_name("pgvector/pgvector")
.with_tag("pg16")
.with_tag("pg18")
.start()
.await?;
@ -867,7 +867,7 @@ Requirements:
the `docker_available()` check in `common/mod.rs`. The test output
includes a `docker unavailable; skip` line per test so the developer
knows the tests were not silently dropped.
- The pgvector image (`pgvector/pgvector:pg16`) is pulled on first run;
- The pgvector image (`pgvector/pgvector:pg18`) is pulled on first run;
~200 MB. A pre-pulled image keeps the per-run overhead at the cold-start
container boot (~2-5 seconds).
@ -920,7 +920,7 @@ async fn build_bench(rows: usize) -> Bench {
let embedder = TestEmbedder::new_768();
let container = Postgres::default()
.with_name("pgvector/pgvector")
.with_tag("pg16")
.with_tag("pg18")
.start()
.await
.unwrap();
@ -1092,7 +1092,7 @@ Notes:
- The Postgres feature tests should run in a separate CI matrix entry to
isolate failures and skip them entirely on platforms (Windows runners
if any) where the pgvector image is not available.
- Cache the `pgvector/pgvector:pg16` image between runs. The
- Cache the `pgvector/pgvector:pg18` image between runs. The
`docker/setup-buildx-action` cache or a simple `docker pull` step before
the test step keeps cold-start under the existing CI time budget.
- Skip CI: contributors without Docker can still merge changes that do
@ -1113,7 +1113,7 @@ jobs:
# no `postgres` service block needed; testcontainers manages its own
steps:
- uses: actions/checkout@v4
- run: docker pull pgvector/pgvector:pg16
- run: docker pull pgvector/pgvector:pg18
- uses: dtolnay/rust-toolchain@stable
- run: cargo test -p vestige-core --features postgres-backend --test '*'
```

View file

@ -1,27 +1,55 @@
# Local Dev Postgres Setup (Arch / CachyOS)
# Local Dev Postgres Setup (container, hybrid approach)
**Status**: Applied on this machine on 2026-04-21
**Related**: docs/plans/0002-phase-2-postgres-backend.md, docs/adr/0001-pluggable-storage-and-network-access.md
**Status**: Applied on this machine on 2026-05-27 (rootless podman, Postgres 18.4 + pgvector 0.8.2).
**Related**: docs/plans/0002-phase-2-postgres-backend.md, docs/adr/0002-phase-2-execution.md, docs/adr/0001-pluggable-storage-and-network-access.md
Purpose: capture the minimum, repeatable steps to stand up a Postgres 18 instance on a local Arch/CachyOS box for Phase 2 (`PgMemoryStore`) development, `sqlx prepare`, and manual migration testing. This is a single-operator dev recipe, not a production runbook.
Purpose: capture the minimum, repeatable steps to stand up a long-lived
Postgres 18 + pgvector instance on a local Linux dev box for Phase 2
(`PgMemoryStore`) development, `sqlx prepare`, and manual migration
testing. This is a single-operator dev recipe, not a production runbook.
ADR 0002 picked the **hybrid container** approach over a native install:
the `pgvector/pgvector:pg18` image ships pgvector pre-installed, matches
the image testcontainers will use in the Phase 2 test harness, and avoids
the AUR/build-from-source friction of native pgvector packaging on Arch.
---
## Current state on this machine
- Package: `postgresql` 18.3-2 (pacman). Pulls `postgresql-libs`, `libxslt`.
- Service: `postgresql.service`, enabled + active.
- Listens on: `127.0.0.1:5432` and `[::1]:5432` only (default `listen_addresses = 'localhost'`).
- Data dir: `/var/lib/postgres/data`, owner `postgres:postgres`.
- Auth (`pg_hba.conf`, Arch defaults): `peer` for local socket, `scram-sha-256` for host 127.0.0.1/::1.
- Runtime: rootless `podman` 5.8.2 (Arch). `docker` 29.5.1 also installed but unused.
- Image: `docker.io/pgvector/pgvector:pg18` (PostgreSQL 18.4, pgvector 0.8.2).
- Container: `vestige-pg`, `--restart=always`, port `127.0.0.1:5432:5432`.
- Volume: named podman volume `vestige-pgdata`, mounted at
`/var/lib/postgresql/data` inside the container; `PGDATA` points at
`/var/lib/postgresql/data/pgdata` so the volume mount is non-empty at
init time (Postgres refuses to initdb into a non-empty directory).
- Listens on: `127.0.0.1:5432` only (port mapping is bound to loopback).
- Auth: `scram-sha-256` (image default for both local socket and host).
### Database + role
- Database: `vestige`, UTF8, owner `vestige`.
- Role: `vestige` with `LOGIN CREATEDB` (no superuser, no replication, no cross-db).
- Schema `public` re-owned to `vestige`, plus default privileges so any future tables / sequences / functions in `public` are fully owned and granted to `vestige`.
- Database: `vestige`, UTF8, owner `vestige`, `LC_COLLATE=C.UTF-8`, `LC_CTYPE=C.UTF-8`.
- Role: `vestige` with `LOGIN CREATEDB` (no superuser, no replication).
- Schema `public` re-owned to `vestige` with full default privileges on
future tables / sequences / functions.
- Extension: `vector` (pgvector 0.8.2) installed in the `vestige`
database by the superuser at setup time.
Net effect: the `vestige` role can create, alter, drop, and grant freely inside the `vestige` database -- enough for `sqlx::migrate!`, ad-hoc schema work, and the full Phase 2 `MemoryStore` surface. It cannot create extensions (see Phase 2 followups below) and cannot touch other databases.
Net effect: the `vestige` role can create, alter, drop, and grant freely
inside the `vestige` database -- enough for `sqlx::migrate!`, ad-hoc
schema work, and the full Phase 2 `MemoryStore` surface. It cannot create
extensions; the superuser handled `CREATE EXTENSION vector` already.
### Passwords
Two passwords live in the dev user's home, mode 600:
- `~/.vestige_pg_superpw` -- the `postgres` superuser password inside the
container. Used for one-shot admin tasks (creating roles, installing
extensions, password rotation). Day-to-day app traffic does NOT use it.
- `~/.vestige_pg_pw` -- the `vestige` role password. This is the one the
Phase 2 backend, `sqlx prepare`, and ad-hoc `psql` invocations use.
### Connection
@ -29,13 +57,8 @@ Net effect: the `vestige` role can create, alter, drop, and grant freely inside
postgresql://vestige:<password>@127.0.0.1:5432/vestige
```
Password lives at `~/.vestige_pg_pw`, mode 600, owned by the dev user (no sudo needed to read it). Read with:
```sh
cat ~/.vestige_pg_pw
```
Recommended dev shell export (keep this OUT of the repo; use `.env` + gitignore or a shell rc):
Recommended dev shell export (keep this OUT of the repo; use `.env` +
gitignore or a shell rc):
```sh
export DATABASE_URL="postgresql://vestige:$(cat ~/.vestige_pg_pw)@127.0.0.1:5432/vestige"
@ -45,109 +68,212 @@ export DATABASE_URL="postgresql://vestige:$(cat ~/.vestige_pg_pw)@127.0.0.1:5432
## Reproduce from scratch
On a fresh Arch / CachyOS box with passwordless sudo:
On a fresh Linux box with `podman` installed and `python3` available:
```sh
# 1. Install
sudo pacman -S --noconfirm postgresql
# 1. Pull the image
podman pull docker.io/pgvector/pgvector:pg18
# 2. Initialize the cluster (UTF8, scram-sha-256 for host, peer for local)
sudo -iu postgres initdb \
--locale=C.UTF-8 --encoding=UTF8 \
-D /var/lib/postgres/data \
--auth-host=scram-sha-256 --auth-local=peer
# 2. Create a persistent named volume
podman volume create vestige-pgdata
# 3. Start + enable
sudo systemctl enable --now postgresql
# 3. Generate the superuser password and stash it (mode 600)
SUPER_PW=$(python3 -c 'import secrets,string; a=string.ascii_letters+string.digits; print("".join(secrets.choice(a) for _ in range(32)))')
umask 077
printf '%s' "$SUPER_PW" > ~/.vestige_pg_superpw
chmod 600 ~/.vestige_pg_superpw
# 4. Generate a password and stash it in the dev user's home (mode 600)
# 4. Start the container
podman run -d \
--name vestige-pg \
--restart=always \
-p 127.0.0.1:5432:5432 \
-e POSTGRES_PASSWORD="$SUPER_PW" \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v vestige-pgdata:/var/lib/postgresql/data \
docker.io/pgvector/pgvector:pg18
unset SUPER_PW
# 5. Wait for ready
until podman exec vestige-pg pg_isready -U postgres -h 127.0.0.1 >/dev/null 2>&1; do
sleep 1
done
# 6. Generate the vestige role password and stash it (mode 600)
VESTIGE_PW=$(python3 -c 'import secrets,string; a=string.ascii_letters+string.digits; print("".join(secrets.choice(a) for _ in range(32)))')
umask 077
printf '%s' "$VESTIGE_PW" > ~/.vestige_pg_pw
chmod 600 ~/.vestige_pg_pw
# 5. Create role + database + grants
sudo -u postgres psql -v ON_ERROR_STOP=1 <<SQL
# 7. Create role + database + grants + extension (runs as superuser inside the container)
podman exec -i vestige-pg psql -U postgres -v ON_ERROR_STOP=1 <<SQL
CREATE ROLE vestige WITH LOGIN CREATEDB PASSWORD '${VESTIGE_PW}';
CREATE DATABASE vestige OWNER vestige ENCODING 'UTF8';
CREATE DATABASE vestige OWNER vestige ENCODING 'UTF8'
TEMPLATE template0 LC_COLLATE 'C.UTF-8' LC_CTYPE 'C.UTF-8';
GRANT ALL PRIVILEGES ON DATABASE vestige TO vestige;
SQL
sudo -u postgres psql -d vestige -v ON_ERROR_STOP=1 <<'SQL'
podman exec -i vestige-pg psql -U postgres -d vestige -v ON_ERROR_STOP=1 <<'SQL'
GRANT ALL ON SCHEMA public TO vestige;
ALTER SCHEMA public OWNER TO vestige;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO vestige;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO vestige;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON FUNCTIONS TO vestige;
CREATE EXTENSION IF NOT EXISTS vector;
SQL
# 6. Smoke test
PGPASSWORD="$VESTIGE_PW" psql -h 127.0.0.1 -U vestige -d vestige \
-c 'SELECT current_user, current_database(), version();'
unset VESTIGE_PW
# 8. Smoke test as the vestige role
PGPASSWORD="$(cat ~/.vestige_pg_pw)" psql -h 127.0.0.1 -U vestige -d vestige \
-c "SELECT current_user, current_database(), version();" \
-c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';" \
-c "SELECT '[1,2,3]'::vector <-> '[3,2,1]'::vector AS l2_distance;"
```
---
## Phase 2 followups (before PgMemoryStore works)
## Boot persistence (rootless podman)
The cluster above is bare Postgres. Phase 2 needs `pgvector`:
`--restart=always` keeps the container alive across podman daemon
restarts, but rootless podman containers do NOT auto-start on system
boot unless the dev user has lingering enabled:
```sh
# Install the extension package
sudo pacman -S --noconfirm pgvector
# Enable it in the vestige database (must run as postgres; vestige is not superuser)
sudo -u postgres psql -d vestige -c 'CREATE EXTENSION IF NOT EXISTS vector;'
sudo loginctl enable-linger "$USER"
```
Verify:
After that, the `podman-restart.service` user unit handles restart of
`--restart=always` containers when the user session starts at boot:
```sh
PGPASSWORD="$(cat ~/.vestige_pg_pw)" psql -h 127.0.0.1 -U vestige -d vestige \
-c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
systemctl --user enable --now podman-restart.service
```
Notes:
Skip both if you prefer to start the cluster manually each session with
`podman start vestige-pg`.
- `pgvector` must be available on the server before `sqlx::migrate!` runs, or the Phase 2 migration that declares typed `Vector` columns will fail.
- Testcontainer-based Phase 2 integration tests use `pgvector/pgvector:pg16` and are independent of this local cluster. This local cluster is for `sqlx prepare`, `cargo run -- migrate --to postgres`, and manual poking.
- `sqlx prepare` needs `DATABASE_URL` pointed at this cluster with `vestige` migrations already applied. Run from `crates/vestige-core/`.
---
## Day-to-day operation
```sh
# Status
podman ps --filter name=vestige-pg
# Logs (follow)
podman logs -f vestige-pg
# psql as the app role
PGPASSWORD="$(cat ~/.vestige_pg_pw)" psql -h 127.0.0.1 -U vestige -d vestige
# psql as the superuser (for grants, extensions, role admin)
podman exec -it vestige-pg psql -U postgres
# Stop / start
podman stop vestige-pg
podman start vestige-pg
# Restart in place
podman restart vestige-pg
```
---
## Password rotation
```sh
# Rotate the vestige role password
NEW_PW=$(python3 -c 'import secrets,string; a=string.ascii_letters+string.digits; print("".join(secrets.choice(a) for _ in range(32)))')
umask 077
printf '%s' "$NEW_PW" > ~/.vestige_pg_pw
chmod 600 ~/.vestige_pg_pw
sudo -u postgres psql -v ON_ERROR_STOP=1 \
podman exec -i vestige-pg psql -U postgres -v ON_ERROR_STOP=1 \
-c "ALTER ROLE vestige WITH PASSWORD '${NEW_PW}';"
unset NEW_PW
# Rotate the superuser password (less common)
NEW_SUPER=$(python3 -c 'import secrets,string; a=string.ascii_letters+string.digits; print("".join(secrets.choice(a) for _ in range(32)))')
umask 077
printf '%s' "$NEW_SUPER" > ~/.vestige_pg_superpw
chmod 600 ~/.vestige_pg_superpw
podman exec -i vestige-pg psql -U postgres -v ON_ERROR_STOP=1 \
-c "ALTER ROLE postgres WITH PASSWORD '${NEW_SUPER}';"
unset NEW_SUPER
```
Then re-export `DATABASE_URL` in any live shells.
---
## Backup and restore (dev-grade)
`pg_dump` writes a plain-text SQL dump to host disk. For dev data this is
enough; production runbook lives in `0002i-runbook.md`.
```sh
# Dump
PGPASSWORD="$(cat ~/.vestige_pg_pw)" pg_dump -h 127.0.0.1 -U vestige -d vestige \
--format=plain --no-owner > vestige-$(date +%Y%m%d-%H%M%S).sql
# Restore (drops + recreates)
podman exec -i vestige-pg psql -U postgres -v ON_ERROR_STOP=1 \
-c 'DROP DATABASE IF EXISTS vestige;' \
-c 'CREATE DATABASE vestige OWNER vestige ENCODING UTF8 TEMPLATE template0;'
PGPASSWORD="$(cat ~/.vestige_pg_pw)" psql -h 127.0.0.1 -U vestige -d vestige < vestige-DUMP.sql
```
The named volume `vestige-pgdata` persists outside the container; the
container can be `podman rm`'d and recreated without losing data, as
long as the volume stays in place.
---
## Teardown
Destroys the cluster and all data in it:
```sh
sudo systemctl disable --now postgresql
sudo pacman -Rns postgresql postgresql-libs
sudo rm -rf /var/lib/postgres
rm -f ~/.vestige_pg_pw
podman stop vestige-pg
podman rm vestige-pg
podman volume rm vestige-pgdata
podman rmi docker.io/pgvector/pgvector:pg18
rm -f ~/.vestige_pg_pw ~/.vestige_pg_superpw
```
`enable-linger` and the user systemd unit can be undone with
`sudo loginctl disable-linger "$USER"` and
`systemctl --user disable podman-restart.service` if you turned them on.
---
## Notes for Phase 2
- `pgvector` is preinstalled in the image; the `CREATE EXTENSION vector`
in step 7 above makes it available inside the `vestige` DB. The
extension must be loaded BEFORE `sqlx::migrate!` runs the Phase 2
migration that declares typed `Vector` columns, otherwise the
migration fails.
- Testcontainer-based Phase 2 integration tests use the same
`pgvector/pgvector:pg18` image and spin up fresh containers per run;
they are independent of this long-lived cluster. This cluster exists
for `sqlx prepare`, `cargo run -- migrate --to postgres`, and manual
poking.
- `sqlx prepare` needs `DATABASE_URL` pointed at this cluster with
`vestige` migrations already applied. Run from `crates/vestige-core/`.
---
## Out of scope for this doc
- TLS, client-cert auth, non-localhost access. Phase 3 exposes the Vestige HTTP API over the network, not Postgres directly.
- Backups, PITR, WAL archiving. For dev data: `pg_dump -h 127.0.0.1 -U vestige vestige > vestige.sql`.
- Replication, PgBouncer, tuned `postgresql.conf`. Defaults are fine for Phase 2 development.
- Making this the canonical Vestige backend. By default Vestige still uses SQLite; this cluster exists so the `postgres-backend` feature can be built and tested locally.
- TLS, client-cert auth, non-localhost access. Phase 3 exposes the
Vestige HTTP API over the network, not Postgres directly.
- PITR, WAL archiving, replication, PgBouncer, tuned `postgresql.conf`.
Defaults are fine for Phase 2 development.
- Native (non-container) Postgres install. The prior version of this
doc covered native Arch packaging; superseded by ADR 0002's hybrid
decision.
- Making this the canonical Vestige backend. By default Vestige still
uses SQLite; this cluster exists so the `postgres-backend` feature
can be built and tested locally.