omnigraph/docker/entrypoint.sh

#!/bin/sh
set -eu

SERVER_BIN="/usr/local/bin/omnigraph-server"

if [ "$#" -gt 0 ]; then
  exec "$SERVER_BIN" "$@"
fi

bind="${OMNIGRAPH_BIND:-0.0.0.0:8080}"

# Cluster mode first, and exclusive (the server's mode-inference rule 0):
# a deployment serves from cluster state XOR omnigraph.yaml, never a merge.
# Fail fast here with the same contract the server enforces.
if [ -n "${OMNIGRAPH_CLUSTER:-}" ]; then
  if [ -n "${OMNIGRAPH_TARGET_URI:-}" ] || [ -n "${OMNIGRAPH_CONFIG:-}" ] || [ -n "${OMNIGRAPH_TARGET:-}" ]; then
    echo "OMNIGRAPH_CLUSTER is an exclusive boot source; unset OMNIGRAPH_TARGET_URI/OMNIGRAPH_CONFIG/OMNIGRAPH_TARGET" >&2
    exit 64
  fi
  set -- --cluster "${OMNIGRAPH_CLUSTER}" --bind "${bind}"
  case "${OMNIGRAPH_REQUIRE_ALL_GRAPHS:-}" in
    ""|0|false|FALSE) ;;
    *) set -- "$@" --require-all-graphs ;;
  esac
  exec "$SERVER_BIN" "$@"
fi

# URI comes from the env var (the positional arg wins over any config
# `graphs` block in resolve_target_uri). OMNIGRAPH_CONFIG, when also set,
# is forwarded as --config purely to supply a policy file — the two
# compose. Without OMNIGRAPH_CONFIG the behavior is unchanged.
if [ -n "${OMNIGRAPH_TARGET_URI:-}" ]; then
  exec "$SERVER_BIN" "${OMNIGRAPH_TARGET_URI}" \
    ${OMNIGRAPH_CONFIG:+--config "$OMNIGRAPH_CONFIG"} \
    --bind "${bind}"
fi

if [ -n "${OMNIGRAPH_CONFIG:-}" ]; then
  if [ -n "${OMNIGRAPH_TARGET:-}" ]; then
    exec "$SERVER_BIN" --config "${OMNIGRAPH_CONFIG}" --target "${OMNIGRAPH_TARGET}" --bind "${bind}"
  fi
  exec "$SERVER_BIN" --config "${OMNIGRAPH_CONFIG}" --bind "${bind}"
fi

cat >&2 <<'EOF'
omnigraph-server container startup requires one of:
  - OMNIGRAPH_CLUSTER     (serve a cluster directory's applied revision;
                           exclusive — cannot combine with the others)
  - OMNIGRAPH_TARGET_URI
  - OMNIGRAPH_CONFIG

Optional:
  - OMNIGRAPH_BIND (default: 0.0.0.0:8080)
  - OMNIGRAPH_REQUIRE_ALL_GRAPHS (cluster mode: fail startup unless every
    applied graph is healthy)
  - OMNIGRAPH_TARGET (used with OMNIGRAPH_CONFIG)
  - OMNIGRAPH_CONFIG (may also accompany OMNIGRAPH_TARGET_URI to add a
    policy file; the URI still comes from OMNIGRAPH_TARGET_URI)
EOF
exit 64
Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00			`#!/bin/sh`
			`set -eu`

			`SERVER_BIN="/usr/local/bin/omnigraph-server"`

			`if [ "$#" -gt 0 ]; then`
			`exec "$SERVER_BIN" "$@"`
			`fi`

			`bind="${OMNIGRAPH_BIND:-0.0.0.0:8080}"`

feat(docker): cluster-mode entrypoint and the CLI in the image OMNIGRAPH_CLUSTER boots the container from a mounted cluster directory's applied revision — checked first and exclusive (exit 64 when combined with OMNIGRAPH_TARGET_URI/CONFIG/TARGET), the entrypoint-level mirror of the server's mode-inference rule 0. The omnigraph CLI joins the image so the day-2 loop (cluster apply/approve/status, data loads by explicit URI) runs in-container via docker/ECS exec or railway shell — no omnigraph.yaml required, which the cluster-local-config PR pins. entrypoint_test gains the cluster case plus all three exclusivity refusals. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> 2026-06-10 22:35:58 +03:00			`# Cluster mode first, and exclusive (the server's mode-inference rule 0):`
			`# a deployment serves from cluster state XOR omnigraph.yaml, never a merge.`
			`# Fail fast here with the same contract the server enforces.`
			`if [ -n "${OMNIGRAPH_CLUSTER:-}" ]; then`
			`if [ -n "${OMNIGRAPH_TARGET_URI:-}" ] \|\| [ -n "${OMNIGRAPH_CONFIG:-}" ] \|\| [ -n "${OMNIGRAPH_TARGET:-}" ]; then`
			`echo "OMNIGRAPH_CLUSTER is an exclusive boot source; unset OMNIGRAPH_TARGET_URI/OMNIGRAPH_CONFIG/OMNIGRAPH_TARGET" >&2`
			`exit 64`
			`fi`
fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap (#284) * fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap A `cluster apply` carrying a schema change against a graph that has non-main branches, or an unsupported "needs backfill" migration, armed a recovery sidecar before calling the engine, then left it behind when the engine rejected the apply pre-movement. The server refuses to boot while any sidecar is pending, and re-running apply re-armed a fresh sidecar — an unescapable crash loop. None of the engine rejections are bugs; the trap is in the apply/serve choreography. Three coordinated changes: 1. Preview before arming the sidecar. `cluster apply` now runs `preview_schema_apply_with_options` before `write_recovery_sidecar`, so parser/planner rejections (non-main branches, unsupported plan) fail loudly without leaving recovery work behind. The post-preview engine error path now deletes the sidecar when the live schema still matches the recorded digest (nothing moved), and keeps it only on real mid-movement failure — both branches covered by new engine-failpoint tests (cluster failpoints now enable omnigraph/failpoints). 2. Per-graph quarantine at serve time instead of whole-cluster refusal. A graph-attributed pending sidecar, an unopenable graph root, a query parse failure, or an unresolvable embedding provider now quarantines just that graph (logged loudly at every boot layer) while healthy graphs serve; `/graphs` lists only ready graphs and quarantined routes 404. Cluster-global problems (missing/unreadable state, malformed or unattributable sidecars, shared-catalog or cluster-policy errors, zero healthy graphs) stay fail-fast. `--require-all-graphs` / OMNIGRAPH_REQUIRE_ALL_GRAPHS=1 restores all-or-nothing boot. 3. Backfill embedding-provider profile metadata on apply. Mirrors the existing policy-binding backfill: a pre-5A ledger missing `embedding_profile` is now detected as a metadata-only change and backfilled by a no-op apply, instead of bricking serve with `embedding_provider_profile_missing` forever. Tests: trap (no sidecar after a rejected apply), both digest-cleanup branches, per-graph quarantine (cluster + server), embedding backfill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: resilient cluster boot + recovery-sidecar trap fix Amend RFC-005 D4 readiness posture (cluster-global fail-fast vs graph-local quarantine; deviation #5 for --require-all-graphs), add the v0.7.0 release note, and update the user cluster/server/deployment docs and the OMNIGRAPH_REQUIRE_ALL_GRAPHS env var. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cluster): surface sidecar-cleanup failures; document severity promotion Address Greptile review on PR #284: - The pre-movement sidecar cleanup fast-path discarded `delete_object`'s result, so a transient delete failure left the graph quarantined with no signal. Add `try_delete_object` (Result-returning) and emit a `recovery_sidecar_cleanup_failed` warning diagnostic on failure; the fire-and-forget `delete_object` now delegates to it. - Document why the serve-time loop promotes every `list_recovery_sidecars` diagnostic to a cluster-fatal error (the listing only emits genuine read/parse/version failures, as warnings, whose blast radius serving cannot prove) and note the promote-by-code path if that ever changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-19 03:34:15 +03:00			`set -- --cluster "${OMNIGRAPH_CLUSTER}" --bind "${bind}"`
			`case "${OMNIGRAPH_REQUIRE_ALL_GRAPHS:-}" in`
			`""\|0\|false\|FALSE) ;;`
			`*) set -- "$@" --require-all-graphs ;;`
			`esac`
			`exec "$SERVER_BIN" "$@"`
feat(docker): cluster-mode entrypoint and the CLI in the image OMNIGRAPH_CLUSTER boots the container from a mounted cluster directory's applied revision — checked first and exclusive (exit 64 when combined with OMNIGRAPH_TARGET_URI/CONFIG/TARGET), the entrypoint-level mirror of the server's mode-inference rule 0. The omnigraph CLI joins the image so the day-2 loop (cluster apply/approve/status, data loads by explicit URI) runs in-container via docker/ECS exec or railway shell — no omnigraph.yaml required, which the cluster-local-config PR pins. entrypoint_test gains the cluster case plus all three exclusivity refusals. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> 2026-06-10 22:35:58 +03:00			`fi`

feat(server): compose OMNIGRAPH_TARGET_URI with OMNIGRAPH_CONFIG in entrypoint (#129) The container entrypoint's URI and config branches were mutually exclusive, so a deployment driven by OMNIGRAPH_TARGET_URI could never load a policy file. Forward --config alongside the positional URI when OMNIGRAPH_CONFIG is also set (the URI still wins via resolve_target_uri), enabling Cedar policy without changing how the URI is provided. Add docker/entrypoint_test.sh (arg-composition cases) + a CI job, and document the env-var contract in docs/user/deployment.md. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 20:17:55 +01:00			`# URI comes from the env var (the positional arg wins over any config`
			# `graphs` block in resolve_target_uri). OMNIGRAPH_CONFIG, when also set,
			`# is forwarded as --config purely to supply a policy file — the two`
			`# compose. Without OMNIGRAPH_CONFIG the behavior is unchanged.`
Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00			`if [ -n "${OMNIGRAPH_TARGET_URI:-}" ]; then`
feat(server): compose OMNIGRAPH_TARGET_URI with OMNIGRAPH_CONFIG in entrypoint (#129) The container entrypoint's URI and config branches were mutually exclusive, so a deployment driven by OMNIGRAPH_TARGET_URI could never load a policy file. Forward --config alongside the positional URI when OMNIGRAPH_CONFIG is also set (the URI still wins via resolve_target_uri), enabling Cedar policy without changing how the URI is provided. Add docker/entrypoint_test.sh (arg-composition cases) + a CI job, and document the env-var contract in docs/user/deployment.md. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 20:17:55 +01:00			`exec "$SERVER_BIN" "${OMNIGRAPH_TARGET_URI}" \`
			`${OMNIGRAPH_CONFIG:+--config "$OMNIGRAPH_CONFIG"} \`
			`--bind "${bind}"`
Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00			`fi`

			`if [ -n "${OMNIGRAPH_CONFIG:-}" ]; then`
			`if [ -n "${OMNIGRAPH_TARGET:-}" ]; then`
			`exec "$SERVER_BIN" --config "${OMNIGRAPH_CONFIG}" --target "${OMNIGRAPH_TARGET}" --bind "${bind}"`
			`fi`
			`exec "$SERVER_BIN" --config "${OMNIGRAPH_CONFIG}" --bind "${bind}"`
			`fi`

			`cat >&2 <<'EOF'`
			`omnigraph-server container startup requires one of:`
feat(docker): cluster-mode entrypoint and the CLI in the image OMNIGRAPH_CLUSTER boots the container from a mounted cluster directory's applied revision — checked first and exclusive (exit 64 when combined with OMNIGRAPH_TARGET_URI/CONFIG/TARGET), the entrypoint-level mirror of the server's mode-inference rule 0. The omnigraph CLI joins the image so the day-2 loop (cluster apply/approve/status, data loads by explicit URI) runs in-container via docker/ECS exec or railway shell — no omnigraph.yaml required, which the cluster-local-config PR pins. entrypoint_test gains the cluster case plus all three exclusivity refusals. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> 2026-06-10 22:35:58 +03:00			`- OMNIGRAPH_CLUSTER (serve a cluster directory's applied revision;`
			`exclusive — cannot combine with the others)`
Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00			`- OMNIGRAPH_TARGET_URI`
			`- OMNIGRAPH_CONFIG`

			`Optional:`
			`- OMNIGRAPH_BIND (default: 0.0.0.0:8080)`
fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap (#284) * fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap A `cluster apply` carrying a schema change against a graph that has non-main branches, or an unsupported "needs backfill" migration, armed a recovery sidecar before calling the engine, then left it behind when the engine rejected the apply pre-movement. The server refuses to boot while any sidecar is pending, and re-running apply re-armed a fresh sidecar — an unescapable crash loop. None of the engine rejections are bugs; the trap is in the apply/serve choreography. Three coordinated changes: 1. Preview before arming the sidecar. `cluster apply` now runs `preview_schema_apply_with_options` before `write_recovery_sidecar`, so parser/planner rejections (non-main branches, unsupported plan) fail loudly without leaving recovery work behind. The post-preview engine error path now deletes the sidecar when the live schema still matches the recorded digest (nothing moved), and keeps it only on real mid-movement failure — both branches covered by new engine-failpoint tests (cluster failpoints now enable omnigraph/failpoints). 2. Per-graph quarantine at serve time instead of whole-cluster refusal. A graph-attributed pending sidecar, an unopenable graph root, a query parse failure, or an unresolvable embedding provider now quarantines just that graph (logged loudly at every boot layer) while healthy graphs serve; `/graphs` lists only ready graphs and quarantined routes 404. Cluster-global problems (missing/unreadable state, malformed or unattributable sidecars, shared-catalog or cluster-policy errors, zero healthy graphs) stay fail-fast. `--require-all-graphs` / OMNIGRAPH_REQUIRE_ALL_GRAPHS=1 restores all-or-nothing boot. 3. Backfill embedding-provider profile metadata on apply. Mirrors the existing policy-binding backfill: a pre-5A ledger missing `embedding_profile` is now detected as a metadata-only change and backfilled by a no-op apply, instead of bricking serve with `embedding_provider_profile_missing` forever. Tests: trap (no sidecar after a rejected apply), both digest-cleanup branches, per-graph quarantine (cluster + server), embedding backfill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: resilient cluster boot + recovery-sidecar trap fix Amend RFC-005 D4 readiness posture (cluster-global fail-fast vs graph-local quarantine; deviation #5 for --require-all-graphs), add the v0.7.0 release note, and update the user cluster/server/deployment docs and the OMNIGRAPH_REQUIRE_ALL_GRAPHS env var. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cluster): surface sidecar-cleanup failures; document severity promotion Address Greptile review on PR #284: - The pre-movement sidecar cleanup fast-path discarded `delete_object`'s result, so a transient delete failure left the graph quarantined with no signal. Add `try_delete_object` (Result-returning) and emit a `recovery_sidecar_cleanup_failed` warning diagnostic on failure; the fire-and-forget `delete_object` now delegates to it. - Document why the serve-time loop promotes every `list_recovery_sidecars` diagnostic to a cluster-fatal error (the listing only emits genuine read/parse/version failures, as warnings, whose blast radius serving cannot prove) and note the promote-by-code path if that ever changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-19 03:34:15 +03:00			`- OMNIGRAPH_REQUIRE_ALL_GRAPHS (cluster mode: fail startup unless every`
			`applied graph is healthy)`
Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00			`- OMNIGRAPH_TARGET (used with OMNIGRAPH_CONFIG)`
feat(server): compose OMNIGRAPH_TARGET_URI with OMNIGRAPH_CONFIG in entrypoint (#129) The container entrypoint's URI and config branches were mutually exclusive, so a deployment driven by OMNIGRAPH_TARGET_URI could never load a policy file. Forward --config alongside the positional URI when OMNIGRAPH_CONFIG is also set (the URI still wins via resolve_target_uri), enabling Cedar policy without changing how the URI is provided. Add docker/entrypoint_test.sh (arg-composition cases) + a CI job, and document the env-var contract in docs/user/deployment.md. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-05-30 20:17:55 +01:00			`- OMNIGRAPH_CONFIG (may also accompany OMNIGRAPH_TARGET_URI to add a`
			`policy file; the URI still comes from OMNIGRAPH_TARGET_URI)`
Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00			`EOF`
			`exit 64`