* feat(webhooks): durable retrying delivery for final webhooks
Final webhook nodes were fired inline with a single best-effort httpx POST
(run_integrations._execute_webhook_node). On a transient error the failure was
swallowed at three levels, so ARQ never retried and the final call report was
permanently lost -- leaving downstream receivers stuck (e.g. a CRM showing a
call as still "in conversation").
Replace the one-shot POST with a durable, idempotent delivery pipeline modelled
on the campaign retry pattern (persisted row + scheduled_for + bounded attempts):
- New webhook_deliveries table (WebhookDeliveryModel) is the source of truth.
Payload is rendered once and frozen so retries are deterministic; secrets are
not stored -- the credential is referenced by uuid and re-resolved at send time.
- run_integrations now persists a delivery row and enqueues deliver_webhook with
a deterministic ARQ job id instead of sending inline.
- deliver_webhook (new ARQ task) sends the request and:
* 2xx -> succeeded
* transient -> retry with capped exponential backoff (RequestError /
5xx / 408 / 425 / 429), up to max_attempts then dead_letter
* permanent 4xx -> dead_letter immediately (no pointless looping)
It is idempotent: a non-pending delivery is a no-op, so a duplicate enqueue or
sweeper re-injection can't double-send.
- sweep_webhook_deliveries cron (every 5 min) re-enqueues overdue pending
deliveries so nothing is lost to a worker restart / Redis flush.
- Stable X-Dograh-Delivery-Id / Workflow-Run-Id / Attempt headers let receivers
dedupe retried deliveries.
- enqueue_job now forwards ARQ job options (_job_id, _defer_by); failures log
repr(e) so empty-message errors like ConnectTimeout are diagnosable.
Config via DEFAULT_WEBHOOK_DELIVERY_CONFIG (env-overridable): max_attempts=5,
base_delay=30s, max_delay=600s, timeout=30s.
Tests cover payload rendering, persist+enqueue, success, transient retry,
retryable 5xx, permanent 4xx dead-letter, attempt exhaustion, and idempotency.
Migration verified to apply/rollback against Postgres; table/enum/indexes confirmed.
* fix(webhooks): atomic claim, safe success-recording, sweep paging, migration cleanup
Address review feedback on the webhook delivery pipeline:
- deliver_webhook now atomically claims a delivery (conditional UPDATE that
leases scheduled_for) before sending, so concurrent ARQ executions can't
double-send (the prior status=='pending' read was non-atomic).
- Recording success is moved out of the dead-letter try-block: if the receiver
accepted the payload (2xx) but the success DB-write fails, the row is left
pending for the sweeper to reconcile instead of being dead-lettered.
- The sweep keyset-paginates by id so a backlog over the page size is fully
drained, and logs the true re-enqueued total.
- Migration downgrade drops the enum via op.execute(DROP TYPE IF EXISTS ...)
instead of the deprecated op.get_bind().
* fix(webhooks): idempotent delivery creation and drop secret custom headers
Address the remaining review feedback:
- Add a (workflow_run_id, webhook_node_id) unique constraint and make
create_webhook_delivery a get-or-create returning (delivery, created). A
retried run_integrations now reuses the existing row instead of creating and
sending a duplicate final webhook; only a freshly-created row is enqueued.
- Stop persisting secret-looking custom headers (Authorization, X-API-Key,
Cookie, ...) in plaintext on the delivery row: they are dropped with a warning
pointing at the credential store (which is re-resolved securely at send time).
Non-secret custom headers are unaffected.
* fix(webhooks): harden idempotency key, secret-header match, sweep reclaim id
Address follow-up review feedback:
- webhook_node_id is now NOT NULL so a NULL can't slip past the
(workflow_run_id, webhook_node_id) unique constraint and create duplicates.
- Secret-header filtering matches normalized markers (auth/token/secret/cookie/
api-key/...) instead of an exact name list, catching variants like
X-Custom-Auth-Token while leaving benign headers (e.g. X-Idempotency-Key).
- The sweeper re-enqueues with a reclaim-specific job id (the lease timestamp)
so reconciling a delivered-but-unrecorded row isn't deduped against the
original attempt's already-completed ARQ job. The atomic claim still ensures
at most one send.
* fix(webhooks): scope delivery rows to workflow org
---------
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
ChatwootWidget's visibility effect took the synchronous fast path whenever
`window.$chatwoot` was truthy. But the SDK assigns `$chatwoot` (with
`toggleBubbleVisibility`) synchronously in run(), while `.woot--bubble-holder`
is only appended later, after the widget iframe finishes loading. Calling
`toggleBubbleVisibility("show")` in that gap makes the SDK dereference
`null.classList` — an intermittent crash on navigating to /workflow that
surfaces via the React error boundary (follow-up to #483).
Gate the immediate-apply path on the bubble holder actually being in the DOM;
when it's absent, fall through to the existing `chatwoot:ready` listener, which
the SDK fires only after the holder is created.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: auto-assign per-worktree backend port via VS Code folderOpen task
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: remove .conductor dev setup (moved to native git worktrees)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: drain active calls before rolling updates
* Use provisional VAD interruption strategy
* feat: wire provisional VAD configuration
* chore: refactor user turn strategies
* chore: bump pipecat
* fix: reject misrouted smallwebrtc runs on the telephony websocket
A smallwebrtc (browser/WebRTC) workflow run is established through the WebRTC
signaling endpoint, not the PSTN telephony websocket. When such a run reached
_handle_telephony_websocket it read no "provider" from initial_context and
closed with an opaque "Provider type not found". Detect smallwebrtc runs and
close with a clear reason pointing to the signaling endpoint, without setting
the run to running or invoking a telephony provider. Also store the provider on
smallwebrtc runs at creation so they are self-describing, and make the generic
no-provider close reason include the run id and mode.
Closes#433
* fix: merge workflow run initial context defaults
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
* feat(twilio): add Answering Machine Detection (AMD) support via telephony config
Closes#339
* chore: regenerate OpenAPI spec to fix drift-check
The openapi.json snapshot had drifted from the FastAPI app definition
because main gained new organization endpoints (billing, credits,
context) after this branch was created. Regenerate it with
'python -m scripts.dump_docs_openapi' to bring it back in sync.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add provider-level AMD hooks
* fix: handle db error while persisting amd result
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Sabiha Khan <sabihak89@gmail.com>
Co-authored-by: Sabiha Khan <87858386+chewwbaka@users.noreply.github.com>
* feat(scripts): generate REDIS_PASSWORD on setup, plumb through compose
Per the discussion on #453, this takes the recommended path of extending
the setup scripts rather than introducing a parallel compose file.
- scripts/setup_remote.sh now generates REDIS_PASSWORD alongside
OSS_JWT_SECRET and POSTGRES_PASSWORD and writes it to the rendered
.env (with a short comment noting it can be rotated, unlike the
postgres password which is baked into the volume on first init).
- scripts/start_docker.sh now generates REDIS_PASSWORD on first run
if missing, mirroring the existing OSS_JWT_SECRET pattern (reuses
generate_secret, which falls back through python3 → openssl →
/dev/urandom).
- docker-compose.yaml and docker-compose-local.yaml now interpolate
${REDIS_PASSWORD:-redissecret} in the redis --requirepass, the redis
healthcheck, and the api REDIS_URL.
The :-redissecret fallback preserves backwards compatibility for users
with an existing .env that predates this change — they keep the old
value until they regenerate. New installs (via either script) get a
secure random hex.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Harden local Docker secret setup
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
* update tuner documentation
* docs: update Tuner integration walkthrough video
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: mohamed salem <259547077+mohamedsalem-bot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>