mirror of
https://github.com/dograh-hq/dograh.git
synced 2026-07-04 10:52:17 +02:00
* feat(webhooks): durable retrying delivery for final webhooks
Final webhook nodes were fired inline with a single best-effort httpx POST
(run_integrations._execute_webhook_node). On a transient error the failure was
swallowed at three levels, so ARQ never retried and the final call report was
permanently lost -- leaving downstream receivers stuck (e.g. a CRM showing a
call as still "in conversation").
Replace the one-shot POST with a durable, idempotent delivery pipeline modelled
on the campaign retry pattern (persisted row + scheduled_for + bounded attempts):
- New webhook_deliveries table (WebhookDeliveryModel) is the source of truth.
Payload is rendered once and frozen so retries are deterministic; secrets are
not stored -- the credential is referenced by uuid and re-resolved at send time.
- run_integrations now persists a delivery row and enqueues deliver_webhook with
a deterministic ARQ job id instead of sending inline.
- deliver_webhook (new ARQ task) sends the request and:
* 2xx -> succeeded
* transient -> retry with capped exponential backoff (RequestError /
5xx / 408 / 425 / 429), up to max_attempts then dead_letter
* permanent 4xx -> dead_letter immediately (no pointless looping)
It is idempotent: a non-pending delivery is a no-op, so a duplicate enqueue or
sweeper re-injection can't double-send.
- sweep_webhook_deliveries cron (every 5 min) re-enqueues overdue pending
deliveries so nothing is lost to a worker restart / Redis flush.
- Stable X-Dograh-Delivery-Id / Workflow-Run-Id / Attempt headers let receivers
dedupe retried deliveries.
- enqueue_job now forwards ARQ job options (_job_id, _defer_by); failures log
repr(e) so empty-message errors like ConnectTimeout are diagnosable.
Config via DEFAULT_WEBHOOK_DELIVERY_CONFIG (env-overridable): max_attempts=5,
base_delay=30s, max_delay=600s, timeout=30s.
Tests cover payload rendering, persist+enqueue, success, transient retry,
retryable 5xx, permanent 4xx dead-letter, attempt exhaustion, and idempotency.
Migration verified to apply/rollback against Postgres; table/enum/indexes confirmed.
* fix(webhooks): atomic claim, safe success-recording, sweep paging, migration cleanup
Address review feedback on the webhook delivery pipeline:
- deliver_webhook now atomically claims a delivery (conditional UPDATE that
leases scheduled_for) before sending, so concurrent ARQ executions can't
double-send (the prior status=='pending' read was non-atomic).
- Recording success is moved out of the dead-letter try-block: if the receiver
accepted the payload (2xx) but the success DB-write fails, the row is left
pending for the sweeper to reconcile instead of being dead-lettered.
- The sweep keyset-paginates by id so a backlog over the page size is fully
drained, and logs the true re-enqueued total.
- Migration downgrade drops the enum via op.execute(DROP TYPE IF EXISTS ...)
instead of the deprecated op.get_bind().
* fix(webhooks): idempotent delivery creation and drop secret custom headers
Address the remaining review feedback:
- Add a (workflow_run_id, webhook_node_id) unique constraint and make
create_webhook_delivery a get-or-create returning (delivery, created). A
retried run_integrations now reuses the existing row instead of creating and
sending a duplicate final webhook; only a freshly-created row is enqueued.
- Stop persisting secret-looking custom headers (Authorization, X-API-Key,
Cookie, ...) in plaintext on the delivery row: they are dropped with a warning
pointing at the credential store (which is re-resolved securely at send time).
Non-secret custom headers are unaffected.
* fix(webhooks): harden idempotency key, secret-header match, sweep reclaim id
Address follow-up review feedback:
- webhook_node_id is now NOT NULL so a NULL can't slip past the
(workflow_run_id, webhook_node_id) unique constraint and create duplicates.
- Secret-header filtering matches normalized markers (auth/token/secret/cookie/
api-key/...) instead of an exact name list, catching variants like
X-Custom-Auth-Token while leaving benign headers (e.g. X-Idempotency-Key).
- The sweeper re-enqueues with a reclaim-specific job id (the lease timestamp)
so reconciling a delivered-but-unrecorded row isn't deduped against the
original attempt's already-completed ARQ job. The atomic claim still ensures
at most one send.
* fix(webhooks): scope delivery rows to workflow org
---------
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
|
||
|---|---|---|
| .. | ||
| dto_fixtures | ||
| integrations | ||
| support | ||
| telephony | ||
| __init__.py | ||
| conftest.py | ||
| test_active_calls.py | ||
| test_add_call_disposition_code.py | ||
| test_aggregation_fix.py | ||
| test_ai_model_configuration_v2.py | ||
| test_auth_depends.py | ||
| test_azure_realtime_wrapper.py | ||
| test_azure_speech_service_factory.py | ||
| test_camb_tts_integration.py | ||
| test_campaign_call_dispatcher.py | ||
| test_campaign_tasks.py | ||
| test_cartesia_stt_service_factory.py | ||
| test_cartesia_tts_service_factory.py | ||
| test_circuit_breaker.py | ||
| test_custom_tools.py | ||
| test_custom_tools_context_integration.py | ||
| test_deepgram_flux_service_factory.py | ||
| test_display_options_evaluator.py | ||
| test_dograh_embedding_service.py | ||
| test_dograh_managed_correlation.py | ||
| test_dograh_sdk.py | ||
| test_dograh_sdk_typed.py | ||
| test_dograh_stt_service_factory.py | ||
| test_dto.py | ||
| test_from_number_pool_isolation.py | ||
| test_gemini_json_schema_adapter.py | ||
| test_gemini_live_reconnect_tool_results.py | ||
| test_get_backend_endpoints.py | ||
| test_google_stt_service_factory.py | ||
| test_google_tts_service_factory.py | ||
| test_google_vertex_llm_service_factory.py | ||
| test_grok_realtime_wrapper.py | ||
| test_huggingface_stt_service_factory.py | ||
| test_inworld_tts_service_factory.py | ||
| test_is_private_ip_candidate.py | ||
| test_json_parser.py | ||
| test_knowledge_base_processing_embeddings.py | ||
| test_layout.py | ||
| test_masked_key_rejection.py | ||
| test_mcp_auth.py | ||
| test_mcp_create_workflow.py | ||
| test_mcp_custom_tool_manager.py | ||
| test_mcp_docs_search.py | ||
| test_mcp_get_workflow.py | ||
| test_mcp_instructions_drift.py | ||
| test_mcp_integration.py | ||
| test_mcp_save_workflow.py | ||
| test_mcp_tool_creation.py | ||
| test_mcp_tool_definition.py | ||
| test_mcp_tool_route.py | ||
| test_mcp_tool_session.py | ||
| test_message_sanitization.py | ||
| test_minimax_service_factory.py | ||
| test_mps_service_key_client.py | ||
| test_node_specs.py | ||
| test_onboarding_state.py | ||
| test_openai_realtime_initial_context.py | ||
| test_openai_tts_service_factory.py | ||
| test_organization_usage_billing.py | ||
| test_pipecat_engine_callbacks.py | ||
| test_pipecat_engine_context_update.py | ||
| test_pipecat_engine_end_call.py | ||
| test_pipecat_engine_node_switch_with_user_speech.py | ||
| test_pipecat_engine_tool_calls.py | ||
| test_pipecat_engine_transition_mute.py | ||
| test_pipecat_engine_variable_extraction.py | ||
| test_pipeline_cancellation.py | ||
| test_posthog_client.py | ||
| test_pre_call_fetch.py | ||
| test_public_agent_routes.py | ||
| test_public_embed_cors.py | ||
| test_public_signaling_origin.py | ||
| test_qa_analysis_non_dict_response.py | ||
| test_quota_service.py | ||
| test_realtime_feedback_events.py | ||
| test_realtime_feedback_observer.py | ||
| test_realtime_message_append.py | ||
| test_recording_router_processor.py | ||
| test_resolve_effective_config.py | ||
| test_run_integrations_webhook.py | ||
| test_run_pipeline_realtime_turn_config.py | ||
| test_run_usage_response.py | ||
| test_s3_signed_url.py | ||
| test_sarvam_service_factory.py | ||
| test_sdk_sync.py | ||
| test_smallest_service_factory.py | ||
| test_speaches_service_factory.py | ||
| test_telephony_factory.py | ||
| test_telephony_routes.py | ||
| test_template_renderer.py | ||
| test_text_and_audio_playback.py | ||
| test_text_chat_logs.py | ||
| test_text_chat_session_service.py | ||
| test_tool_schema.py | ||
| test_trigger_path_validation.py | ||
| test_ts_bridge.py | ||
| test_tts_endframe_with_audio_write_failure.py | ||
| test_ultravox_realtime_wrapper.py | ||
| test_unregistered_function_call.py | ||
| test_user_configuration_upsert.py | ||
| test_user_configured_service_url_security.py | ||
| test_user_email_case_insensitive.py | ||
| test_user_idle_handler.py | ||
| test_user_muting_during_bot_speech.py | ||
| test_voicemail_detector.py | ||
| test_workflow_create_route.py | ||
| test_workflow_graph_constraints.py | ||
| test_workflow_list_route.py | ||
| test_workflow_qa_masking.py | ||
| test_workflow_run_billing.py | ||
| test_workflow_text_chat.py | ||
| test_workflow_versioning.py | ||