dograh/api/db/db_client.py

74 lines
3.2 KiB
Python
Raw Normal View History

from api.db.agent_trigger_client import AgentTriggerClient
2025-09-09 14:37:32 +05:30
from api.db.api_key_client import APIKeyClient
from api.db.campaign_client import CampaignClient
from api.db.embed_token_client import EmbedTokenClient
from api.db.folder_client import FolderClient
2025-09-09 14:37:32 +05:30
from api.db.integration_client import IntegrationClient
from api.db.knowledge_base_client import KnowledgeBaseClient
2025-09-09 14:37:32 +05:30
from api.db.organization_client import OrganizationClient
from api.db.organization_configuration_client import OrganizationConfigurationClient
from api.db.organization_usage_client import OrganizationUsageClient
from api.db.reports_client import ReportsClient
from api.db.telephony_configuration_client import TelephonyConfigurationClient
from api.db.telephony_phone_number_client import TelephonyPhoneNumberClient
from api.db.tool_client import ToolClient
2025-09-09 14:37:32 +05:30
from api.db.user_client import UserClient
from api.db.webhook_credential_client import WebhookCredentialClient
feat(webhooks): durable retrying delivery for final webhooks (#478) * feat(webhooks): durable retrying delivery for final webhooks Final webhook nodes were fired inline with a single best-effort httpx POST (run_integrations._execute_webhook_node). On a transient error the failure was swallowed at three levels, so ARQ never retried and the final call report was permanently lost -- leaving downstream receivers stuck (e.g. a CRM showing a call as still "in conversation"). Replace the one-shot POST with a durable, idempotent delivery pipeline modelled on the campaign retry pattern (persisted row + scheduled_for + bounded attempts): - New webhook_deliveries table (WebhookDeliveryModel) is the source of truth. Payload is rendered once and frozen so retries are deterministic; secrets are not stored -- the credential is referenced by uuid and re-resolved at send time. - run_integrations now persists a delivery row and enqueues deliver_webhook with a deterministic ARQ job id instead of sending inline. - deliver_webhook (new ARQ task) sends the request and: * 2xx -> succeeded * transient -> retry with capped exponential backoff (RequestError / 5xx / 408 / 425 / 429), up to max_attempts then dead_letter * permanent 4xx -> dead_letter immediately (no pointless looping) It is idempotent: a non-pending delivery is a no-op, so a duplicate enqueue or sweeper re-injection can't double-send. - sweep_webhook_deliveries cron (every 5 min) re-enqueues overdue pending deliveries so nothing is lost to a worker restart / Redis flush. - Stable X-Dograh-Delivery-Id / Workflow-Run-Id / Attempt headers let receivers dedupe retried deliveries. - enqueue_job now forwards ARQ job options (_job_id, _defer_by); failures log repr(e) so empty-message errors like ConnectTimeout are diagnosable. Config via DEFAULT_WEBHOOK_DELIVERY_CONFIG (env-overridable): max_attempts=5, base_delay=30s, max_delay=600s, timeout=30s. Tests cover payload rendering, persist+enqueue, success, transient retry, retryable 5xx, permanent 4xx dead-letter, attempt exhaustion, and idempotency. Migration verified to apply/rollback against Postgres; table/enum/indexes confirmed. * fix(webhooks): atomic claim, safe success-recording, sweep paging, migration cleanup Address review feedback on the webhook delivery pipeline: - deliver_webhook now atomically claims a delivery (conditional UPDATE that leases scheduled_for) before sending, so concurrent ARQ executions can't double-send (the prior status=='pending' read was non-atomic). - Recording success is moved out of the dead-letter try-block: if the receiver accepted the payload (2xx) but the success DB-write fails, the row is left pending for the sweeper to reconcile instead of being dead-lettered. - The sweep keyset-paginates by id so a backlog over the page size is fully drained, and logs the true re-enqueued total. - Migration downgrade drops the enum via op.execute(DROP TYPE IF EXISTS ...) instead of the deprecated op.get_bind(). * fix(webhooks): idempotent delivery creation and drop secret custom headers Address the remaining review feedback: - Add a (workflow_run_id, webhook_node_id) unique constraint and make create_webhook_delivery a get-or-create returning (delivery, created). A retried run_integrations now reuses the existing row instead of creating and sending a duplicate final webhook; only a freshly-created row is enqueued. - Stop persisting secret-looking custom headers (Authorization, X-API-Key, Cookie, ...) in plaintext on the delivery row: they are dropped with a warning pointing at the credential store (which is re-resolved securely at send time). Non-secret custom headers are unaffected. * fix(webhooks): harden idempotency key, secret-header match, sweep reclaim id Address follow-up review feedback: - webhook_node_id is now NOT NULL so a NULL can't slip past the (workflow_run_id, webhook_node_id) unique constraint and create duplicates. - Secret-header filtering matches normalized markers (auth/token/secret/cookie/ api-key/...) instead of an exact name list, catching variants like X-Custom-Auth-Token while leaving benign headers (e.g. X-Idempotency-Key). - The sweeper re-enqueues with a reclaim-specific job id (the lease timestamp) so reconciling a delivered-but-unrecorded row isn't deduped against the original attempt's already-completed ARQ job. The atomic claim still ensures at most one send. * fix(webhooks): scope delivery rows to workflow org --------- Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-07-02 17:14:14 +01:00
from api.db.webhook_delivery_client import WebhookDeliveryClient
2025-09-09 14:37:32 +05:30
from api.db.workflow_client import WorkflowClient
from api.db.workflow_recording_client import WorkflowRecordingClient
2025-09-09 14:37:32 +05:30
from api.db.workflow_run_client import WorkflowRunClient
from api.db.workflow_run_text_session_client import WorkflowRunTextSessionClient
2025-09-09 14:37:32 +05:30
from api.db.workflow_template_client import WorkflowTemplateClient
class DBClient(
WorkflowClient,
WorkflowRunClient,
WorkflowRunTextSessionClient,
2025-09-09 14:37:32 +05:30
UserClient,
OrganizationClient,
OrganizationConfigurationClient,
OrganizationUsageClient,
IntegrationClient,
WorkflowTemplateClient,
CampaignClient,
ReportsClient,
APIKeyClient,
EmbedTokenClient,
AgentTriggerClient,
WebhookCredentialClient,
feat(webhooks): durable retrying delivery for final webhooks (#478) * feat(webhooks): durable retrying delivery for final webhooks Final webhook nodes were fired inline with a single best-effort httpx POST (run_integrations._execute_webhook_node). On a transient error the failure was swallowed at three levels, so ARQ never retried and the final call report was permanently lost -- leaving downstream receivers stuck (e.g. a CRM showing a call as still "in conversation"). Replace the one-shot POST with a durable, idempotent delivery pipeline modelled on the campaign retry pattern (persisted row + scheduled_for + bounded attempts): - New webhook_deliveries table (WebhookDeliveryModel) is the source of truth. Payload is rendered once and frozen so retries are deterministic; secrets are not stored -- the credential is referenced by uuid and re-resolved at send time. - run_integrations now persists a delivery row and enqueues deliver_webhook with a deterministic ARQ job id instead of sending inline. - deliver_webhook (new ARQ task) sends the request and: * 2xx -> succeeded * transient -> retry with capped exponential backoff (RequestError / 5xx / 408 / 425 / 429), up to max_attempts then dead_letter * permanent 4xx -> dead_letter immediately (no pointless looping) It is idempotent: a non-pending delivery is a no-op, so a duplicate enqueue or sweeper re-injection can't double-send. - sweep_webhook_deliveries cron (every 5 min) re-enqueues overdue pending deliveries so nothing is lost to a worker restart / Redis flush. - Stable X-Dograh-Delivery-Id / Workflow-Run-Id / Attempt headers let receivers dedupe retried deliveries. - enqueue_job now forwards ARQ job options (_job_id, _defer_by); failures log repr(e) so empty-message errors like ConnectTimeout are diagnosable. Config via DEFAULT_WEBHOOK_DELIVERY_CONFIG (env-overridable): max_attempts=5, base_delay=30s, max_delay=600s, timeout=30s. Tests cover payload rendering, persist+enqueue, success, transient retry, retryable 5xx, permanent 4xx dead-letter, attempt exhaustion, and idempotency. Migration verified to apply/rollback against Postgres; table/enum/indexes confirmed. * fix(webhooks): atomic claim, safe success-recording, sweep paging, migration cleanup Address review feedback on the webhook delivery pipeline: - deliver_webhook now atomically claims a delivery (conditional UPDATE that leases scheduled_for) before sending, so concurrent ARQ executions can't double-send (the prior status=='pending' read was non-atomic). - Recording success is moved out of the dead-letter try-block: if the receiver accepted the payload (2xx) but the success DB-write fails, the row is left pending for the sweeper to reconcile instead of being dead-lettered. - The sweep keyset-paginates by id so a backlog over the page size is fully drained, and logs the true re-enqueued total. - Migration downgrade drops the enum via op.execute(DROP TYPE IF EXISTS ...) instead of the deprecated op.get_bind(). * fix(webhooks): idempotent delivery creation and drop secret custom headers Address the remaining review feedback: - Add a (workflow_run_id, webhook_node_id) unique constraint and make create_webhook_delivery a get-or-create returning (delivery, created). A retried run_integrations now reuses the existing row instead of creating and sending a duplicate final webhook; only a freshly-created row is enqueued. - Stop persisting secret-looking custom headers (Authorization, X-API-Key, Cookie, ...) in plaintext on the delivery row: they are dropped with a warning pointing at the credential store (which is re-resolved securely at send time). Non-secret custom headers are unaffected. * fix(webhooks): harden idempotency key, secret-header match, sweep reclaim id Address follow-up review feedback: - webhook_node_id is now NOT NULL so a NULL can't slip past the (workflow_run_id, webhook_node_id) unique constraint and create duplicates. - Secret-header filtering matches normalized markers (auth/token/secret/cookie/ api-key/...) instead of an exact name list, catching variants like X-Custom-Auth-Token while leaving benign headers (e.g. X-Idempotency-Key). - The sweeper re-enqueues with a reclaim-specific job id (the lease timestamp) so reconciling a delivered-but-unrecorded row isn't deduped against the original attempt's already-completed ARQ job. The atomic claim still ensures at most one send. * fix(webhooks): scope delivery rows to workflow org --------- Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-07-02 17:14:14 +01:00
WebhookDeliveryClient,
ToolClient,
KnowledgeBaseClient,
WorkflowRecordingClient,
TelephonyConfigurationClient,
TelephonyPhoneNumberClient,
FolderClient,
2025-09-09 14:37:32 +05:30
):
"""
Unified database client that combines all specialized database operations.
This client inherits from:
- WorkflowClient: handles workflow and workflow definition operations
- WorkflowRunClient: handles workflow run operations
- UserClient: handles user and user configuration operations
- OrganizationClient: handles organization operations
- OrganizationConfigurationClient: handles organization configuration operations
- OrganizationUsageClient: handles organization usage reporting aggregates
2025-09-09 14:37:32 +05:30
- IntegrationClient: handles integration operations
- WorkflowTemplateClient: handles workflow template operations
- CampaignClient: handles campaign operations
- ReportsClient: handles reports and analytics operations
- APIKeyClient: handles API key operations
- EmbedTokenClient: handles embed token and session operations
- AgentTriggerClient: handles agent trigger operations for API-based call triggering
- WebhookCredentialClient: handles webhook credential operations
feat(webhooks): durable retrying delivery for final webhooks (#478) * feat(webhooks): durable retrying delivery for final webhooks Final webhook nodes were fired inline with a single best-effort httpx POST (run_integrations._execute_webhook_node). On a transient error the failure was swallowed at three levels, so ARQ never retried and the final call report was permanently lost -- leaving downstream receivers stuck (e.g. a CRM showing a call as still "in conversation"). Replace the one-shot POST with a durable, idempotent delivery pipeline modelled on the campaign retry pattern (persisted row + scheduled_for + bounded attempts): - New webhook_deliveries table (WebhookDeliveryModel) is the source of truth. Payload is rendered once and frozen so retries are deterministic; secrets are not stored -- the credential is referenced by uuid and re-resolved at send time. - run_integrations now persists a delivery row and enqueues deliver_webhook with a deterministic ARQ job id instead of sending inline. - deliver_webhook (new ARQ task) sends the request and: * 2xx -> succeeded * transient -> retry with capped exponential backoff (RequestError / 5xx / 408 / 425 / 429), up to max_attempts then dead_letter * permanent 4xx -> dead_letter immediately (no pointless looping) It is idempotent: a non-pending delivery is a no-op, so a duplicate enqueue or sweeper re-injection can't double-send. - sweep_webhook_deliveries cron (every 5 min) re-enqueues overdue pending deliveries so nothing is lost to a worker restart / Redis flush. - Stable X-Dograh-Delivery-Id / Workflow-Run-Id / Attempt headers let receivers dedupe retried deliveries. - enqueue_job now forwards ARQ job options (_job_id, _defer_by); failures log repr(e) so empty-message errors like ConnectTimeout are diagnosable. Config via DEFAULT_WEBHOOK_DELIVERY_CONFIG (env-overridable): max_attempts=5, base_delay=30s, max_delay=600s, timeout=30s. Tests cover payload rendering, persist+enqueue, success, transient retry, retryable 5xx, permanent 4xx dead-letter, attempt exhaustion, and idempotency. Migration verified to apply/rollback against Postgres; table/enum/indexes confirmed. * fix(webhooks): atomic claim, safe success-recording, sweep paging, migration cleanup Address review feedback on the webhook delivery pipeline: - deliver_webhook now atomically claims a delivery (conditional UPDATE that leases scheduled_for) before sending, so concurrent ARQ executions can't double-send (the prior status=='pending' read was non-atomic). - Recording success is moved out of the dead-letter try-block: if the receiver accepted the payload (2xx) but the success DB-write fails, the row is left pending for the sweeper to reconcile instead of being dead-lettered. - The sweep keyset-paginates by id so a backlog over the page size is fully drained, and logs the true re-enqueued total. - Migration downgrade drops the enum via op.execute(DROP TYPE IF EXISTS ...) instead of the deprecated op.get_bind(). * fix(webhooks): idempotent delivery creation and drop secret custom headers Address the remaining review feedback: - Add a (workflow_run_id, webhook_node_id) unique constraint and make create_webhook_delivery a get-or-create returning (delivery, created). A retried run_integrations now reuses the existing row instead of creating and sending a duplicate final webhook; only a freshly-created row is enqueued. - Stop persisting secret-looking custom headers (Authorization, X-API-Key, Cookie, ...) in plaintext on the delivery row: they are dropped with a warning pointing at the credential store (which is re-resolved securely at send time). Non-secret custom headers are unaffected. * fix(webhooks): harden idempotency key, secret-header match, sweep reclaim id Address follow-up review feedback: - webhook_node_id is now NOT NULL so a NULL can't slip past the (workflow_run_id, webhook_node_id) unique constraint and create duplicates. - Secret-header filtering matches normalized markers (auth/token/secret/cookie/ api-key/...) instead of an exact name list, catching variants like X-Custom-Auth-Token while leaving benign headers (e.g. X-Idempotency-Key). - The sweeper re-enqueues with a reclaim-specific job id (the lease timestamp) so reconciling a delivered-but-unrecorded row isn't deduped against the original attempt's already-completed ARQ job. The atomic claim still ensures at most one send. * fix(webhooks): scope delivery rows to workflow org --------- Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-07-02 17:14:14 +01:00
- WebhookDeliveryClient: handles durable outbound webhook delivery records
- ToolClient: handles tool operations for reusable HTTP API tools
- KnowledgeBaseClient: handles knowledge base document and vector search operations
- FolderClient: handles folder operations for grouping workflows (agents)
2025-09-09 14:37:32 +05:30
"""
pass