refactor: Update GitHub connector to use gitingest CLI

- Refactored GitHubConnector to utilize gitingest CLI via subprocess, improving performance and avoiding async issues with Celery.
- Updated ingestion method to handle repository digests more efficiently, including error handling for subprocess execution.
- Adjusted GitHub indexer to call the new synchronous ingestion method.
- Clarified documentation regarding the optional nature of the Personal Access Token for public repositories.
This commit is contained in:
Anish Sarkar 2026-01-20 23:24:33 +05:30
parent 49b8a46d10
commit 35888144eb
8 changed files with 221 additions and 256 deletions

View file

@ -173,8 +173,13 @@ async def index_github_repos(
logger.info(f"Ingesting repository: {repo_full_name}")
try:
# Ingest the entire repository
digest = await github_client.ingest_repository(repo_full_name)
# Run gitingest via subprocess (isolated from event loop)
# Using to_thread to not block the async database operations
import asyncio
digest = await asyncio.to_thread(
github_client.ingest_repository, repo_full_name
)
if not digest:
logger.warning(