feat: enhance Google Drive file metadata handling

- Updated Google Drive API calls to include md5Checksum in file metadata retrieval for improved content tracking.
- Added logic to check for rename-only updates based on md5Checksum, optimizing document processing by preventing unnecessary ETL operations for unchanged content.
- Enhanced existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
This commit is contained in:
Anish Sarkar 2026-01-17 16:24:53 +05:30
parent 49efc50767
commit f538d59ca3
5 changed files with 138 additions and 4 deletions

View file

@ -47,7 +47,7 @@ class GoogleDriveClient:
async def list_files(
self,
query: str = "",
fields: str = "nextPageToken, files(id, name, mimeType, modifiedTime, size, webViewLink, parents, owners, createdTime, description)",
fields: str = "nextPageToken, files(id, name, mimeType, modifiedTime, md5Checksum, size, webViewLink, parents, owners, createdTime, description)",
page_size: int = 100,
page_token: str | None = None,
) -> tuple[list[dict[str, Any]], str | None, str | None]: