feat(connector): add Amazon Athena connector via Glue Data Catalog (#309)

* feat(connector): add Amazon Athena connector via Glue Data Catalog

* fix(athena): address reviewer feedback

* fix(athena): wire scope discovery, fix normalizeDriver, tighten types and tests

* fix(athena): honor databases scope, wire sql-analysis dialect, harden config resolution

- introspect() limits to the configured `databases` scope instead of scanning
  every Glue database in the account (docs promised this; connector ignored it)
- add athena -> athena to sql-analysis SQLGLOT_DIALECTS so `ktx sql` and MCP
  read-only validation parse Athena SQL under the Trino grammar, not postgres
- stringConfigValue coerces a resolved-empty `env:` reference to undefined so
  optional fields fall back to their defaults (workgroup 'primary', catalog
  'AwsDataCatalog') instead of ''
- drop trailing whitespace in dialect.test.ts

* fix(athena): integrate with main's SQL/non-SQL dialect split and add dialect notes

Rebase onto main, which introduced the KtxDialect (core) vs KtxSqlDialect
(SQL-only) split for MongoDB:
- KtxAthenaDialect implements KtxSqlDialect; the connector resolves it via
  getSqlDialectForDriver so SQL-generation methods stay in scope
- add authored athena.md SQL notes for the sql_dialect_notes MCP tool, required
  now that athena resolves to the athena sqlglot dialect (dialect-notes coverage
  is derived from the warehouse-driver registry)

---------

Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
This commit is contained in:
Patel Dhrit 2026-07-02 06:00:26 -07:00 committed by GitHub
parent 6d01030745
commit fe7e6bd1fa
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
24 changed files with 2047 additions and 6 deletions

View file

@ -243,6 +243,7 @@ describe('setup databases step', () => {
{ value: 'mysql', label: 'MySQL' },
{ value: 'clickhouse', label: 'ClickHouse' },
{ value: 'sqlserver', label: 'SQL Server' },
{ value: 'athena', label: 'Amazon Athena' },
{ value: 'mongodb', label: 'MongoDB' },
{ value: 'sqlite', label: 'SQLite' },
{ value: 'duckdb', label: 'DuckDB' },
@ -618,6 +619,29 @@ describe('setup databases step', () => {
},
],
},
{
driver: 'athena',
textValues: ['', 'us-east-1', 's3://my-bucket/athena-results/', '', ''],
expectedTextPrompts: [
{
message: connectionNamePrompt('Amazon Athena'),
placeholder: 'athena-warehouse',
initialValue: 'athena-warehouse',
},
{
message: 'AWS region\nFor example us-east-1.',
},
{
message: 'S3 staging directory\nAthena writes query results here. For example s3://my-bucket/athena-results/.',
},
{
message: 'Athena workgroup (optional)\nPress Enter to use the default workgroup "primary".',
},
{
message: 'Glue Data Catalog name (optional)\nPress Enter to use the default "AwsDataCatalog".',
},
],
},
];
for (const testCase of cases) {
@ -1967,6 +1991,40 @@ describe('setup databases step', () => {
expect(project.config.connections['clickhouse-warehouse']).not.toHaveProperty('schemas');
});
it('maps Athena scripted database schema input to databases field', async () => {
await writeFile(
join(tempDir, 'ktx.yaml'),
[
'connections:',
' athena-warehouse:',
' driver: athena',
' region: us-east-1',
' s3_staging_dir: s3://my-bucket/athena-results/',
'',
].join('\n'),
'utf-8',
);
await runKtxSetupDatabasesStep(
{
projectDir: tempDir,
inputMode: 'disabled',
skipDatabases: false,
databaseConnectionIds: ['athena-warehouse'],
databaseSchemas: ['analytics', 'raw'],
},
makeIo().io,
{ testConnection: vi.fn(async () => 0), scanConnection: vi.fn(async () => 0) },
);
const project = await loadKtxProject({ projectDir: tempDir });
expect(project.config.connections['athena-warehouse']).toMatchObject({
driver: 'athena',
databases: ['analytics', 'raw'],
});
expect(project.config.connections['athena-warehouse']).not.toHaveProperty('schemas');
});
it('does not prompt for a bootstrap BigQuery dataset before scope discovery', async () => {
const prompts = makePromptAdapter({
multiselectValues: [['bigquery']],