# Structured Data Diagnostic Service Technical Specification
## Overview
This specification describes a new invokable service for diagnosing and analyzing structured data within TrustGraph. The service extracts functionality from the existing `tg-load-structured-data` command-line tool and exposes it as a request/response service, enabling programmatic access to data type detection and descriptor generation capabilities.
The service supports three primary operations:
1.**Data Type Detection**: Analyze a data sample to determine its format (CSV, JSON, or XML)
2.**Descriptor Generation**: Generate a TrustGraph structured data descriptor for a given data sample and type
3.**Combined Diagnosis**: Perform both type detection and descriptor generation in sequence
## Goals
- **Modularize Data Analysis**: Extract data diagnosis logic from CLI into reusable service components
- **Enable Programmatic Access**: Provide API-based access to data analysis capabilities
- **Support Multiple Data Formats**: Handle CSV, JSON, and XML data formats consistently
- **Generate Accurate Descriptors**: Produce structured data descriptors that accurately map source data to TrustGraph schemas
- **Maintain Backward Compatibility**: Ensure existing CLI functionality continues to work
- **Enable Service Composition**: Allow other services to leverage data diagnosis capabilities
- **Improve Testability**: Separate business logic from CLI interface for better testing
- **Support Streaming Analysis**: Enable analysis of data samples without loading entire files
## Background
Currently, the `tg-load-structured-data` command provides comprehensive functionality for analyzing structured data and generating descriptors. However, this functionality is tightly coupled to the CLI interface, limiting its reusability.
Current limitations include:
- Data diagnosis logic embedded in CLI code
- No programmatic access to type detection and descriptor generation
- Difficult to integrate diagnosis capabilities into other services
- Limited ability to compose data analysis workflows
This specification addresses these gaps by creating a dedicated service for structured data diagnosis. By exposing these capabilities as a service, TrustGraph can:
- Enable other services to analyze data programmatically
- Support more complex data processing pipelines
- Facilitate integration with external systems
- Improve maintainability through separation of concerns
## Technical Design
### Architecture
The structured data diagnostic service requires the following technical components:
1.**Diagnostic Service Processor**
- Handles incoming diagnosis requests
- Orchestrates type detection and descriptor generation
- Returns structured responses with diagnosis results