JSON Parser Service

1. Executive Summary

The JSON Parser Service is a central utility service that provides comprehensive JSON processing capabilities for the Augment-It platform. It handles parsing, validation, formatting, schema validation, and transformation of JSON data from multiple sources including AI model responses, configuration files, API requests/responses, and user-generated content. The service ensures consistent JSON handling across all microfrontends while providing advanced features like error recovery, partial parsing, and intelligent type inference.

2. Background & Motivation

Problem Statement

JSON processing is scattered throughout the Augment-It platform with inconsistent error handling, validation, and formatting approaches, leading to fragile data processing and poor user experience when dealing with malformed JSON.

Current Limitations

Inconsistent Error Handling: Different components handle JSON parsing failures differently
No Graceful Degradation: Failed parsing often results in complete component failures
Limited Validation: Basic JSON.parse() calls without schema validation or content verification
Poor User Feedback: Generic error messages don't help users fix JSON syntax issues
Code Duplication: Similar JSON processing logic repeated across multiple components
AI Response Challenges: AI-generated JSON often contains formatting issues or embedded content

Why This Solution

Centralized Processing: Single source of truth for JSON handling logic
Intelligent Parsing: Handle common JSON formatting issues automatically
Rich Validation: Schema-based validation with detailed error reporting
AI Response Optimization: Specialized handling for AI-generated content
Developer Experience: Comprehensive tooling for JSON editing and validation

3. Goals & Non-Goals

Goals

Robust Parsing: Handle malformed JSON with intelligent error recovery
Schema Validation: Validate JSON against predefined schemas with detailed error reporting
AI Response Handling: Specialized processing for AI model outputs (GPT, Claude, Perplexity)
Pretty Formatting: Consistent JSON formatting and syntax highlighting support
Template Processing: Handle JSON templates with variable substitution
Performance: Efficient processing of large JSON objects and arrays
Developer Tools: Integration with code editors and validation UIs

Non-Goals

YAML/XML Support: Focus only on JSON format (other parsers handle different formats)
Database Integration: Pure parsing service without persistence logic
Real-time Collaboration: No collaborative editing features
Binary Data: JSON text processing only, no binary format support

4. Technical Design

High-Level Architecture

graph TD A[JSON Input] --> B[JSON Parser Service] B --> C[Syntax Analyzer] C --> D[Error Recovery Engine] D --> E[Schema Validator] E --> F[Type Inference Engine] F --> G[Formatter|Beautifier] G --> H[Template Processor] H --> I[Structured Output] J[Validation Schemas] --> E K[Formatting Rules] --> G L[Template Variables] --> H M[AI Response Handler] --> B N[Configuration Parser] --> B O[User Input Validator] --> B

Core Components

1. Advanced JSON Parser

Responsibility: Parse JSON with intelligent error recovery and detailed error reporting
Features:
- Standard JSON parsing with enhanced error messages
- Recovery from common formatting issues (trailing commas, unquoted keys, etc.)
- Line-by-line error reporting with context
- Partial parsing for large nested objects

2. AI Response Processor

Responsibility: Handle JSON embedded in AI model responses
Features:
- Extract JSON from markdown code blocks
- Clean up AI-generated formatting inconsistencies
- Handle mixed JSON/text responses
- Support for multiple AI model response formats

3. Schema Validation Engine

Responsibility: Validate JSON against predefined schemas
Features:
- JSON Schema Draft 7 compliance
- Custom validation rules
- Detailed validation error reporting
- Schema inference from sample data

4. Template Processing Engine

Responsibility: Process JSON templates with variable substitution
Features:
- Mustache-style template syntax ({{variable}})
- Nested object traversal
- Conditional logic support
- Safe evaluation with XSS protection

API Specifications

Primary Interfaces

typescript

interface JSONParserOptions {
  strict?: boolean; // Default: false - allows relaxed parsing
  recoveryMode?: boolean; // Default: true - attempt error recovery
  maxDepth?: number; // Default: 100 - prevent stack overflow
  allowComments?: boolean; // Default: true - strip JSON comments
  allowTrailingCommas?: boolean; // Default: true
  allowUnquotedKeys?: boolean; // Default: false
  schema?: JSONSchema; // Optional schema validation
  templateVariables?: Record<string, any>; // For template processing
  formatOptions?: FormatOptions;
}

interface ParseResult<T = any> {
  success: boolean;
  data?: T;
  formatted?: string; // Pretty-printed JSON
  errors: ParseError[];
  warnings: ParseWarning[];
  metadata: {
    originalLength: number;
    formattedLength: number;
    processingTime: number;
    depth: number;
    keyCount: number;
    recoveryAttempts: number;
  };
}

interface ParseError {
  line: number;
  column: number;
  position: number;
  message: string;
  code: ErrorCode;
  severity: 'error' | 'warning' | 'info';
  suggestion?: string;
  context?: string; // Surrounding text for context
}

interface ValidationResult {
  valid: boolean;
  errors: ValidationError[];
  warnings: ValidationWarning[];
  schema?: JSONSchema;
}

// Main parsing functions
function parseJSON<T = any>(input: string, options?: JSONParserOptions): Promise<ParseResult<T>>;
function validateJSON(input: string, schema: JSONSchema): Promise<ValidationResult>;
function formatJSON(input: string, options?: FormatOptions): Promise<string>;
function processTemplate(template: string, variables: Record<string, any>): Promise<string>;
function extractJSONFromAIResponse(response: string, modelType?: 'gpt' | 'claude' | 'perplexity'): Promise<ParseResult[]>;

Core Implementation

typescript

// Based on existing implementations from RequestEditor.tsx and response handlers
class JSONParser {
  private options: Required<JSONParserOptions>;
  
  constructor(options: JSONParserOptions = {}) {
    this.options = {
      strict: false,
      recoveryMode: true,
      maxDepth: 100,
      allowComments: true,
      allowTrailingCommas: true,
      allowUnquotedKeys: false,
      formatOptions: { indent: 2, sortKeys: false },
      ...options
    };
  }

  public async parse<T = any>(input: string): Promise<ParseResult<T>> {
    const startTime = Date.now();
    const errors: ParseError[] = [];
    const warnings: ParseWarning[] = [];
    let recoveryAttempts = 0;

    try {
      // First attempt: Standard JSON.parse
      const data = JSON.parse(input) as T;
      const formatted = this.formatData(data);
      
      return {
        success: true,
        data,
        formatted,
        errors,
        warnings,
        metadata: this.generateMetadata(input, formatted, Date.now() - startTime, recoveryAttempts)
      };
    } catch (initialError) {
      if (this.options.strict) {
        return this.createErrorResult(input, initialError as SyntaxError, startTime);
      }

      // Recovery Mode: Try to fix common issues
      const recoveryResult = await this.attemptRecovery(input);
      recoveryAttempts = recoveryResult.attempts;
      
      if (recoveryResult.success) {
        warnings.push({
          message: 'JSON was auto-corrected during parsing',
          code: 'AUTO_RECOVERY',
          severity: 'warning',
          suggestions: recoveryResult.changes
        });
        
        return {
          success: true,
          data: recoveryResult.data,
          formatted: this.formatData(recoveryResult.data),
          errors,
          warnings,
          metadata: this.generateMetadata(input, recoveryResult.correctedInput, Date.now() - startTime, recoveryAttempts)
        };
      }

      return this.createErrorResult(input, recoveryResult.error, startTime, recoveryAttempts);
    }
  }

  private async attemptRecovery(input: string): Promise<RecoveryResult> {
    const strategies = [
      this.removeTrailingCommas.bind(this),
      this.addMissingQuotes.bind(this),
      this.fixCommonTypos.bind(this),
      this.removeComments.bind(this),
      this.extractFromCodeBlock.bind(this)
    ];

    let lastError: Error;
    const changes: string[] = [];

    for (let i = 0; i < strategies.length; i++) {
      try {
        const corrected = strategies[i](input);
        if (corrected !== input) {
          changes.push(strategies[i].name);
        }
        
        const data = JSON.parse(corrected);
        return {
          success: true,
          data,
          correctedInput: corrected,
          attempts: i + 1,
          changes
        };
      } catch (error) {
        lastError = error as Error;
        input = this.applyStrategy(strategies[i], input);
      }
    }

    return {
      success: false,
      error: lastError!,
      attempts: strategies.length,
      changes
    };
  }

  private removeTrailingCommas(input: string): string {
    // Remove trailing commas before closing braces/brackets
    return input
      .replace(/,\s*}/g, '}')
      .replace(/,\s*]/g, ']');
  }

  private addMissingQuotes(input: string): string {
    // Quote unquoted object keys (basic implementation)
    return input.replace(/([{,])\s*([a-zA-Z_$][a-zA-Z0-9_$]*)\s*:/g, '$1"$2":');
  }

  private removeComments(input: string): string {
    if (!this.options.allowComments) return input;
    
    // Remove // comments and /* */ comments
    return input
      .replace(/\/\*[\s\S]*?\*\//g, '')
      .replace(/\/\/.*$/gm, '');
  }

  private extractFromCodeBlock(input: string): string {
    // Extract JSON from markdown code blocks (common in AI responses)
    const codeBlockMatch = input.match(/```(?:json)?([\s\S]*?)```/);
    return codeBlockMatch ? codeBlockMatch[1].trim() : input;
  }

  private formatData(data: any): string {
    return JSON.stringify(data, null, this.options.formatOptions?.indent || 2);
  }

  private createErrorResult(input: string, error: SyntaxError, startTime: number, recoveryAttempts = 0): ParseResult {
    const parseError = this.createDetailedError(error, input);
    
    return {
      success: false,
      errors: [parseError],
      warnings: [],
      metadata: this.generateMetadata(input, '', Date.now() - startTime, recoveryAttempts)
    };
  }

  private createDetailedError(error: SyntaxError, input: string): ParseError {
    // Extract line and column from error message
    const match = error.message.match(/at position (\d+)/);
    const position = match ? parseInt(match[1]) : 0;
    
    const { line, column } = this.getLineColumn(input, position);
    const context = this.getContext(input, position);
    
    return {
      line,
      column,
      position,
      message: this.enhanceErrorMessage(error.message),
      code: this.getErrorCode(error.message),
      severity: 'error',
      context,
      suggestion: this.generateSuggestion(error.message, context)
    };
  }

  // AI Response Processing
  public async extractJSONFromAIResponse(response: string, modelType: string = 'unknown'): Promise<ParseResult[]> {
    const results: ParseResult[] = [];
    
    // Strategy 1: Look for code blocks
    const codeBlockRegex = /```(?:json)?\s*([\s\S]*?)```/g;
    let match;
    
    while ((match = codeBlockRegex.exec(response)) !== null) {
      const jsonCandidate = match[1].trim();
      if (jsonCandidate) {
        const result = await this.parse(jsonCandidate);
        results.push(result);
      }
    }
    
    // Strategy 2: Look for standalone JSON objects
    if (results.length === 0) {
      const objectRegex = /{[\s\S]*}/g;
      while ((match = objectRegex.exec(response)) !== null) {
        const jsonCandidate = match[0];
        const result = await this.parse(jsonCandidate);
        if (result.success) {
          results.push(result);
        }
      }
    }
    
    // Strategy 3: Try parsing the entire response
    if (results.length === 0) {
      const fullResult = await this.parse(response);
      results.push(fullResult);
    }
    
    return results;
  }

  // Template Processing
  public async processTemplate(template: string, variables: Record<string, any>): Promise<string> {
    let processed = template;
    
    // Replace {{variable}} patterns
    Object.entries(variables).forEach(([key, value]) => {
      const pattern = new RegExp(`\\{\\{\\s*${key}\\s*\\}\\}`, 'g');
      const replacement = typeof value === 'string' ? value : JSON.stringify(value);
      processed = processed.replace(pattern, replacement);
    });
    
    // Validate the processed template is valid JSON
    const result = await this.parse(processed);
    if (!result.success) {
      throw new Error(`Template processing resulted in invalid JSON: ${result.errors[0]?.message}`);
    }
    
    return result.formatted || processed;
  }

  // Schema Validation
  public async validateAgainstSchema(data: any, schema: JSONSchema): Promise<ValidationResult> {
    // Implement JSON Schema validation
    // This would typically use a library like ajv
    const errors: ValidationError[] = [];
    const warnings: ValidationWarning[] = [];
    
    try {
      // Simplified validation logic - in practice would use ajv or similar
      const isValid = this.performSchemaValidation(data, schema, errors, warnings);
      
      return {
        valid: isValid,
        errors,
        warnings,
        schema
      };
    } catch (error) {
      errors.push({
        path: '',
        message: `Schema validation failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
        code: 'SCHEMA_ERROR',
        severity: 'error'
      });
      
      return {
        valid: false,
        errors,
        warnings,
        schema
      };
    }
  }
}

// Enhanced error codes
enum ErrorCode {
  SYNTAX_ERROR = 'SYNTAX_ERROR',
  UNEXPECTED_TOKEN = 'UNEXPECTED_TOKEN',
  UNEXPECTED_END = 'UNEXPECTED_END',
  INVALID_CHARACTER = 'INVALID_CHARACTER',
  MISSING_QUOTES = 'MISSING_QUOTES',
  TRAILING_COMMA = 'TRAILING_COMMA',
  SCHEMA_VIOLATION = 'SCHEMA_VIOLATION',
  TEMPLATE_ERROR = 'TEMPLATE_ERROR',
  RECOVERY_FAILED = 'RECOVERY_FAILED'
}

Integration Points

1. AI Response Handlers

GPT Response Processing: Extract and validate JSON from OpenAI API responses
Claude Response Processing: Handle Anthropic's response format with embedded JSON
Perplexity Processing: Parse structured responses with citations

2. Request Editor Integration

Template Validation: Ensure request templates are valid JSON with proper placeholder syntax
Real-time Validation: Provide immediate feedback during editing
Format Assistance: Auto-format and beautify JSON content

3. Configuration Management

Settings Validation: Validate application configuration JSON
API Configuration: Parse and validate API endpoint configurations
User Preferences: Handle user preference JSON structures

Error Handling

Expected Error Cases

Syntax Errors
- Missing commas, brackets, or quotes
- Trailing commas in strict mode
- Invalid escape sequences
- Unexpected characters
Semantic Errors
- Schema validation failures
- Missing required properties
- Type mismatches
- Circular references
AI Response Issues
- Embedded JSON in text responses
- Malformed AI-generated JSON
- Mixed content types
- Encoding issues

Error Recovery Strategies

Progressive Enhancement: Try multiple parsing strategies in order of likelihood
Contextual Suggestions: Provide specific suggestions based on error type and context
Partial Success: Extract valid parts of malformed JSON when possible
User-Friendly Messages: Convert technical errors into actionable feedback

Security Considerations

Input Sanitization
- Prevent JSON injection attacks
- Limit recursion depth to prevent stack overflow
- Validate input size to prevent DoS
- Escape user-generated content in templates
Template Security
- Safe variable substitution without code execution
- XSS prevention in web contexts
- Input validation for template variables

5. Implementation Plan

Phase 1: Core JSON Processing (Week 1-2)

Basic Parser with Error Recovery
- Standard JSON parsing with enhanced error messages
- Common error recovery strategies
- Line-by-line error reporting
AI Response Integration
- Extract JSON from markdown code blocks
- Handle mixed JSON/text responses
- Integration with existing response handlers

Phase 2: Advanced Features (Week 3-4)

Schema Validation Engine
- JSON Schema Draft 7 support
- Custom validation rules
- Detailed error reporting with suggestions
Template Processing
- Variable substitution with {{}} syntax
- Safe evaluation engine
- Integration with RequestEditor

Phase 3: Developer Tools & Polish (Week 5)

Editor Integration
- CodeMirror linting integration
- Real-time validation feedback
- Syntax highlighting enhancements
Performance Optimization
- Large JSON handling
- Streaming parser for huge datasets
- Memory usage optimization

Dependencies

Internal: Shared error handling service, editor integration APIs
External: CodeMirror for editor features, potential JSON Schema library (ajv)
Development: TypeScript 5+, Jest for testing, performance benchmarking tools

Testing Strategy

Unit Tests
- All parsing scenarios (valid/invalid JSON)
- Error recovery mechanisms
- Template processing edge cases
- Schema validation accuracy
Integration Tests
- AI response processing end-to-end
- Editor component integration
- Performance with large JSON files
- Real-world malformed JSON scenarios
Performance Tests
- Parsing speed benchmarks
- Memory usage profiling
- Error recovery performance impact

6. Alternatives Considered

Third-Party JSON Libraries

JSON5: Extended JSON format with comments and trailing commas
- Pros: Built-in support for relaxed JSON parsing
- Cons: Different standard, limited ecosystem
- Decision: Incorporate features but maintain JSON compatibility

Server-Side Processing

Backend JSON Processing: Move complex parsing to server
- Pros: More processing power, centralized logic
- Cons: Network latency, reduced offline capability
- Decision: Keep client-side for responsiveness, server for heavy processing

Streaming JSON Parsers

SAX-style JSON Parsing: Process JSON without full memory loading
- Pros: Handle very large JSON files
- Cons: Complex implementation, limited use cases
- Decision: Phase 3 enhancement for specific large data scenarios

7. Open Questions

Schema Evolution: How should we handle schema versioning and migration?
Large File Handling: What's the practical limit for client-side JSON processing?
AI Model Integration: Should we have model-specific parsing strategies?
Caching Strategy: Should we cache parsed results for frequently accessed JSON?
Internationalization: How should we handle JSON with international characters and encoding issues?
Real-time Collaboration: Future consideration for collaborative JSON editing?

8. Appendix

Glossary

JSON Schema: A vocabulary that allows you to annotate and validate JSON documents
Error Recovery: Techniques to parse malformed input by making intelligent corrections
Template Substitution: Replacing placeholder variables in JSON templates with actual values
Linting: Real-time validation and error checking during editing

References

Revision History

v0.1.0 (2025-08-12): Initial comprehensive specification based on existing implementations
v0.0.0.1 (2025-08-09): Initial file creation