Refactored FileSystemObserver to Remove YAML Libraries

Summary

Removed all YAML libraries from the FileSystemObserver system and replaced with regex-based frontmatter parsing to fix date formatting inconsistencies and prevent infinite update loops.

Why Care

The observer system was causing inconsistent date formatting across content files, adding timestamps and quotes that violated our style guidelines. This created an infinite loop of changes and made content files difficult to read and compare. The fix ensures consistent date formatting across all content files and prevents future formatting issues.

Implementation

Changes Made

  • Removed js-yaml library imports from all observer files:
    • /tidyverse/observers/fileSystemObserver.ts
    • /tidyverse/observers/services/templateRegistry.ts
    • /tidyverse/observers/services/openGraphService.ts
    • /tidyverse/observers/services/citationService.ts
  • Replaced YAML parsing with regex-based frontmatter extraction in:
    • /tidyverse/observers/fileSystemObserver.ts - Updated extractFrontmatter function
    • /tidyverse/observers/services/openGraphService.ts - Added extractFrontmatterForOpenGraph function
    • /tidyverse/observers/services/citationService.ts - Updated extractFrontmatterAndBody function
  • Enhanced formatFrontmatter function in /tidyverse/observers/fileSystemObserver.ts:
    • Added export to make it available to other modules
    • Added neverQuoteFields list to prevent quotes around title, lede, etc.
    • Improved date field handling using the Single Source of Truth formatDate utility
  • Created and ran one-off script /tidyverse/observers/scripts/fix-date-timestamps.ts to fix existing files:
    • Processed 329 files in vocabulary directory
    • Processed 50 files in prompts directory
    • Removed timestamps and quotes from all date fields
  • Updated /tidyverse/observers/index.ts to use the correct content root path

Technical Details

  • Regex-based frontmatter parsing implementation:
typescript
// From fileSystemObserver.ts
async function extractFrontmatter(
  content: string, 
  filePath: string, 
  reportingService: ReportingService
): Promise<{frontmatter: Record<string, any> | null, hasConversions: boolean, convertedFrontmatter?: Record<string, any>}> {
  // Check if content has frontmatter
  if (!content.startsWith('---')) {
    return { frontmatter: null, hasConversions: false };
  }
  
  // Find the end of frontmatter
  const endIndex = content.indexOf('---', 3);
  if (endIndex === -1) {
    return { frontmatter: null, hasConversions: false };
  }
  
  // Extract frontmatter content
  const frontmatterContent = content.substring(3, endIndex).trim();
  
  // Parse frontmatter using regex, not YAML library
  const frontmatter: Record<string, any> = {};
  // ... parsing logic ...
}
  • Enhanced date formatting to prevent quotes and timestamps:
typescript
// From fileSystemObserver.ts
export function formatFrontmatter(frontmatter: Record<string, any>): string {
  // Fields that should never have quotes, even if they contain special characters
  const neverQuoteFields = ['title', 'lede', 'category', 'status', 'augmented_with'];
  
  // Handle date fields specially to avoid quotes and timestamps
  if (key.startsWith('date_') && value) {
    // Use the formatDate utility function from commonUtils
    const formattedDate = formatDate(value);
    yamlContent += `${key}: ${formattedDate}\n`;
  }
  // Never add quotes to certain fields, even if they contain special characters
  else if (neverQuoteFields.includes(key)) {
    yamlContent += `${key}: ${value}\n`;
  }
  // ... other formatting logic ...
}
  • Single Source of Truth date formatting:
typescript
// From commonUtils.ts
export function formatDate(dateValue: any): string | null {
  if (!dateValue || dateValue === 'null') {
    return null;
  }

  try {
    // If it's already in YYYY-MM-DD format, return it
    if (typeof dateValue === 'string' && /^\d{4}-\d{2}-\d{2}$/.test(dateValue)) {
      return dateValue;
    }

    // Handle ISO string format with time component
    if (typeof dateValue === 'string' && dateValue.includes('T')) {
      // Just extract the date part
      return dateValue.split('T')[0];
    }

    // Format as YYYY-MM-DD
    const date = new Date(dateValue);
    const year = date.getFullYear();
    const month = String(date.getMonth() + 1).padStart(2, '0');
    const day = String(date.getDate()).padStart(2, '0');
    return `${year}-${month}-${day}`;
  } catch (error) {
    console.error(`Error formatting date:`, error);
    return null;
  }
}

Integration Points

  • The updated observer system works with all content directories:
    • /content/vocabulary/
    • /content/tooling/
    • /content/lost-in-public/prompts/
  • The fix-date-timestamps.ts script can be run on any content directory to standardize date formatting:
bash
cd tidyverse/observers/scripts && npx ts-node fix-date-timestamps.ts <directory-path>

Documentation

  • Created issue resolution document:
    • /content/lost-in-public/issue-resolution/2025-04-11_Frontmatter--Date-formatting-fix.md
    • Detailed explanation of the issue, solution, and lessons learned
    • Instructions for running the fix script on other directories
  • Key lessons documented:
    1. Avoid YAML libraries for frontmatter parsing
    2. Use regex-based parsing for better control
    3. Implement Single Source of Truth for date formatting
    4. Handle special fields (title, lede) explicitly