Frontmatter Date Formatting Fix
Frontmatter Date Formatting Fix
Issue Description
Content files across the repository had inconsistent date formatting in frontmatter:
- Some date fields contained timestamps (e.g.,
2025-04-07T22:42:08.649Z
) - Some date fields were quoted (e.g.,
'2025-04-07'
) - Some date fields had both issues
This inconsistency caused problems with the filesystem observer and content rendering.
Solution Implemented
Created a one-off script in
tidyverse/observers/scripts/fix-date-timestamps.ts
that:- Uses the Single Source of Truth
formatDate
utility fromtidyverse/observers/utils/commonUtils.ts
- Scans markdown files in specified directories
- Detects date fields with timestamps or quotes
- Converts all dates to the standard YYYY-MM-DD format without quotes
- Preserves all other frontmatter and content
Key Technical Components
- Date Detection:
- Regex patterns to identify timestamps and quoted values:
typescript// Check if the date has a timestamp if (value.includes('T') || /\d{4}-\d{2}-\d{2} \d{2}:\d{2}/.test(value)) { needsFixing = true; reason = 'timestamp'; } // Direct check for the raw YAML content to find quoted dates const datePattern = new RegExp(`${key}:\\s*['"]([^'"]+)['"]`); const match = frontmatterContent.match(datePattern); if (match) { needsFixing = true; reason = 'quotes (found in raw YAML)'; }
- Direct examination of raw YAML to catch quoted dates that js-yaml automatically unquotes
- Formatting Logic:typescript
// Format the date properly and remove quotes let formattedDate = formatDate(value); // Remove quotes if they exist if (typeof formattedDate === 'string') { formattedDate = formattedDate.replace(/^['"]|['"]$/g, ''); }
- Single Source of Truth Date Formatting:typescript
// From commonUtils.ts function formatDate(dateValue: any): string | null { // If it's already in YYYY-MM-DD format, return it if (typeof dateValue === 'string' && /^\d{4}-\d{2}-\d{2}$/.test(dateValue)) { return dateValue; } // Handle ISO string format with time component if (typeof dateValue === 'string' && dateValue.includes('T')) { // Just extract the date part return dateValue.split('T')[0]; } // Format as YYYY-MM-DD const date = new Date(dateValue); const year = date.getFullYear(); const month = String(date.getMonth() + 1).padStart(2, '0'); const day = String(date.getDate()).padStart(2, '0'); return `${year}-${month}-${day}`; }
- YAML Generation:
- Manual YAML construction to avoid js-yaml's automatic formatting
- Special handling for date fields to prevent quotes and timestamps:
typescript// Handle date fields specially to avoid quotes and timestamps if (key.startsWith('date_') && value) { // Format the date properly - ensure no quotes const formattedDate = formatDate(value); yamlContent += `${key}: ${formattedDate}\n`; }
Root Cause Analysis
The investigation revealed several critical issues:
- YAML Library Usage: The observer was using the
js-yaml
library which was automatically:- Converting dates to timestamps
- Adding quotes around strings with special characters
- Using block scalar syntax for multi-line strings
- Infinite Loop: The observer would detect changes, fix them, but the fix would trigger another change detection, causing an endless cycle.
- Inconsistent Formatting: Different files were using different date formats, causing inconsistency across the codebase.
Solution Details
The solution involved two major components:
- One-off Fix Script:
- Created
fix-date-timestamps.ts
to standardize existing files - Successfully processed 329 files in the vocabulary directory and 50 files in the prompts directory
- Fixed all quoted dates and timestamps
- Observer Code Refactoring:
- Removed all YAML libraries from the codebase
- Replaced with regex-based frontmatter parsing
- Implemented a custom
formatFrontmatter
function that:- Never adds quotes to title, lede, category, status, and augmented_with fields
- Properly formats dates using the formatDate utility
- Maintains consistent YAML formatting for arrays
Execution Results
The script successfully processed:
- 330 files in the vocabulary directory (329 fixed)
- 50 files in the prompts directory
- Fixed all quoted dates and timestamps
Lessons Learned
- Avoid YAML Libraries: YAML libraries like
js-yaml
have their own agenda and can cause unexpected formatting issues. Use regex-based parsing instead. - Single Source of Truth: Using the existing
formatDate
utility ensured consistent date formatting across the codebase. - Raw Content Examination: Sometimes examining the raw file content is necessary to detect formatting issues that get normalized during parsing.
- Explicit Field Handling: Some fields (like title, lede) should never have quotes, regardless of their content. This needs to be explicitly coded.
Future Considerations
- The filesystem observer has been updated to prevent these issues from occurring in new files.
- Consider adding a validation step in the CI pipeline to catch inconsistent date formatting.
- The script can be extended to process other content directories as needed.
Script Location
The fix script is located at:
text
tidyverse/observers/scripts/fix-date-timestamps.ts
Run with:
bash
cd tidyverse/observers/scripts && npx ts-node fix-date-timestamps.ts <directory-path>