YAML Frontmatter Error Detection and Correction System
YAML Frontmatter Error Detection and Correction System
Executive Summary
Our content management system relies heavily on YAML frontmatter in markdown files to drive site functionality, metadata, and content organization. With over 700 markdown files and growing, maintaining consistency in YAML formatting has become increasingly challenging, especially with the integration of AI assistants and automated content generation tools.
Property | Desired Syntax | Irregular Syntax | Level of Problem |
title | title: Technical Specification: YAML Frontmatter Error Detection and Correction System | title: "Technical Specification: YAML Frontmatter Error Detection and Correction System" | Uncertain |
site_url | docs_url: https://developers.raycast.com/ |
We've developed a robust error detection and correction system that:
- Identifies 11 distinct types of YAML formatting issues across the content library
- Automatically corrects formatting problems while preserving content integrity
- Generates detailed reports for each type of correction
- Prevents critical errors from affecting downstream build processes
- Maintains proper quoting conventions for different property types
In our initial deployment, the system successfully processed 729 files, identifying and correcting multiple issues:
- 290 error message properties requiring proper quotes
- 555 files with improper character sets around error messages
- 265 URL properties with unnecessary quotes
- 300 files missing URL properties
- 273 UUID properties requiring quote removal
- 274 timestamp properties needing quote standardization
- 158 files with inconsistent tag syntax requiring normalization
This represents a significant improvement in content quality and build reliability with zero manual intervention required.
Technical Specification
1. System Architecture
graph TD
A[Markdown Files] --> B[Error Detection System]
B --> C{Error Types}
C --> D[Quote Issues]
C --> E[Character Sets]
C --> F[URL Formatting]
C --> G[Block Scalar]
C --> H[Duplicate Keys]
C --> I[Missing Properties]
C --> J[Tag Syntax]
D --> K[Correction Functions]
E --> K
F --> K
G --> K
H --> K
I --> K
J --> K
K --> L[Report Generation]
L --> M[Individual Reports]
L --> N[Summary Report]
M --> O[File System]
N --> O
2. Core Components
2.1 Configuration System (getUserOptions.cjs
)
javascript
const USER_OPTIONS = {
frontmatterPropertySets: {
urlProperties: ['url', 'image', 'favicon', 'og_screenshot_url', ...],
errorMessageProperties: ['jina_error', 'og_errors', 'og_error_message'],
plainTextProperties: ['description', 'og_description', 'zinger', ...],
timestampProperties: ['og_last_error', 'og_error_message', ...],
tagProperties: ['tags']
}
};
2.2 Error Cases Registry (getKnownErrorsAndFixes.cjs
)
Each error case defines:
- Detection regex pattern
- Example errors
- Proper syntax
- Prevention operations
- Correction function
- Criticality level
Example structure:
javascript
const knownErrorCases = {
unquotedErrorMessageProperty: {
detectError: /pattern/,
messageToLog: 'Description',
preventsOperations: ['operation1'],
correctionFunction: 'functionName',
isCritical: boolean
},
tagsMayHaveInconsistentSyntax: {
detectError: new RegExp(/(?:tags:\s*(?:\[.*?\]|.*?,.*?|['"].*?['"])|(?:^|\n)\s*-\s*\w+[^\S\n]+\w+)/),
exampleErrors: [
'tags: ["tag1", "tag2"]',
'tags: tag1, tag2',
'tags: "tag1", "tag2"',
'tags:\n- Tag With Spaces'
],
properSyntax: `tags:\n- tag-one\n- tag-two`,
messageToLog: 'Tags may have inconsistent syntax',
preventsOperations: ['assureYAMLPropertiesCorrect.cjs', "getCollection('tooling')"],
correctionFunction: 'assureOrFixTagSyntaxInFrontmatter',
isCritical: true
}
// ... additional cases
};
Troubleshooting quotes on urls
javascript
const knownErrorCases = {
quoteCharactersFoundOnEitherOrBothSidesOfUrl:
exampleErros: [
'og_screenshot_url: https://og-screenshots-prod.s3.amazonaws.com/1366x768/80/false/5916148b9afbd26e770c8ff3838ad81a0d97176ab6cba9887cb83e17bc3b7d80.jpeg""'
]
}
3. Error Types and Correction Strategies
3.1 Quote-Related Issues
- Unquoted Error Messagesyaml
# Before error_message: Error 404 not found # After error_message: 'Error 404 not found'
- Improper Character Setsyaml
# Before jina_error: """Error occurred""" # After jina_error: 'Error occurred'
- URL Quote Issuesyaml
# Before url: "https://example.com" # After url: https://example.com
3.2 Structural Issues
- Block Scalar Syntaxyaml
# Before description: >- Multiple lines # After description: Multiple lines
- Duplicate Keysyaml
# Before title: First description: Text title: Second # After title: Second description: Text
3.7 Tag Syntax Issues
- Array Syntax with Quotesyaml
# Before tags: ["Technology-Consultants", "Organizations"] # After tags: - Technology-Consultants - Organizations
- Comma-Separated Tagsyaml
# Before tags: Technology-Consultants, Organizations # After tags: - Technology-Consultants - Organizations
- Space-Separated Wordsyaml
# Before tags: - Technology Consultants - Organizations # After tags: - Technology-Consultants - Organizations
4. Correction Functions
4.1 Error Message Property Correction
javascript
async surroundErrorMessagePropertiesWithSingleMarkQuotes(markdownContent, markdownFilePath) {
const frontmatterData = helperFunctions.extractFrontmatter(markdownContent);
// Process each error message property
// Add single quotes if missing
// Return modified content
}
4.2 Character Set Correction
javascript
async removeImproperCharacterSetAddSingleMarkQuotes(markdownFileContent, markdownFilePath) {
// Remove multiple/mixed quotes
// Add single quotes
// Return standardized content
}
4.3 URL Property Correction
javascript
async removeAnyQuoteCharactersfromEitherOrBothSidesOfURL(markdownFileContent, markdownFilePath) {
// Remove all quotes from URL properties
// Preserve the URL itself
// Return cleaned content
}
4.7 Tag Syntax Correction
javascript
async assureOrFixTagSyntaxInFrontmatter(markdownContent, markdownFilePath) {
const frontmatterData = helperFunctions.extractFrontmatter(markdownContent);
// Process frontmatter lines
const lines = frontmatterData.frontmatterString.split('\n');
let modified = false;
let inTagsBlock = false;
let tagsArray = [];
let tagsStartIndex = -1;
// Process each line for tag formatting
for (let i = 0; i < lines.length; i++) {
const line = lines[i].trim();
if (line.startsWith('tags:')) {
inTagsBlock = true;
tagsStartIndex = i;
// Handle inline tags
const tagsContent = line.substring(5).trim();
if (tagsContent) {
const rawTags = tagsContent
.replace(/^\[|\]$/g, '') // Remove array brackets
.replace(/["']/g, '') // Remove quotes
.split(',') // Split by commas
.map(tag => tag.trim()) // Clean whitespace
.filter(tag => tag); // Remove empty
tagsArray = rawTags.map(tag =>
tag.replace(/\s+/g, '-') // Replace spaces
);
modified = true;
}
continue;
}
// Process bullet list tags
if (inTagsBlock && line.startsWith('-')) {
const tag = line.substring(1).trim()
.replace(/["']/g, '') // Remove quotes
.replace(/\s+/g, '-'); // Replace spaces
tagsArray.push(tag);
modified = true;
continue;
}
// End of tags block
if (inTagsBlock && !line.startsWith('-') && line.trim()) {
inTagsBlock = false;
}
}
// Return if no changes needed
if (!modified) {
return {
success: true,
modified: false,
filePath: markdownFilePath
};
}
// Reconstruct frontmatter with proper tag format
const beforeTags = lines.slice(0, tagsStartIndex);
const afterTags = lines.slice(tagsStartIndex + 1)
.filter(line => !line.trim().startsWith('-') || !inTagsBlock);
const formattedTags = ['tags:']
.concat(tagsArray.map(tag => `- ${tag}`));
const newFrontmatter = beforeTags
.concat(formattedTags)
.concat(afterTags)
.join('\n');
// Return modified content
return {
success: true,
modified: true,
filePath: markdownFilePath,
content: markdownContent.slice(0, frontmatterData.startIndex) +
'---\n' + newFrontmatter + '\n---' +
markdownContent.slice(frontmatterData.endIndex),
modifications: ['Reformatted tags to proper YAML bullet list syntax']
};
}
5. Helper Functions
5.1 Frontmatter Extraction
javascript
extractFrontmatter(markdownFileContent) {
// Find opening delimiter
// Extract content
// Handle missing closing delimiter
// Return frontmatter data
}
5.2 Result Standardization
javascript
createSuccessMessage(markdownFilePath, wasModified, modifications = []) {
return {
success: true,
modified: wasModified,
modifications,
filePath: markdownFilePath,
fileName: path.basename(markdownFilePath),
errors: []
};
}
6. Processing Workflow
sequenceDiagram
participant MF as Markdown Files
participant EP as Error Processor
participant CF as Correction Functions
participant RP as Report Generator
MF->>EP: Read Files
EP->>EP: Extract Frontmatter
loop Each Error Case
EP->>EP: Apply Detection Pattern
alt Error Found
EP->>CF: Apply Correction
CF->>EP: Return Modified Content
EP->>MF: Write Changes
end
end
EP->>RP: Send Results
RP->>RP: Generate Reports
7. Report Generation
7.1 Individual Error Reports
markdown
---
title: Error Type Name
date: YYYY-MM-DD
---
## Summary of Files Processed
Files processed: X
Files with issue: Y
Successful corrections: Z
### Files with Issues
[[file1]], [[file2]], ...
### Files Successfully Corrected
[[file1]], [[file2]], ...
7.2 Summary Report
markdown
# Error Processing Summary for YYYY-MM-DD
## Overview
Total Reports Generated: X
## Report Statistics
### Report_Name
- Files Processed: X
- Issues Found: Y
- Corrections Made: Z
- Success Rate: W%
## Aggregate Statistics
- Total Files Processed: X
- Total Issues Found: Y
- Total Corrections Made: Z
- Overall Success Rate: W%
8. Implementation Constraints
- Property Handling
- URL properties must never have quotes
- Error messages must have single quotes
- Timestamps must have consistent quote format
- UUIDs must not have quotes
- Error Detection
- Must check for exact property matches
- Must handle nested properties correctly
- Must preserve multiline values appropriately
- Report Generation
- Must generate reports for each error type
- Must create summary report
- Must track success rates
- File Processing
- Must handle missing delimiters gracefully
- Must preserve file content outside frontmatter
- Must maintain proper YAML structure
9. Performance Considerations
- File Processing
- Process files sequentially to manage memory
- Add delays between cases to prevent system overload
- Track and report progress regularly
- Error Detection
- Use efficient regex patterns
- Avoid unnecessary file reads
- Cache frontmatter extraction results
- Report Generation
- Write reports incrementally
- Use efficient file system operations
- Maintain consistent report format
10. Results and Impact
Our system has demonstrated significant success in maintaining YAML frontmatter quality:
- Processing Statistics
- 729 files processed
- 11 error cases checked
- 2,115 total corrections made
- Success Rates
- 100% of error message quote issues fixed
- 100% of URL quote issues resolved
- 96% of character set issues corrected
- 100% of timestamp format issues resolved
- 100% of tag syntax issues resolved
- Build Impact
- Zero YAML parsing errors in build
- Consistent metadata display
- Reliable OpenGraph data
Implementation Guidelines
For Developers
- Setup
- Clone repository
- Install dependencies
- Configure user options
- Set up report directories
- Running the System
- Execute main script
- Monitor progress
- Review reports
- Verify corrections
- Adding New Error Cases
- Define detection pattern
- Create correction function
- Add to known cases
- Test thoroughly
For AI Assistants
- Code Modification
- Preserve existing patterns
- Maintain helper functions
- Follow error handling patterns
- Document changes clearly
- Testing
- Verify pattern matches
- Check correction accuracy
- Validate report generation
- Monitor performance
- Reporting
- Use standard formats
- Include all statistics
- Document any issues
- Suggest improvements
This specification provides a comprehensive guide for understanding and extending our YAML frontmatter error detection and correction system. It ensures consistent handling of content while maintaining the integrity of our build process.