Frontmatter consistency through filesystem observer
Objective
Create a robust filesystem observer system that monitors Markdown files, validates frontmatter against predefined templates, and automatically corrects inconsistencies while preserving the most accurate metadata.
Implementation Status
This system has been successfully implemented in the
tidyverse/observers
directory with the following key features:- Template-based frontmatter validation
- Automatic correction of missing required fields
- Special handling for date_created using file birthtime
- Kebab-case to snake_case property conversion
- Proper YAML formatting for tags and arrays
System Architecture
graph TD
A[File System] -->|File Events| B[FileSystemObserver]
B -->|Read File| C[Extract Frontmatter]
C -->|Validate| D[TemplateRegistry]
D -->|Get Template| E[Template Definitions]
C -->|Missing Fields?| F[addMissingRequiredFields]
F -->|Special Handling| G[date_created]
G -->|Compare with| H[File Birthtime]
F -->|Update| I[Write Updated File]
B -->|Log Activity| J[ReportingService]
J -->|Generate| K[Markdown Reports]
Data Flow
- File Detection:text
File System (new/modified file) → FileSystemObserver (event) → Extract Frontmatter → Validate Against Template
- Field Processing:text
Template Registry (find matching template) → Check Required Fields → Special Handling for date_created → Compare with File Birthtime → Keep Earlier Date
- Reporting Flow:text
Observer Activity → ReportingService → Log Property Conversions → Generate Markdown Reports
Key Components
1. Template Registry
typescript
// Template definition pattern
interface Template {
id: string;
name: string;
description: string;
// Path matching rules
pathPatterns: string[];
// Schema definition
required: {
[key: string]: {
type: string;
description: string;
defaultValueFn?: (filePath: string) => any;
}
};
optional: {
[key: string]: {
type: string;
description: string;
defaultValueFn?: (filePath: string) => any;
}
};
}
2. File Observer
typescript
class FileSystemObserver {
constructor(
private templateRegistry: TemplateRegistry,
private contentRoot: string
) {
this.watcher = chokidar.watch(contentRoot);
}
async onFileChanged(filePath: string) {
// Read file and extract frontmatter
const content = await fs.readFile(filePath, 'utf8');
const frontmatterResult = this.extractFrontmatter(content);
if (frontmatterResult.frontmatter) {
// Find matching template
const template = this.templateRegistry.findTemplate(filePath);
// Add missing required fields
const { updatedFrontmatter, changed } = addMissingRequiredFields(
frontmatterResult.frontmatter,
template,
filePath
);
// Write updated file if changes were made
if (changed) {
await this.writeUpdatedFile(filePath, updatedFrontmatter, frontmatterResult.content);
}
}
}
}
3. Special Handling for date_created
typescript
// In addMissingRequiredFields function
if (key === 'date_created') {
try {
// Get file birthtime
const fs = require('fs');
if (fs.existsSync(filePath)) {
const stats = fs.statSync(filePath);
const birthtime = stats.birthtime;
const birthtimeIso = birthtime.toISOString();
// If date_created exists, check if birthtime is earlier
if (updatedFrontmatter[key]) {
const existingDate = new Date(updatedFrontmatter[key]);
// If birthtime is earlier than the existing date_created, update it
if (birthtime < existingDate) {
console.log(`Updating date_created for ${filePath} from ${updatedFrontmatter[key]} to ${birthtimeIso} (file birthtime is earlier)`);
updatedFrontmatter[key] = birthtimeIso;
changed = true;
} else {
console.log(`Keeping existing date_created for ${filePath}: ${updatedFrontmatter[key]} (earlier than file birthtime ${birthtimeIso})`);
}
}
// If date_created doesn't exist, add it
else {
console.log(`Adding date_created for ${filePath}: ${birthtimeIso}`);
updatedFrontmatter[key] = birthtimeIso;
changed = true;
}
// Skip the standard field processing for date_created
continue;
}
} catch (error) {
console.error(`Error handling date_created for ${filePath}:`, error);
// Continue with standard processing if there was an error
}
}
4. Template Definition for Tooling
typescript
const toolingTemplate = {
id: 'tooling',
name: 'Tooling Document',
description: 'Template for tooling documentation',
pathPatterns: ['content/tooling/**/*.md'],
required: {
site_uuid: {
type: 'string',
description: 'Unique identifier for the site',
defaultValueFn: () => uuidv4()
},
tags: {
type: 'array',
description: 'Categorization tags',
defaultValueFn: (filePath) => {
// Extract directory structure as tags
try {
// Extract all directory names after 'tooling'
const pathParts = filePath.split('/');
const toolingIndex = pathParts.findIndex(part => part === 'tooling');
if (toolingIndex >= 0) {
// Get all directory names after 'tooling' and before the filename
const tags = pathParts.slice(toolingIndex + 1, -1).map(tag => tag.replace(/\s+/g, '-'));
return tags.length > 0 ? tags : ['Uncategorized'];
}
return ['Uncategorized'];
} catch (error) {
console.error(`Error generating tags for ${filePath}:`, error);
return ['Uncategorized'];
}
}
},
date_created: {
type: 'date',
description: 'Creation date',
defaultValueFn: (filePath) => {
try {
// Use the Node.js fs module for synchronous operations
const fs = require('fs');
// Check if file exists
if (fs.existsSync(filePath)) {
// Get file stats to access creation time
const stats = fs.statSync(filePath);
// Use birthtime (actual file creation time) which is reliable on Mac
const timestamp = stats.birthtime;
// Return full ISO string with timezone
return timestamp.toISOString();
} else {
// Return null instead of current date
return null;
}
} catch (error) {
// Return null instead of current date
return null;
}
}
},
date_modified: {
type: 'date',
description: 'Last modified date',
defaultValueFn: (filePath) => {
// Similar to date_created but using mtime
// Implementation details...
}
}
},
optional: {
// Optional fields definition
// Implementation details...
}
};
Best Practices
- Reliable File Timestamps:
- Use
birthtime
fordate_created
which is reliable on Mac systems - Compare existing values with file timestamps and keep the earlier date
- Add proper error handling to prevent fallbacks to current date
- Frontmatter Consistency:
- Convert kebab-case properties to snake_case
- Format tags as proper YAML lists with hyphens
- Preserve content while updating frontmatter
- Reporting and Monitoring:
- Log all property conversions and validation issues
- Generate periodic reports in markdown format
- Create a final report on system shutdown
- Code Reuse and Shared Functionality:
- Extract common functionality into shared utility modules
- Implement a single source of truth for operations used across multiple templates
- All templates should use the same shared code for common operations like:
- UUID generation
- Date handling
- File stats retrieval
- Tag formatting
- Never duplicate functionality across template files
- When adding new functionality to one template, ensure it's available to all templates that need it
Constraints and Limitations
- File System Compatibility:
- The
birthtime
property is reliable on Mac but may not be on all systems - Error handling is in place to prevent incorrect timestamps
- Performance Considerations:
- Synchronous file operations are used for simplicity but may impact performance with large numbers of files
- Consider batch processing for large directories
- Template Management:
- Templates must be manually updated when frontmatter requirements change
- No automatic detection of new frontmatter patterns
- Preventing Infinite Loops:
- The observer must track files currently being processed to prevent infinite loops
- Implement async promise-based processing with proper error handling
CRITICAL: Preventing Infinite Loops
The most critical aspect of the filesystem observer implementation is preventing infinite loops. This is ABSOLUTELY ESSENTIAL for proper functioning:
typescript
class FileSystemObserver {
private processingFiles: Set<string> = new Set(); // Track files currently being processed
async onFileChanged(filePath: string): Promise<void> {
// Skip if this file is already being processed to prevent infinite loops
if (this.processingFiles.has(filePath)) {
console.log(`Skipping ${filePath} as it's already being processed (preventing loop)`);
return;
}
try {
// Mark file as being processed
this.processingFiles.add(filePath);
// Process the file...
} finally {
// CRITICAL: Always remove from processing set when done
this.processingFiles.delete(filePath);
}
}
}
Why This Is Critical
- Infinite Loop Prevention: Without this mechanism, the observer will enter an infinite loop because:
- Observer detects file change
- Observer updates file
- Update triggers another file change event
- Process repeats indefinitely
- Async Promise-Based Processing: All file processing must use async/await with proper Promise handling to ensure:
- Operations complete fully before releasing the file lock
- Error handling doesn't prevent cleanup
- File state remains consistent
- User-Triggered Changes Only: The observer should ONLY process changes that are actually made by the USER, not changes made by the observer itself.
- Resource Protection: Infinite loops can quickly:
- Consume all available CPU
- Fill up disk space with logs
- Corrupt files with partial updates
- Crash the entire application
This is not an optional feature - it is the single most important aspect of the implementation that must be implemented correctly.
Two-Phase Observer Approach
Another critical implementation detail is using a two-phase approach to prevent observer loops while still ensuring all files are properly processed:
typescript
class FileSystemObserver {
private initialProcessingComplete: boolean = false;
private initialProcessingTimeout: NodeJS.Timeout | null = null;
constructor(
templateRegistry: TemplateRegistry,
reportingService: ReportingService,
contentRoot: string,
private options: {
ignoreInitial?: boolean;
processExistingFiles?: boolean;
initialProcessingDelay?: number; // Delay in ms before switching to regular observer mode
} = {}
) {
// Set default options
this.options.initialProcessingDelay = this.options.initialProcessingDelay ?? 90000; // Default 90 seconds
// Set up initial processing timeout
if (this.options.processExistingFiles) {
console.log(`Initial processing mode active. Will switch to regular observer mode after ${this.options.initialProcessingDelay / 1000} seconds.`);
this.initialProcessingTimeout = setTimeout(() => {
console.log('Switching to regular observer mode...');
this.initialProcessingComplete = true;
// Generate a report after initial processing
this.reportingService.generateReport();
}, this.options.initialProcessingDelay);
}
}
async onFileChanged(filePath: string): Promise<void> {
// Standard loop prevention first
if (this.processingFiles.has(filePath)) {
return;
}
// Additional loop prevention for regular observer mode
if (this.initialProcessingComplete) {
// Check if this is a file we just updated
const lastModified = (await fs.stat(filePath)).mtime.getTime();
const currentTime = Date.now();
const timeSinceModification = currentTime - lastModified;
// If the file was modified very recently (within 5 seconds) and we're in regular observer mode,
// it's likely our own update, so skip it
if (timeSinceModification < 5000) {
console.log(`Skipping recently modified file ${filePath} to prevent observer loop (modified ${timeSinceModification}ms ago)`);
return;
}
}
// Process the file...
}
}
Why This Approach Is Essential
- Initial Processing Phase:
- Processes all existing files once at startup
- Runs for a fixed duration (90 seconds by default)
- Generates a comprehensive report after completion
- Regular Observer Phase:
- Automatically activates after the initial processing phase
- Only processes files that were genuinely modified by users
- Includes a smart detection system to ignore self-triggered changes
- Benefits:
- Ensures all files are processed once during startup
- Automatically transitions to a stable monitoring mode
- Self-triggered changes don't cause infinite processing loops
- Provides clear logging about which phase the observer is in
- Configuration Options:
initialProcessingDelay
: Adjustable based on content directory size (default: 90 seconds)processExistingFiles
: Can be disabled if only new changes should be processed
This two-phase approach complements the processingFiles tracking mechanism and provides an additional layer of protection against observer loops.
Next Steps
- Enhanced Validation:
- Add more sophisticated validation rules for specific field types
- Implement cross-field validation (e.g., date_created should be before date_modified)
- User Interface:
- Create a dashboard for monitoring observer activity
- Add interactive controls for managing templates
- Integration:
- Connect with build process to ensure frontmatter is valid before deployment
- Add hooks for custom processing of specific fields