Citation Processing for FileSystem Observer
Citation Processing System
Objective
Extend the filesystem observer to process citations in markdown files by:
- Converting numeric citations to hex format
- Ensuring proper citation formatting
- Creating/updating footnote definitions
- Maintaining a citation registry
graph TD
A[Markdown File] --> B[Parse Content]
B --> C[Extract Frontmatter & Body]
C --> D[Identify Citations]
D --> E{Numeric Citation?}
E --> |Yes| F[Generate Hex ID]
E --> |No| G{Valid Hex ID?}
G --> |No| H[Flag for Review]
G --> |Yes| I[Verify Footnote Definition]
F --> I
I --> J{Definition Exists?}
J --> |No| K[Create Definition]
J --> |Yes| L[Verify Footnotes Section]
K --> L
L --> M{Section Exists?}
M --> |No| N[Create Section]
M --> |Yes| O[Update Registry]
N --> O
O --> P[Update File]
Template Extension
Add content processing capability to the template system:
typescript
// Extended MetadataTemplate interface with content processing
interface MetadataTemplate {
id: string;
name: string;
description: string;
// Define which files this template applies to
appliesTo: {
directories?: string[] ;
filePatterns?: string[] ;
};
// For frontmatter templates
required?: Record<string, FieldDefinition>;
optional?: Record<string, FieldDefinition>;
// For content processing templates
contentProcessing?: {
enabled: boolean;
processor: (content: string, filePath: string) => Promise<{
updatedContent: string;
changed: boolean;
stats: Record<string, any>;
}>;
};
}
Citation Detection Patterns
typescript
// Regular expressions for citation detection
const citationPatterns = {
// Pattern for numeric citations with caret: [^abcdef]
numericWithCaret: /\[\^(\d+)\] /g,
// Pattern for numeric citations without caret: [^123456]
// Negative lookahead to avoid matching markdown links [text] (url) or [text] [ref]
numericWithoutCaret: /\[(\d+)\] (?!\] |\()/g,
// Pattern for existing hex citations: [^xyz789]
existingHexCitation: /\[\^([0-9a-f] {6})\] /g,
// Pattern for footnote definitions: [^abcdef] : Text
footnoteDefinition: /\[\^([\da-f] +)\] :\s*(.*)/g
};
Formatting Requirements
Let's use Obsidian's citation format. [^1]
- Spacing
- Ensure at least one space before citations
- Ensure one space or newline after citations
- Fix improper spacing
- Format Conversion
- Convert
[^123456]
→[^abcdef]
(add caret) - Convert
[^abcdef]
→[^xyz789]
(numeric to hex) - Preserve existing hex citations
Citation Consistency Requirements
- Numeric Citation Consistency
- All instances of the same numeric citation (e.g., all instances of
[8]
) MUST be converted to the same hex ID - The system must maintain a mapping of numeric IDs to hex IDs during processing
- This mapping must be consistent within and across files
- Example: If
[8]
is converted to[^xyz789]
in one place, all other instances of[8]
must also be converted to `[^xyz789]
- Citation Deduplication
- The citation registry should not contain duplicate entries for the same citation
- When processing a file with multiple instances of the same citation, only one entry should be created in the registry
- The registry should track all files where a citation appears, but not create separate entries for each appearance
- Implementation Strategytypescript
// Create a mapping of numeric IDs to hex IDs const numericToHexMap: Record<string, string> = {}; // Process all numeric citations in a file numericCitations.forEach(match => { const numericId = match[1]; // If we've already assigned a hex ID for this numeric ID, reuse it if (numericToHexMap[numericId]) { // Reuse existing mapping } else { // Generate a new hex ID and store the mapping numericToHexMap[numericId] = generateHexId(); } }); // Replace all citations using the consistent mapping Object.entries(numericToHexMap).forEach(([numericId, hexId]) => { // Replace all instances of [^numericId] with [^hexId] });
This approach ensures that all references to the same citation are consistently converted to the same hex ID, preventing duplication in the registry and maintaining the relationship between citations and their references.
Implementation Steps
- Create citations template in
tidyverse/observers/templates/citations.ts
- Implement citation processor function in
tidyverse/observers/services/citationService.ts
- Extend FileSystemObserver to use content processing
- Add comprehensive error handling and reporting
Processing Logic
Citation-Footnote Pairing
- Check Existing Footnotes First
- For each citation, check if a corresponding footnote definition exists BEFORE conversion
- With Existing Footnote
- Convert both citation and footnote to the SAME hexCode
- Perform as atomic operation to prevent mismatches
- Example:
[^ghijkl]
→[^mnopqr]
and[^ghijkl] :
→[^mnopqr] :
- Without Existing Footnote
- Convert citation to hexCode
- Generate placeholder footnote with same hexCode
- Add placeholder at end of document
- Track change in memory for reporting
- Add footnotes section if needed
Footnotes Section Format
When adding a new footnotes section:
markdown
# Footnotes
***
- Each element on its own line with blank lines as shown
- Add ONE LINE ABOVE the first footnote
- The section MUST be added if citations exist but no section is present
- The section should be placed at the end of the document
- If a footnote section exists but doesn't match the format, it should be preserved as-is
Footnotes Section Logic
typescript
/**
* Ensures a Footnotes section exists in the content if citations are present
* @param content - The markdown content
* @param config - Configuration for the footnotes section
* @returns Updated content with footnotes section if needed
*/
function ensureFootnotesSection(
content: string,
config: {
footnotesSectionHeader: string;
footnotesSectionSeparator: string;
}
): string {
// Check if any citations exist
const citationRegex = /\[\^([0-9a-f] +)\] /g;
const citations = [...content.matchAll(citationRegex)] ;
if (citations.length === 0) {
// No citations, no need for a footnotes section
return content;
}
// Check if a footnote definition exists
const footnoteDefRegex = /\[\^([0-9a-f] +)\] :/g;
const footnoteDefs = [...content.matchAll(footnoteDefRegex)] ;
if (footnoteDefs.length === 0) {
// No footnote definitions, no need for a section
return content;
}
// Check if a Footnotes section already exists
const sectionRegex = new RegExp(`${config.footnotesSectionHeader}`, 'i');
if (content.match(sectionRegex)) {
// Section already exists
return content;
}
// Add footnotes section before the first footnote definition
const firstFootnoteDef = footnoteDefs[^f51f7a] ;
const firstFootnotePos = content.indexOf(firstFootnoteDef[^f51f7a] );
// Get the content before and after the first footnote
const contentBefore = content.substring(0, firstFootnotePos);
const contentAfter = content.substring(firstFootnotePos);
// Add the footnotes section
return `${contentBefore}\n\n${config.footnotesSectionHeader}\n\n${config.footnotesSectionSeparator}\n\n${contentAfter}`;
}
Citation Registry Integration
The citation registry (
site/src/content/citations/citation-registry.json
):- Registry Loading:
- Load the citation registry at the start of processing a batch of files
- If the registry doesn't exist, create an empty registry structure
- Registry Lookup During Processing:
- When encountering an existing hex citation (
[^a1b2c3]
):- Check if it exists in the registry
- If not, add it to the registry with the current file path and footnote text
- If it exists, update the registry entry with this file path if not already included
- Registry Updates During Conversion:
- When converting a numeric citation to hex:
- First check if the footnote text already exists in the registry
- If a match is found, use the existing hexCode for consistency
- If no match is found, generate a new hexCode and add it to the registry
- Registry Persistence:
- After processing each file, update the registry with any new citations
- Write the updated registry back to disk after each file to prevent data loss
- Include metadata about when the registry was last updated
- Registry Structure:typescript
interface CitationRegistry { citations: { [hexCode: string] : { text: string; files: string[] ; created: string; // ISO date lastUpdated: string; // ISO date } }; metadata: { lastUpdated: string; // ISO date totalCitations: number; }; }
Registry Structure
typescript
interface CitationRegistry {
citations: {
[hexCode: string] : {
text: string;
files: string[] ;
created: string; // ISO date
lastUpdated: string; // ISO date
}
};
metadata: {
lastUpdated: string; // ISO date
totalCitations: number;
};
}
Citation Registry Implementation
typescript
class CitationRegistry {
private registryPath: string;
private registry: CitationRegistry;
constructor(registryPath: string) {
this.registryPath = registryPath;
this.registry = { citations: {}, metadata: { lastUpdated: '', totalCitations: 0 } };
}
async loadRegistry(): Promise<void> {
try {
const registryData = await fs.promises.readFile(this.registryPath, 'utf8');
this.registry = JSON.parse(registryData);
} catch (error) {
// If file doesn't exist, create empty registry
await fs.promises.writeFile(this.registryPath, JSON.stringify(this.registry, null, 2), 'utf8');
}
}
async saveRegistry(): Promise<void> {
await fs.promises.writeFile(this.registryPath, JSON.stringify(this.registry, null, 2), 'utf8');
}
addCitation(hexCode: string, citationData: { text: string; files: string[] }): void {
this.registry.citations[hexCode] = citationData;
this.registry.metadata.totalCitations++;
this.registry.metadata.lastUpdated = new Date().toISOString();
}
updateCitationFiles(hexCode: string, filePath: string): void {
if (this.registry.citations[hexCode] ) {
this.registry.citations[hexCode] .files.push(filePath);
this.registry.citations[hexCode] .lastUpdated = new Date().toISOString();
}
}
getCitation(hexCode: string): { text: string; files: string[] } | undefined {
return this.registry.citations[hexCode] ;
}
}
Complete Processing Pipeline
typescript
/**
* Process citations in a Markdown file
* @param content - The markdown file content
* @param filePath - Path to the file
* @param config - Citation configuration
* @returns Object with updated content and processing statistics
*/
export async function processCitations(
content: string,
filePath: string,
config: CitationConfig
): Promise<{
updatedContent: string;
changed: boolean;
stats: {
citationsConverted: number;
footnotesAdded: number;
footnoteSectionAdded: boolean;
}
}> {
// Get citation registry
const citationRegistry = new CitationRegistry(config.registryPath);
// Load existing registry
await citationRegistry.loadRegistry();
// Extract frontmatter and body
const frontmatterAndBody = extractFrontmatterAndBody(content);
if (!frontmatterAndBody) {
return {
updatedContent: content,
changed: false,
stats: {
citationsConverted: 0,
footnotesAdded: 0,
footnoteSectionAdded: false
}
};
}
const { frontmatter, body } = frontmatterAndBody;
// Step 1: Fix citation spacing
const bodyWithFixedSpacing = fixCitationSpacing(body);
// Step 2: Convert citations without carets
const bodyWithCarets = convertCitationsToCaret(bodyWithFixedSpacing);
// Step 3: Convert numeric citations to hex
const { updatedContent: bodyWithHexCitations, stats: conversionStats } =
convertNumericCitationsToHex(bodyWithCarets, citationRegistry);
// Step 4: Ensure all citations have footnote definitions
const { updatedContent: bodyWithFootnotes, footnotesAdded } =
ensureFootnoteDefinitions(bodyWithHexCitations, citationRegistry);
// Step 5: Ensure Footnotes section exists if needed
const hadFootnotesSection = bodyWithFootnotes.includes(config.footnotesSectionHeader);
const bodyWithFootnotesSection = ensureFootnotesSection(bodyWithFootnotes, {
footnotesSectionHeader: config.footnotesSectionHeader,
footnotesSectionSeparator: config.footnotesSectionSeparator
});
const footnoteSectionAdded = !hadFootnotesSection &&
bodyWithFootnotesSection.includes(config.footnotesSectionHeader);
// Extract citation text for all hex citations and update registry
const hexCitationRegex = /\[\^([0-9a-f] {6})\] /g;
const hexCitations = [...bodyWithFootnotesSection.matchAll(hexCitationRegex)] ;
hexCitations.forEach(match => {
const hexId = match[^41e8c7] ;
const citationText = extractCitationText(bodyWithFootnotesSection, hexId);
if (citationText) {
citationRegistry.addCitation(hexId, {
sourceText: citationText,
files: [filePath]
});
}
// Update citation registry with this file
citationRegistry.updateCitationFiles(hexId, filePath);
});
// Update frontmatter with citation information
const updatedFrontmatter = {
...frontmatter,
date_modified: new Date().toISOString().split('T')[^f51f7a]
};
// Combine frontmatter and body
const finalContent = combineFrontmatterAndBody(
updatedFrontmatter,
bodyWithFootnotesSection
);
// Save citation registry
await citationRegistry.saveRegistry();
return {
updatedContent: finalContent,
changed: finalContent !== content,
stats: {
citationsConverted: conversionStats.conversionsPerformed,
footnotesAdded,
footnoteSectionAdded
}
};
}
Safe Registry Update Mechanism
typescript
/**
* Safely update the citation registry with backup and error recovery
* @param registryPath - Path to the registry file
* @param updateFn - Function to update the registry data
*/
async function safelyUpdateRegistry(
registryPath: string,
updateFn: (data: any) => any
): Promise<void> {
// Create backup first
const backupPath = `${registryPath}.backup`;
try {
await fs.promises.copyFile(registryPath, backupPath);
} catch (error) {
// If file doesn't exist, create empty registry
await fs.promises.writeFile(registryPath, '{}', 'utf8');
await fs.promises.copyFile(registryPath, backupPath);
}
try {
// Read current data
const data = JSON.parse(await fs.promises.readFile(registryPath, 'utf8'));
// Apply updates
const updatedData = updateFn(data);
// Write to temporary file first
const tempPath = `${registryPath}.temp`;
await fs.promises.writeFile(tempPath, JSON.stringify(updatedData, null, 2), 'utf8');
// Rename temp file to actual file (atomic operation on most file systems)
await fs.promises.rename(tempPath, registryPath);
// Remove backup if successful
await fs.promises.unlink(backupPath);
} catch (error) {
console.error('Error updating registry:', error);
// Restore from backup on error
try {
await fs.promises.copyFile(backupPath, registryPath);
} catch (restoreError) {
console.error('Failed to restore registry from backup:', restoreError);
}
throw error;
}
}
FileSystemObserver Integration
typescript
// In fileSystemObserver.ts
async processFile(filePath: string): Promise<void> {
// Process frontmatter as before
// Process citations if markdown file
if (filePath.endsWith('.md')) {
const template = this.templateRegistry.findTemplateForFile(filePath);
if (template?.contentProcessing?.enabled) {
const content = await fs.promises.readFile(filePath, 'utf8');
const { updatedContent, changed, stats } =
await template.contentProcessing.processor(content, filePath);
if (changed) {
await fs.promises.writeFile(filePath, updatedContent, 'utf8');
this.reportingService.addProcessedFile(filePath, stats);
}
}
}
}
Complete Implementation
typescript
// In tidyverse/observers/templates/citations.ts
export const citationsTemplate: MetadataTemplate = {
id: 'citations',
name: 'Citations Template',
description: 'Template for citation processing in markdown files',
appliesTo: {
directories: [
'content/lost-in-public/prompts/**/*',
'content/specs/**/*',
// Other directories as needed
] ,
},
// Configuration options for citation processing
citationConfig: {
// Registry path - configurable by user
registryPath: 'site/src/content/citations/citation-registry.json',
// Hex ID configuration
hexLength: 6,
// Footnotes section configuration
footnotesSectionHeader: '# Footnotes',
footnotesSectionSeparator: '***'
},
contentProcessing: {
enabled: true,
processor: async (content: string, filePath: string) => {
// Get citation service instance with this template's configuration
const citationService = CitationService.getInstance(citationsTemplate.citationConfig);
// Process citations
const result = await citationService.processCitations(content, filePath);
return {
updatedContent: result.content,
changed: result.changed,
stats: result.stats
};
}
}
};
Citation Service
typescript
// In tidyverse/observers/services/citationService.ts
export interface CitationConfig {
registryPath: string;
hexLength: number;
footnotesSectionHeader: string;
footnotesSectionSeparator: string;
}
export class CitationService {
private static instance: CitationService;
private registry: CitationRegistry;
private config: CitationConfig;
static getInstance(config?: CitationConfig): CitationService {
if (!CitationService.instance || config) {
CitationService.instance = new CitationService(config);
}
return CitationService.instance;
}
constructor(config?: CitationConfig) {
// Use provided config or default values
this.config = config || {
registryPath: 'site/src/content/citations/citation-registry.json',
hexLength: 6,
footnotesSectionHeader: '# Footnotes',
footnotesSectionSeparator: '***'
};
// Resolve registry path relative to cwd if needed
this.config.registryPath = path.isAbsolute(this.config.registryPath)
? this.config.registryPath
: path.join(process.cwd(), this.config.registryPath);
this.registry = new CitationRegistry(this.config.registryPath);
}
async processCitations(content: string, filePath: string): Promise<{
content: string;
changed: boolean;
stats: Record<string, any>;
}> {
// Implementation of citation processing logic
// Following the requirements specified above
// Using this.config for all configurable options
}
// Other methods for registry management
}
[^abcdef] : Citation text needed
[^d7009d] : Citation text needed
[^a1b2c3] : Citation text needed
[^f51f7a] : Citation text needed
[^41e8c7] : Citation text needed
[^f96520]: Citation text needed
[^d7009d]: Citation text needed
[^a1b2c3]: Citation text needed
[^f51f7a]: Citation text needed
[^41e8c7]: Citation text needed
[^1]: https://forum.obsidian.md/t/citations-and-bibliography/11495