Rendering Extended Markdown through AST
Unfinished Work
- handle citations sections INSIDE callouts, but include callout content that comes AFTER the citations section.
Context
Comparing Approaches to Extended Markdown Rendering
Previous Approach (Component-Level Transformation)
Initially, we tried handling markdown extensions (callouts, citations) at the component level:
- Let markdown pass through remark untouched
- Process nodes in Astro components
- Transform content during rendering
This approach faced challenges:
- Duplicate processing logic in components
- Inconsistent node structure preservation
- Difficulty maintaining AST hierarchy
- Potential conflicts between transformations
Current Approach (Unified Remark Pipeline)
We now use a unified remark plugin pipeline:
- Process markdown extensions during the MDAST phase
- Use proper node structure with
hName
andhProperties
- Render transformed nodes in components
Benefits:
- Single source of truth for transformations
- Better preservation of AST structure
- Cleaner component logic
- More maintainable codebase
Constraints
Maintain separation of concerns:
- Remark plugins handle AST transformations
- Components handle rendering
- No duplicate processing logic
Implementation Examples
1. Citation Processing
Citation Syntax
Citations are marked in markdown using:
- A "Citations:" header line
- Numbered citation entries starting with [n]
- Citations block continues until a blank line
Example:
markdown
Citations:
[1] First citation entry
[2] Second citation entry
Citation Plugin Structure
typescript
// remarkCitations.ts
type CitationNode = {
type: 'citation';
value: string;
data: {
hName: string;
hProperties: {
className: string;
};
};
};
type CitationsContainerNode = {
type: 'citations';
children: CitationNode[];
data: {
hName: string;
hProperties: {
className: string;
};
};
};
export default function remarkCitations() {
return (tree: Root) => {
let citationsFound: CitationNode[] = [];
// First pass: find and extract citations
visit(tree, 'paragraph', (node: Paragraph, index: number, parent: Parent) => {
const firstChild = node.children[0];
if (firstChild?.type === 'text' &&
(firstChild.value.startsWith('Citations:') || firstChild.value.includes('\n[1]'))) {
// Extract and transform citations
const citations = firstChild.value
.split('\n')
.filter(line => line.trim() && !line.startsWith('Citations:'))
.map(citation => ({
type: 'citation',
value: citation.trim(),
data: {
hName: 'div',
hProperties: {
className: 'citation'
}
}
} as CitationNode));
citationsFound = citationsFound.concat(citations);
// Remove original paragraph
if (typeof index === 'number' && Array.isArray(parent?.children)) {
parent.children.splice(index, 1);
}
}
});
// Second pass: add citations container
if (citationsFound.length > 0) {
const citationsNode = {
type: 'citations',
children: citationsFound,
data: {
hName: 'div',
hProperties: {
className: 'citations-container'
}
}
} as CitationsContainerNode;
tree.children.push(citationsNode as unknown as Paragraph);
}
};
}
Component Rendering
astro
// ArticleCitations.astro
---
interface Props {
node: {
type: string;
children: {
type: string;
value: string;
}[];
};
}
const { node } = Astro.props;
---
<div class="citations-container">
{node.children.map((citation) => (
<div class="citation">{citation.value}</div>
))}
</div>
2. Remark Plugin Pipeline
typescript
// OneArticle.astro
const processor = unified()
.use(remarkParse) // 1. Parse markdown to MDAST
.use(remarkCitations) // 2. Process citations
.use(remarkBacklinks) // 3. Process inline wiki-style links
.use(remarkImages) // 4. Process inline images
.use(remarkCallouts); // 5. Process container elements
// First parse to MDAST
const mdast = await processor.parse(content || '');
// Then run transformations
const transformedMdast = await processor.run(mdast);
Key Principles
- Single Responsibility
- Each plugin handles one type of transformation
- Clean separation between MDAST and HAST phases
- Components only handle rendering
- Node Structure
- Use proper MDAST/HAST node types
- Set
hName
andhProperties
for HTML generation - Maintain AST hierarchy
- Error Handling
- Validate input at each phase
- Preserve original content on error
- Clear error reporting
- Debugging
- Output AST state at each phase
- Track transformations
- Maintain type safety
- Component Integration
- Clean component interfaces
- Type-safe props
- Minimal processing logic
Callout Processing Structure (2025-04-03)
Directory Structure
text
site/src/utils/markdown/callouts/
├── calloutCases.ts # Known patterns and types
├── calloutTypes.ts # TypeScript definitions
├── detectMarkdownCallouts.ts # Phase 1: Pattern detection
├── isolateCalloutContent.ts # Phase 2: Content isolation
├── transformCalloutStructure.ts # Phase 3: AST transformation
├── embedCalloutNodes.ts # Phase 4: Node embedding
└── processCalloutPipeline.ts # Pipeline orchestration
Pipeline Flow
- Detection (
detectMarkdownCallouts.ts
):- Finds blockquotes that match callout patterns
- Returns array of detected callout nodes
- No modifications to original nodes
- Isolation (
isolateCalloutContent.ts
):- Extracts complete content from detected nodes
- Preserves context and relationships
- Returns array of isolated callout content
- Transformation (
transformCalloutStructure.ts
):- Creates component structure from isolated content
- Sets HAST properties for HTML generation
- Returns array of transformed nodes
- Embedding (
embedCalloutNodes.ts
):- Replaces original nodes with transformed versions
- Preserves tree structure and relationships
- Returns modified AST
Pipeline Orchestration
typescript
// processCalloutPipeline.ts
export async function processCallouts(tree: Node): Promise<Node> {
try {
// Phase 1: Detection
const detected = await detectMarkdownCallouts(tree);
if (!detected.length) return tree;
// Phase 2: Isolation
const isolated = await isolateCalloutContent(detected);
if (!isolated.length) return tree;
// Phase 3: Transformation
const transformed = await transformCalloutStructure(isolated);
if (!transformed.length) return tree;
// Phase 4: Embedding
return await embedCalloutNodes(tree, transformed);
} catch (error) {
console.error('Error in callout pipeline:', error);
return tree;
}
}
Remark Plugin Integration
typescript
// remark-callout-handler.ts
const remarkCalloutHandler: Plugin<[], Root> = () => {
return async (tree: Root) => {
try {
astDebugger.writeDebugFile('0-initial-tree', tree);
const processedTree = await processCallouts(tree);
astDebugger.writeDebugFile('5-final-tree', processedTree);
return processedTree;
} catch (error) {
console.error('Error in remark-callout:', error);
return tree;
}
};
};
Debug Points
0-initial-tree.json
- Initial MDAST1-detected-callouts.json
- After detection phase2-isolated-callouts.json
- After isolation phase3-transformed-callouts.json
- After transformation phase4-final-tree.json
- After embedding phase
Key Principles
- Each phase is independent and has a single responsibility
- Clear error handling at each phase
- Comprehensive debug output
- Original content preserved on error
- No assumptions about node structure
- Explicit type definitions
- Clear transformation tracking