Rendering Extended Markdown through AST

Unfinished Work

handle citations sections INSIDE callouts, but include callout content that comes AFTER the citations section.

Context

Comparing Approaches to Extended Markdown Rendering

Previous Approach (Component-Level Transformation)

Initially, we tried handling markdown extensions (callouts, citations) at the component level:

Let markdown pass through remark untouched
Process nodes in Astro components
Transform content during rendering

This approach faced challenges:

Duplicate processing logic in components
Inconsistent node structure preservation
Difficulty maintaining AST hierarchy
Potential conflicts between transformations

Current Approach (Unified Remark Pipeline)

We now use a unified remark plugin pipeline:

Process markdown extensions during the MDAST phase
Use proper node structure with hName and hProperties
Render transformed nodes in components

Benefits:

Single source of truth for transformations
Better preservation of AST structure
Cleaner component logic
More maintainable codebase

Constraints

Maintain separation of concerns:

Remark plugins handle AST transformations
Components handle rendering
No duplicate processing logic

Implementation Examples

1. Citation Processing

Citation Syntax

Citations are marked in markdown using:

A "Citations:" header line
Numbered citation entries starting with [n]
Citations block continues until a blank line

Example:

markdown

Citations:
[1] First citation entry
[2] Second citation entry

Citation Plugin Structure

typescript

// remarkCitations.ts
type CitationNode = {
  type: 'citation';
  value: string;
  data: {
    hName: string;
    hProperties: {
      className: string;
    };
  };
};

type CitationsContainerNode = {
  type: 'citations';
  children: CitationNode[];
  data: {
    hName: string;
    hProperties: {
      className: string;
    };
  };
};

export default function remarkCitations() {
  return (tree: Root) => {
    let citationsFound: CitationNode[] = [];

    // First pass: find and extract citations
    visit(tree, 'paragraph', (node: Paragraph, index: number, parent: Parent) => {
      const firstChild = node.children[0];
      if (firstChild?.type === 'text' && 
          (firstChild.value.startsWith('Citations:') || firstChild.value.includes('\n[1]'))) {
        
        // Extract and transform citations
        const citations = firstChild.value
          .split('\n')
          .filter(line => line.trim() && !line.startsWith('Citations:'))
          .map(citation => ({
            type: 'citation',
            value: citation.trim(),
            data: {
              hName: 'div',
              hProperties: {
                className: 'citation'
              }
            }
          } as CitationNode));

        citationsFound = citationsFound.concat(citations);

        // Remove original paragraph
        if (typeof index === 'number' && Array.isArray(parent?.children)) {
          parent.children.splice(index, 1);
        }
      }
    });

    // Second pass: add citations container
    if (citationsFound.length > 0) {
      const citationsNode = {
        type: 'citations',
        children: citationsFound,
        data: {
          hName: 'div',
          hProperties: {
            className: 'citations-container'
          }
        }
      } as CitationsContainerNode;

      tree.children.push(citationsNode as unknown as Paragraph);
    }
  };
}

Component Rendering

astro

// ArticleCitations.astro
---
interface Props {
  node: {
    type: string;
    children: {
      type: string;
      value: string;
    }[];
  };
}

const { node } = Astro.props;
---

<div class="citations-container">
  {node.children.map((citation) => (
    <div class="citation">{citation.value}</div>
  ))}
</div>

2. Remark Plugin Pipeline

typescript

// OneArticle.astro
const processor = unified()
  .use(remarkParse)           // 1. Parse markdown to MDAST
  .use(remarkCitations)       // 2. Process citations
  .use(remarkBacklinks)       // 3. Process inline wiki-style links
  .use(remarkImages)          // 4. Process inline images
  .use(remarkCallouts);       // 5. Process container elements

// First parse to MDAST
const mdast = await processor.parse(content || '');

// Then run transformations
const transformedMdast = await processor.run(mdast);

Key Principles

Single Responsibility
- Each plugin handles one type of transformation
- Clean separation between MDAST and HAST phases
- Components only handle rendering
Node Structure
- Use proper MDAST/HAST node types
- Set hName and hProperties for HTML generation
- Maintain AST hierarchy
Error Handling
- Validate input at each phase
- Preserve original content on error
- Clear error reporting
Debugging
- Output AST state at each phase
- Track transformations
- Maintain type safety
Component Integration
- Clean component interfaces
- Type-safe props
- Minimal processing logic

Callout Processing Structure (2025-04-03)

Directory Structure

text

site/src/utils/markdown/callouts/
├── calloutCases.ts     # Known patterns and types
├── calloutTypes.ts     # TypeScript definitions
├── detectMarkdownCallouts.ts    # Phase 1: Pattern detection
├── isolateCalloutContent.ts     # Phase 2: Content isolation
├── transformCalloutStructure.ts  # Phase 3: AST transformation
├── embedCalloutNodes.ts         # Phase 4: Node embedding
└── processCalloutPipeline.ts    # Pipeline orchestration

Pipeline Flow

Detection (detectMarkdownCallouts.ts):
- Finds blockquotes that match callout patterns
- Returns array of detected callout nodes
- No modifications to original nodes
Isolation (isolateCalloutContent.ts):
- Extracts complete content from detected nodes
- Preserves context and relationships
- Returns array of isolated callout content
Transformation (transformCalloutStructure.ts):
- Creates component structure from isolated content
- Sets HAST properties for HTML generation
- Returns array of transformed nodes
Embedding (embedCalloutNodes.ts):
- Replaces original nodes with transformed versions
- Preserves tree structure and relationships
- Returns modified AST

Pipeline Orchestration

typescript

// processCalloutPipeline.ts
export async function processCallouts(tree: Node): Promise<Node> {
  try {
    // Phase 1: Detection
    const detected = await detectMarkdownCallouts(tree);
    if (!detected.length) return tree;
    
    // Phase 2: Isolation
    const isolated = await isolateCalloutContent(detected);
    if (!isolated.length) return tree;
    
    // Phase 3: Transformation
    const transformed = await transformCalloutStructure(isolated);
    if (!transformed.length) return tree;
    
    // Phase 4: Embedding
    return await embedCalloutNodes(tree, transformed);
  } catch (error) {
    console.error('Error in callout pipeline:', error);
    return tree;
  }
}

Remark Plugin Integration

typescript

// remark-callout-handler.ts
const remarkCalloutHandler: Plugin<[], Root> = () => {
  return async (tree: Root) => {
    try {
      astDebugger.writeDebugFile('0-initial-tree', tree);
      const processedTree = await processCallouts(tree);
      astDebugger.writeDebugFile('5-final-tree', processedTree);
      return processedTree;
    } catch (error) {
      console.error('Error in remark-callout:', error);
      return tree;
    }
  };
};

Debug Points

0-initial-tree.json - Initial MDAST
1-detected-callouts.json - After detection phase
2-isolated-callouts.json - After isolation phase
3-transformed-callouts.json - After transformation phase
4-final-tree.json - After embedding phase

Key Principles

Each phase is independent and has a single responsibility
Clear error handling at each phase
Comprehensive debug output
Original content preserved on error
No assumptions about node structure
Explicit type definitions
Clear transformation tracking