Fetch Open Graph Data from API
OpenGraph Data Fetching Script Implementation Guide
Create a Node.js script (
runFetchOpenGraphData.cjs
) that processes Markdown files to fetch and update OpenGraph metadata and screenshots. This guide provides detailed specifications for implementing a robust, error-tolerant system.Use Meticulous-Constraints-for-Every-Prompt and Maintain-Consistent-Reporting-Templates for the Single Operation Process Report.
Model Responses:
json
{
"hybridGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"openGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": {
"url": "https://example.com/image.png"
},
"url": "https://example.com",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"htmlInferred": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"images": [
"https://example.com/image1.png",
"https://example.com/image2.png",
"https://example.com/image3.png",
"https://example.com/image4.png"
]
},
"requestInfo": {
"redirects": 1,
"host": "https://example.com",
"responseCode": 200,
"cache_ok": true,
"max_cache_age": 432000000,
"accept_lang": "en-US,en;q=0.9",
"url": "https://example.com",
"full_render": false,
"use_proxy": false,
"use_superior" : false,
"responseContentType": "text/html; charset=utf-8"
},
"accept_lang": "en-US,en;q=0.9",
"is_cache": false,
"url": "https://example.com"
}
Core Components
1. File System Structure
text
scripts/
build-scripts/
runFetchOpenGraphData.cjs # Main script
utils/
addReportNamingConventions.cjs # Report filename generation
addReportFrontmatterTemplate.cjs # Report frontmatter formatting
2. Environment Setup
javascript
// Required environment variables
OPEN_GRAPH_IO_API_KEY=your_api_key
// Configuration constants
const TARGET_DIR = process.env.TARGET_DIR || '../content/tooling/AI-Toolkit';
const REPORT_OUTPUT_DIR = 'src/content/data_site';
const REPORT_NAME = 'open-graph-fetch-report';
3. Core Functions
A. Frontmatter Management
- Use plain text parsing (NOT gray-matter) to handle frontmatter
- Extract content between
---
markers - Preserve exact line positioning for updates
- Handle both YAML and non-YAML frontmatter gracefully
javascript
function extractFrontmatter(content) {
// Returns: { frontmatter: Object, content: string }
// Preserves original formatting
}
function updateMarkdownFile(filePath, frontmatter, content) {
// Atomic write operation
// Maintains file permissions
}
B. OpenGraph Data Fetching
- Implement retry logic (3 attempts)
- Handle rate limits with exponential backoff
- Validate response data structure
- Strip quotes from values
javascript
async function fetchOpenGraphData(url, filePath) {
// Returns: Promise<{
// og_title: string,
// og_description: string,
// og_image: string,
// og_url: string,
// og_last_fetch: string
// } | null>
}
C. Screenshot Fetching
- Non-blocking parallel operations
- Track in-progress fetches
- Cache results to prevent duplicates
javascript
async function fetchScreenshotUrl(url, filePath) {
// Returns: Promise<string | null>
// string = screenshot URL
// null = fetch failed
}
4. Processing Logic
A. Skip Conditions
Skip OpenGraph fetch if ANY of these exist:
image
og_image
og_last_error
Skip Screenshot fetch if:
og_screenshot
exists
B. Error Handling
- Mark files with errors:yaml
og_error: "Error message" og_last_fetch: "2025-03-24T05:59:57.811Z"
- Categories of errors:
- API errors (rate limits, timeouts)
- Invalid responses
- Missing required properties
- Network failures
C. Statistics Tracking
javascript
const stats = {
filesProcessed: 0,
filesWithIssues: new Set(),
openGraph: {
skippedDueToYaml: 0,
properOpenGraphDataFound: 0,
newSuccesses: new Set(),
newErrors: new Set()
},
screenshots: {
newSuccesses: new Set(),
errors: new Set()
}
};
5. Report Generation
A. Report Structure
markdown
---
date: 2025-03-24
datetime: 2025-03-24T05:59:57.811Z
authors:
- Michael Staton
augmented_with: 'Windsurf on Claude 3.5 Sonnet'
category: Data-Augmentation
tags:
- Data-Augmentation
- OpenGraph
- Automation
- Content-Processing
---
## Summary of Files Processed
Files processed: <count>
Total Files with issues: <count>
Open Graph data fetches:
- Skipped bc YAML inconsistency: <count>
- Skipped bc prior Open Graph Data: <count>
- New Open Graph data: <count>
- New Screenshots: <count>
- New Errors: <count>
### Files with Issues that were skipped completely
[[path/to/file1]], [[path/to/file2]]
### Files that have new open graph data
[[path/to/file3]], [[path/to/file4]]
### Files that have a new screenshot
[[path/to/file5]], [[path/to/file6]]
### Files that OpenGraphIo returned an error for core og data:
[[path/to/file7]]
### Files that OpenGraphIo returned an error for screenshot:
[[path/to/file8]]
B. Report Naming Convention
Format:
YYYY-MM-DD_reportName_runIndex.md
Example: 2025-03-24_open-graph-fetch-report_07.md
6. Implementation Notes
- File Safety
- Use atomic write operations
- Verify file existence before operations
- Maintain proper file permissions
- Handle concurrent access gracefully
- Performance
- Process files in parallel
- Implement request throttling
- Cache API responses when possible
- Track memory usage for large directories
- Logging
- Use emoji indicators for visibility:
- ✅ Success
- ⚠️ Warning
- ❌ Error
- Include file names in all log messages
- Log both to console and report
- Dependencies
- Node.js built-ins: fs, path
- External: dotenv (for API key)
- Custom utils: addReportNamingConventions.cjs, addReportFrontmatterTemplate.cjs
This implementation provides a robust, maintainable solution for fetching and managing OpenGraph data across a collection of Markdown files.