Implement an OpenGraph fetcher as an Obsidian Plugin

Objective
The plugin has the working name "Open Graph Fetcher" or
open-graph-fetcher
and can be found on GitHub at open-graph-fetcher-obsidian-plugin with the development branch being most active. From the lossless-monorepo it can be found as a submodule.Working Directory
- The relative path from the lossless-monorepo is
open-graph-fetcher
- The absolute path is on mps' mac is
/Users/mpstaton/code/lossless-monorepo/open-graph-fetcher
Background
We have implemented this as a script several times, then ported it to a filesystem observer system and we have it working. It runs on a full directory and iterates recursively through that directory. It can be found at:
tidyverse/observers/scripts/test-opengraph.ts
tidyverse/observers/services/openGraphService.ts
tidyverse/observers/templates/tooling.ts
tidyverse/observers/watchers/toolkitWatcher.ts
How Obsidian Plugin Updates Work
The reason your version bump script doesn't automatically update the version in Obsidian's community plugin list is because Obsidian doesn't pull updates from your local
manifest.json file. Here's how the update mechanism actually works:
The Update Process:
- GitHub Releases are the Source of Truth: Obsidian fetches plugin updates from your GitHub releases, not from your local files or repository's main branch.
- Release Process Required: To update a plugin in the community list, you need to:
- Updatemanifest.json with the new version number
- Updatetext
versions.json
with version compatibility info - Create a GitHub release with the exact version number as a tag
- Upload the built files (manifest.json,text
main.js
,styles.css) as binary attachments to that release
- Obsidian's Update Check:
- Obsidian reads the list fromtext
community-plugins.json
in the obsidian-releases repo - When checking for updates, it pulls themanifest.json from your repo to see the latest version
- When users install/update, it downloads the files from your GitHub releases (not from the repo files)
Your Current Setup:
Your
version-bump.mjs script only updates local files but doesn't create the GitHub release that Obsidian needs to detect the update. You need to:
- Create a GitHub Release: After running your version bump script, you need to create a GitHub release with the new version as the tag
- Upload Built Files: Attachmanifest.json,text
main.js
, andstyles.css to that release - Publish the Release: This makes it available for Obsidian to detect and download
Recommended Solution:
Consider using GitHub Actions to automate this process. The Obsidian sample plugin suggests using
text
npm version patch/minor/major
commands which can trigger automated releases via GitHub Actions, making the entire process seamless.
Task at Hand
1st Prompt
Review the files above, and then write your analysis of the key functionality and crucial functions into this file under the below header entitled "Analysis of OpenGraph fetching in the Observer System"
Analysis of OpenGraph fetching in the Observer System
Architecture Overview
The OpenGraph fetching functionality is integrated into the observer system through a modular architecture:
- Core Services
openGraphService.ts
: Manages OpenGraph metadata fetchingreportingService.ts
: Handles statistics and error reportingtemplateRegistry.ts
: Manages frontmatter templates
- Key Components
- OpenGraph Service:
- Implements single source of truth for OpenGraph fields (OG_FIELDS)
- Handles API integration with OpenGraph.io
- Manages retry logic and error handling
- Normalizes OpenGraph data before storage
- File Watcher:
- Uses chokidar for file system monitoring
- Watches for add/change events
- Processes files only if they contain frontmatter
- Maintains file state tracking
Implementation Details
- Smart Processing Decisions
- Uses
needsOpenGraph()
to determine if processing is needed - Checks for existing og_last_fetch timestamp
- Verifies presence and correctness of all required fields
- Prevents unnecessary API calls
- Data Normalization
- Standardizes OpenGraph field handling
- Converts nested objects to primitives
- Handles arrays of objects with 'url' properties
- Strips quotes and trims strings
- Error Handling
- Implements retry logic with exponential backoff
- Records errors in frontmatter
- Maintains processing statistics
- Graceful degradation for missing fields
- Performance Optimizations
- Only processes Markdown files (.md)
- Uses awaitWriteFinish to prevent race conditions
- Implements file state tracking
- Skips processing for files with valid metadata
Integration Points
- Frontmatter Processing
- Uses custom YAML parser (not gray-matter)
- Maintains exact line positioning
- Preserves original formatting
- Handles both YAML and non-YAML frontmatter
- File System Integration
- Uses fs-extra for file operations
- Implements atomic write operations
- Maintains file permissions
- Handles recursive directory scanning
Key Features
- Smart Skipping Logic
- Skips files with valid OpenGraph data
- Skips files with recent fetch timestamps
- Prevents infinite processing loops
- Data Integrity
- Validates all OpenGraph fields
- Ensures proper field normalization
- Maintains consistent data structure
- Preserves original file content
- Reporting
- Tracks processing statistics
- Records errors and failures
- Maintains processing history
- Provides detailed reporting
This implementation follows the project's established patterns for observer-based processing while adding specific functionality for OpenGraph metadata fetching and normalization. The system is designed to be robust, efficient, and maintainable while providing comprehensive error handling and reporting capabilities.
2nd Prompt: Plan the Obsidian Plugin
Desired Functionality
- Settings Management
- Settings Section similar to
content-farm/main.ts
where the user can configure:- OpenGraph API Key (stored securely in Obsidian's vault)
- Base URL for OpenGraph.io API (configurable for different environments)
- Retry settings (number of attempts, backoff delay)
- Rate limiting configuration
- Cache duration settings
- Modal Interface
- OpenGraph Fetch Modal with:
- Checkbox: "Overwrite Existing Open Graph YAML properties?"
- Checkbox: "Create new YAML properties if none exists?"
- Checkbox: "Write any returned Errors into YAML?"
- Checkbox: "Write or Overwrite date for This Fetch?"
- Button: "Fetch Open Graph Data"
- Button: "Fetch Screenshot"
- Progress indicator for fetch operations
- Status message area for feedback
- Command Implementation
- Register a Command called "Fetch Open Graph Data" that:
- Opens the OpenGraph Fetch Modal
- Button: "Fetch Open Graph Data"
- Fetches Open Graph Data from OpenGraph.io using URL from YAML frontmatter
- Reviews modal settings and performs accordingly
- Handles errors gracefully and displays feedback
- Button: "Fetch Open Graph Screenshot"
- Uses OpenGraph.io screenshot API
- Handles screenshot errors separately from metadata errors
3rd Prompt: Batch Fetch for Target Directory
Okay, so this "Batch Delay" part of the modal is actually part of another command and modal. The idea is there is a command called "Target Folder for Open Graph Fetch" This opens a Modal where it confirms the current working directory, and counts the number of files with urls but no open graph data, lists those files by file name, and then allows the user to run the fetch in an iterative batch
Implementation
Implementation Details
- File Structuretypescript
src/ main.ts // Plugin entry point settings.ts // Settings management modal.ts // OpenGraph Fetch Modal services/ openGraph.ts // OpenGraph API integration screenshot.ts // Screenshot fetching types.ts // TypeScript interfaces utils.ts // Helper functions
- Key Componentsa. Settings Managementtypescript
class OpenGraphPluginSettings { apiKey: string; baseUrl: string; retries: number; backoffDelay: number; rateLimit: number; cacheDuration: number; }
b. OpenGraph Servicetypescriptclass OpenGraphService { private readonly apiKey: string; private readonly baseUrl: string; async fetchMetadata(url: string): Promise<OpenGraphData>; async fetchScreenshot(url: string): Promise<string | null>; }
c. Modal Implementationtypescriptclass OpenGraphFetchModal extends Modal { private settings: OpenGraphPluginSettings; private options: { overwriteExisting: boolean; createNew: boolean; writeErrors: boolean; updateFetchDate: boolean; }; async fetchOpenGraph(): Promise<void>; async fetchScreenshot(): Promise<void>; }
- Error Handling
- Implement proper error boundaries
- Handle API rate limits
- Provide user-friendly error messages
- Log errors without exposing sensitive information
- Performance Optimizations
- Implement caching for API responses
- Use debouncing for rapid fetch attempts
- Handle large files efficiently
- Implement progress indicators
Previous Script Implementation
Create a Node.js script (
runFetchOpenGraphData.cjs
) that processes Markdown files to fetch and update OpenGraph metadata and screenshots. This guide provides detailed specifications for implementing a robust, error-tolerant system.Use Meticulous-Constraints-for-Every-Prompt and Maintain-Consistent-Reporting-Templates for the Single Operation Process Report.
Model Responses:
json
{
"hybridGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"openGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": {
"url": "https://example.com/image.png"
},
"url": "https://example.com",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"htmlInferred": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"images": [
"https://example.com/image1.png",
"https://example.com/image2.png",
"https://example.com/image3.png",
"https://example.com/image4.png"
]
},
"requestInfo": {
"redirects": 1,
"host": "https://example.com",
"responseCode": 200,
"cache_ok": true,
"max_cache_age": 432000000,
"accept_lang": "en-US,en;q=0.9",
"url": "https://example.com",
"full_render": false,
"use_proxy": false,
"use_superior" : false,
"responseContentType": "text/html; charset=utf-8"
},
"accept_lang": "en-US,en;q=0.9",
"is_cache": false,
"url": "https://example.com"
}
Core Components
1. File System Structure
text
scripts/
build-scripts/
runFetchOpenGraphData.cjs # Main script
utils/
addReportNamingConventions.cjs # Report filename generation
addReportFrontmatterTemplate.cjs # Report frontmatter formatting
2. Environment Setup
javascript
// Required environment variables
OPEN_GRAPH_IO_API_KEY=your_api_key
// Configuration constants
const TARGET_DIR = process.env.TARGET_DIR || '../content/tooling/AI-Toolkit';
const REPORT_OUTPUT_DIR = 'src/content/data_site';
const REPORT_NAME = 'open-graph-fetch-report';
3. Core Functions
A. Frontmatter Management
- Use plain text parsing (NOT gray-matter) to handle frontmatter
- Extract content between
---
markers - Preserve exact line positioning for updates
- Handle both YAML and non-YAML frontmatter gracefully
javascript
function extractFrontmatter(content) {
// Returns: { frontmatter: Object, content: string }
// Preserves original formatting
}
function updateMarkdownFile(filePath, frontmatter, content) {
// Atomic write operation
// Maintains file permissions
}
B. OpenGraph Data Fetching
- Implement retry logic (3 attempts)
- Handle rate limits with exponential backoff
- Validate response data structure
- Strip quotes from values
javascript
async function fetchOpenGraphData(url, filePath) {
// Returns: Promise<{
// og_title: string,
// og_description: string,
// og_image: string,
// og_url: string,
// og_last_fetch: string
// } | null>
}
C. Screenshot Fetching
- Non-blocking parallel operations
- Track in-progress fetches
- Cache results to prevent duplicates
javascript
async function fetchScreenshotUrl(url, filePath) {
// Returns: Promise<string | null>
// string = screenshot URL
// null = fetch failed
}
4. Processing Logic
A. Skip Conditions
Skip OpenGraph fetch if ANY of these exist:
image
og_image
og_last_error
Skip Screenshot fetch if:
og_screenshot
exists
B. Error Handling
- Mark files with errors:yaml
og_error: "Error message" og_last_fetch: "2025-03-24T05:59:57.811Z"
- Categories of errors:
- API errors (rate limits, timeouts)
- Invalid responses
- Missing required properties
- Network failures
C. Statistics Tracking
javascript
const stats = {
filesProcessed: 0,
filesWithIssues: new Set(),
openGraph: {
skippedDueToYaml: 0,
properOpenGraphDataFound: 0,
newSuccesses: new Set(),
newErrors: new Set()
},
screenshots: {
newSuccesses: new Set(),
errors: new Set()
}
};
5. Report Generation
A. Report Structure
markdown
---
date: 2025-03-24
datetime: 2025-03-24T05:59:57.811Z
authors:
- Michael Staton
augmented_with: 'Windsurf on Claude 3.5 Sonnet'
category: Data-Augmentation
tags:
- Data-Augmentation
- OpenGraph
- Automation
- Content-Processing
---
## Summary of Files Processed
Files processed: <count>
Total Files with issues: <count>
Open Graph data fetches:
- Skipped bc YAML inconsistency: <count>
- Skipped bc prior Open Graph Data: <count>
- New Open Graph data: <count>
- New Screenshots: <count>
- New Errors: <count>
### Files with Issues that were skipped completely
[[path/to/file1]], [[path/to/file2]]
### Files that have new open graph data
[[path/to/file3]], [[path/to/file4]]
### Files that have a new screenshot
[[path/to/file5]], [[path/to/file6]]
### Files that OpenGraphIo returned an error for core og data:
[[path/to/file7]]
### Files that OpenGraphIo returned an error for screenshot:
[[path/to/file8]]
B. Report Naming Convention
Format:
YYYY-MM-DD_reportName_runIndex.md
Example: 2025-03-24_open-graph-fetch-report_07.md
6. Implementation Notes
- File Safety
- Use atomic write operations
- Verify file existence before operations
- Maintain proper file permissions
- Handle concurrent access gracefully
- Performance
- Process files in parallel
- Implement request throttling
- Cache API responses when possible
- Track memory usage for large directories
- Logging
- Use emoji indicators for visibility:
- ✅ Success
- ⚠️ Warning
- ❌ Error
- Include file names in all log messages
- Log both to console and report
- Dependencies
- Node.js built-ins: fs, path
- External: dotenv (for API key)
- Custom utils: addReportNamingConventions.cjs, addReportFrontmatterTemplate.cjs
This implementation provides a robust, maintainable solution for fetching and managing OpenGraph data across a collection of Markdown files.