Enhanced OpenGraph Service with Asynchronous Screenshot Fetching

Summary

Enhanced the OpenGraph service in the FileSystemObserver to asynchronously fetch screenshot URLs for all Markdown files with URLs, providing a fallback image source when no OpenGraph image is available.

Why Care

This improvement ensures that all content with URLs will eventually have screenshot previews, enhancing visual representation across the site without blocking the main processing flow. The implementation follows a non-blocking approach that allows the observer to continue processing files while screenshots are fetched in the background.

Implementation

Changes Made

  • Enhanced the OpenGraph service to check for the og_screenshot_url property and fetch it asynchronously if missing
  • Updated the following files:
    • /tidyverse/observers/services/openGraphService.ts: Added asynchronous screenshot URL fetching
    • /tidyverse/observers/package.json: Removed gray-matter dependency, using js-yaml instead
    • /tidyverse/observers/fileSystemObserver.ts: Integrated screenshot URL fetching

Technical Details

  • Implemented a non-blocking background process for fetching screenshot URLs:
typescript
// /tidyverse/observers/services/openGraphService.ts
function fetchScreenshotUrlInBackground(url: string, filePath: string): void {
  // Skip if we're already fetching this URL
  if (screenshotFetchInProgress.has(url)) {
    console.log(`Screenshot fetch already in progress for ${url}, skipping duplicate request`);
    return;
  }
  
  // Add to tracking set
  screenshotFetchInProgress.add(url);
  
  console.log(`Starting background screenshot fetch for ${url} (${filePath})`);
  
  // Don't await this promise - let it run in the background
  (async () => {
    try {
      const screenshotUrl = await fetchScreenshotUrl(url, filePath);
      
      if (screenshotUrl) {
        console.log(`✅ Received screenshot URL for ${url} in background process: ${screenshotUrl}`);
        
        // Read the file content
        const content = await fs.readFile(filePath, 'utf8');
        
        // Check if content has frontmatter
        if (!content.startsWith('---')) {
          console.log(`No frontmatter found in ${filePath}, cannot update screenshot URL`);
          return;
        }
        
        // Find the end of frontmatter
        const endIndex = content.indexOf('---', 3);
        if (endIndex === -1) {
          console.log(`Invalid frontmatter format in ${filePath}, cannot update screenshot URL`);
          return;
        }
        
        // Extract frontmatter content
        const frontmatterContent = content.substring(3, endIndex).trim();
        
        try {
          // Parse YAML frontmatter
          const frontmatter = yaml.load(frontmatterContent) as Record<string, any>;
          
          // Update frontmatter with screenshot URL
          frontmatter.og_screenshot_url = screenshotUrl;
          
          // Format the updated frontmatter
          let yamlContent = yaml.dump(frontmatter);
          
          // Insert updated frontmatter back into the file
          const newContent = `---\n${yamlContent}---\n\n${content.substring(endIndex + 3).trimStart()}`;
          await fs.writeFile(filePath, newContent, 'utf8');
          
          console.log(`Updated ${filePath} with screenshot URL in background process`);
        } catch (error) {
          console.error(`Error parsing frontmatter in ${filePath}:`, error);
        }
      } else {
        console.log(`⚠️ No screenshot URL found for ${url} in background process`);
      }
    } catch (error) {
      console.error(`Error in background screenshot fetch for ${url}:`, error);
    } finally {
      // Remove from tracking set when done
      screenshotFetchInProgress.delete(url);
    }
  })();
}
  • Used the correct OpenGraph.io API endpoint format for screenshots:
typescript
// /tidyverse/observers/services/openGraphService.ts
const screenshotApiUrl = `https://opengraph.io/api/1.1/screenshot/${encodeURIComponent(url)}?dimensions=lg&quality=80&accept_lang=en&use_proxy=true&app_id=${apiKey}`;
  • Removed dependency on gray-matter, using js-yaml directly for frontmatter parsing:
json
// /tidyverse/observers/package.json
"dependencies": {
  "chokidar": "^3.5.3",
  "dotenv": "^16.0.3",
  "fs-extra": "^11.1.1",
  "js-yaml": "^4.1.0",
  "minimatch": "^9.0.3",
  "node-fetch": "^2.6.9",
  "uuid": "^9.0.0"
}

Integration Points

  • The screenshot URL fetching integrates with the existing FileSystemObserver system
  • The implementation uses the same OpenGraph.io API key as the existing OpenGraph service
  • The screenshot URLs are stored in the frontmatter of Markdown files as og_screenshot_url
  • The ReportingService tracks successful and failed screenshot URL fetches

Documentation

  • The implementation follows the project's code style with comprehensive commenting
  • The OpenGraph service now checks for og_screenshot_url in frontmatter and fetches it if missing
  • The screenshot fetching happens asynchronously to avoid blocking the main process
  • The implementation uses a tracking set to prevent duplicate requests for the same URL