Pull YAML properties from a diverged content collection, and merge them.
Objective: Create a script that reads a list of files (missing URLs), finds corresponding files by filename in a source directory (
temp_old_repo/content/tooling), extracts the url: property from the source file's frontmatter (if present, without using a YAML library), and inserts this url: line into the frontmatter of the target file (content/tooling).New Script:
tidyverse/tidy-up/tidy-one-property/import-url-from-old-repo.mjsDetailed Steps:
- Setup & Paths:
- Use Node.js
fs/promisesandpath. - Import necessary constants (
MONOREPO_ROOT,CONTENT_ROOT,REPORTS_DIR) fromutils/constants.cjs. - Import reporting utilities (
formatRelativePath,writeReport) fromutils/reportUtils.cjs. - Define key paths:
INPUT_REPORT_PATH:path.join(REPORTS_DIR, '2025-04-15_missing-url-report.md')TARGET_BASE_DIR:CONTENT_ROOT(Base directory for files listed in the report, paths are relative to this)SOURCE_DIR:path.resolve(MONOREPO_ROOT, '../temp_old_repo/content/tooling')(Assumption:temp_old_repois one level abovelossless-monorepo. Needs confirmation.)
- Read Input Report:
- Read the
INPUT_REPORT_PATHfile content. - Parse the content to extract a list of file paths relative to
TARGET_BASE_DIR(e.g.,tooling/AI-Toolkit/Models/Dolphin.md). Handle potential formats (e.g.,#### [[path|name]]or plain paths).
- Build Source File Index:
- Recursively scan the
SOURCE_DIRfor all.mdfiles. - Create a JavaScript
Mapwhere the key is the filename (e.g.,Dolphin.md) and the value is the absolute path to that file withinSOURCE_DIR. This allows fast lookups by filename only.
- Define Helper Functions (Manual Frontmatter Handling):
async function extractUrlLine(filePath):- Reads the file content at
filePath. - Manually scans lines between the first and second
---delimiters. - If a line starting with
url:(case-sensitive, ignoring leading whitespace) is found, returns that full line (e.g.,url: https://example.com). - Returns
nullif nourl:line is found within valid frontmatter delimiters or if read error occurs.
function insertUrlIntoFrontmatter(targetContent, urlLine):- Finds the indices of the first and second
---delimiters intargetContent. - Extracts the frontmatter section. Checks if
url:already exists (using regex^\s*url:). - If
url:exists -> Log skip, returnnull(indicating no change needed/skip). - If delimiters invalid -> Log error, return
null. - Constructs the new content string by inserting
urlLineimmediately before the second---delimiter. - Returns the modified
targetContent.
- Process Files:
- Initialize tracking variables (files processed, matches found, URLs found, URLs inserted, errors, lists for reporting, etc.).
- Iterate through the list of relative target paths from the input report.
- For each
relativeTargetPath:- Get the
targetFilename = path.basename(relativeTargetPath). - Construct
absoluteTargetPath = path.join(TARGET_BASE_DIR, relativeTargetPath). - Look up
targetFilenamein the source file index map. - If no match found in source -> Log skip, increment counter, add to skip list, continue.
- If match found (
absoluteSourcePath):- Call
urlLine = await extractUrlLine(absoluteSourcePath). - If
urlLineisnull-> Log URL not found/read error, increment counter, add to skip list, continue. - If
urlLineis found:- Try reading the target file:
targetContent = await fs.readFile(absoluteTargetPath, 'utf8'). Handle read errors (log, increment error count, add to error list, continue). - Try inserting the URL:
newContent = insertUrlIntoFrontmatter(targetContent, urlLine). - If
newContentisnull(URL existed or malformed frontmatter) -> Increment skip counter, add to skip list, continue. - If
newContentis different:- Try writing the
newContentback toabsoluteTargetPath:await fs.writeFile(absoluteTargetPath, newContent, 'utf8'). Handle write errors (log, increment error count, add to error list, continue). - If write successful -> Log success, increment success counter, add to updated list.
- Generate Final Report:
- Create a detailed Markdown report string summarizing the entire operation:
- Input report path used.
- Source directory scanned.
- Target base directory.
- Total files listed in the input report.
- Number of target files processed.
- Number of files where a matching source filename was found.
- Number of source files where a
url:property was found. - Number of target files successfully updated with a
url:. - List of files updated (using
formatRelativePath). - List of files skipped because
url:already existed/malformed frontmatter. - List of files skipped because no matching source filename was found.
- List of files skipped because URL was not found in the source file.
- List of files that caused read/write errors.
- Use
await writeReport(reportString, 'import-url-from-old-repo')to save the report.
javascript
import fs from 'fs/promises';
import path from 'path';
import { fileURLToPath } from 'url';
// Derive __dirname for ES module
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
// --- Configuration ---
const MONOREPO_ROOT = path.resolve(__dirname, '../../'); // Adjust based on script location
const CURRENT_TOOLING_DIR = path.join(MONOREPO_ROOT, 'content', 'tooling');
const OLD_TOOLING_DIR = path.join(MONOREPO_ROOT, 'temp_old_repo', 'tooling'); // Path to the cloned old repo's tooling dir
// List of relative file paths missing 'url:' (from previous `find` command output)
// NOTE: Manually copy the list from the previous step's output here.
// Ensure paths are relative to CURRENT_TOOLING_DIR (e.g., './Hardware/CM5.md')
// --- Helper Functions ---
/**
* Extracts the URL from the frontmatter of a given file content string.
* Manually searches for 'url:' line without parsing YAML.
* @param {string} fileContent - The full content of the file.
* @returns {string|null} - The extracted URL or null if not found.
*/
function extractUrlManually(fileContent) {
const lines = fileContent.split('\n');
let inFrontmatter = false;
for (const line of lines) {
if (line.trim() === '---') {
// Toggle frontmatter flag, but stop if we hit the second '---'
if (inFrontmatter) {
return null; // Reached end of frontmatter without finding url
}
inFrontmatter = true;
continue;
}
if (inFrontmatter) {
const trimmedLine = line.trim();
// Look for 'url:' specifically (case-sensitive, at start of line in frontmatter)
if (trimmedLine.startsWith('url:')) {
// Extract value after 'url:'
return trimmedLine.substring(4).trim();
}
}
}
return null; // No url found or no frontmatter
}
/**
* Inserts the urlLine into the frontmatter of the target file content.
* Inserts just before the closing '---'.
* @param {string} targetContent - The content of the file to modify.
* @param {string} urlLine - The full 'url: <value>' line to insert.
* @returns {string|null} - The modified content or null if frontmatter markers aren't found.
*/
function insertUrlManually(targetContent, urlLine) {
const lines = targetContent.split('\n');
let firstMarkerIndex = -1;
let secondMarkerIndex = -1;
// Find frontmatter delimiters
for (let i = 0; i < lines.length; i++) {
if (lines[i].trim() === '---') {
if (firstMarkerIndex === -1) {
firstMarkerIndex = i;
} else {
secondMarkerIndex = i;
break;
}
}
}
// Ensure frontmatter block exists
if (firstMarkerIndex === -1 || secondMarkerIndex === -1) {
console.warn("Could not find frontmatter delimiters (---).");
return null;
}
// Insert the urlLine just before the closing delimiter
lines.splice(secondMarkerIndex, 0, urlLine);
return lines.join('\n');
}
// --- Main Processing Logic ---
async function processFiles() {
console.log(`Starting URL restoration process...`);
let processedCount = 0;
let addedCount = 0;
let notFoundInOldRepoCount = 0;
let urlMissingInOldFileCount = 0;
let writeErrorCount = 0;
let frontmatterErrorCount = 0;
for (const relativePath of filesMissingUrl) {
processedCount++;
const currentFilePath = path.join(CURRENT_TOOLING_DIR, relativePath);
const oldFilePath = path.join(OLD_TOOLING_DIR, relativePath);
try {
// 1. Check if old file exists
await fs.access(oldFilePath); // Throws error if doesn't exist
// 2. Read old file and extract URL
const oldContent = await fs.readFile(oldFilePath, 'utf-8');
const extractedUrlValue = extractUrlManually(oldContent);
if (!extractedUrlValue) {
console.warn(`[WARN] No 'url:' found in old file: ${relativePath}`);
urlMissingInOldFileCount++;
continue; // Skip to next file
}
// Construct the full url line
const urlLineToInsert = `url: ${extractedUrlValue}`;
// 3. Read current file
const currentContent = await fs.readFile(currentFilePath, 'utf-8');
// Check if URL already exists somehow (safety check)
if (currentContent.includes('\nurl:')) {
console.log(`[SKIP] 'url:' already exists in current file: ${relativePath}`);
continue;
}
// 4. Insert URL into current file content
const updatedContent = insertUrlManually(currentContent, urlLineToInsert);
if (!updatedContent) {
console.error(`[ERROR] Failed to insert URL due to missing frontmatter markers in: ${relativePath}`);
frontmatterErrorCount++;
continue; // Skip to next file
}
// 5. Write updated content back to current file
await fs.writeFile(currentFilePath, updatedContent, 'utf-8');
console.log(`[SUCCESS] Added URL to: ${relativePath}`);
addedCount++;
} catch (error) {
if (error.code === 'ENOENT') {
console.warn(`[WARN] Old file not found: ${relativePath}`);
notFoundInOldRepoCount++;
} else {
console.error(`[ERROR] Processing ${relativePath}: ${error.message}`);
writeErrorCount++; // Assume other errors are write related for now
}
}
}
console.log(`\n--- Processing Summary ---`);
console.log(`Total files checked: ${processedCount}`);
console.log(`URLs successfully added: ${addedCount}`);
console.log(`Files not found in old repo: ${notFoundInOldRepoCount}`);
console.log(`'url:' missing in old file: ${urlMissingInOldFileCount}`);
console.log(`Errors finding frontmatter: ${frontmatterErrorCount}`);
console.log(`Other errors (read/write): ${writeErrorCount}`);
console.log(`--------------------------`);
}
processFiles();