Self-Updating Product Announcement Watcher

1. Overview

1.1 Purpose

An automated system for monitoring, collecting, and cataloging product releases, feature announcements, and significant milestones for the 1,400+ tools tracked in our content repository. The system will eliminate manual monitoring and ensure the lost-in-public/keeping-up/ content collection stays current without human intervention.

1.2 Problem Statement

Currently, the keeping-up directory contains manually created announcement files that are inconsistent, incomplete, and difficult to maintain:

Manual discovery of product announcements is time-consuming and error-prone
Files lack standardized frontmatter and formatting
No systematic way to track which tools have been updated
Announcements are missed or discovered weeks/months after release
Content team spends significant time on repetitive monitoring tasks

1.3 Scope

The system will:

Monitor multiple announcement sources (GitHub, RSS feeds, changelogs, blogs, YouTube, Medium)
Automatically detect new releases and announcements
Generate standardized markdown files in keeping-up/
Link announcements to existing tool documentation
Deduplicate and/or Aggregate announcements across multiple sources
Enrich content with LLM-generated summaries
Support 1,400+ tools with focus on 400+ AI-Toolkit items initially

1.4 Out of Scope (Phase 1)

Manual announcement submission interface
Social media monitoring (Twitter/X, LinkedIn)
Community forum monitoring (Discord, Reddit)
Breaking change analysis
Automated migration guide generation

2. Architecture

2.1 System Components

text

┌─────────────────────────────────────────────┐
│  Watch Configuration Layer                  │
│  - Per-tool watch configurations            │
│  - Source definitions and priorities        │
│  - Monitoring schedules and rules           │
└─────────────────────────────────────────────┘
              ↓
┌─────────────────────────────────────────────┐
│  Data Collection Services (Parallel)        │
│  1. GitHub Release API Watcher              │
│  2. RSS/Atom Feed Watcher                   │
│  3. Changelog Page Scraper                  │
│  4. OpenGraph Monitor (page change detect)  │
└─────────────────────────────────────────────┘
              ↓
┌─────────────────────────────────────────────┐
│  Event Processor & Deduplicator             │
│  - Normalizes announcement data             │
│  - Deduplicates across sources              │
│  - Enriches with metadata                   │
└─────────────────────────────────────────────┘
              ↓
┌─────────────────────────────────────────────┐
│  LLM Enrichment Service                     │
│  - Generates announcement summaries         │
│  - Extracts key features                    │
│  - Categorizes announcement types           │
└─────────────────────────────────────────────┘
              ↓
┌─────────────────────────────────────────────┐
│  Content Generator                          │
│  - Creates keeping-up/*.md files            │
│  - Links to tooling files                   │
│  - Embeds media (images, videos)            │
└─────────────────────────────────────────────┘
              ↓
┌─────────────────────────────────────────────┐
│  Existing Filesystem Observer               │
│  - Validates/enriches frontmatter           │
│  - Applies keeping-up template              │
│  - Ensures metadata consistency             │
└─────────────────────────────────────────────┘

2.2 Data Flow

Configuration Load: System reads watch configurations for each tool
Scheduled Polling: Services check sources on defined intervals (hourly, daily)
Event Detection: New releases/announcements identified and collected
Normalization: Raw data transformed into standardized format
Deduplication: Cross-source duplicate detection and merging
Enrichment: LLM adds summaries, categorization, and extracted metadata
Content Generation: Markdown files created with proper frontmatter
Observer Processing: Existing filesystem observer validates and enhances
State Persistence: Last-seen state updated to prevent reprocessing

2.3 Deployment Model

Option A: Standalone Service (Recommended)

Separate Node.js/TypeScript service
Runs on schedule via cron or GitHub Actions
Lightweight, single-purpose, easy to debug
Writes directly to content repository
State stored in simple JSON or SQLite

Option B: Integrated Observer Extension

Adds releaseWatcher alongside fileSystemObserver
Shares infrastructure and utilities
More complex but better integrated

Recommendation: Start with Option A for faster iteration, migrate to Option B once stable.

3. Technical Requirements

3.1 Watch Configuration Format

Each tool can define watch sources via sidecar YAML file:

yaml

# tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI.watch.yml
---
watch_enabled: true
tool_ref: "tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI.md"
priority: high  # high, medium, low (affects polling frequency)

sources:
  - type: github_releases
    repo: traehq/trae
    include_prereleases: false

  - type: rss
    url: https://www.trae.ai/blog/rss.xml
    filter_keywords: [release, launch, announce]

  - type: changelog_page
    url: https://www.trae.ai/changelog
    selector: ".release-item"

  - type: blog
    url: https://www.trae.ai/blog
    jina_reader: true  # Use Jina Reader for extraction

  - type: product_hunt
    url: https://www.producthunt.com/products/trae

notification_settings:
  slack_channel: "#product-updates"  # Optional
  email: updates@example.com  # Optional

3.2 Data Collection Services

3.2.1 GitHub Release Watcher

Technology: Octokit REST API
Polling Frequency: Every 6 hours for high-priority, daily for others
Rate Limits: 5,000 requests/hour (authenticated)
Data Extracted:
- Release version
- Release name and description
- Published date
- Author information
- Asset URLs (binaries, images)
- Prerelease flag

3.2.2 RSS/Atom Feed Watcher

Technology: rss-parser npm package
Polling Frequency: Every 12 hours
Data Extracted:
- Title and description
- Publication date
- Link to full article
- Author
- Categories/tags
- Enclosures (images, videos)

3.2.3 Changelog Scraper

Technology: Playwright or Puppeteer
Alternative: Jina.ai Reader API (already in use)
Polling Frequency: Daily
Change Detection: Hash-based comparison of content
Data Extracted:
- Version numbers
- Release dates
- Change descriptions
- Categorized changes (features, fixes, breaking)

3.2.4 OpenGraph Monitor

Technology: Existing OpenGraph.io integration
Use Case: Detect blog post changes, new announcement pages
Polling Frequency: Weekly
Data Extracted:
- og, og, og
- Last-modified headers
- Content hash for change detection

3.3 State Management

Track what has been processed to avoid duplicates:

json

// .state/release-watcher-state.json
{
  "tools": {
    "trae-ai": {
      "last_checked": "2025-11-15T10:30:00Z",
      "sources": {
        "github": {
          "last_release_id": "v1.2.0",
          "last_check": "2025-11-15T10:30:00Z"
        },
        "rss": {
          "last_item_guid": "https://trae.ai/blog/solo-launch",
          "last_check": "2025-11-15T10:30:00Z"
        },
        "changelog": {
          "content_hash": "a7f3b2c9...",
          "last_check": "2025-11-15T10:30:00Z"
        }
      },
      "announcements_created": [
        "lost-in-public/keeping-up/Trae Solo is released.md"
      ]
    }
  },
  "metadata": {
    "schema_version": "1.0.0",
    "last_global_update": "2025-11-15T10:30:00Z"
  }
}

3.4 Content Generation

3.4.1 Generated Markdown Format

markdown

---
date_created: 2025-11-15
date_modified: 2025-11-15
announcement_url: https://www.trae.ai/blog/product_solo_1112
tool_ref: "[[tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI]]"
announcement_type: release  # release | feature | milestone | partnership
version: 1.2.0  # if applicable
source: github_releases  # which service detected this
auto_generated: true
reviewed: false  # human review flag
tags: [AI-Toolkit, Product-Release, Code-Generators]
---

# Trae Launches SOLO Mode

Trae has announced the general availability of SOLO mode, a new way to interact with AI during development that rethinks how context works in AI-assisted coding.

## Key Features
- Context-aware code suggestions
- Improved multi-file understanding
- Enhanced debugging capabilities

## What's New
[LLM-generated summary from release notes/changelog]

## Resources
- [Official Announcement](https://www.trae.ai/blog/product_solo_1112)
- [Documentation](https://docs.trae.ai/solo)

![Trae launches SOLO mode](https://ik.imagekit.io/xvpgfijuw/lossless-content-embeds/2025-11-15_Trae-Solo-Launch--Bigger.gif)

---
*This announcement was automatically detected and generated. Last updated: 2025-11-15*

3.4.2 Filename Convention

text

{YYYY-MM-DD}_{Tool-Name}_{Announcement-Type}.md

Examples:
2025-11-15_Trae-AI_SOLO-Release.md
2025-10-21_Cursor_New-Features.md
2025-09-15_Claude_Sonnet-4-Launch.md

3.5 LLM Enrichment Service

Purpose: Generate human-readable summaries from raw release notes

Input: Raw announcement data (GitHub release body, RSS content, changelog text)

Output: Structured announcement content

Prompt Template:

text

Analyze this product announcement and generate a concise summary:

Product: {tool_name}
Source: {source_url}
Raw Content:
{raw_content}

Generate:
1. A compelling headline (max 100 chars)
2. A 2-3 sentence summary of the announcement
3. A bulleted list of 3-5 key features or changes
4. Categorize as: release | feature | milestone | partnership
5. Extract version number if present

Format as markdown.

LLM Selection: Claude Haiku for cost/speed, Sonnet for complex releases

3.6 Deduplication Strategy

Problem: Same announcement appears from multiple sources (GitHub release + blog post + RSS feed)

Solution: Multi-factor matching

URL Matching: Same announcement_url → exact duplicate
Version Matching: Same version number + tool → likely duplicate
Date Proximity: Published within 48 hours + similar title → potential duplicate
Content Similarity: Embedding-based similarity > 0.85 → probable duplicate

Action on Duplicate:

Keep the richest source (GitHub > Blog > RSS)
Merge unique information from all sources
Update frontmatter with all source URLs
Mark duplicates in state file

4. Implementation Phases

Phase 1: GitHub Releases (Weeks 1-2)

Goal: Prove the pipeline with most reliable data source

Deliverables:

GitHub Release watcher service
Basic state management
Markdown file generator
Watch configurations for 50 high-priority AI tools

Success Metrics:

Detects 100% of new GitHub releases within 6 hours
Zero duplicate announcements created
Generated files pass Filesystem Observer validation
State file correctly tracks processed releases

Phase 2: RSS/Blog Monitoring (Weeks 3-4)

Goal: Catch announcements not published to GitHub

Deliverables:

Link Filler
RSS feed watcher
Jina Reader integration for blog posts
Deduplication logic (GitHub vs RSS)
LLM enrichment service (basic summaries)
Expand to 200 tool configurations

Success Metrics:

Detects announcements 24-48 hours before manual discovery
Deduplication catches 95%+ of cross-source duplicates
LLM summaries are coherent and accurate

Phase 3: Changelog Scraping (Weeks 5-6)

Goal: Handle tools without RSS/GitHub

Deliverables:

Playwright-based changelog scraper
Change detection via content hashing
Selector configuration per tool
Fallback to Jina Reader for difficult sites
Full coverage of 400 AI-Toolkit tools

Success Metrics:

Successfully monitors 80%+ of configured changelogs
Change detection has <5% false positives
Scraper handles common anti-bot measures

Phase 4: Polish & Scale (Weeks 7-8)

Goal: Production-ready system for all 1,400+ tools

Deliverables:

Advanced LLM enrichment (key features extraction)
Automatic image/video embedding
Error handling and retry logic
Monitoring dashboard (optional)
Watch configurations for all 1,400+ tools
Documentation and runbooks

Success Metrics:

System runs unattended for 2+ weeks without issues
Content team reports 90%+ reduction in manual monitoring
Generated files require minimal human review

5. Integration with Existing Systems

5.1 Filesystem Observer Integration

The generated announcement files will trigger the existing Filesystem Observer, which will:

Validate frontmatter completeness
Apply the keeping-up template
Ensure consistent metadata
Link to related tools and content

Configuration Update Required:

typescript

// tidyverse/observers/userOptionsConfig.ts
export const keepingUpConfig = {
  enabled: true,
  template: 'templates/keeping-up.ts',
  requiredFields: [
    'date_created',
    'date_modified',
    'announcement_url',
    'tool_ref',
    'announcement_type',
    'auto_generated'
  ],
  optionalFields: [
    'version',
    'source',
    'reviewed',
    'tags'
  ],
  services: [] // No external services needed
};

5.2 Tooling Directory Bidirectional Links

When an announcement is created, update the corresponding tool file:

markdown

# tooling/AI-Toolkit/.../Trae AI.md

## Recent Announcements
- [[lost-in-public/keeping-up/2025-11-15_Trae-AI_SOLO-Release|Trae Launches SOLO Mode]] (2025-11-15)
- [[lost-in-public/keeping-up/2025-10-01_Trae-AI_New-Features|New Context Features]] (2025-10-01)

Implementation: Observer can append to a designated section, or maintain a separate index file.

5.3 State File Location

Store state files in:

text

content/.state/release-watcher/
  - global-state.json
  - github-sources.json
  - rss-sources.json
  - changelog-sources.json

Add to .gitignore:

text

.state/release-watcher/*.json

Optional: Commit a .state/release-watcher/schema.json for documentation.

6. Configuration Management

6.1 Watch Configuration Discovery

Option A: Sidecar Files (Recommended)

Place {tool-name}.watch.yml next to {tool-name}.md
Easy to discover via filesystem scan
Clear 1 relationship

Option B: Centralized Registry

Single watch-registry.yml file
Easier to manage globally
Harder to keep in sync with 1,400+ tools

Recommendation: Start with Option A for Phase 1-2, consider Option B if management becomes unwieldy.

6.2 Auto-Configuration from Existing Metadata

Many tool files already have relevant metadata:

yaml

# Extract from existing frontmatter
url: https://www.trae.ai/
parent_org: "[[organizations/ByteDance|ByteDance]]"

Auto-generation Logic:

Scan tooling/ for all .md files
Extract url from frontmatter
Check if URL is GitHub repo → create github_releases source
Check for common RSS patterns (append /rss.xml, /feed, /blog/rss)
Generate basic .watch.yml configuration
Human reviews and enables watch_enabled: true

7. Error Handling & Resilience

7.1 Failure Modes

API Rate Limiting: GitHub, RSS feeds
- Solution: Exponential backoff, distributed polling, authenticated requests
Website Structure Changes: Changelog selectors break
- Solution: Selector validation, fallback to Jina Reader, alert on failures
Network Timeouts: Services unreachable
- Solution: Retry with timeout, skip and log, alert after 3 consecutive failures
Malformed Data: Invalid RSS, unexpected JSON
- Solution: Schema validation, graceful degradation, detailed error logging
Filesystem Observer Conflicts: Concurrent writes
- Solution: File locking, atomic writes, queue-based processing

7.2 Monitoring & Alerting

Metrics to Track:

Announcements detected per day/week
Success rate by source type
Average processing time
Deduplication hit rate
LLM API costs and latency
Error counts by type

Alerting Thresholds:

Zero announcements detected for 7+ days (possible system failure)
Error rate > 20% for any source type
State file corruption
Filesystem Observer validation failures > 10%

7.3 Logging Strategy

text

logs/release-watcher/
  - 2025-11-15-detections.log    # Announcements found
  - 2025-11-15-errors.log         # Errors and warnings
  - 2025-11-15-duplicates.log     # Deduplication events
  - 2025-11-15-enrichment.log     # LLM processing

Log Format: Structured JSON for easy parsing

json

{
  "timestamp": "2025-11-15T10:30:00Z",
  "level": "info",
  "service": "github-watcher",
  "tool": "trae-ai",
  "event": "release_detected",
  "data": {
    "version": "1.2.0",
    "url": "https://github.com/traehq/trae/releases/tag/v1.2.0"
  }
}

8. Security & Privacy

8.1 API Key Management

Store in environment variables, never commit
Use separate keys for dev/prod
Rotate keys quarterly
Rate limit protection

8.2 Data Privacy

Only collect publicly available announcement data
No personal information scraped
Respect robots.txt
Honor opt-out requests

8.3 Resource Limits

Max 1,000 HTTP requests per hour per service
Timeout requests after 30 seconds
Max file size 5MB for scraped content
LLM token limits: 10k input, 2k output

9. Performance Requirements

9.1 Processing Speed

Detect GitHub releases within 6 hours of publication
Process RSS feeds within 12 hours
Generate markdown file within 5 minutes of detection
LLM enrichment completes within 30 seconds

9.2 Scalability

Support 1,400+ tool configurations
Handle 50+ announcements per day
Process 10,000+ HTTP requests per day
Store 5+ years of announcement history

9.3 Resource Usage

Max 512MB RAM during operation
Max 1GB disk space for state files
Max $50/month in LLM API costs
Max $20/month in third-party API costs

10. Future Enhancements (Post-Phase 4)

10.1 Advanced Features

Breaking change detection and impact analysis
Automated migration guide generation
Competitive analysis (compare releases across similar tools)
Trend detection (feature adoption patterns)
Social media monitoring (Twitter/X, LinkedIn)
Community sentiment analysis
Release prediction (based on historical patterns)

10.2 User Interface

Web dashboard for monitoring watch configurations
Manual announcement submission form
Bulk configuration editor
Analytics and reporting interface
Review queue for auto-generated content

10.3 Integrations

Slack notifications for important releases
Email digests (weekly summary of announcements)
Calendar events for major releases
Integration with project management tools
API for external consumption

11. Success Criteria

11.1 System Health

95%+ uptime over 30 days
<5% error rate across all sources
Zero data loss or corruption events
All generated files pass Filesystem Observer validation

11.2 Content Quality

90%+ of announcements require no manual editing
LLM summaries are accurate and coherent
Zero duplicate announcements published
Announcements link correctly to tool files

11.3 Team Impact

Content team reports 90%+ reduction in monitoring time
Announcements published 24-48 hours faster than manual process
Content team adopts system for primary announcement workflow
Product catalog completeness increases to 95%+

11.4 Cost Efficiency

Total operating cost < $100/month
Cost per announcement < $0.50
System requires <2 hours/week of maintenance
ROI positive within 3 months

12. Technical Stack Summary

12.1 Core Technologies

Runtime: Node.js 18+ / TypeScript 5+
Package Manager: npm or pnpm
State Storage: JSON files (Phase 1-3), SQLite (Phase 4+)
Scheduling: node-cron or GitHub Actions

12.2 Key Dependencies

GitHub API: @octokit/rest
RSS Parsing: rss-parser
Web Scraping: playwright or puppeteer
Content Extraction: Jina Reader API (existing)
OpenGraph: OpenGraph.io (existing)
LLM: Anthropic Claude API (existing)
YAML: js-yaml
Markdown: remark, gray-matter (existing)

12.3 Development Tools

Testing: Jest or Vitest
Linting: ESLint
Formatting: Prettier
CI/CD: GitHub Actions
Logging: winston or pino

13. Repository Structure

text

tidyverse/watchers/release-watcher/
├── src/
│   ├── index.ts                    # Main orchestrator
│   ├── config/
│   │   ├── watchConfigLoader.ts    # Load .watch.yml files
│   │   └── schemas.ts              # Zod schemas for validation
│   ├── sources/
│   │   ├── github.ts               # GitHub Release watcher
│   │   ├── rss.ts                  # RSS feed watcher
│   │   ├── changelog.ts            # Changelog scraper
│   │   └── opengraph.ts            # OpenGraph monitor
│   ├── processors/
│   │   ├── normalizer.ts           # Unify data formats
│   │   ├── deduplicator.ts         # Cross-source dedup
│   │   └── enricher.ts             # LLM enrichment
│   ├── generators/
│   │   ├── markdown.ts             # Generate .md files
│   │   └── templates.ts            # Content templates
│   ├── state/
│   │   ├── stateManager.ts         # Read/write state files
│   │   └── schemas.ts              # State schemas
│   └── utils/
│       ├── logger.ts               # Structured logging
│       ├── retry.ts                # Retry logic
│       └── hash.ts                 # Content hashing
├── tests/
│   ├── sources/
│   ├── processors/
│   └── generators/
├── config/
│   └── default.yml                 # Global configuration
├── .env.example                    # API keys template
├── package.json
├── tsconfig.json
└── README.md

14. Documentation Requirements

14.1 User Documentation

How to create watch configurations
How to review auto-generated announcements
How to opt tools in/out of monitoring
Troubleshooting common issues

14.2 Developer Documentation

Architecture overview and data flow
How to add new source types
How to customize LLM prompts
How to extend deduplication logic
Testing strategy and test data

14.3 Operational Documentation

Deployment procedures
Monitoring and alerting setup
Backup and recovery procedures
Cost optimization strategies

15. Migration Path

15.1 Existing Announcements

Audit existing keeping-up/ files
Extract announcement_url and tool references
Backfill state file to prevent re-detection
Standardize frontmatter via Filesystem Observer
Add auto_generated: false to manual announcements

15.2 Tooling Metadata Enhancement

Scan all 1,400+ tool files
Extract GitHub repos, RSS feeds from content
Add to frontmatter if not present
Generate initial watch configurations
Prioritize AI-Toolkit (400 tools) for Phase 1

16. Appendix

16.2 Reference Implementations

GitHub Release monitoring: Dependabot, Renovate
RSS aggregation: Feedly, NewsBlur
Content scraping: Mercury Parser, Readability

16.3 Example Watch Configurations

See inline examples in Section 3.1 and throughout this document.

16.4 Glossary

Announcement: Product release, feature launch, or significant milestone
Watch Configuration: YAML file defining monitoring sources for a tool
State File: JSON file tracking processed announcements
Deduplication: Process of identifying and merging duplicate announcements
Enrichment: Adding LLM-generated summaries and metadata
Keeping-Up: Content collection for product announcements

This specification is a living document and will be updated as implementation progresses and new requirements emerge.