Self-Updating Product Announcement Watcher

Self-Updating Product Announcement Watcher

1. Overview

1.1 Purpose

An automated system for monitoring, collecting, and cataloging product releases, feature announcements, and significant milestones for the 1,400+ tools tracked in our content repository. The system will eliminate manual monitoring and ensure the lost-in-public/keeping-up/ content collection stays current without human intervention.

1.2 Problem Statement

Currently, the keeping-up directory contains manually created announcement files that are inconsistent, incomplete, and difficult to maintain:
  • Manual discovery of product announcements is time-consuming and error-prone
  • Files lack standardized frontmatter and formatting
  • No systematic way to track which tools have been updated
  • Announcements are missed or discovered weeks/months after release
  • Content team spends significant time on repetitive monitoring tasks

1.3 Scope

The system will:
  • Monitor multiple announcement sources (GitHub, RSS feeds, changelogs, blogs, YouTube, Medium)
  • Automatically detect new releases and announcements
  • Generate standardized markdown files in keeping-up/
  • Link announcements to existing tool documentation
  • Deduplicate and/or Aggregate announcements across multiple sources
  • Enrich content with LLM-generated summaries
  • Support 1,400+ tools with focus on 400+ AI-Toolkit items initially

1.4 Out of Scope (Phase 1)

  • Manual announcement submission interface
  • Social media monitoring (Twitter/X, LinkedIn)
  • Community forum monitoring (Discord, Reddit)
  • Breaking change analysis
  • Automated migration guide generation

2. Architecture

2.1 System Components

text
┌─────────────────────────────────────────────┐
│  Watch Configuration Layer                  │
│  - Per-tool watch configurations            │
│  - Source definitions and priorities        │
│  - Monitoring schedules and rules           │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Data Collection Services (Parallel)        │
│  1. GitHub Release API Watcher              │
│  2. RSS/Atom Feed Watcher                   │
│  3. Changelog Page Scraper                  │
│  4. OpenGraph Monitor (page change detect)  │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Event Processor & Deduplicator             │
│  - Normalizes announcement data             │
│  - Deduplicates across sources              │
│  - Enriches with metadata                   │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  LLM Enrichment Service                     │
│  - Generates announcement summaries         │
│  - Extracts key features                    │
│  - Categorizes announcement types           │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Content Generator                          │
│  - Creates keeping-up/*.md files            │
│  - Links to tooling files                   │
│  - Embeds media (images, videos)            │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Existing Filesystem Observer               │
│  - Validates/enriches frontmatter           │
│  - Applies keeping-up template              │
│  - Ensures metadata consistency             │
└─────────────────────────────────────────────┘

2.2 Data Flow

  1. Configuration Load: System reads watch configurations for each tool
  2. Scheduled Polling: Services check sources on defined intervals (hourly, daily)
  3. Event Detection: New releases/announcements identified and collected
  4. Normalization: Raw data transformed into standardized format
  5. Deduplication: Cross-source duplicate detection and merging
  6. Enrichment: LLM adds summaries, categorization, and extracted metadata
  7. Content Generation: Markdown files created with proper frontmatter
  8. Observer Processing: Existing filesystem observer validates and enhances
  9. State Persistence: Last-seen state updated to prevent reprocessing

2.3 Deployment Model

Option A: Standalone Service (Recommended)
  • Separate Node.js/TypeScript service
  • Runs on schedule via cron or GitHub Actions
  • Lightweight, single-purpose, easy to debug
  • Writes directly to content repository
  • State stored in simple JSON or SQLite
Option B: Integrated Observer Extension
  • Adds releaseWatcher alongside fileSystemObserver
  • Shares infrastructure and utilities
  • More complex but better integrated
Recommendation: Start with Option A for faster iteration, migrate to Option B once stable.

3. Technical Requirements

3.1 Watch Configuration Format

Each tool can define watch sources via sidecar YAML file:
yaml
# tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI.watch.yml
---
watch_enabled: true
tool_ref: "tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI.md"
priority: high  # high, medium, low (affects polling frequency)

sources:
  - type: github_releases
    repo: traehq/trae
    include_prereleases: false

  - type: rss
    url: https://www.trae.ai/blog/rss.xml
    filter_keywords: [release, launch, announce]

  - type: changelog_page
    url: https://www.trae.ai/changelog
    selector: ".release-item"

  - type: blog
    url: https://www.trae.ai/blog
    jina_reader: true  # Use Jina Reader for extraction

  - type: product_hunt
    url: https://www.producthunt.com/products/trae

notification_settings:
  slack_channel: "#product-updates"  # Optional
  email: updates@example.com  # Optional

3.2 Data Collection Services

3.2.1 GitHub Release Watcher

  • Technology: Octokit REST API
  • Polling Frequency: Every 6 hours for high-priority, daily for others
  • Rate Limits: 5,000 requests/hour (authenticated)
  • Data Extracted:
    • Release version
    • Release name and description
    • Published date
    • Author information
    • Asset URLs (binaries, images)
    • Prerelease flag

3.2.2 RSS/Atom Feed Watcher

  • Technology: rss-parser npm package
  • Polling Frequency: Every 12 hours
  • Data Extracted:
    • Title and description
    • Publication date
    • Link to full article
    • Author
    • Categories/tags
    • Enclosures (images, videos)

3.2.3 Changelog Scraper

  • Technology: Playwright or Puppeteer
  • Alternative: Jina.ai Reader API (already in use)
  • Polling Frequency: Daily
  • Change Detection: Hash-based comparison of content
  • Data Extracted:
    • Version numbers
    • Release dates
    • Change descriptions
    • Categorized changes (features, fixes, breaking)

3.2.4 OpenGraph Monitor

  • Technology: Existing OpenGraph.io integration
  • Use Case: Detect blog post changes, new announcement pages
  • Polling Frequency: Weekly
  • Data Extracted:
    • og, og, og
    • Last-modified headers
    • Content hash for change detection

3.3 State Management

Track what has been processed to avoid duplicates:
json
// .state/release-watcher-state.json
{
  "tools": {
    "trae-ai": {
      "last_checked": "2025-11-15T10:30:00Z",
      "sources": {
        "github": {
          "last_release_id": "v1.2.0",
          "last_check": "2025-11-15T10:30:00Z"
        },
        "rss": {
          "last_item_guid": "https://trae.ai/blog/solo-launch",
          "last_check": "2025-11-15T10:30:00Z"
        },
        "changelog": {
          "content_hash": "a7f3b2c9...",
          "last_check": "2025-11-15T10:30:00Z"
        }
      },
      "announcements_created": [
        "lost-in-public/keeping-up/Trae Solo is released.md"
      ]
    }
  },
  "metadata": {
    "schema_version": "1.0.0",
    "last_global_update": "2025-11-15T10:30:00Z"
  }
}

3.4 Content Generation

3.4.1 Generated Markdown Format

markdown
---
date_created: 2025-11-15
date_modified: 2025-11-15
announcement_url: https://www.trae.ai/blog/product_solo_1112
tool_ref: "[[tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI]]"
announcement_type: release  # release | feature | milestone | partnership
version: 1.2.0  # if applicable
source: github_releases  # which service detected this
auto_generated: true
reviewed: false  # human review flag
tags: [AI-Toolkit, Product-Release, Code-Generators]
---

# Trae Launches SOLO Mode

Trae has announced the general availability of SOLO mode, a new way to interact with AI during development that rethinks how context works in AI-assisted coding.

## Key Features
- Context-aware code suggestions
- Improved multi-file understanding
- Enhanced debugging capabilities

## What's New
[LLM-generated summary from release notes/changelog]

## Resources
- [Official Announcement](https://www.trae.ai/blog/product_solo_1112)
- [Documentation](https://docs.trae.ai/solo)

![Trae launches SOLO mode](https://ik.imagekit.io/xvpgfijuw/lossless-content-embeds/2025-11-15_Trae-Solo-Launch--Bigger.gif)

---
*This announcement was automatically detected and generated. Last updated: 2025-11-15*

3.4.2 Filename Convention

text
{YYYY-MM-DD}_{Tool-Name}_{Announcement-Type}.md

Examples:
2025-11-15_Trae-AI_SOLO-Release.md
2025-10-21_Cursor_New-Features.md
2025-09-15_Claude_Sonnet-4-Launch.md

3.5 LLM Enrichment Service

Purpose: Generate human-readable summaries from raw release notes
Input: Raw announcement data (GitHub release body, RSS content, changelog text)
Output: Structured announcement content
Prompt Template:
text
Analyze this product announcement and generate a concise summary:

Product: {tool_name}
Source: {source_url}
Raw Content:
{raw_content}

Generate:
1. A compelling headline (max 100 chars)
2. A 2-3 sentence summary of the announcement
3. A bulleted list of 3-5 key features or changes
4. Categorize as: release | feature | milestone | partnership
5. Extract version number if present

Format as markdown.
LLM Selection: Claude Haiku for cost/speed, Sonnet for complex releases

3.6 Deduplication Strategy

Problem: Same announcement appears from multiple sources (GitHub release + blog post + RSS feed)
Solution: Multi-factor matching
  1. URL Matching: Same announcement_url → exact duplicate
  2. Version Matching: Same version number + tool → likely duplicate
  3. Date Proximity: Published within 48 hours + similar title → potential duplicate
  4. Content Similarity: Embedding-based similarity > 0.85 → probable duplicate
Action on Duplicate:
  • Keep the richest source (GitHub > Blog > RSS)
  • Merge unique information from all sources
  • Update frontmatter with all source URLs
  • Mark duplicates in state file

4. Implementation Phases

Phase 1: GitHub Releases (Weeks 1-2)

Goal: Prove the pipeline with most reliable data source
Deliverables:
  1. GitHub Release watcher service
  2. Basic state management
  3. Markdown file generator
  4. Watch configurations for 50 high-priority AI tools
Success Metrics:
  • Detects 100% of new GitHub releases within 6 hours
  • Zero duplicate announcements created
  • Generated files pass Filesystem Observer validation
  • State file correctly tracks processed releases

Phase 2: RSS/Blog Monitoring (Weeks 3-4)

Goal: Catch announcements not published to GitHub
Deliverables:
  1. Link Filler
  2. RSS feed watcher
  3. Jina Reader integration for blog posts
  4. Deduplication logic (GitHub vs RSS)
  5. LLM enrichment service (basic summaries)
  6. Expand to 200 tool configurations
Success Metrics:
  • Detects announcements 24-48 hours before manual discovery
  • Deduplication catches 95%+ of cross-source duplicates
  • LLM summaries are coherent and accurate

Phase 3: Changelog Scraping (Weeks 5-6)

Goal: Handle tools without RSS/GitHub
Deliverables:
  1. Playwright-based changelog scraper
  2. Change detection via content hashing
  3. Selector configuration per tool
  4. Fallback to Jina Reader for difficult sites
  5. Full coverage of 400 AI-Toolkit tools
Success Metrics:
  • Successfully monitors 80%+ of configured changelogs
  • Change detection has <5% false positives
  • Scraper handles common anti-bot measures

Phase 4: Polish & Scale (Weeks 7-8)

Goal: Production-ready system for all 1,400+ tools
Deliverables:
  1. Advanced LLM enrichment (key features extraction)
  2. Automatic image/video embedding
  3. Error handling and retry logic
  4. Monitoring dashboard (optional)
  5. Watch configurations for all 1,400+ tools
  6. Documentation and runbooks
Success Metrics:
  • System runs unattended for 2+ weeks without issues
  • Content team reports 90%+ reduction in manual monitoring
  • Generated files require minimal human review

5. Integration with Existing Systems

5.1 Filesystem Observer Integration

The generated announcement files will trigger the existing Filesystem Observer, which will:
  1. Validate frontmatter completeness
  2. Apply the keeping-up template
  3. Ensure consistent metadata
  4. Link to related tools and content
Configuration Update Required:
typescript
// tidyverse/observers/userOptionsConfig.ts
export const keepingUpConfig = {
  enabled: true,
  template: 'templates/keeping-up.ts',
  requiredFields: [
    'date_created',
    'date_modified',
    'announcement_url',
    'tool_ref',
    'announcement_type',
    'auto_generated'
  ],
  optionalFields: [
    'version',
    'source',
    'reviewed',
    'tags'
  ],
  services: [] // No external services needed
};
When an announcement is created, update the corresponding tool file:
markdown
# tooling/AI-Toolkit/.../Trae AI.md

## Recent Announcements
- [[lost-in-public/keeping-up/2025-11-15_Trae-AI_SOLO-Release|Trae Launches SOLO Mode]] (2025-11-15)
- [[lost-in-public/keeping-up/2025-10-01_Trae-AI_New-Features|New Context Features]] (2025-10-01)
Implementation: Observer can append to a designated section, or maintain a separate index file.

5.3 State File Location

Store state files in:
text
content/.state/release-watcher/
  - global-state.json
  - github-sources.json
  - rss-sources.json
  - changelog-sources.json
Add to .gitignore:
text
.state/release-watcher/*.json
Optional: Commit a .state/release-watcher/schema.json for documentation.

6. Configuration Management

6.1 Watch Configuration Discovery

Option A: Sidecar Files (Recommended)
  • Place {tool-name}.watch.yml next to {tool-name}.md
  • Easy to discover via filesystem scan
  • Clear 1 relationship
Option B: Centralized Registry
  • Single watch-registry.yml file
  • Easier to manage globally
  • Harder to keep in sync with 1,400+ tools
Recommendation: Start with Option A for Phase 1-2, consider Option B if management becomes unwieldy.

6.2 Auto-Configuration from Existing Metadata

Many tool files already have relevant metadata:
yaml
# Extract from existing frontmatter
url: https://www.trae.ai/
parent_org: "[[organizations/ByteDance|ByteDance]]"
Auto-generation Logic:
  1. Scan tooling/ for all .md files
  2. Extract url from frontmatter
  3. Check if URL is GitHub repo → create github_releases source
  4. Check for common RSS patterns (append /rss.xml, /feed, /blog/rss)
  5. Generate basic .watch.yml configuration
  6. Human reviews and enables watch_enabled: true

7. Error Handling & Resilience

7.1 Failure Modes

  1. API Rate Limiting: GitHub, RSS feeds
    • Solution: Exponential backoff, distributed polling, authenticated requests
  2. Website Structure Changes: Changelog selectors break
    • Solution: Selector validation, fallback to Jina Reader, alert on failures
  3. Network Timeouts: Services unreachable
    • Solution: Retry with timeout, skip and log, alert after 3 consecutive failures
  4. Malformed Data: Invalid RSS, unexpected JSON
    • Solution: Schema validation, graceful degradation, detailed error logging
  5. Filesystem Observer Conflicts: Concurrent writes
    • Solution: File locking, atomic writes, queue-based processing

7.2 Monitoring & Alerting

Metrics to Track:
  • Announcements detected per day/week
  • Success rate by source type
  • Average processing time
  • Deduplication hit rate
  • LLM API costs and latency
  • Error counts by type
Alerting Thresholds:
  • Zero announcements detected for 7+ days (possible system failure)
  • Error rate > 20% for any source type
  • State file corruption
  • Filesystem Observer validation failures > 10%

7.3 Logging Strategy

text
logs/release-watcher/
  - 2025-11-15-detections.log    # Announcements found
  - 2025-11-15-errors.log         # Errors and warnings
  - 2025-11-15-duplicates.log     # Deduplication events
  - 2025-11-15-enrichment.log     # LLM processing
Log Format: Structured JSON for easy parsing
json
{
  "timestamp": "2025-11-15T10:30:00Z",
  "level": "info",
  "service": "github-watcher",
  "tool": "trae-ai",
  "event": "release_detected",
  "data": {
    "version": "1.2.0",
    "url": "https://github.com/traehq/trae/releases/tag/v1.2.0"
  }
}

8. Security & Privacy

8.1 API Key Management

  • Store in environment variables, never commit
  • Use separate keys for dev/prod
  • Rotate keys quarterly
  • Rate limit protection

8.2 Data Privacy

  • Only collect publicly available announcement data
  • No personal information scraped
  • Respect robots.txt
  • Honor opt-out requests

8.3 Resource Limits

  • Max 1,000 HTTP requests per hour per service
  • Timeout requests after 30 seconds
  • Max file size 5MB for scraped content
  • LLM token limits: 10k input, 2k output

9. Performance Requirements

9.1 Processing Speed

  • Detect GitHub releases within 6 hours of publication
  • Process RSS feeds within 12 hours
  • Generate markdown file within 5 minutes of detection
  • LLM enrichment completes within 30 seconds

9.2 Scalability

  • Support 1,400+ tool configurations
  • Handle 50+ announcements per day
  • Process 10,000+ HTTP requests per day
  • Store 5+ years of announcement history

9.3 Resource Usage

  • Max 512MB RAM during operation
  • Max 1GB disk space for state files
  • Max $50/month in LLM API costs
  • Max $20/month in third-party API costs

10. Future Enhancements (Post-Phase 4)

10.1 Advanced Features

  • Breaking change detection and impact analysis
  • Automated migration guide generation
  • Competitive analysis (compare releases across similar tools)
  • Trend detection (feature adoption patterns)
  • Social media monitoring (Twitter/X, LinkedIn)
  • Community sentiment analysis
  • Release prediction (based on historical patterns)

10.2 User Interface

  • Web dashboard for monitoring watch configurations
  • Manual announcement submission form
  • Bulk configuration editor
  • Analytics and reporting interface
  • Review queue for auto-generated content

10.3 Integrations

  • Slack notifications for important releases
  • Email digests (weekly summary of announcements)
  • Calendar events for major releases
  • Integration with project management tools
  • API for external consumption

11. Success Criteria

11.1 System Health

  • 95%+ uptime over 30 days
  • <5% error rate across all sources
  • Zero data loss or corruption events
  • All generated files pass Filesystem Observer validation

11.2 Content Quality

  • 90%+ of announcements require no manual editing
  • LLM summaries are accurate and coherent
  • Zero duplicate announcements published
  • Announcements link correctly to tool files

11.3 Team Impact

  • Content team reports 90%+ reduction in monitoring time
  • Announcements published 24-48 hours faster than manual process
  • Content team adopts system for primary announcement workflow
  • Product catalog completeness increases to 95%+

11.4 Cost Efficiency

  • Total operating cost < $100/month
  • Cost per announcement < $0.50
  • System requires <2 hours/week of maintenance
  • ROI positive within 3 months

12. Technical Stack Summary

12.1 Core Technologies

  • Runtime: Node.js 18+ / TypeScript 5+
  • Package Manager: npm or pnpm
  • State Storage: JSON files (Phase 1-3), SQLite (Phase 4+)
  • Scheduling: node-cron or GitHub Actions

12.2 Key Dependencies

  • GitHub API: @octokit/rest
  • RSS Parsing: rss-parser
  • Web Scraping: playwright or puppeteer
  • Content Extraction: Jina Reader API (existing)
  • OpenGraph: OpenGraph.io (existing)
  • LLM: Anthropic Claude API (existing)
  • YAML: js-yaml
  • Markdown: remark, gray-matter (existing)

12.3 Development Tools

  • Testing: Jest or Vitest
  • Linting: ESLint
  • Formatting: Prettier
  • CI/CD: GitHub Actions
  • Logging: winston or pino

13. Repository Structure

text
tidyverse/watchers/release-watcher/
├── src/
│   ├── index.ts                    # Main orchestrator
│   ├── config/
│   │   ├── watchConfigLoader.ts    # Load .watch.yml files
│   │   └── schemas.ts              # Zod schemas for validation
│   ├── sources/
│   │   ├── github.ts               # GitHub Release watcher
│   │   ├── rss.ts                  # RSS feed watcher
│   │   ├── changelog.ts            # Changelog scraper
│   │   └── opengraph.ts            # OpenGraph monitor
│   ├── processors/
│   │   ├── normalizer.ts           # Unify data formats
│   │   ├── deduplicator.ts         # Cross-source dedup
│   │   └── enricher.ts             # LLM enrichment
│   ├── generators/
│   │   ├── markdown.ts             # Generate .md files
│   │   └── templates.ts            # Content templates
│   ├── state/
│   │   ├── stateManager.ts         # Read/write state files
│   │   └── schemas.ts              # State schemas
│   └── utils/
│       ├── logger.ts               # Structured logging
│       ├── retry.ts                # Retry logic
│       └── hash.ts                 # Content hashing
├── tests/
│   ├── sources/
│   ├── processors/
│   └── generators/
├── config/
│   └── default.yml                 # Global configuration
├── .env.example                    # API keys template
├── package.json
├── tsconfig.json
└── README.md

14. Documentation Requirements

14.1 User Documentation

  • How to create watch configurations
  • How to review auto-generated announcements
  • How to opt tools in/out of monitoring
  • Troubleshooting common issues

14.2 Developer Documentation

  • Architecture overview and data flow
  • How to add new source types
  • How to customize LLM prompts
  • How to extend deduplication logic
  • Testing strategy and test data

14.3 Operational Documentation

  • Deployment procedures
  • Monitoring and alerting setup
  • Backup and recovery procedures
  • Cost optimization strategies

15. Migration Path

15.1 Existing Announcements

  • Audit existing keeping-up/ files
  • Extract announcement_url and tool references
  • Backfill state file to prevent re-detection
  • Standardize frontmatter via Filesystem Observer
  • Add auto_generated: false to manual announcements

15.2 Tooling Metadata Enhancement

  • Scan all 1,400+ tool files
  • Extract GitHub repos, RSS feeds from content
  • Add to frontmatter if not present
  • Generate initial watch configurations
  • Prioritize AI-Toolkit (400 tools) for Phase 1

16. Appendix

16.2 Reference Implementations

  • GitHub Release monitoring: Dependabot, Renovate
  • RSS aggregation: Feedly, NewsBlur
  • Content scraping: Mercury Parser, Readability

16.3 Example Watch Configurations

See inline examples in Section 3.1 and throughout this document.

16.4 Glossary

  • Announcement: Product release, feature launch, or significant milestone
  • Watch Configuration: YAML file defining monitoring sources for a tool
  • State File: JSON file tracking processed announcements
  • Deduplication: Process of identifying and merging duplicate announcements
  • Enrichment: Adding LLM-generated summaries and metadata
  • Keeping-Up: Content collection for product announcements

This specification is a living document and will be updated as implementation progresses and new requirements emerge.