Self-Updating Product Announcement Watcher
Self-Updating Product Announcement Watcher
1. Overview
1.1 Purpose
An automated system for monitoring, collecting, and cataloging product releases, feature announcements, and significant milestones for the 1,400+ tools tracked in our content repository. The system will eliminate manual monitoring and ensure the
lost-in-public/keeping-up/ content collection stays current without human intervention.1.2 Problem Statement
Currently, the
keeping-up directory contains manually created announcement files that are inconsistent, incomplete, and difficult to maintain:- Manual discovery of product announcements is time-consuming and error-prone
- Files lack standardized frontmatter and formatting
- No systematic way to track which tools have been updated
- Announcements are missed or discovered weeks/months after release
- Content team spends significant time on repetitive monitoring tasks
1.3 Scope
The system will:
- Monitor multiple announcement sources (GitHub, RSS feeds, changelogs, blogs, YouTube, Medium)
- Automatically detect new releases and announcements
- Generate standardized markdown files in
keeping-up/ - Link announcements to existing tool documentation
- Deduplicate and/or Aggregate announcements across multiple sources
- Enrich content with LLM-generated summaries
- Support 1,400+ tools with focus on 400+ AI-Toolkit items initially
1.4 Out of Scope (Phase 1)
- Manual announcement submission interface
- Social media monitoring (Twitter/X, LinkedIn)
- Community forum monitoring (Discord, Reddit)
- Breaking change analysis
- Automated migration guide generation
2. Architecture
2.1 System Components
text
┌─────────────────────────────────────────────┐
│ Watch Configuration Layer │
│ - Per-tool watch configurations │
│ - Source definitions and priorities │
│ - Monitoring schedules and rules │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Data Collection Services (Parallel) │
│ 1. GitHub Release API Watcher │
│ 2. RSS/Atom Feed Watcher │
│ 3. Changelog Page Scraper │
│ 4. OpenGraph Monitor (page change detect) │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Event Processor & Deduplicator │
│ - Normalizes announcement data │
│ - Deduplicates across sources │
│ - Enriches with metadata │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ LLM Enrichment Service │
│ - Generates announcement summaries │
│ - Extracts key features │
│ - Categorizes announcement types │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Content Generator │
│ - Creates keeping-up/*.md files │
│ - Links to tooling files │
│ - Embeds media (images, videos) │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Existing Filesystem Observer │
│ - Validates/enriches frontmatter │
│ - Applies keeping-up template │
│ - Ensures metadata consistency │
└─────────────────────────────────────────────┘ 2.2 Data Flow
- Configuration Load: System reads watch configurations for each tool
- Scheduled Polling: Services check sources on defined intervals (hourly, daily)
- Event Detection: New releases/announcements identified and collected
- Normalization: Raw data transformed into standardized format
- Deduplication: Cross-source duplicate detection and merging
- Enrichment: LLM adds summaries, categorization, and extracted metadata
- Content Generation: Markdown files created with proper frontmatter
- Observer Processing: Existing filesystem observer validates and enhances
- State Persistence: Last-seen state updated to prevent reprocessing
2.3 Deployment Model
Option A: Standalone Service (Recommended)
- Separate Node.js/TypeScript service
- Runs on schedule via cron or GitHub Actions
- Lightweight, single-purpose, easy to debug
- Writes directly to content repository
- State stored in simple JSON or SQLite
Option B: Integrated Observer Extension
- Adds
releaseWatcheralongsidefileSystemObserver - Shares infrastructure and utilities
- More complex but better integrated
Recommendation: Start with Option A for faster iteration, migrate to Option B once stable.
3. Technical Requirements
3.1 Watch Configuration Format
Each tool can define watch sources via sidecar YAML file:
yaml
# tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI.watch.yml
---
watch_enabled: true
tool_ref: "tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI.md"
priority: high # high, medium, low (affects polling frequency)
sources:
- type: github_releases
repo: traehq/trae
include_prereleases: false
- type: rss
url: https://www.trae.ai/blog/rss.xml
filter_keywords: [release, launch, announce]
- type: changelog_page
url: https://www.trae.ai/changelog
selector: ".release-item"
- type: blog
url: https://www.trae.ai/blog
jina_reader: true # Use Jina Reader for extraction
- type: product_hunt
url: https://www.producthunt.com/products/trae
notification_settings:
slack_channel: "#product-updates" # Optional
email: updates@example.com # Optional 3.2 Data Collection Services
3.2.1 GitHub Release Watcher
- Polling Frequency: Every 6 hours for high-priority, daily for others
- Rate Limits: 5,000 requests/hour (authenticated)
- Data Extracted:
- Release version
- Release name and description
- Published date
- Author information
- Asset URLs (binaries, images)
- Prerelease flag
3.2.2 RSS/Atom Feed Watcher
- Technology:
rss-parsernpm package - Polling Frequency: Every 12 hours
- Data Extracted:
- Title and description
- Publication date
- Link to full article
- Author
- Categories/tags
- Enclosures (images, videos)
3.2.3 Changelog Scraper
- Polling Frequency: Daily
- Change Detection: Hash-based comparison of content
- Data Extracted:
- Version numbers
- Release dates
- Change descriptions
- Categorized changes (features, fixes, breaking)
3.2.4 OpenGraph Monitor
- Technology: Existing OpenGraph.io integration
- Use Case: Detect blog post changes, new announcement pages
- Polling Frequency: Weekly
- Data Extracted:
- og, og, og
- Last-modified headers
- Content hash for change detection
3.3 State Management
Track what has been processed to avoid duplicates:
json
// .state/release-watcher-state.json
{
"tools": {
"trae-ai": {
"last_checked": "2025-11-15T10:30:00Z",
"sources": {
"github": {
"last_release_id": "v1.2.0",
"last_check": "2025-11-15T10:30:00Z"
},
"rss": {
"last_item_guid": "https://trae.ai/blog/solo-launch",
"last_check": "2025-11-15T10:30:00Z"
},
"changelog": {
"content_hash": "a7f3b2c9...",
"last_check": "2025-11-15T10:30:00Z"
}
},
"announcements_created": [
"lost-in-public/keeping-up/Trae Solo is released.md"
]
}
},
"metadata": {
"schema_version": "1.0.0",
"last_global_update": "2025-11-15T10:30:00Z"
}
} 3.4 Content Generation
3.4.1 Generated Markdown Format
markdown
---
date_created: 2025-11-15
date_modified: 2025-11-15
announcement_url: https://www.trae.ai/blog/product_solo_1112
tool_ref: "[[tooling/AI-Toolkit/Generative AI/Code Generators/Trae AI]]"
announcement_type: release # release | feature | milestone | partnership
version: 1.2.0 # if applicable
source: github_releases # which service detected this
auto_generated: true
reviewed: false # human review flag
tags: [AI-Toolkit, Product-Release, Code-Generators]
---
# Trae Launches SOLO Mode
Trae has announced the general availability of SOLO mode, a new way to interact with AI during development that rethinks how context works in AI-assisted coding.
## Key Features
- Context-aware code suggestions
- Improved multi-file understanding
- Enhanced debugging capabilities
## What's New
[LLM-generated summary from release notes/changelog]
## Resources
- [Official Announcement](https://www.trae.ai/blog/product_solo_1112)
- [Documentation](https://docs.trae.ai/solo)

---
*This announcement was automatically detected and generated. Last updated: 2025-11-15* 3.4.2 Filename Convention
text
{YYYY-MM-DD}_{Tool-Name}_{Announcement-Type}.md
Examples:
2025-11-15_Trae-AI_SOLO-Release.md
2025-10-21_Cursor_New-Features.md
2025-09-15_Claude_Sonnet-4-Launch.md 3.5 LLM Enrichment Service
Purpose: Generate human-readable summaries from raw release notes
Input: Raw announcement data (GitHub release body, RSS content, changelog text)
Output: Structured announcement content
Prompt Template:
text
Analyze this product announcement and generate a concise summary:
Product: {tool_name}
Source: {source_url}
Raw Content:
{raw_content}
Generate:
1. A compelling headline (max 100 chars)
2. A 2-3 sentence summary of the announcement
3. A bulleted list of 3-5 key features or changes
4. Categorize as: release | feature | milestone | partnership
5. Extract version number if present
Format as markdown. LLM Selection: Claude Haiku for cost/speed, Sonnet for complex releases
3.6 Deduplication Strategy
Problem: Same announcement appears from multiple sources (GitHub release + blog post + RSS feed)
Solution: Multi-factor matching
- URL Matching: Same announcement_url → exact duplicate
- Version Matching: Same version number + tool → likely duplicate
- Date Proximity: Published within 48 hours + similar title → potential duplicate
- Content Similarity: Embedding-based similarity > 0.85 → probable duplicate
Action on Duplicate:
- Keep the richest source (GitHub > Blog > RSS)
- Merge unique information from all sources
- Update frontmatter with all source URLs
- Mark duplicates in state file
4. Implementation Phases
Phase 1: GitHub Releases (Weeks 1-2)
Goal: Prove the pipeline with most reliable data source
Deliverables:
- GitHub Release watcher service
- Basic state management
- Markdown file generator
- Watch configurations for 50 high-priority AI tools
Success Metrics:
- Detects 100% of new GitHub releases within 6 hours
- Zero duplicate announcements created
- Generated files pass Filesystem Observer validation
- State file correctly tracks processed releases
Phase 2: RSS/Blog Monitoring (Weeks 3-4)
Goal: Catch announcements not published to GitHub
Deliverables:
- Link Filler
- RSS feed watcher
- Jina Reader integration for blog posts
- Deduplication logic (GitHub vs RSS)
- LLM enrichment service (basic summaries)
- Expand to 200 tool configurations
Success Metrics:
- Detects announcements 24-48 hours before manual discovery
- Deduplication catches 95%+ of cross-source duplicates
- LLM summaries are coherent and accurate
Phase 3: Changelog Scraping (Weeks 5-6)
Goal: Handle tools without RSS/GitHub
Deliverables:
- Playwright-based changelog scraper
- Change detection via content hashing
- Selector configuration per tool
- Fallback to Jina Reader for difficult sites
- Full coverage of 400 AI-Toolkit tools
Success Metrics:
- Successfully monitors 80%+ of configured changelogs
- Change detection has <5% false positives
- Scraper handles common anti-bot measures
Phase 4: Polish & Scale (Weeks 7-8)
Goal: Production-ready system for all 1,400+ tools
Deliverables:
- Advanced LLM enrichment (key features extraction)
- Automatic image/video embedding
- Error handling and retry logic
- Monitoring dashboard (optional)
- Watch configurations for all 1,400+ tools
- Documentation and runbooks
Success Metrics:
- System runs unattended for 2+ weeks without issues
- Content team reports 90%+ reduction in manual monitoring
- Generated files require minimal human review
5. Integration with Existing Systems
5.1 Filesystem Observer Integration
The generated announcement files will trigger the existing Filesystem Observer, which will:
- Validate frontmatter completeness
- Apply the
keeping-uptemplate - Ensure consistent metadata
- Link to related tools and content
Configuration Update Required:
typescript
// tidyverse/observers/userOptionsConfig.ts
export const keepingUpConfig = {
enabled: true,
template: 'templates/keeping-up.ts',
requiredFields: [
'date_created',
'date_modified',
'announcement_url',
'tool_ref',
'announcement_type',
'auto_generated'
],
optionalFields: [
'version',
'source',
'reviewed',
'tags'
],
services: [] // No external services needed
}; 5.2 Tooling Directory Bidirectional Links
When an announcement is created, update the corresponding tool file:
markdown
# tooling/AI-Toolkit/.../Trae AI.md
## Recent Announcements
- [[lost-in-public/keeping-up/2025-11-15_Trae-AI_SOLO-Release|Trae Launches SOLO Mode]] (2025-11-15)
- [[lost-in-public/keeping-up/2025-10-01_Trae-AI_New-Features|New Context Features]] (2025-10-01) Implementation: Observer can append to a designated section, or maintain a separate index file.
5.3 State File Location
Store state files in:
text
content/.state/release-watcher/
- global-state.json
- github-sources.json
- rss-sources.json
- changelog-sources.json Add to
.gitignore: text
.state/release-watcher/*.json Optional: Commit a
.state/release-watcher/schema.json for documentation.6. Configuration Management
6.1 Watch Configuration Discovery
Option A: Sidecar Files (Recommended)
- Place
{tool-name}.watch.ymlnext to{tool-name}.md - Easy to discover via filesystem scan
- Clear 1 relationship
Option B: Centralized Registry
- Single
watch-registry.ymlfile - Easier to manage globally
- Harder to keep in sync with 1,400+ tools
Recommendation: Start with Option A for Phase 1-2, consider Option B if management becomes unwieldy.
6.2 Auto-Configuration from Existing Metadata
Many tool files already have relevant metadata:
yaml
# Extract from existing frontmatter
url: https://www.trae.ai/
parent_org: "[[organizations/ByteDance|ByteDance]]" Auto-generation Logic:
- Scan
tooling/for all.mdfiles - Extract
urlfrom frontmatter - Check if URL is GitHub repo → create github_releases source
- Check for common RSS patterns (append
/rss.xml,/feed,/blog/rss) - Generate basic
.watch.ymlconfiguration - Human reviews and enables watch_enabled: true
7. Error Handling & Resilience
7.1 Failure Modes
- API Rate Limiting: GitHub, RSS feeds
- Solution: Exponential backoff, distributed polling, authenticated requests
- Website Structure Changes: Changelog selectors break
- Solution: Selector validation, fallback to Jina Reader, alert on failures
- Network Timeouts: Services unreachable
- Solution: Retry with timeout, skip and log, alert after 3 consecutive failures
- Malformed Data: Invalid RSS, unexpected JSON
- Solution: Schema validation, graceful degradation, detailed error logging
- Filesystem Observer Conflicts: Concurrent writes
- Solution: File locking, atomic writes, queue-based processing
7.2 Monitoring & Alerting
Metrics to Track:
- Announcements detected per day/week
- Success rate by source type
- Average processing time
- Deduplication hit rate
- LLM API costs and latency
- Error counts by type
Alerting Thresholds:
- Zero announcements detected for 7+ days (possible system failure)
- Error rate > 20% for any source type
- State file corruption
- Filesystem Observer validation failures > 10%
7.3 Logging Strategy
text
logs/release-watcher/
- 2025-11-15-detections.log # Announcements found
- 2025-11-15-errors.log # Errors and warnings
- 2025-11-15-duplicates.log # Deduplication events
- 2025-11-15-enrichment.log # LLM processing Log Format: Structured JSON for easy parsing
json
{
"timestamp": "2025-11-15T10:30:00Z",
"level": "info",
"service": "github-watcher",
"tool": "trae-ai",
"event": "release_detected",
"data": {
"version": "1.2.0",
"url": "https://github.com/traehq/trae/releases/tag/v1.2.0"
}
} 8. Security & Privacy
8.1 API Key Management
- Store in environment variables, never commit
- Use separate keys for dev/prod
- Rotate keys quarterly
- Rate limit protection
8.2 Data Privacy
- Only collect publicly available announcement data
- No personal information scraped
- Respect robots.txt
- Honor opt-out requests
8.3 Resource Limits
- Max 1,000 HTTP requests per hour per service
- Timeout requests after 30 seconds
- Max file size 5MB for scraped content
- LLM token limits: 10k input, 2k output
9. Performance Requirements
9.1 Processing Speed
- Detect GitHub releases within 6 hours of publication
- Process RSS feeds within 12 hours
- Generate markdown file within 5 minutes of detection
- LLM enrichment completes within 30 seconds
9.2 Scalability
- Support 1,400+ tool configurations
- Handle 50+ announcements per day
- Process 10,000+ HTTP requests per day
- Store 5+ years of announcement history
9.3 Resource Usage
- Max 512MB RAM during operation
- Max 1GB disk space for state files
- Max $50/month in LLM API costs
- Max $20/month in third-party API costs
10. Future Enhancements (Post-Phase 4)
10.1 Advanced Features
- Breaking change detection and impact analysis
- Automated migration guide generation
- Competitive analysis (compare releases across similar tools)
- Trend detection (feature adoption patterns)
- Social media monitoring (Twitter/X, LinkedIn)
- Community sentiment analysis
- Release prediction (based on historical patterns)
10.2 User Interface
- Web dashboard for monitoring watch configurations
- Manual announcement submission form
- Bulk configuration editor
- Analytics and reporting interface
- Review queue for auto-generated content
10.3 Integrations
- Slack notifications for important releases
- Email digests (weekly summary of announcements)
- Calendar events for major releases
- Integration with project management tools
- API for external consumption
11. Success Criteria
11.1 System Health
- 95%+ uptime over 30 days
- <5% error rate across all sources
- Zero data loss or corruption events
- All generated files pass Filesystem Observer validation
11.2 Content Quality
- 90%+ of announcements require no manual editing
- LLM summaries are accurate and coherent
- Zero duplicate announcements published
- Announcements link correctly to tool files
11.3 Team Impact
- Content team reports 90%+ reduction in monitoring time
- Announcements published 24-48 hours faster than manual process
- Content team adopts system for primary announcement workflow
- Product catalog completeness increases to 95%+
11.4 Cost Efficiency
- Total operating cost < $100/month
- Cost per announcement < $0.50
- System requires <2 hours/week of maintenance
- ROI positive within 3 months
12. Technical Stack Summary
12.1 Core Technologies
- Runtime: Node.js 18+ / TypeScript 5+
- Package Manager: npm or pnpm
- State Storage: JSON files (Phase 1-3), SQLite (Phase 4+)
- Scheduling: node-cron or GitHub Actions
12.2 Key Dependencies
- GitHub API: @octokit/rest
- RSS Parsing: rss-parser
- Web Scraping: playwright or puppeteer
- Content Extraction: Jina Reader API (existing)
- OpenGraph: OpenGraph.io (existing)
- LLM: Anthropic Claude API (existing)
- YAML: js-yaml
- Markdown: remark, gray-matter (existing)
12.3 Development Tools
- Testing: Jest or Vitest
- Linting: ESLint
- Formatting: Prettier
- CI/CD: GitHub Actions
- Logging: winston or pino
13. Repository Structure
text
tidyverse/watchers/release-watcher/
├── src/
│ ├── index.ts # Main orchestrator
│ ├── config/
│ │ ├── watchConfigLoader.ts # Load .watch.yml files
│ │ └── schemas.ts # Zod schemas for validation
│ ├── sources/
│ │ ├── github.ts # GitHub Release watcher
│ │ ├── rss.ts # RSS feed watcher
│ │ ├── changelog.ts # Changelog scraper
│ │ └── opengraph.ts # OpenGraph monitor
│ ├── processors/
│ │ ├── normalizer.ts # Unify data formats
│ │ ├── deduplicator.ts # Cross-source dedup
│ │ └── enricher.ts # LLM enrichment
│ ├── generators/
│ │ ├── markdown.ts # Generate .md files
│ │ └── templates.ts # Content templates
│ ├── state/
│ │ ├── stateManager.ts # Read/write state files
│ │ └── schemas.ts # State schemas
│ └── utils/
│ ├── logger.ts # Structured logging
│ ├── retry.ts # Retry logic
│ └── hash.ts # Content hashing
├── tests/
│ ├── sources/
│ ├── processors/
│ └── generators/
├── config/
│ └── default.yml # Global configuration
├── .env.example # API keys template
├── package.json
├── tsconfig.json
└── README.md 14. Documentation Requirements
14.1 User Documentation
- How to create watch configurations
- How to review auto-generated announcements
- How to opt tools in/out of monitoring
- Troubleshooting common issues
14.2 Developer Documentation
- Architecture overview and data flow
- How to add new source types
- How to customize LLM prompts
- How to extend deduplication logic
- Testing strategy and test data
14.3 Operational Documentation
- Deployment procedures
- Monitoring and alerting setup
- Backup and recovery procedures
- Cost optimization strategies
15. Migration Path
15.1 Existing Announcements
- Audit existing
keeping-up/files - Extract announcement_url and tool references
- Backfill state file to prevent re-detection
- Standardize frontmatter via Filesystem Observer
- Add
auto_generated: falseto manual announcements
15.2 Tooling Metadata Enhancement
- Scan all 1,400+ tool files
- Extract GitHub repos, RSS feeds from content
- Add to frontmatter if not present
- Generate initial watch configurations
- Prioritize AI-Toolkit (400 tools) for Phase 1
16. Appendix
16.1 Related Specifications
16.2 Reference Implementations
- GitHub Release monitoring: Dependabot, Renovate
- RSS aggregation: Feedly, NewsBlur
- Content scraping: Mercury Parser, Readability
16.3 Example Watch Configurations
See inline examples in Section 3.1 and throughout this document.
16.4 Glossary
- Announcement: Product release, feature launch, or significant milestone
- Watch Configuration: YAML file defining monitoring sources for a tool
- State File: JSON file tracking processed announcements
- Deduplication: Process of identifying and merging duplicate announcements
- Enrichment: Adding LLM-generated summaries and metadata
- Keeping-Up: Content collection for product announcements
This specification is a living document and will be updated as implementation progresses and new requirements emerge.