HomeGuides › Case study — AI watch bot

Case study · Build in public

AI watch bot:
from idea to production


An agent that fetches AI news, has the Claude API summarize it, and publishes to two places (Telegram + the site's "AI Watch" page) — with no server, orchestrated by GitHub Actions.

By Hugo Lahutte · · ~10 min read
One collection run, two outputs. The feed.json is the data contract between the two repos.

1. The spark / the initial need

I wanted to follow AI news (labs, models, tools, research) without spending time on it or drowning in hype. With my own angle: covering global AI and especially AI that's useful to merchants and business owners — the concrete stuff for a real business.

A real case had already pointed me in this direction: an LLM drafts a daily hi-fi press review for me from about thirty sources every morning. The AI watch bot is the generalization of that mechanic.

Two goals from the start:

2. The idea → the scope (decisions, rejected alternatives)

  • Two separate repos (veille-ia = engine, knowledge-hub = site) — to isolate secrets (API keys, bot token) from the public site. Rejected: putting everything in the site repo, secrets exposed.
  • Bot = data engine / Site = display, connected by a single feed.json — separation of concerns. Rejected: the site collecting and summarizing itself.
  • Reading feed.json at runtime (browser fetch), not at build time — a feed.json push is reflected without redeploying. Rejected: regenerating the site on every update.
  • Archive page ≠ Telegram — the archive is permanent, Telegram is ephemeral. Rejected: relying solely on Telegram history.
  • Renaming "Flux IA" → "Veille IA" + URL /veille-ia/ (redirect from /flux-ia/) — more accurate industry term, better SEO/GEO anchor. Rejected: "Flux IA", "Radar IA".
  • Tweet media: official X embed + automatic thumbnail+link fallback — X embeds are fragile, the user is never blocked. Rejected: embed only (often breaks).
  • Transformative EN summaries, never copy-pasting the tweet — hard copyright rule. Rejected: copying the tweet (not allowed).

3. The "vibe coding": how it came together, step by step

The project happened in two phases: a chat-based scoping session (the decisions above, formalized in handoff MDs), then the actual code build with Claude Code in the veille-ia repo.

The pipeline built (veille/):

  1. twitter.py — fetches recent tweets by account via twitterapi.io (+ advanced_search for backfill).
  2. Filtering — removes noise (retweets, replies, out-of-window).
  3. feed.py — Claude (role "neutral editor-in-chief, anti-hype") selects noteworthy topics and produces structured entries via an enforced JSON Schema. Factual fields (URL, date, media, author) are not delegated to Claude: they're read back from the tweet source by its index [n].
  4. digest.py — writes the Telegram digest.
  5. telegram.py — publishes (with splitting for messages > 4096 characters).
  6. main.py — orchestrates both outputs from the same collection run.
  7. backfill.py — generates the initial feed.json (May 2026), budget-capped.

The whole thing runs without a server: two GitHub Actions workflows (veille.yml daily at 05:00 UTC ≈ 7am Paris, and backfill.yml manual).

4. The stack + tools

  • twitterapi.io — tweet collection (Twitter having closed free access). Pay-per-use, a few cents/day. Key <TWITTERAPI_IO_KEY>.
  • RSS (feeds.txt) — complementary sources beyond X.
  • Claude API (Anthropic) — editorial selection + neutral summaries. Model claude-opus-4-8 (switchable to claude-sonnet-4-6 to reduce cost). Key <ANTHROPIC_API_KEY>.
  • Telegram Bot API — digest push. Token <TELEGRAM_BOT_TOKEN>, channel t.me/VeilleIA_HL.
  • GitHub Actions — the scheduler (1 run/day) that orchestrates collect → summarize → 2 outputs.
  • GitHub (veille-ia) — engine + secrets. GitHub Pages (knowledge-hub) — the site.
  • feed.json — the data contract between the two repos (cumulative). seen.json — the deduplication memory.
  • Cross-repo bridge — at the end of the job, veille-ia commits and pushes feed.json to knowledge-hub via a dedicated PAT <KH_REPO_TOKEN>.

5. The headaches (what broke and how we fixed it)

Faithfully reconstructed from the code + the repo's Git history.

  • The cross-repo bridge fighting itself. First instinct: rebase before pushing feed.json. Result → add/add conflicts. → Our feed.json being the source of truth, we overwrite the site version: git reset --hard origin/main then copy + push, with 3 resync attempts. (commits 7cf2157ec80ca4)
  • Telegram blocking everything. Bot not an admin of the channel (403 "bot is not a member") or a network hiccup → the whole delivery would fail, including feed.json. → Telegram sending made non-blocking (try/except) and optional: no Telegram configured → skip, but still produce the feed. (commits 500a8ce, e9717d1)
  • Secret name mismatch. The code expected TELEGRAM_CHAT_ID, the secret was named TELEGRAM_CHANNEL_ID. → Alias accepted from both sides. (commit e9717d1)
  • Invalid GitHub Actions workflow. A secrets.* inside a step if:forbidden by GitHub. → The "no token → skip" guard moved into the script ([ -z "$KH_REPO_TOKEN" ]). (commit ba581b3)
  • Unbalanced backfill. Without a quota, a few talkative accounts ate the entire budget. → Per-account quota (fair collection across ~45 accounts) + strict global cap on the total. (commit 00afa3f)

6. Time spent (real) and tokens / cost

Time. Scoping (2-repo architecture, feed.json contract, stack, safeguards) spread over several short conversations, formalized in two handoff MDs. Actual setup on May 31, 2026 (accounts + secrets), based on my screenshot timestamps: ~3:14–3:19 PM (twitterapi.io account + 1st secret), ~6:44–6:50 PM (Telegram bot via BotFather + 3 secrets), ~11:42–11:43 PM (4th secret + "no workflow yet" screen). Total span ~8h30 that evening — but these are clock gaps between screenshots, not actual working time. The real work end-to-end (creating the accounts, pasting the 4 secrets, hooking up the workflow) takes tens of minutes, not hours.

Cost (unlike the site, here you pay per use):

  • twitterapi.io: €100 prepaid "to see" — even though Claude advised me to start at zero. It's a comfort ceiling, not an expense: the real consumption for ~45 accounts is in the order of a few cents/day, so those €100 last a very long time.
  • Claude API: one daily digest = a handful of cents. claude-opus-4-8 by default; claude-sonnet-4-6 to reduce it further.
  • GitHub Actions: free within public repo limits.
  • Budget safeguards: 1 run/day, RSS/tweet caps, low prepaid balance, capped backfill (no loop on advanced_search, strict ceiling ~1500 tweets). See also How much does AI cost.

7. What this illustrates

  • A useful agent fits in ~300 lines and 0 servers. Collect → summarize via the Claude API → dual publish, orchestrated by a free GitHub Actions cron. No infrastructure to manage. (What's an agent? → the guide.)
  • Good architecture prevents headaches. Separating engine from display, isolating secrets, locking a data contract: most of the remaining bugs were integration details, not design flaws.
  • Keep humans (and the LLM) honest. Factual fields read back from the source; copyright respected (summary, never copy-paste); neutral, anti-hype editorial line, even when the news isn't flattering for Anthropic.

What's next

  • Document the Telegram / AI watch project (step-by-step guide "My AI watch agent").
  • Pinned welcome post on the Telegram channel.
  • Tag-filterable Guides page on the site.
  • LinkedIn post: announce the project and track it over time — two solid concrete use cases, with real time and cost numbers.

Let's talk

Building with AI?

I document in public how I ship this kind of project — no pitch, just the urge to talk about it. If you're doing the same and want to compare notes, reach out.