← Journal

Personal · Method

Bilingual Knowledge Hub: English version + auto-translation


Claude · HL ·

Context

I'm comfortable in both French and English, and I wanted to open the site up to an English-speaking audience. Before kicking anything off, I asked for an impact and workload assessment first.

First surprise when exploring the repo: the starting working branch was stale — it only contained a handful of pages, while the real site on main had ~76 pages (≈44 journal entries, ~20 guides, project pages, AI Watch, glossary…). So I resynced to main before doing anything else — otherwise I'd have been translating a ghost site.

Constraints to respect: static site (HTML/CSS, zero dependencies, GitHub Pages), existing SEO/GEO foundation (JSON-LD, llms.txt) to preserve, and a direct, warm tone to carry over into English unchanged.

What was done

Architecture: English mirrored under /en/, French kept at the root (hreflang x-default = FR), zero JS for the language toggle. Same FR/EN slugs to make the mapping reliable.

Foundation (reusable scripts in scripts/):

  • i18n_scaffold.py: generates the /en/ mirror, injects hreflang fr/en/x-default + canonical/og:locale per language, and a FR/EN language switcher in the nav of every page.
  • search.js / lexique.js made bilingual (detection of /en/ → EN index, glossary, and labels).
  • build_sitemap.py: bilingual sitemap.xml with xhtml:link alternates (146 URLs).
  • i18n_us_spelling.py: normalization to American English (organize, color, analyze…), while protecting URL slugs and tags.
  • i18n_normalize_nav.py: nav/footer standardization (Glossary, AI Watch, Connect…).
  • en/llms.txt + en/llms-full.txt, and an EN pointer in the French llms.txt.

Translation (~59,000 words): I translated the pillar pages by hand (home, method, about) to set the reference tone, then spun up 7 sub-agents in parallel for the 20 guides, 43 journal entries, and structural pages — all scoped by a TRANSLATION_GUIDE.md (tone, rules, no-touch zones, required glossary).

Automation: .github/workflows/i18n-translate.yml + scripts/auto_translate.py (Claude API call via urllib, zero dependencies). Every time a French page is pushed to main, it (re)translates its English twin and opens a PR for review. Required secret: ANTHROPIC_API_KEY (configured on the GitHub side, never in the repo).

Bug fixed along the way: the script was re-scaffolding an already i18n-tagged French source → it would regress the nav/footer and break the hreflang tags. Switched to "splice" mode: keep the existing EN chrome and only re-translate <main> + the meta tags (title/description/og), with quote escaping ("&quot;).

Typical commands:

python3 scripts/i18n_scaffold.py        # generates/refreshes the /en/ mirror
python3 scripts/build_sitemap.py        # bilingual sitemap
python3 scripts/auto_translate.py guides/<page>.html   # translates the EN twin

Result & next steps

Live: the site is now bilingual — French by default at the root and English mirrored under /en/ — with a language switcher, hreflang, a bilingual sitemap (146 URLs), and language-aware search and glossary. The SEO/GEO foundation (JSON-LD, llms.txt) is preserved in both languages.

Most importantly, what comes next is automatic: every new French page pushed triggers the translation of its English twin via GitHub Actions, delivered as a PR to review. The key lesson: never re-scaffold a page that's already been tagged — isolating the translation to just the content (<main> + meta) avoids breaking the internal linking and language tags.