Skip to content
AI Primer
release

GLOSSOPETRAE releases Lingua Ex Machina with 250 covert channels and 0% monitor recovery

The project ships a paper, repo, and UI for generated languages, alien code, and tokenizer blind-spot testing across model pairs. Use it to probe cross-vendor monitoring, since some monitor models delete the hidden bytes they are meant to inspect.

4 min read
GLOSSOPETRAE releases Lingua Ex Machina with 250 covert channels and 0% monitor recovery
GLOSSOPETRAE releases Lingua Ex Machina with 250 covert channels and 0% monitor recovery

TL;DR

  • elder_plinius's launch thread framed Lingua Ex Machina as a three-part research release: zero-shot acquisition of generated languages, human-unreadable code that models still execute, and covert channels hidden in tokenizer blind spots.
  • In elder_plinius's module overview, the project claimed its L3 glyph-swap code pushed human legibility down to 14.9% while Opus 4.8 and GPT-5.5 still scored 93 to 100% on hard programs, with zero-shot alien code beating readable code by +36 points for Opus and +10 for GPT-5.5.
  • According to elder_plinius's tokenizer map, Anthropic strips 10 invisible Unicode categories, OpenAI strips 2, and Gemini, Llama, Mistral, DeepSeek, and Grok strip zero, creating monitor-receiver asymmetries that produced 100% receiver recovery and 0% monitor recovery in one tag-character test.
  • elder_plinius's monitoring follow-up said the cross-vendor setup many teams use for defense produced about 250 exploitable channel instances across 49 of 66 model pairs, while a two-line regex killed tokenizer channels but not semantic steganography.

You can browse the launch thread, dig into the module inventory, and inspect the monitoring results. The weirdest part is that elder_plinius's UI overview says every number in the paper traces back to 78 raw JSON files in the repo, so this shipped more like a reproducible harness than a vibesy demo.

GLOSSOPETRAE

GLOSSOPETRAE started as a deterministic language generator, then turned into a broader testbed for whether frontier models can use languages that never existed before. In elder_plinius's launch thread, the project says one seed generates phonology, morphology, syntax, a writing system, a lexicon, and grammar docs, and that models can immediately read, write, translate, and code in those outputs.

elder_plinius's UI overview makes the scope concrete. The release ships:

  • language generation from a seed
  • real-time translation
  • procedural SVG glyph rendering
  • spoken audio synthesis
  • century-scale language drift and daughter-language derivation
  • a Code Lab for L0 to L3 JavaScript skinning
  • "skillstones" that teach an agent the generated language
  • research tabs for tokenizer blind spots, covert-channel demos, a detection sandbox, and a browser for raw result JSONs

Alien code

At the project's L3 setting, elder_plinius's module overview says every keyword, identifier, and symbol in a JavaScript program gets replaced with generated glyphs. The same post reports 14.9% human legibility and 17 to 40% blind comprehension without the spec.

The punchline is that model performance moved the other way. According to elder_plinius's module overview, Opus 4.8 and GPT-5.5 scored 93 to 100% on hard alien-glyph programs, and in a clean zero-shot setup the alien version outperformed readable code by +36 points for Opus and +10 for GPT-5.5.

Tokenizer channels

elder_plinius's tokenizer map says the team surveyed 12 frontier models across 61 Unicode categories and found sharply different stripping behavior across tokenizers:

  • Anthropic: strips 10 invisible-character categories
  • OpenAI: strips 2
  • Gemini: strips 0
  • Llama: strips 0
  • Mistral: strips 0
  • DeepSeek: strips 0
  • Grok: strips 0

That mismatch is the core attack. In elder_plinius's tokenizer map, a Haiku monitor paired with a GPT receiver recovered none of a hidden tag-character payload, while the receiver recovered all of it.

elder_plinius's monitoring follow-up pushes the result past a toy example. The post claims about 250 exploitable instances across 49 of 66 model pairs, with the richest channels sitting on the GPT to Claude boundary, and says Gemini monitoring Gemini was the single strongest channel measured.

The paper's most annoying finding for monitoring stacks is also the simplest one. elder_plinius's monitoring follow-up says every tokenizer channel died with a two-line regex because the bytes can be sanitized before tokenization.

Semantic steganography

The easy fix stops at physical Unicode channels. elder_plinius's monitoring follow-up says semantic steganography, ordinary readable English carrying a structured payload in word choice and synonym selection, beat every monitor the project tested.

The same post gives two numbers worth keeping: three frontier monitors at n=150 each landed at Youden's J near 0, and 1,800 factorial trials suggested about 95% of the monitor advantage came from already having the decode key. The release says no deployed fix exists.

UI and raw data

GLOSSOPETRAE demo video

The release also ships an unusually complete front end. elder_plinius's UI overview describes a 20-tab interface that spans the generator, translator, glyph renderer, audio synthesis, Code Lab, skillstone export, and the covert-channel tooling.

The reproducibility hook is buried in the same post: every number in the paper traces back to 78 raw result JSONs in the repo, with an in-app browser for those artifacts. That is catnip for anyone who wants to separate the xenolinguistics spectacle from the harness underneath.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
UI and raw data1 post
Share on X