๐Ÿ”€ Polyglot Processor

One text. Many encodings. Multiple thoughts. One stream.

Playground
Training Data
Visualizer
Theory

Encode / Decode Playground

Input

Encoded Stream

โ€”

Decoded Output

โ€”

Training Data Generator

See how the same text generates different training examples with different encoding rules.

Encoding Visualizer

Stream Output

โ€”

Theory โ€” Why This Matters

๐Ÿ”€ Polymorphic Encoding

Idea: The same raw bytes can be transformed using different mathematical rules before being fed to a transformer. A 16-byte header tells the model (and decoder) which rules are active.

Why it matters: With ~60 input rules ร— ~55 target rules = 3,300 different encodings of identical data. The model can't memorize surface patterns โ€” it must learn the invariant structure underneath. This is like data augmentation, but in token-space rather than input-space.

Inference trick: Choose a target rule where the desired output byte maps to a high-probability token. You're giving the model a "lens" that makes the right answer easier to predict โ€” bootstrapping.

๐Ÿง  Multi-stream (Concurrent Thoughts)

Idea: Multiple input texts are interleaved into a single token stream using register machines. 8 interpreters, each with 8 registers. Special tokens (OREG, SETREG, OGREG_DUPL) control which interpreter outputs what.

Why it matters: A sequential transformer processes parallel information in one pass. Cross-stream pattern deduplication (OGREG_DUPL) means shared substrings across thoughts become single tokens. The model learns attention across separate thoughts.

Key insight: When N texts share structure, the combined encoding can be shorter than encoding them separately. Compression through shared context.

๐ŸŽฒ Augmented Commands

Idea: Random reversible command tokens (reverse, memory store/recall, relative offsets, rot13, swap, flip) are injected during encoding. The decoder strips them out perfectly.

Why it matters: Forces the model to learn structural understanding rather than surface memorization. The same text never looks the same twice. Commands act as "noise" that the model must learn to process or ignore โ€” building robustness.

โšก The Combined System

All three layers compose: polymorphic rules ร— multi-stream interleaving ร— random augmentation = an exponentially large space of valid encodings for any text.

Training: Each batch presents the same data through different encoding "lenses." The model learns the deep structure, not the surface.

Inference: Choose the encoding that makes your desired output most probable. Constrain the decoding space by selecting favorable rule pairs. Use multi-stream to process multiple hypotheses in parallel.

The meta-insight: Current transformers have a fixed mapping between text and tokens. This system makes that mapping programmable โ€” turning tokenization from a preprocessing step into a first-class part of the architecture.