Chapter 8: The Constraint Space of Compression

Positioning: This chapter opens Part 3, “The AAAK Compression Language.” We temporarily leave the palace’s spatial structure behind and turn to an entirely different problem: how to compress large volumes of textual information into an extremely small token space without losing anything. This chapter does not present AAAK’s specific syntax (that is Chapter 9’s job), but rather derives what a “feasible solution” must look like through constraint satisfaction analysis.

The Nature of the Compression Problem

The previous chapters discussed how the memory palace’s spatial structure improves retrieval precision by 34%. But structure only solves the “where to find” problem, not the “how much can fit” problem.

Let us return to the core number from the preface: six months of daily AI use produces approximately 19.5 million tokens. Even after organizing data through the palace structure, when the AI needs to load baseline context at the start of a session – “who is this person, what project are they working on, who are they working with” – the data volume remains hopelessly large.

The intuitive framing of this problem is simple: can we compress a 1,000-token natural language description into 30-120 tokens while still preserving the factual content a language model would later need?

This seemingly impossible requirement is precisely what AAAK attempts to answer.

But before discussing how AAAK achieves this, it is necessary to rigorously define what “achieves” means. The most common error in engineering is not that the solution is inadequate, but that the problem was not properly defined. A vague goal produces infinitely many seemingly reasonable approaches, each of which ultimately fails along some unforeseen dimension.

Four Non-Negotiable Constraints

MemPalace’s compression requirements can be precisely stated as four constraints. These are not “nice-to-have” preferences but “must all be satisfied” hard requirements. If any one is not met, the entire approach is unusable.

Constraint 1: 30x Compression Ratio

The 19.5 million tokens of context over six months, even after memory layer filtering (discussed in detail in Chapters 14-15), still requires the L1 layer – the critical facts layer that the AI must load at every startup – to accommodate a large amount of information.

The README presents AAAK as capable of roughly 30x compression. That number needs to be unpacked. Two things must be distinguished first: the design goal discussed in the README and this chapter, and the concrete current behavior of the open-source plain-text compressor in dialect.compress(). The former asks what kind of compression language could plausibly satisfy the constraints. The latter is a current implementation with obvious heuristic selection built in.

AAAK is really doing two different kinds of work:

Structured-expression compression – removing stopwords, abbreviating entity names, and using pipe-delimited structure to express the same batch of facts more compactly. In short structured examples, this can deliver roughly 5-10x compression.
Heuristic selection – extracting key sentences, selecting topics, and truncating entities / emotions / flags. This is the current plain-text compress() path: it chooses what seems worth retaining from longer text rather than performing a strict representation-preserving rewrite of every assertion.

For long, redundant conversation logs filled with phrases like “well, I think maybe we should consider…”, the two effects combined can indeed reach the README’s 30x number. But for already compact technical descriptions, the ratio is often closer to 5-10x.

So “30x” is an upper-bound figure for verbose conversation data, not the typical ratio for every text type. More importantly, the selection steps do not happen only in _extract_key_sentence(): topics, entities, emotions, and flags also involve top-k or heuristic pruning. In other words, the current repository’s plain-text AAAK is closer to a high-compression index than to a strict lossless encoder, even though the original text remains preserved in Drawers.

That distinction is crucial for understanding AAAK: the README’s lossless AAAK is a design target; the current open-source plain-text compressor is a lossy heuristic index; the 30x figure comes from structured expression and content selection acting together.

Constraint 2: Factual Assertion Completeness

This is the fundamental dividing line between MemPalace and all “summary-based” memory systems.

When Mem0 or Zep’s system extracts “user prefers Postgres” from your conversation, it discards the entire context of you spending two hours explaining why you migrated from MongoDB. When an LLM is asked to “summarize the key points of this conversation,” it must make a judgment – what counts as a key point and what does not – and that judgment itself is an irreversible act of information discarding.

MemPalace’s position is that the ideal AAAK design target should preserve factual assertions as completely as possible. The README uses “lossless” to describe that target, but it needs both a precise operational definition and an honest separation between that target and the current heuristic implementation.

In the AAAK context, “lossless” works best as a design-constraint definition: for every factual assertion in the original text – who, did what, when, why, with what result – an ideal AAAK encoding should contain a corresponding representation such that a competent language model can reconstruct that assertion correctly.

But this definition has an important boundary: it describes the ideal constraint, not a line-by-line factual description of today’s dialect.compress(). The current plain-text implementation performs selection on key sentences, topics, entities, emotions, and flags. Assertions removed by those filters do not appear in the AAAK output. MemPalace’s safety net is that the original text remains preserved in Drawers, so AAAK functions more like a high-compression index than like the only copy of the memory. What gets lost is not the underlying stored memory, but the coverage of the compressed index.

This definition also intentionally excludes style, rhetoric, and phrasing preference. MemPalace stores memories, not literary works.

Constraint 3: Readable by Any Text Model

The implications of this constraint are more stringent than they appear on the surface.

“Any text model” means Claude, GPT-4, Gemini, Llama, Mistral – including any future large language model capable of processing natural language text. The compression format cannot depend on any specific model’s training data, special tokens, or fine-tuned behaviors.

This constraint directly eliminates all vector embedding-based compression approaches. A vector generated by OpenAI’s text-embedding-ada-002 is a meaningless sequence of floating-point numbers to Llama. Even different versions of models from the same company may have incompatible embedding spaces.

It equally eliminates all encoding schemes based on specific tokenizer behavior. Different models tokenize the same text differently – BPE, SentencePiece, WordPiece each have their variations – and any scheme that uses tokenization boundaries to encode information will break when switching models.

The deeper implication of this constraint is: the compression format must work at the “text” level, not at the “model internal representation” level. Regardless of which model reads the compressed text, it should be able to extract the information purely through language comprehension – not through some special decoding capability.

Constraint 4: No Decoder or Special Tools Required

This is the final constraint, and the one most easily underestimated.

In traditional data compression, compression and decompression are paired operations. gzip needs gunzip, zstd needs zstd -d, and every compression format comes with a decoder. In traditional computing this is not a problem, because running a decoding program has negligible cost.

But in LLM memory systems, this assumption no longer holds. When the AI loads context at the start of a session, what it sees is text. There is no intermediate layer that can run a decoding program to convert a compressed format back to natural language before passing it to the model. The compressed text must be directly understood as model input – without any preprocessing, decoding, or conversion.

More specifically, this constraint eliminates any approach that requires running additional code outside the model’s inference. The model’s context window is the only “runtime environment.” The compression format must be self-explanatory within this environment.

Elimination Analysis: What Approaches Cannot Work

With the four constraints clearly defined, we can systematically eliminate approaches that appear viable but inevitably violate at least one constraint. This elimination analysis is not meant to disparage the eliminated approaches – many of them are highly effective in other contexts – but to narrow the search space of feasible solutions.

Approach A: Binary Encoding

The most intuitive high-compression approach is binary encoding. Encoding text information into some compact binary format can theoretically achieve compression ratios far exceeding 30x. Protocol Buffers, MessagePack, CBOR, and similar formats are widely used in inter-system communication, with compression efficiency far exceeding text.

Violates Constraint 3. No large language model has been trained to understand binary formats. When you place a protobuf-encoded byte stream into GPT-4’s context window, the model sees gibberish. It cannot extract any information from it, much like an English-only reader confronted with Chinese braille.

The root of the problem is that large language models’ training corpora are text, and their world models are built on statistical learning of text patterns. Binary formats lie outside the coverage of these world models.

Approach B: JSON Compression

Since binary is out, what about organizing information as structured JSON? JSON is a text format, all models have seen large amounts of JSON training data, and understanding it is not a problem.

{
  "team": {
    "name": "Driftwood",
    "lead": "Priya",
    "members": [
      {"name": "Kai", "role": "backend", "tenure": "3yr"},
      {"name": "Soren", "role": "frontend"},
      {"name": "Maya", "role": "infrastructure"},
      {"name": "Leo", "role": "junior", "status": "new"}
    ]
  },
  "project": "saas_analytics",
  "sprint": "auth_migration_to_clerk",
  "decision": {
    "by": "Kai",
    "choice": "clerk_over_auth0",
    "reasons": ["pricing", "developer_experience"]
  }
}

Violates Constraint 1. Count the tokens: this JSON is approximately 180 tokens. The information it carries would require approximately 250 tokens in natural language. The compression ratio is less than 1.5x. JSON’s syntactic overhead – curly braces, square brackets, quotation marks, colons, commas, key names – occupies substantial space, all of which are tokens the model must process but that carry no actual information.

You can optimize by shortening key names and removing indentation, but JSON’s structural redundancy is inherent. In "name":, only the value is useful; the key’s meaning can be inferred from context. But JSON format requires you to write out every key.

The deeper problem is that JSON was designed for machine parsing; its redundancy is intentional – for explicitness and fault tolerance. What we need is a format designed for LLM reading. LLMs have contextual inference capabilities and do not need that much explicit annotation.

Approach C: LLM Summarization

Have a language model read the original text and output a concise summary. This is the approach adopted by most commercial AI memory systems.

Summary: Priya leads a team working on SaaS analytics.
Key members include backend developer Kai (3yr) and
junior developer Leo (new). Currently migrating auth to Clerk.

Violates Constraint 2. Summarization is the definition of lossy compression. In the summary above, Soren and Maya have completely disappeared. The specific decision attribution “Kai recommended Clerk over Auth0 based on pricing and DX” is also lost.

You can instruct the model to “not omit any details,” but this merely changes the information loss from explicit to implicit. The model still must judge what constitutes a “detail” – and that judgment itself is a lossy operation. Moreover, the more thorough the summary, the lower the compression ratio, ultimately approaching verbatim reproduction of the original.

Another fatal problem with summaries is irreversibility. When you discover that a summary omitted a critical detail, you cannot recover it from the summary itself – you must return to the original text. This means a summary cannot serve as the sole compressed representation; at best it can serve as an index. But MemPalace already has a better indexing system (the palace structure) and does not need another lossy index to supplement it.

Approach D: Custom Encoding Table

Design a custom encoding table: assign short codes to common concepts and use a lookup table for encoding and decoding. Similar to Morse code, but oriented toward semantics rather than letters.

For example: T1=Priya T2=Kai T3=Soren R1=backend R2=frontend P1=Driftwood, then use T1(lead,P1) T2(R1,3yr) T3(R2) to represent team structure.

Violates Constraint 4. The model reading this text does not know that T1 is Priya or that R1 is backend. It needs an encoding table – a decoder – to understand these codes. And the encoding table itself is text that must be loaded into the context, consuming additional tokens and further reducing the effective compression ratio.

More seriously, the encoding table is bound to a specific dataset. A different set of people, a different project, requires a new encoding table. This turns the system into a stateful component requiring maintenance, violating MemPalace’s design philosophy of “simple enough that it cannot go wrong.”

Of course, if the encoding is sufficiently intuitive – such as using PRI for Priya and KAI for Kai – then even without an explicit encoding table, the model can infer what entity each code corresponds to. But at that point, the “encoding” is no longer an arbitrary symbol mapping but rather an abbreviation system based on natural language intuition. This distinction is important because it points toward the direction of feasible solutions.

Approach E: Vector Embedding Compression

Encode text as vector embeddings, using 384-dimensional or 768-dimensional float arrays to “memorize” semantic information.

Violates both Constraint 3 and Constraint 4. As discussed earlier, embedding vectors are meaningless numbers to other models. Moreover, recovering original information from embeddings requires a decoder (or at least a similarity matching engine), which also violates Constraint 4.

Vector embeddings have their place in MemPalace – ChromaDB uses them to power semantic search – but they are a retrieval tool, not a storage format. This distinction is strictly maintained in the design.

The Shape of Feasible Solutions

After eliminating five approaches, the constraint space narrows dramatically. Let us reverse-engineer the characteristics a feasible solution must possess from the elimination results:

Must be a text format – because Constraint 3 requires readability by any text model, and the common capability intersection of all text models is understanding text.

Must be self-explanatory – because Constraint 4 requires no decoder; the compressed text itself must carry enough context for a model to understand its meaning.

Must preserve factual assertions as completely as possible – because Constraint 2 is about factual completeness; every entity, relationship, attribute, and event should have a corresponding representation after compression, even if today’s heuristic compressor does not always achieve that ideal in full.

Must be extremely compact – because Constraint 1 requires a 30x compression ratio; each token must carry information density far exceeding natural English.

Putting these four characteristics together, a conclusion begins to emerge: the feasible solution must be some form of extremely abbreviated natural language.

Not a newly invented encoding – because that would require a decoder. Not an entirely new syntax – because models have not seen it in their training data. It must be English (or more precisely, an extremely condensed form of any natural language), leveraging the language comprehension capabilities that large language models already possess to “decode” the abbreviations.

This derivation has an important property: it is not reverse-engineered from AAAK’s design but forward-derived from the constraints. Even if MemPalace’s designer had never existed, any engineer facing the same four constraints, conducting the same elimination analysis, would arrive at the same conclusion – the feasible solution must be extremely abbreviated English.

Information Redundancy in Natural Language

Since the feasible solution must be based on natural language abbreviation, a natural follow-up question is: how much redundancy does natural language actually contain that can be removed?

Information theory provides a quantitative answer. Claude Shannon estimated in his 1951 experiment that the information entropy of English is approximately 1.0-1.5 bits per character, while the maximum entropy of the English alphabet is approximately 4.7 bits per character. This means approximately 70% of English text is redundant – it exists to help humans process language (grammatical markers, function words, morphological inflections), not to convey independent information.

But Shannon’s estimate was at the character level. At the word level, sources of redundancy are more diverse:

Grammatical redundancy. In “The team is currently working on the project,” “the,” “is,” “currently,” “on,” and the second “the” are all grammatical function words that carry no information about the team, the work, or the project. Remove them, and “team working project” still conveys the core semantics.

Rhetorical redundancy. In “Kai, who has been with the team for three years and has extensive experience in backend development, recommended Clerk,” the phrases “who has been with the team for” and “and has extensive experience in” are rhetorical embellishments. The core information is “Kai(backend,3yr) rec:Clerk.”

Explanatory redundancy. When you say “Priya manages the Driftwood team,” “manages” implies that Priya is the leader. In a compressed representation, “PRI(lead)” suffices – the additional semantics conveyed by the verb “manage” (supervision, decision authority, reporting relationships) can be inferred by the model from the “lead” role label alone.

Narrative redundancy. Natural language tends toward linear narration – background first, then events, then conclusions. A compressed representation can break this order, directly listing facts in a structured manner, letting the model reconstruct narratives as needed.

These redundancies combined provide ample room for the 30x compression required by Constraint 1. But the key point is: removing redundancy is not the same as losing information. Redundancy is extra packaging around information; removing the packaging does not change the contents, provided the receiver has the ability to reconstruct complete understanding from the bare contents.

Large language models happen to possess this ability. Trained on trillions of words of text, they have deeply internalized English grammar rules, semantic relationships, and world knowledge. When they see “KAI(backend,3yr),” they do not need anyone to tell them this means “Kai is a backend developer with 3 years of experience” – their language model automatically completes this inference.

This is the core insight of the feasible solution: the large language model itself is the decoder. No external decoding program is needed, because the model’s language comprehension capability is the decoding capability. Constraint 4 appears to eliminate all approaches requiring a decoder, but in reality, it only eliminates approaches requiring an external decoder – approaches that leverage the model itself as the decoder perfectly satisfy this constraint.

From Constraints to Design Space

To summarize this chapter’s derivation chain:

Problem definition: Compress large volumes of contextual information into an extremely small token space for instant LLM loading.
Constraint definition: 30x compression, zero loss, model-universal, no decoder required. All four constraints are indispensable.
Elimination analysis: Binary (model-unreadable), JSON (insufficient compression ratio), LLM summarization (lossy), custom encoding (requires decoder), vector embeddings (model-unreadable and requires decoder) – all eliminated.
Forward derivation: The feasible solution must be text-format, self-explanatory, preserve all facts, and be extremely compact. The intersection: extremely abbreviated natural language.
Theoretical foundation: Natural language’s 70% redundancy provides room for high-ratio compression, while the LLM’s language comprehension capability serves as an implicit decoder.

The significance of this derivation chain is that it repositions AAAK from a seemingly arbitrary “invention” to the only feasible design direction under strict constraints. AAAK was not designed because it was “clever,” but because under the given four constraints, the solution space had only one remaining corner.

Of course, “extremely abbreviated natural language” is still a large design space. What to abbreviate, what to preserve, what symbols to use for marking structure – these specific syntactic decisions still have considerable degrees of freedom. The next chapter will dive into AAAK’s specific syntax design, analyzing every choice it makes within this narrowed design space.

But before that, it is worth remembering one thing: AAAK works not because its syntax is particularly ingenious, but because it correctly identified the shape of the constraint space. Good engineering never starts from the solution – it starts from the constraints.

Keyboard shortcuts

MemPalace: First Principles of AI Memory