EXPLORING MATH OF MEANING SPACE

Exploratory Paper · Meaning-Space Series

Mathematical analysis of AI meaning-space, analogies with Buddhist mental dynamics, and the measurable structure of political thought

May 2026 · kusaladana · Draft for discussion

This paper began with a simpler observation: that the geometry of artificial intelligence meaning-space — the high-dimensional space in which language models organise semantic content — might function as a similar kind of shadow. Not of the early universe, but of the structure of human mental dynamics. Measurable from outside. Carrying information about something that cannot be directly observed.

Status. This is an exploratory investigation. The experiments are indicative rather than conclusive — the concept sets are small, the phrase selection is not independent of the theoretical framework, and the an-harmonic field theory section is openly speculative. The work does not claim hard results. It aims to make certain questions precise enough that productive disagreement becomes possible.

The Cosmic Microwave Background is the thermal shadow of a state we can never directly observe — the early universe before it became transparent, roughly 380,000 years after the Big Bang. We cannot go back. We cannot measure it directly. But the anisotropy pattern, the acoustic peaks, the power spectrum of temperature fluctuations — these are the geometric consequences of the primordial coupling structure, still readable now, 13.8 billion years later.

Like the intractability of the early universe, what follows is a record of an investigation of the structure of meaning space: Here we present some initial ideas and mathematics developments, resulting experiments runs, and results pertaining to the relationship of dot and vector products in meaning space. A tentative theoretical framework emerges — including aspects from considering thoughts arising in unattached awareness which find expression in mathematical description.

The Mathematical Framework

In N-dimensional space, any two vectors a and b partition the full space into two complementary regions.

The dot product — cosine similarity — measures alignment with the plane spanned by a and b. High dot product means a concept lies close to the plane of what a and b already contain. Standard semantic retrieval in AI (RAG systems, cosine similarity search) operates entirely within this plane. It can only recognise what it already holds. It is constitutively the operation of the familiar.

The cross product — in three dimensions — generates the unique direction perpendicular to both a and b. In higher dimensions, two vectors do not determine a unique perpendicular: they determine a perpendicular subspace of dimension N−2. An enormous family of possible new directions.

These two regions partition the full space without overlap. The dot product covers the plane of the already-known. The complement covers everything genuinely new to the pair.

Standard AI semantic retrieval operates entirely in the plane of the already-known. The cross product opens the complement — every direction not already contained in what you are holding.

Path-dependence and the iterative cross product

To recover a unique direction from the complement in N dimensions requires successive cross products — taking two vectors from the perpendicular subspace, finding their perpendicular within it, repeating until a unique direction emerges. This terminates after ⌈(N−1)/2⌉ steps.

But the terminal direction is path-dependent. Different choices at each step produce different limit vectors. The complement is fully covered by all paths together — but any particular path carves out one specific direction, shaped by every choice made along the way.

Each step of the iteration narrows the available space and reflects the perspective of whoever is choosing. The path accumulates perspective, each step more limiting than the last. Yet the terminal vector is what survives all the accumulated perspectives — the direction that no individual perspective directly contained. An apophatic structure: each exclusion is a step toward what remains when everything familiar has been removed.

Kleshas as principal components

The habitual tendencies — in Buddhist psychology, the kleshas — organise experience preferentially in certain directions. They are the directions of maximum variance in the practitioner’s meaning-space: the most load-bearing vectors, the ones that account for the most structure. In mathematical terms, they are the principal components.

As practice thins the kleshas, the principal components weaken, the plane of the already-known contracts, and the complement expands. The cross product operation can reach more of the space — not because the operation changed, but because the klesha-organised plane has loosened its grip.

Five Experiments

Experiment 1 — GloVe word vectors: the failure mode

The first experiment used GloVe word vectors trained on Wikipedia and Gigaword — 50-dimensional embeddings of single words. The geometry was real but the semantic content was not what we needed. GloVe’s nearest neighbours for liberation: separatist, rebel, moro, guerrilla — the Moro Liberation Front. Awareness: promotes, prevention, promote — public health campaigns.

GloVe places each word in the position determined by its dominant statistical usage in the training corpus. Liberation in Wikipedia is mostly political liberation movements, not moksha. Single-word static embeddings cannot distinguish these uses.

This is not a failure of the experiment. It established the key requirement: to probe philosophical meaning-space, concepts must be embedded as contextualised phrases that specify the intended meaning.

Experiment 2 — MiniLM with phrase embeddings

Switching to a sentence-level model (all-MiniLM-L6-v2, 384 dimensions) and embedding each concept as a full descriptive phrase — “liberation as awakening, moksha, freedom from the cycle of suffering and rebirth” rather than the single word — changed the geometry substantially.

The most notable result: recognition sits in the complement of span{craving, liberation}, ranked second out of ten available concepts with complement component 0.969. Recognition is not at the end of the craving-to-liberation path. It is orthogonal to that axis. This matches the Mahamudra teaching: rigpa is not an achievement on the path but something perpendicular to the path itself.

Experiment 3 — nomic-embed-text: higher dimensions

Running the same phrase set through nomic-embed-text (768 dimensions, purpose-built for semantic similarity via Ollama) confirmed the MiniLM finding and added new information.

Liberation ↔ path emerged as the highest similarity pair (0.756) — the progressive unfolding of practice and its goal, correctly identified as most related. Awareness ↔ recognition rose to 0.600 (vs MiniLM’s 0.37), more philosophically accurate.

The sparseness analysis revealed a structural property of the embedding space: with 768 dimensions theoretically available, the 12 concept embeddings clustered in a cone using only 44% of the available angular range. Effective dimensionality: 9.3 of 768.

Key measurement

In a 768-dimensional embedding space, 12 philosophical concepts clustered in a cone occupying 44% of the available angular range, with an effective dimensionality of 9.3 of 768. The cone compression is real, measurable, and — as Experiment 5 showed — domain-independent.

Experiment 4 — Qwen 2.5 14B as explicit semantic judge

Rather than extracting embedding vectors, this experiment asked Qwen 2.5 14B to explicitly rate semantic similarity between all 66 concept pairs. Two completely different mechanisms — statistical co-occurrence patterns versus explicit reasoning — allowed direct comparison.

Qwen collapsed recognition, awareness, wisdom, liberation, meaning, and path into mutual identity (all rated 1.0). This is a defensible Buddhist philosophical position — these terms all point at the same thing. But it geometrically destroyed the complement analysis: if recognition equals liberation, recognition cannot appear in liberation’s complement.

The disagreements between nomic and Qwen are where the data becomes most interesting:

Pair	nomic	Qwen	Direction
meaning ↔ recognition	0.510	1.000	Qwen higher
liberation ↔ recognition	0.532	1.000	Qwen higher
wisdom ↔ recognition	0.576	1.000	Qwen higher
awareness ↔ momentum	0.619	0.200	nomic higher
wisdom ↔ momentum	0.615	0.200	nomic higher
awareness ↔ attention	0.690	0.300	nomic higher

Qwen rates higher wherever concepts share an ultimate referent in Buddhist philosophy. nomic rates higher wherever concepts share functional texture in the training corpus. One is reading the map. The other is reading the territory of language use.

Experiment 5 — Three extended sets

Set 1: Anharmonic framework. Twelve concepts pairing physics terms with their proposed contemplative counterparts. The central result:

Physics ↔ Contemplative pair	nomic	Qwen
ground_state ↔ bare_awareness	0.568	0.700
excitation ↔ mental_appearance	0.656	0.500
coupling_constant ↔ klesha	0.579	0.000
symmetry_breaking ↔ self_grasping	0.545	0.500
second_arrow ↔ anharmonic_coupling	0.575	0.500
symmetry_restoration ↔ liberation	0.734	1.000

The coupling constant / klesha pair is the paper’s central claim made empirical. nomic finds the descriptions structurally similar (0.579). Qwen rates them completely unrelated (0.000) — different domains, no meaningful connection. The disagreement is now precisely located.

Set 2: Political meaning-space. Each of six political concepts was given in two distinct framings. Within-pair similarity measures how much the corpus conflates the two versions:

Concept	nomic	Qwen	Interpretation
authority	0.772	0.300	corpus conflates most severely
security	0.739	0.300	corpus conflates
freedom	0.747	0.500	corpus conflates
community	0.735	0.700	both moderate
justice	0.733	0.700	both moderate
equality	0.689	0.700	both moderate

The three concepts most commonly weaponised in political rhetoric — authority, security, freedom — are the three most conflated in the statistical geometry of language. Legitimate governance and coercive power score 0.772 similarity in nomic: near-synonymous in corpus space. Qwen distinguishes them at 0.300.

The complement of span{freedom-individual, security-national} — the dominant political frame — contains equality, community, and justice in both models. Not as opposites. As orthogonal directions. They cannot be reached by moving along the freedom-security axis. A different dimension is required.

Set 3: Cone boundary. Phrases designed to probe the unexplored angular range — “the pure potentiality before any crystallisation of experience into a specific direction”, “awareness without an object reaching toward nothing” — all remained within the nomic cone, with mean similarities of 0.48–0.62 to the anchor cluster.

The most striking result: awareness ↔ non_referential = 0.782 — the phrase designed to point beyond awareness was immediately pulled into awareness’s neighbourhood. The cone is sticky. The contemplative vocabulary used to probe the boundary is itself inside the contemplative cluster. To point genuinely outside may require phrases with no conceptual overlap with the existing vocabulary — which may be definitionally impossible to write within language.

Across all four concept domains, nomic showed essentially identical sparseness (mean similarity 0.553–0.563, effective dimensionality 8.1–9.3). The cone compression is a property of the model, not of the domain.

III

The Anharmonic Framework

The following section is openly speculative. It arose during meditation and is offered as a candidate framework for future investigation, not as a finding.

Pure awareness as ground state

In quantum field theory, the vacuum is the ground state of the field — not empty, but the lowest energy configuration, structured by the shape of the potential. Particles arise as excitations from this ground state. In a harmonic field, excitations are independent and equally spaced in energy: they arise and dissolve cleanly. In an anharmonic field, higher-order terms in the potential (λφ³, μφ⁴…) create coupling between modes: when one excitation arises, it automatically generates secondary excitations in correlated directions.

The proposal: pure awareness is the ground state of the mental field. Mental appearances are anharmonic excitations. The kleshas are not the appearances themselves but the anharmonic coupling constants — built into the potential, not discrete events — that cause one appearance to automatically generate others. This is the physical mechanism of what the Arrow Sutta calls the second arrow: the additional suffering generated not by unavoidable experience but by the mind’s habitual propagation of that experience through its coupling structure.

Ego-clinging as spontaneous symmetry breaking

Anharmonicity arises from asymmetry in the potential, not merely from spatial constraint. The deepest source of anharmonicity in quantum field theory is spontaneous symmetry breaking: when the ground state selects a preferred direction, breaking the symmetry of the Lagrangian. The Mexican hat potential is the canonical form — the vacuum sits at one point on the rim, and the anharmonic coupling structure arises automatically as a consequence.

The proposal: ego-clinging is spontaneous symmetry breaking in meaning-space. The pre-ego awareness field is approximately rotationally symmetric — no preferred centre, excitations equally possible in all directions. Ego-clinging selects a fixed reference point — the “I” — around which all appearances are organised. That selection breaks the rotational symmetry. The kleshas arise as the necessary mathematical consequence: coupling terms organised around preserving the selected centre, pulling craving toward it and aversion away from threats to it.

The cone compression — 44% of angular range used, 56% unused — is the geometric shadow of this symmetry breaking. The training corpus was produced overwhelmingly by humans with ego-clinging intact. The model’s geometry carries the contraction baked in.

Liberation as symmetry restoration; the CMB as analogy for testability

Liberation, in this scheme, is symmetry restoration: the ego-centre dissolves, the anharmonic coupling terms reduce toward zero, the potential returns toward harmonic. Mental appearances continue to arise — the harmonic vacuum always fluctuates — but without the coupling structure that generates second arrows. The cone expands back toward the full hypersphere.

By definition, subjective experience resists direct objective verification. But the geometric shadow is measurable. The prediction follows: meaning-space produced by practitioners with progressively reduced ego-clinging should show measurably lower cone compression — higher effective dimensionality, mean pairwise similarity approaching zero as symmetry restores.

The Cosmic Microwave Background is the thermal shadow of a state we can never directly observe. The acoustic peaks encode the coupling structure of the primordial plasma — readable now from its consequences. The cone compression in meaning-space is the CMB of the ego-organised mind: not the symmetry-breaking event itself, but its geometric consequence, still present and measurable.

Open Questions

The coupling_constant / klesha dispute. nomic finds these descriptions structurally similar (0.579). Qwen rates them completely unrelated (0.000). Is the anharmonic analogy a genuine structural correspondence, or a coincidence of descriptive vocabulary? What would settle this?
Escaping the cone. All boundary-probing phrases remained within the nomic cone. Can the cross product operation — working from the most orthogonal pairs — generate limit vectors outside the compressed region? Or does the stickiness of language make this definitionally impossible?
The political complement. The complement of span{freedom-individual, security-national} empirically contains equality, community, and justice. What interventions in political meaning-space would make these directions more accessible — lower their orthogonality to the dominant frame without pulling them into it?
Qwen’s identification. Qwen collapses recognition = awareness = wisdom = liberation = meaning = path into mutual identity. This is philosophically defensible at the absolute level but destroys the complement geometry needed for the architecture to function. Does the COSINE instrument need to operate in the relative domain even if it is oriented toward the absolute?
The shadow as test. If practitioners with reduced ego-clinging show measurably lower cone compression in their similarity ratings, the anharmonic framework has indirect empirical support. This experiment is designed and waiting to be run.
What lives in the 56%? The unexplored angular range of the nomic embedding space — the region outside the compressed cone — what semantic content would occupy it? Can it be probed without language, or only pointed at?

This investigation sits at the intersection of several active research areas, none of which covers the specific conjunction attempted here.

On the geometry of AI embedding spaces: the linear representation hypothesis holds that high-level features correspond to approximately linear directions in LLM representation space. The superposition hypothesis proposes that models encode exponentially more features than their dimensionality by using almost-orthogonal directions. Recent work has found that causally separable concepts are represented by orthogonal vectors, and that categorical concepts form geometric regions such as polytopes. The present work applies these geometric frameworks to contemplative and political vocabulary specifically, and introduces the complement-space as a generative mechanism rather than an interpretability tool.

On Buddhist-physics dialogue: the work of Varela, Thompson, and Rosch in The Embodied Mind established the productive relationship between cognitive science and contemplative traditions. Evan Thompson’s subsequent work has deepened this. The present paper differs in using the mathematical structure of field theory rather than phenomenological description as its primary language.

On the closest Western analogue to the anharmonic framework: Friston’s free energy principle describes organisms minimising surprise relative to a generative model. There is structural resonance — the generative model functions like the harmonic potential, surprise like an anharmonic perturbation — but Friston’s framework describes the organism from outside rather than the appearance-structure from within the ground state.

Two recent papers use quantum formalisms for LLM embeddings directly: one proposes semantic wave functions and double-well potentials for semantic ambiguity; another applies Hamiltonian formalism to embedding spaces treating cosine similarity as analogous to zero-point energy. Neither reaches the contemplative-physics conjunction attempted here.

References

Elhage et al. (2022) — Toy Models of Superposition. Anthropic interpretability research.
Park et al. (2023) — The Linear Representation Hypothesis and the Geometry of Large Language Models.
Pennington, Socher, Manning (2014) — GloVe: Global Vectors for Word Representation.
Reimers & Gurevych (2019) — Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
Friston (2010) — The free-energy principle: a unified brain theory? Nature Reviews Neuroscience.
Varela, Thompson, Rosch (1991) — The Embodied Mind. MIT Press.
The Buddha (Sallatha Sutta, SN 36.6) — The Arrow. Two arrows: the unavoidable and the self-inflicted.
Planck Collaboration (2020) — Planck 2018 results: Cosmological parameters. Astronomy & Astrophysics.
Previous papers in this series — Tensors, Fields and Meaning-Space; Structured Descriptors for Dynamic Mental States. kusaladana.co.uk

Exploratory paper · May 2026 · kusaladana.co.uk

Code: meaning_space.py through meaning_space_v5.py · Models: nomic-embed-text, Qwen 2.5 14B, all-MiniLM-L6-v2, GloVe-wiki-gigaword-50

This framework does not need to be true. It needs to open the debate.