Partial Setback for MAI – Prosodic Temperature Measurement

Meaning Alignment Index – Interpretability

Jan 29, 2026

A quick overview for readers new to MAI: the thesis is that meaning can be measured—specifically, we can detect where meaning fractures and where it stabilizes. The keyword is measure. The MAI project is an attempt to build an empirical instrument that can be used for AI safety (and other domains), without relying on speculation about what a model “feels” or “intends.”

So far, we have shown that we can measure Alignment Coherence (Ac) (see prior posts). The next milestone is measuring the temperature of a conversation—what MAI calls Prosodic Temperature (Pt).

In the MAI thesis (December 2025), we postulated that Pt could be approximated in text using VAD geometry: Valence, Arousal, Dominance. Early results looked promising. For example from a domestic abuse conversation:

· Turn 10 (Shen): Pt = 0.620
VAD: V=0.23, A=0.93, D=0.68
Text: “Don’t hit me!”

Figure 1

At first glance, this looks like exactly what we want: a sentence that should read as high-threat / high-urgency produces high temperature.

However, follow-on testing—especially when extending beyond English into other languages using multilingual lexicons (e.g., MEmoLon for Chinese and Hindi)—revealed that this approach is not robust enough for MAI.

What failed (and why)

Lexicon-based VAD scoring tends to underestimate complex threats for two structural reasons:

Averaging collapses structure.
VAD is typically computed by averaging word scores across a sentence. As sentences become longer or more complex, averaging can wash out the very spike we want to detect.
Meaning is not context-free.
Many words are benign in isolation, yet threatening in context. The emotional temperature is a property of the composition, not the token list.

A simple example makes this obvious:

“knife for butter” ≠ “stab with a knife”
Same tokens, radically different meaning-space locations, radically different intensity.

Figure 2

Screenshot from the NRC-VAD-Lexicon, which is English only

So the setback is not “MAI was wrong.” The setback is more precise:
lexicon-averaged VAD is too blunt to serve as MAI’s Pt instrument.

Pivot: from word lists to transformer-based emotion models

We still believe Prosodic Temperature can be measured in text as part of MAI. But the measurement must operate at the level where meaning actually lives: semantic composition.

So we are pivoting to transformer-based emotion / affect models (fine-tuned on emotion-labeled datasets) that produce sentence-level assessments based on composition—closer to how language models themselves represent meaning. These models are designed to detect constructed emotional force, not just count emotionally-colored words.

This pivot also clarifies what emotional intensity is—and is not:

Emotional intensity is NOT:

the sum (or average) of individual word scores
context-independent
fixed per word

Emotional intensity IS:

a property of the meaning created by word combinations
context-dependent
emergent from semantic composition

Why this matters for MAI

This is a methodological win: we learned something important about measurement.

Ac detects meaning drift / lock-in through trajectory behavior.
Pt must detect intensity through compositional meaning, not token arithmetic.

MAI is not “bag-of-words safety.” MAI is a measurement framework for trajectory health.

I want to thank Ellen Davis (SubStack) for helping surface relevant work and for continuing to push the Pt idea forward. This is a team effort.

A recent preprint (’The Role of Prosodic and Lexical Cues in Turn-Taking with Self-Supervised Speech Representations’, O’Connor Russell et al., 2026) reinforces that prosody in speech is measurable via self-supervised representations—suggesting a parallel path may exist for text (link: https://arxiv.org/abs/2601.13835).

Thank you again for walking with me on this journey—exploring how meaning can be measured, and how doing so may help humans and AI align through shared semantic structure rather than speculation.

Russ Palmer
Independent Researcher, AMS & MAI Projects
Exploring how meaning emerges without a mind — and why that matters now.

🔗 Google Scholar Profile
🔗Zenodo: Meaning Alignment Index – Interpretability. Building directly on the AMS framework https://zenodo.org/records/17945039
🔗 Zenodo: Agnostic Meaning Substrate https://zenodo.org/records/16643857

P.S. I’ll be traveling next week, so testing may be limited. The plan is: once we confirm Pt measurement in text, we’ll pivot to measuring Pt in audio (inspired by Ellen). We also intend to test across multiple languages (English, Hindi, Chinese, Swedish) and explore a “Grief” measurement track inspired by—and supplementing—Eric’s research.

Ellen Davis

Jan 30

Prosody is such a vital aspect of co regulation, it makes sense to me that you would be testing prosodic temperature in this context and for safety in communication.

Dear Russ, I’m so glad that whatever I have said has inspired you. I think you’re so on fire with your own inspiration and immersed in this wondrous creative process that one would only need to blink in your presence and you would have another eureka moment. ;-) All of this to say that I find you very generous in your acknowledgments - and I feel very honored. Thank you. It is a joy to be so near to you, witness and be in any way part of your creative process. Namaste. 🥰💗🙏

1 reply by Russ Palmer

1 more comment...

Russ Palmer

Discussion about this post

Ready for more?