mAiSight

Math-heavy fixture

Display equations and inline math interleave with prose. KaTeX or MathJax may render math after widget mount; the walker should treat math containers as no-mark zones (they don't contain natural-language prose).

Per plan/01_renderer.md, math blocks render as <div class="math-display" data-block-id="…"> and inline math is wrapped in <span class="math-inline">. Both should be skipped by the term walker.

Gradients and Jacobians

The gradient of a scalar function \htmlClass:RnR\htmlClass{maisight-sym-f}{f}: \mathbb{R}^n \to \mathbb{R} is the vector of partial derivatives:

\htmlClass(\htmlClass)=(\htmlClass\htmlClass1,,\htmlClass\htmlClassn)\nabla \htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x}) = \left( \frac{\partial \htmlClass{maisight-sym-f}{f}}{\partial \htmlClass{maisight-sym-x}{x}_1}, \dots, \frac{\partial \htmlClass{maisight-sym-f}{f}}{\partial \htmlClass{maisight-sym-x}{x}_n} \right)

The Jacobian generalizes the gradient to vector-valued functions \htmlClass:RnRm\htmlClass{maisight-sym-f}{f}: \mathbb{R}^n \to \mathbb{R}^m:

J\htmlClass=(\htmlClass1\htmlClass1\htmlClass1\htmlClassn\htmlClassm\htmlClass1\htmlClassm\htmlClassn)J_{\htmlClass{maisight-sym-f}{f}} = \begin{pmatrix} \frac{\partial \htmlClass{maisight-sym-f}{f}_1}{\partial \htmlClass{maisight-sym-x}{x}_1} & \cdots & \frac{\partial \htmlClass{maisight-sym-f}{f}_1}{\partial \htmlClass{maisight-sym-x}{x}_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial \htmlClass{maisight-sym-f}{f}_m}{\partial \htmlClass{maisight-sym-x}{x}_1} & \cdots & \frac{\partial \htmlClass{maisight-sym-f}{f}_m}{\partial \htmlClass{maisight-sym-x}{x}_n} \end{pmatrix}

The Hessian is the matrix of second-order partials. In the LaTeX source above, the words "gradient", "Jacobian", and "Hessian" do not appear — they're only in the surrounding prose, which is where they should be marked.

Eigenvalues and SVD

An eigenvalue \htmlClass\htmlClass{maisight-sym-lambda}{\lambda} and eigenvector \htmlClass\htmlClass{maisight-sym-v}{v} satisfy \htmlClass\htmlClass=\htmlClass\htmlClass\htmlClass{maisight-sym-A}{A} \htmlClass{maisight-sym-v}{v} = \htmlClass{maisight-sym-lambda}{\lambda} \htmlClass{maisight-sym-v}{v}. The SVD factors any matrix \htmlClass\htmlClass{maisight-sym-A}{A} as:

\htmlClass=\htmlClass\htmlClass\htmlClass\htmlClass{maisight-sym-A}{A} = \htmlClass{maisight-sym-U}{U} \htmlClass{maisight-sym-Sigma}{\Sigma} \htmlClass{maisight-sym-V}{V}^\top

where \htmlClass\htmlClass{maisight-sym-Sigma}{\Sigma} holds the singular values. PCA is the special case where we use the SVD of a centered data matrix to find directions of maximum variance.

Norms

The L2 norm of vv is v2=ivi2\|v\|_2 = \sqrt{\sum_i v_i^2}. The L1 norm is v1=ivi\|v\|_1 = \sum_i |v_i|. The generic norm in prose refers to whichever is in scope.

Softmax and cross-entropy

The softmax of a vector \htmlClass\htmlClass{maisight-sym-z}{z} is:

softmax(\htmlClass)i=e\htmlClassije\htmlClassj\text{softmax}(\htmlClass{maisight-sym-z}{z})_i = \frac{e^{\htmlClass{maisight-sym-z}{z}_i}}{\sum_j e^{\htmlClass{maisight-sym-z}{z}_j}}

The inverse is the logit. Cross-entropy loss between a target distribution pp and a prediction qq is H(p,q)=ipilogqiH(p, q) = -\sum_i p_i \log q_i. KL divergence is DKL(pq)=ipilogpiqiD_{KL}(p \| q) = \sum_i p_i \log \frac{p_i}{q_i}.

Attention

In self-attention, queries \htmlClass\htmlClass{maisight-sym-Q}{Q}, keys \htmlClass\htmlClass{maisight-sym-K}{K}, and values \htmlClass\htmlClass{maisight-sym-V}{V} combine via scaled dot product:

Attention(\htmlClass,\htmlClass,\htmlClass)=softmax(\htmlClass\htmlClassdk)\htmlClass\text{Attention}(\htmlClass{maisight-sym-Q}{Q}, \htmlClass{maisight-sym-K}{K}, \htmlClass{maisight-sym-V}{V}) = \text{softmax}\left( \frac{\htmlClass{maisight-sym-Q}{Q} \htmlClass{maisight-sym-K}{K}^\top}{\sqrt{d_k}} \right) \htmlClass{maisight-sym-V}{V}

The numerator's normalization is cosine similarity when QQ and KK are unit-norm. The whole operation acts on tensors of shape (batch, heads, seq, d_k). einsum notation compresses the math: bhqd, bhkd -> bhqk.

Positional encoding adds a sinusoidal vector to each token embedding so the dot-product attention can distinguish positions:

PE(\htmlClass,2\htmlClass)=sin(\htmlClass/100002\htmlClass/d),PE(\htmlClass,2\htmlClass+1)=cos(\htmlClass/100002\htmlClass/d)PE_{(\htmlClass{maisight-sym-pos}{pos}, 2\htmlClass{maisight-sym-i}{i})} = \sin\left( \htmlClass{maisight-sym-pos}{pos} / 10000^{2\htmlClass{maisight-sym-i}{i}/d} \right), \quad PE_{(\htmlClass{maisight-sym-pos}{pos}, 2\htmlClass{maisight-sym-i}{i}+1)} = \cos\left( \htmlClass{maisight-sym-pos}{pos} / 10000^{2\htmlClass{maisight-sym-i}{i}/d} \right)

Inline math density

Inline math in dense paragraphs: when \htmlClass(\htmlClass)=\htmlClass2\htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x}) = \htmlClass{maisight-sym-x}{x}^2 and g(\htmlClass)=e\htmlClassg(\htmlClass{maisight-sym-x}{x}) = e^{\htmlClass{maisight-sym-x}{x}}, the chain rule gives dd\htmlClassg(\htmlClass(\htmlClass))=g(\htmlClass(\htmlClass))\htmlClass(\htmlClass)=e\htmlClass22\htmlClass\frac{d}{d\htmlClass{maisight-sym-x}{x}} g(\htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x})) = g'(\htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x})) \htmlClass{maisight-sym-f}{f}'(\htmlClass{maisight-sym-x}{x}) = e^{\htmlClass{maisight-sym-x}{x}^2} \cdot 2\htmlClass{maisight-sym-x}{x}. The terms "gradient" and "attention" appear in this very paragraph and should be marked in the prose, but the math expressions ff, gg, ex2e^{x^2} should NOT be touched by the walker.