Appendix — Decoding Social Intent from Neural Oscillations

Neural Signals and Calcium Imaging

Neurons communicate via brief electrical pulses called action potentials. When a neuron fires, calcium ions flood into the cell. We exploit this by using calcium imaging: neurons are genetically engineered to express GCaMP6s, a protein that fluoresces in the presence of calcium. A miniature microscope implanted on the mouse's head records this fluorescence, producing a brightness time series for each neuron at 30 frames per second.

GCaMP6s has a ~600 ms decay time, which acts as a low-pass filter on neural activity. Frequencies above ~7 Hz are not reliably resolved. This constrains our analysis to the 0.01–7 Hz range.

Our dataset contains 3,938 neurons across 18 sessions, totaling ~87 minutes of recording at 30 Hz.

Data Preprocessing

Linear detrending

Calcium signals exhibit slow downward drift from photobleaching (the fluorescent protein losing brightness over time). We remove this by fitting and subtracting a linear trend:

x_{\text{detrended}}(t) = x(t) - (at + b)

where $ a $ is the slope and $ b $ the intercept of the best-fit line.

Behavior label resampling

Behavior video (25 fps) and calcium imaging (30 fps) are recorded at different rates. We resample behavior labels to 30 fps using nearest-neighbor interpolation:

\text{idx}[i] = \text{round}\!\left(\frac{i \cdot (n-1)}{m-1}\right), \quad i = 0, 1, \ldots, m-1

where $ n $ is the number of behavior frames and $ m $ is the target number of calcium-matched frames.

Social labeling

Behavior video is processed through SLEAP (a deep-learning pose tracker, 15 body keypoints per animal). A frame is labeled "social" when the resident mouse's nose is within 10 pixels of the intruder's nearest body part.

Why 10 pixels? At the camera resolution used, 10 pixels corresponds to direct nose-to-body contact. Larger thresholds include non-interacting proximity; smaller thresholds miss interactions due to pose-tracker imprecision.

Social Modulation Index

The SMI quantifies each neuron's preference for social vs. solo states as a normalized difference in mean activity:

\text{SMI} = \frac{\bar{x}_{\text{social}} - \bar{x}_{\text{solo}}}{\bar{x}_{\text{social}} + \bar{x}_{\text{solo}} + \varepsilon}

where $ \varepsilon = 10^{-12} $ prevents division by zero when both means are near zero.

$ \text{SMI} = +1 $: exclusively active during social contact
$ \text{SMI} = 0 $: no difference between conditions
$ \text{SMI} = -1 $: exclusively suppressed during social contact

Significance testing

To test whether each neuron's SMI is significantly different from zero, a Wilcoxon rank-sum test compares the neuron's social and solo activity distributions. With 3,938 neurons tested simultaneously, the resulting p-values are corrected using Benjamini-Hochberg FDR (false discovery rate) at α = 0.05. This controls the expected proportion of false positives among all neurons declared significant:

Sort all $ m $ p-values: $ p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m)} $
Find the largest $ k $ such that $ p_{(k)} \leq \frac{k}{m} \cdot \alpha $
Reject all hypotheses $ i = 1, \ldots, k $

87% of neurons pass BH-FDR correction, showing significant modulation — but in both directions (some excited, some suppressed). This heterogeneity means population averaging cancels out opposing responses.

Why BH-FDR instead of Bonferroni? Bonferroni correction for 3,938 tests would set the threshold at $ 0.05 / 3938 = 1.27 \times 10^{-5} $, which is overly conservative. BH-FDR provides more statistical power by controlling the false discovery rate rather than the family-wise error rate.

The Fourier Transform

Any signal can be decomposed into a sum of sine waves at different frequencies. The Discrete Fourier Transform (DFT) computes this decomposition for sampled data. Given $ N $ samples $ x[0], \ldots, x[N-1] $:

X[k] = \sum_{n=0}^{N-1} x[n] \cdot e^{-j \, 2\pi \, kn / N}

$ x[n] $ — the $ n $-th sample (neuron brightness at time step $ n $)
$ X[k] $ — complex Fourier coefficient at frequency bin $ k $, encoding amplitude and phase
$ e^{-j2\pi kn/N} = \cos(2\pi kn/N) - j\sin(2\pi kn/N) $ — a rotating complex exponential that probes frequency $ k $

The power at frequency bin $ k $ is the squared magnitude:

P[k] = |X[k]|^2 = \text{Re}(X[k])^2 + \text{Im}(X[k])^2

Bin $ k $ corresponds to real-world frequency $ f = k \cdot f_s / N $ Hz.

Role in this project: The DFT is the mathematical foundation underlying every spectral method we use. We never call it directly — instead it is invoked inside Welch’s method (A6), which applies the FFT to overlapping, windowed segments. Understanding the DFT is essential for interpreting frequency resolution, the Nyquist limit, and why windowing is necessary.

Power Spectral Density: Welch's Method

A single FFT of the entire signal produces a noisy spectral estimate. Welch's method (1967) reduces this variance by averaging spectra across overlapping, windowed segments:

Segment the signal into overlapping blocks of length $ L $
Window each segment with a Hann function to suppress spectral leakage: $w[n] = 0.5 \left(1 - \cos\!\left(\frac{2\pi n}{L - 1}\right)\right)$
Compute the FFT of each windowed segment and take the squared magnitude
Average across all $ K $ segments: $\hat{S}(f_k) = \frac{1}{K} \sum_{i=1}^{K} \frac{1}{L \cdot U} \left| \sum_{n=0}^{L-1} w[n] \cdot x_i[n] \cdot e^{-j2\pi f_k n / f_s} \right|^2$ where $ U = \frac{1}{L}\sum w[n]^2 $ normalizes for windowing energy loss.

Figure A1. Welch's method: the signal is split into overlapping segments, each windowed and FFT'd, then the resulting power spectra are averaged to reduce noise.

Role in this project: Welch PSD is the workhorse of the entire quantitative pipeline. It is used in two places: (1) neuron spectral profiling — computing each neuron’s full-session PSD to derive fractional band powers for clustering (A17), and (2) per-window feature extraction — computing PSD within each 1-second epoch to produce the 6 classification features (A10). Every statistical test, effect size, and classification result in the paper traces back to Welch PSD estimates.

Frequency Bands and Band Power

Band	Range	Period	Captures
Infraslow	0.01 – 0.1 Hz	10 – 100 s	Very slow drifts; partly photobleaching artifact
Slow	0.1 – 1.0 Hz	1 – 10 s	Network-level synchronization
Delta	1.0 – 4.0 Hz	0.25 – 1 s	Slow rhythmic activity
Theta	4.0 – 7.0 Hz	0.14 – 0.25 s	Social behavior and attention

Why these boundaries? Band definitions follow Kuga et al. (2022). The 7 Hz theta ceiling reflects the GCaMP6s bandwidth limit rather than the traditional EEG theta ceiling of 8 Hz.

Band power integration

Total power in a band is computed by integrating the PSD using the trapezoidal rule:

P_{\text{band}} = \int_{f_{\text{lo}}}^{f_{\text{hi}}} S(f) \, df \approx \sum_{i} \frac{(f_{i+1} - f_i)\big(S(f_i) + S(f_{i+1})\big)}{2}

Fractional band power

For cross-neuron comparison, band power is normalized by total power:

P_{\text{frac}}^{(\text{band})} = \frac{P_{\text{band}}}{P_{\text{infraslow}} + P_{\text{slow}} + P_{\text{delta}} + P_{\text{theta}}}

This yields a value in [0, 1] representing what fraction of the neuron's power resides in each band. Fractional power is used for spectral clustering.

Role in this project: Band power integration converts the raw PSD curve into interpretable numbers — one per frequency band per neuron or window. Raw band powers are used as 4 of the 6 classification features (A10) and for the social-vs-solo statistical tests (Figure 7). Fractional band powers normalize out overall signal brightness, enabling fair comparison across neurons with different firing rates — these are the inputs to K-means spectral clustering (A17) that discover the two neuron subpopulations.

Bandpass Filtering: Butterworth

Purpose of bandpass filtering

In Section A7, we computed how much total power a neuron has in each frequency band. That tells us, on average, how strong each band is — but it collapses the entire recording into a single number. A bandpass filter does something different: it isolates a specific frequency band as a time-varying signal, so we can watch how that band's activity rises and falls moment to moment.

For example, applying a 4–7 Hz bandpass filter to a neuron's trace removes all activity outside the theta range. The output is a new waveform that oscillates purely at theta frequencies. This lets us overlay theta activity onto behavioral timestamps and visually inspect whether theta power increases during social contact (as shown in Figure 5 of the main paper).

How a bandpass filter works

Every filter has a frequency response $ H(f) $ that describes how much it amplifies or suppresses each frequency. For a bandpass filter, frequencies inside the passband (e.g., 4–7 Hz) pass through with gain ≈ 1 (unchanged), while frequencies outside the passband are multiplied by a gain close to 0 (suppressed). The transition between passband and stopband is not instantaneous — it follows a gradual rolloff curve.

The Butterworth design is defined by having the flattest possible response inside the passband. For a lowpass Butterworth filter of order $ N $ with cutoff $ f_c $:

|H(f)|^2 = \frac{1}{1 + \left(\frac{f}{f_c}\right)^{2N}}

At the cutoff frequency $ f = f_c $, the gain is exactly $ 1/\sqrt{2} \approx 0.707 $ (i.e., power is halved, a –3 dB point). Frequencies well below $ f_c $ pass through nearly unchanged; frequencies well above $ f_c $ are attenuated. For a bandpass filter, two cutoffs are combined: a lower cutoff $ f_{\text{lo}} $ and an upper cutoff $ f_{\text{hi}} $.

Figure A2. Butterworth bandpass frequency response for the theta band (4–7 Hz). Order 4 (blue) produces a sharper rolloff than order 2 (gray dashed). The passband region is shaded.

Filter order

The order $ N $ controls how sharply the filter transitions from passband to stopband. Higher orders produce sharper rolloff, but they also introduce more ringing (oscillatory artifacts near the edges) and can become numerically unstable. We use order $ N = 4 $:

Why order 4? Standard choice in neuroscience. Order 2 produces too gentle a rolloff (neighboring bands bleed into each other). Order 6+ can cause numerical errors at very low frequencies like the infraslow band (0.01 Hz). Order 4 provides a sharp enough transition without instability.

Role in this project: Bandpass filtering is used exclusively as a visualization tool, not in the quantitative pipeline. Combined with the Hilbert envelope (A9), it produces the band-decomposed time series in Figure 5 — making it possible to see theta and delta activity rise and fall in lockstep with social contact. This visual evidence was critical for building intuition about which frequency bands carry behavioral information, guiding the quantitative analysis that follows. The classification and clustering pipelines use Welch PSD band power integration (A6–A7) instead, which summarizes power per window as a single number.

Hilbert Transform and Signal Envelope

The problem: oscillations hide the trend

After bandpass filtering, a neuron's signal oscillates up and down at the band's frequency. During a period of strong theta activity, the filtered signal swings between large positive and negative values. During weak theta activity, the swings are small. The overall "loudness" of the oscillation is changing over time, but the signal itself keeps crossing zero, making it hard to directly compare against behavioral states.

We need a way to extract a smooth, non-negative curve that tracks how strong the oscillation is at each moment — ignoring the up-and-down cycling. This curve is called the envelope, or equivalently the instantaneous amplitude.

Intuition: the analytic signal

Consider a sine wave: $ x(t) = A \cos(2\pi f t) $. Its amplitude is $ A $, but measuring $ A $ from the signal alone is not straightforward because the signal passes through zero twice per cycle. If we had a second copy of the signal shifted by exactly 90° (a quarter cycle) — namely $ A \sin(2\pi f t) $ — we could combine them:

A = \sqrt{(A\cos\theta)^2 + (A\sin\theta)^2} = A

The squared cosine and squared sine always sum to 1, so the magnitude recovers the amplitude $ A $ regardless of where we are in the oscillation cycle. The Hilbert transform manufactures that companion signal — it shifts every frequency component by exactly −90° in phase without changing any amplitudes, so we can compute the amplitude at every time point.

The Hilbert Transform

The Hilbert Transform $ \mathcal{H}\{x(t)\} $ shifts every frequency component of $ x(t) $ by exactly −90° in phase, without changing any amplitudes. For a pure cosine, the Hilbert Transform produces a sine. For a complex, multi-frequency signal (like our bandpass-filtered neural trace), it shifts each constituent frequency independently by 90°.

Combining the original signal with its Hilbert Transform produces the analytic signal:

z(t) = x(t) + j \cdot \mathcal{H}\{x(t)\}

Here $ j = \sqrt{-1} $. The analytic signal is complex-valued: the real part is the original signal, and the imaginary part is the 90°-shifted version.

Extracting the envelope

The envelope is the magnitude (absolute value) of the analytic signal at each time point:

A(t) = |z(t)| = \sqrt{x(t)^2 + \mathcal{H}\{x(t)\}^2}

This produces a smooth, non-negative curve that traces the peaks of the oscillation. When the oscillation is large, the envelope is high. When the oscillation is small, the envelope is low. The rapid up-and-down cycling is removed, leaving only the slow amplitude modulation.

Figure A4. The Hilbert envelope (red) traces the instantaneous amplitude of a bandpass-filtered oscillation (gray). The envelope is always non-negative and smooth, making it easy to compare against behavioral states.

Why not just take the absolute value?

Taking $ |x(t)| $ directly would produce a bumpy, pulsating curve (flipping negative half-cycles upward) rather than a smooth envelope. The Hilbert approach yields a genuinely smooth curve because the 90° companion fills in the gaps between peaks, providing continuous amplitude information.

A10

Feature Extraction Pipeline

Purpose: Convert raw calcium traces into a compact set of numbers that a classifier can use — 6 spectral features per 1-second window.

Windowing and labeling

Each neuron's signal is divided into 1-second, non-overlapping windows (30 frames). Windows are labeled by behavioral purity:

Social: ≥ 70% of frames are social
Solo: ≤ 30% of frames are social
Discarded: everything between (ambiguous)

Why 1-second windows? This is the shortest window that still permits PSD estimation across all four bands (delta's lowest frequency is 1 Hz, requiring at least one full cycle). Shorter windows degrade frequency resolution; longer windows reduce the number of labeled windows and introduce label ambiguity. This was a practical tradeoff, not formally optimized.

Per-window processing

Each window is linearly detrended, then its Welch PSD is computed and integrated into band powers.

The 6 features

\mathbf{x} = \begin{bmatrix} P_{\text{infraslow}} \\ P_{\text{slow}} \\ P_{\text{delta}} \\ P_{\text{theta}} \\ H_{\text{spectral}} \\ P_{\text{theta}} / P_{\text{delta}} \end{bmatrix}

Feature	Description
Infraslow power	Integrated PSD, 0.01–0.1 Hz
Slow power	Integrated PSD, 0.1–1 Hz
Delta power	Integrated PSD, 1–4 Hz
Theta power	Integrated PSD, 4–7 Hz
Spectral entropy	Shannon entropy of the normalized PSD (see below)
Theta/delta ratio	$ P_\theta / \max(P_\delta, \, \varepsilon) $, where $ \varepsilon = 10^{-12} $

Spectral entropy

The PSD is normalized to a probability distribution, then Shannon entropy is computed:

H = -\sum_{i} p_i \ln(p_i), \quad \text{where } p_i = \frac{S(f_i)}{\sum_k S(f_k)}

Low entropy = power concentrated at few frequencies. High entropy = power spread broadly.

Theta/delta ratio

This ratio captures spectral shape (which band dominates) rather than raw power magnitude, making it robust to overall signal brightness. It emerged as the top discriminative feature for Cluster 1 classification.

A11

K-Means Clustering

Purpose: Discover whether neurons form distinct spectral subpopulations — splitting the population before classification reveals a signal that the whole-population average hides.

Neurons are clustered by their 4-dimensional fractional band power vectors (infraslow, slow, delta, theta fractions summing to 1). K-Means iterates between two steps until convergence:

Assign each neuron to its nearest centroid by Euclidean distance: $d(\mathbf{x}, \boldsymbol{\mu}_j) = \sqrt{\sum_{i=1}^{4} (x_i - \mu_{j,i})^2}$
Update each centroid to the mean of its assigned neurons: $\boldsymbol{\mu}_j = \frac{1}{|C_j|} \sum_{\mathbf{x} \in C_j} \mathbf{x}$

Cluster selection

We evaluated k = 2 through 6 using the silhouette score:

s(i) = \frac{b(i) - a(i)}{\max\!\big(a(i), b(i)\big)}

where $ a(i) $ = mean intra-cluster distance and $ b(i) $ = mean nearest-cluster distance. Range: [−1, +1]; higher is better. k = 2 consistently achieved the best score (~0.399) across all 18 sessions.

A12

Classification

Purpose: Test whether spectral features can actually predict social vs. solo behavior — the classifier is the evaluation tool, not the contribution.

Standardization

Features are z-scored using training-set statistics only:

x_{\text{scaled}} = \frac{x - \mu_{\text{train}}}{\sigma_{\text{train}}}

Classifiers

Three linear classifiers were tested:

LDA — finds the projection that maximizes between-class separation relative to within-class variance, assuming Gaussian distributions with shared covariance.
Linear SVM — finds the hyperplane that maximizes the margin between the two classes. Uses balanced class weights.
Logistic Regression — models class probability via the sigmoid function: $P(\text{social} \mid \mathbf{x}) = \frac{1}{1 + e^{-(\mathbf{w}^T \mathbf{x} + b)}}$ Predicts social if $ P > 0.5 $. Uses L2 regularization and balanced class weights.

Class balancing

Social and solo windows are often imbalanced. Balanced class weights increase the misclassification penalty for the minority class, preventing the classifier from defaulting to the majority label.

AUC-ROC

The primary metric is Area Under the ROC Curve. The ROC plots true positive rate vs. false positive rate across all classification thresholds. AUC = 0.5 indicates chance; AUC = 1.0 indicates perfect separation.

A13

Spatial Analysis

Purpose: Check whether the two spectral clusters are spatially segregated or intermixed — ruling out the possibility that cluster identity is just a location artifact.

Centroid extraction

Each neuron's spatial footprint (from CNMF-E source extraction) is reduced to a centroid via center of mass:

\bar{y} = \frac{\sum_{y,x} y \cdot A(y,x)}{\sum_{y,x} A(y,x)}, \quad \bar{x} = \frac{\sum_{y,x} x \cdot A(y,x)}{\sum_{y,x} A(y,x)}

Nearest-neighbor distances

Euclidean distances between centroids are computed to test spatial organization:

Intra-cluster NN: distance from each Cluster 1 neuron to its nearest Cluster 1 neighbor
Cross-cluster NN: distance from each Cluster 1 neuron to its nearest Cluster 0 neighbor

If Cluster 1 neurons were spatially grouped, intra-cluster NN would be smaller than cross-cluster NN. The opposite is observed: Cluster 1 neurons are spatially intermixed with Cluster 0, indicating spectral identity is an intrinsic cellular property, not location-dependent.

Weighted intensity maps

Each neuron's spatial footprint is weighted by its band power and overlaid to produce maps of the spatial distribution of delta and theta activity across the field of view.

A14

Cross-Validation

We use GroupKFold (5 folds) with recording sessions as groups. In each fold, all windows from a set of sessions are held out for testing while the remaining sessions are used for training. No data from a test session appears during training.

Performance is reported as the mean AUC across folds.

Figure A5. GroupKFold cross-validation. In each fold, entire recording sessions are held out for testing (red). No windows from a test session appear in training (blue).

Why GroupKFold? Standard KFold could place windows from the same session in both train and test sets. Consecutive windows within a session are temporally correlated, so this would leak information and inflate performance. GroupKFold prevents this by keeping entire sessions together.

A15

Permutation Testing

A permutation test determines whether classification performance exceeds what would be expected by chance. The procedure:

Compute the real cross-validated AUC: $ \text{AUC}_{\text{real}} $
Repeat 100 times: shuffle behavior labels randomly, retrain and retest, record $ \text{AUC}_{\text{perm}}^{(i)} $
Compute the p-value: $p = \frac{\#\{\text{AUC}_{\text{perm}}^{(i)} \geq \text{AUC}_{\text{real}}\}}{n_{\text{perm}}}$

If $ p < 0.05 $, the classifier performs significantly better than chance.

We run this test three times: once on the full population, and once on each spectral cluster separately. This is the key comparison — the whole-population classifier fails the permutation test (p = 0.196), but the Cluster 1 classifier passes (p = 0.020), revealing that the social signal is concentrated in the theta-enriched minority rather than spread across all neurons.

Figure A6. Permutation test. The gray histogram shows the distribution of AUC scores from 100 label-shuffled runs (null distribution). The red line marks the real classifier's AUC. The p-value is the fraction of shuffled AUCs that equal or exceed the real AUC.

Why 100 shuffles? This gives p-value resolution to 0.01, sufficient for significance testing at α = 0.05. More shuffles (e.g., 1,000) would give finer resolution at 10× the computational cost.

A16

Wilcoxon Rank-Sum Test

To test whether band power differs between social and solo windows, we use the Wilcoxon rank-sum test (also called the Mann-Whitney U test). This is a non-parametric test: it makes no assumptions about the shape of the data distribution, only that the two groups are independent.

The procedure:

Pool all social and solo band-power values and rank them from smallest to largest.
Sum the ranks belonging to one group (e.g., social): this is the test statistic $ W $.
Compare $ W $ to its expected distribution under the null hypothesis (no difference between groups) to obtain a p-value.

A small p-value means the band-power values in one condition tend to be systematically larger than the other — i.e., the two rank distributions are shifted apart. This test generates the p-values that Bonferroni correction (A16) is then applied to.

Why non-parametric? Band power distributions are typically right-skewed (many low values, a few very high ones). The Wilcoxon test does not assume normality, making it robust to these skewed distributions.

\|d\|	Interpretation
~0.2	Small effect
~0.5	Medium effect
~0.8	Large effect

Preprocessing