Appendix

I

Preprocessing

Neural Signals and Calcium Imaging

Neurons communicate via brief electrical pulses called action potentials. When a neuron fires, calcium ions flood into the cell. We exploit this by using calcium imaging: neurons are genetically engineered to express GCaMP6s, a protein that fluoresces in the presence of calcium. A miniature microscope implanted on the mouse's head records this fluorescence, producing a brightness time series for each neuron at 30 frames per second.

GCaMP6s has a ~600 ms decay time, which acts as a low-pass filter on neural activity. Frequencies above ~7 Hz are not reliably resolved. This constrains our analysis to the 0.01–7 Hz range.

Our dataset contains 3,938 neurons across 18 sessions, totaling ~87 minutes of recording at 30 Hz.

Sampling and the Nyquist Limit

The microscope captures 30 snapshots per second, giving a sampling rate of \( f_s = 30 \) Hz. The Nyquist-Shannon theorem states that the maximum resolvable frequency is half the sampling rate:

$$ f_{\text{max}} = \frac{f_s}{2} = \frac{30}{2} = 15 \text{ Hz} $$

Frequencies above the Nyquist limit cannot be accurately represented and produce distortion called aliasing. In practice, the GCaMP indicator's slow dynamics further restrict the useful bandwidth to ~7 Hz.

Data Preprocessing

Linear detrending

Calcium signals exhibit slow downward drift from photobleaching (the fluorescent protein losing brightness over time). We remove this by fitting and subtracting a linear trend:

$$ x_{\text{detrended}}(t) = x(t) - (at + b) $$

where \( a \) is the slope and \( b \) the intercept of the best-fit line.

Behavior label resampling

Behavior video (25 fps) and calcium imaging (30 fps) are recorded at different rates. We resample behavior labels to 30 fps using nearest-neighbor interpolation:

$$ \text{idx}[i] = \text{round}\!\left(\frac{i \cdot (n-1)}{m-1}\right), \quad i = 0, 1, \ldots, m-1 $$

where \( n \) is the number of behavior frames and \( m \) is the target number of calcium-matched frames.

Social labeling

Behavior video is processed through SLEAP (a deep-learning pose tracker, 15 body keypoints per animal). A frame is labeled "social" when the resident mouse's nose is within 10 pixels of the intruder's nearest body part.

Why 10 pixels? At the camera resolution used, 10 pixels corresponds to direct nose-to-body contact. Larger thresholds include non-interacting proximity; smaller thresholds miss interactions due to pose-tracker imprecision.

Social Modulation Index

The SMI quantifies each neuron's preference for social vs. solo states as a normalized difference in mean activity:

$$ \text{SMI} = \frac{\bar{x}_{\text{social}} - \bar{x}_{\text{solo}}}{\bar{x}_{\text{social}} + \bar{x}_{\text{solo}} + \varepsilon} $$

where \( \varepsilon = 10^{-12} \) prevents division by zero when both means are near zero.

Significance testing

To test whether each neuron's SMI is significantly different from zero, a Wilcoxon rank-sum test compares the neuron's social and solo activity distributions. With 3,938 neurons tested simultaneously, the resulting p-values are corrected using Benjamini-Hochberg FDR (false discovery rate) at α = 0.05. This controls the expected proportion of false positives among all neurons declared significant:

  1. Sort all \( m \) p-values: \( p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m)} \)
  2. Find the largest \( k \) such that \( p_{(k)} \leq \frac{k}{m} \cdot \alpha \)
  3. Reject all hypotheses \( i = 1, \ldots, k \)

87% of neurons pass BH-FDR correction, showing significant modulation — but in both directions (some excited, some suppressed). This heterogeneity means population averaging cancels out opposing responses.

Why BH-FDR instead of Bonferroni? Bonferroni correction for 3,938 tests would set the threshold at \( 0.05 / 3938 = 1.27 \times 10^{-5} \), which is overly conservative. BH-FDR provides more statistical power by controlling the false discovery rate rather than the family-wise error rate.
II

Neural Signal Processing

The Fourier Transform

Any signal can be decomposed into a sum of sine waves at different frequencies. The Discrete Fourier Transform (DFT) computes this decomposition for sampled data. Given \( N \) samples \( x[0], \ldots, x[N-1] \):

$$ X[k] = \sum_{n=0}^{N-1} x[n] \cdot e^{-j \, 2\pi \, kn / N} $$

The power at frequency bin \( k \) is the squared magnitude:

$$ P[k] = |X[k]|^2 = \text{Re}(X[k])^2 + \text{Im}(X[k])^2 $$

Bin \( k \) corresponds to real-world frequency \( f = k \cdot f_s / N \) Hz.

Role in this project: The DFT is the mathematical foundation underlying every spectral method we use. We never call it directly — instead it is invoked inside Welch’s method (A6), which applies the FFT to overlapping, windowed segments. Understanding the DFT is essential for interpreting frequency resolution, the Nyquist limit, and why windowing is necessary.

Power Spectral Density: Welch's Method

A single FFT of the entire signal produces a noisy spectral estimate. Welch's method (1967) reduces this variance by averaging spectra across overlapping, windowed segments:

  1. Segment the signal into overlapping blocks of length \( L \)
  2. Window each segment with a Hann function to suppress spectral leakage:
    $$ w[n] = 0.5 \left(1 - \cos\!\left(\frac{2\pi n}{L - 1}\right)\right) $$
  3. Compute the FFT of each windowed segment and take the squared magnitude
  4. Average across all \( K \) segments:
    $$ \hat{S}(f_k) = \frac{1}{K} \sum_{i=1}^{K} \frac{1}{L \cdot U} \left| \sum_{n=0}^{L-1} w[n] \cdot x_i[n] \cdot e^{-j2\pi f_k n / f_s} \right|^2 $$
    where \( U = \frac{1}{L}\sum w[n]^2 \) normalizes for windowing energy loss.
Full signal Overlapping segments (50% overlap) Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Hann window FFT |X|² Average PSD estimate repeat for each segment

Figure A1. Welch's method: the signal is split into overlapping segments, each windowed and FFT'd, then the resulting power spectra are averaged to reduce noise.

Role in this project: Welch PSD is the workhorse of the entire quantitative pipeline. It is used in two places: (1) neuron spectral profiling — computing each neuron’s full-session PSD to derive fractional band powers for clustering (A17), and (2) per-window feature extraction — computing PSD within each 1-second epoch to produce the 6 classification features (A10). Every statistical test, effect size, and classification result in the paper traces back to Welch PSD estimates.

Frequency Bands and Band Power

Band Range Period Captures
Infraslow0.01 – 0.1 Hz10 – 100 sVery slow drifts; partly photobleaching artifact
Slow0.1 – 1.0 Hz1 – 10 sNetwork-level synchronization
Delta1.0 – 4.0 Hz0.25 – 1 sSlow rhythmic activity
Theta4.0 – 7.0 Hz0.14 – 0.25 sSocial behavior and attention
Why these boundaries? Band definitions follow Kuga et al. (2022). The 7 Hz theta ceiling reflects the GCaMP6s bandwidth limit rather than the traditional EEG theta ceiling of 8 Hz.

Band power integration

Total power in a band is computed by integrating the PSD using the trapezoidal rule:

$$ P_{\text{band}} = \int_{f_{\text{lo}}}^{f_{\text{hi}}} S(f) \, df \approx \sum_{i} \frac{(f_{i+1} - f_i)\big(S(f_i) + S(f_{i+1})\big)}{2} $$

Fractional band power

For cross-neuron comparison, band power is normalized by total power:

$$ P_{\text{frac}}^{(\text{band})} = \frac{P_{\text{band}}}{P_{\text{infraslow}} + P_{\text{slow}} + P_{\text{delta}} + P_{\text{theta}}} $$

This yields a value in [0, 1] representing what fraction of the neuron's power resides in each band. Fractional power is used for spectral clustering.

Role in this project: Band power integration converts the raw PSD curve into interpretable numbers — one per frequency band per neuron or window. Raw band powers are used as 4 of the 6 classification features (A10) and for the social-vs-solo statistical tests (Figure 7). Fractional band powers normalize out overall signal brightness, enabling fair comparison across neurons with different firing rates — these are the inputs to K-means spectral clustering (A17) that discover the two neuron subpopulations.

Bandpass Filtering: Butterworth

Purpose of bandpass filtering

In Section A7, we computed how much total power a neuron has in each frequency band. That tells us, on average, how strong each band is — but it collapses the entire recording into a single number. A bandpass filter does something different: it isolates a specific frequency band as a time-varying signal, so we can watch how that band's activity rises and falls moment to moment.

For example, applying a 4–7 Hz bandpass filter to a neuron's trace removes all activity outside the theta range. The output is a new waveform that oscillates purely at theta frequencies. This lets us overlay theta activity onto behavioral timestamps and visually inspect whether theta power increases during social contact (as shown in Figure 5 of the main paper).

How a bandpass filter works

Every filter has a frequency response \( H(f) \) that describes how much it amplifies or suppresses each frequency. For a bandpass filter, frequencies inside the passband (e.g., 4–7 Hz) pass through with gain ≈ 1 (unchanged), while frequencies outside the passband are multiplied by a gain close to 0 (suppressed). The transition between passband and stopband is not instantaneous — it follows a gradual rolloff curve.

The Butterworth design is defined by having the flattest possible response inside the passband. For a lowpass Butterworth filter of order \( N \) with cutoff \( f_c \):

$$ |H(f)|^2 = \frac{1}{1 + \left(\frac{f}{f_c}\right)^{2N}} $$

At the cutoff frequency \( f = f_c \), the gain is exactly \( 1/\sqrt{2} \approx 0.707 \) (i.e., power is halved, a –3 dB point). Frequencies well below \( f_c \) pass through nearly unchanged; frequencies well above \( f_c \) are attenuated. For a bandpass filter, two cutoffs are combined: a lower cutoff \( f_{\text{lo}} \) and an upper cutoff \( f_{\text{hi}} \).

Frequency (Hz) Gain |H(f)| 0 2 4 7 10 15 1.0 0.5 0 PASSBAND -3 dB (0.707) f_lo = 4 Hz f_hi = 7 Hz Order 4 Order 2

Figure A2. Butterworth bandpass frequency response for the theta band (4–7 Hz). Order 4 (blue) produces a sharper rolloff than order 2 (gray dashed). The passband region is shaded.

Filter order

The order \( N \) controls how sharply the filter transitions from passband to stopband. Higher orders produce sharper rolloff, but they also introduce more ringing (oscillatory artifacts near the edges) and can become numerically unstable. We use order \( N = 4 \):

Why order 4? Standard choice in neuroscience. Order 2 produces too gentle a rolloff (neighboring bands bleed into each other). Order 6+ can cause numerical errors at very low frequencies like the infraslow band (0.01 Hz). Order 4 provides a sharp enough transition without instability.
Role in this project: Bandpass filtering is used exclusively as a visualization tool, not in the quantitative pipeline. Combined with the Hilbert envelope (A9), it produces the band-decomposed time series in Figure 5 — making it possible to see theta and delta activity rise and fall in lockstep with social contact. This visual evidence was critical for building intuition about which frequency bands carry behavioral information, guiding the quantitative analysis that follows. The classification and clustering pipelines use Welch PSD band power integration (A6–A7) instead, which summarizes power per window as a single number.

Hilbert Transform and Signal Envelope

The problem: oscillations hide the trend

After bandpass filtering, a neuron's signal oscillates up and down at the band's frequency. During a period of strong theta activity, the filtered signal swings between large positive and negative values. During weak theta activity, the swings are small. The overall "loudness" of the oscillation is changing over time, but the signal itself keeps crossing zero, making it hard to directly compare against behavioral states.

We need a way to extract a smooth, non-negative curve that tracks how strong the oscillation is at each moment — ignoring the up-and-down cycling. This curve is called the envelope, or equivalently the instantaneous amplitude.

Intuition: the analytic signal

Consider a sine wave: \( x(t) = A \cos(2\pi f t) \). Its amplitude is \( A \), but measuring \( A \) from the signal alone is not straightforward because the signal passes through zero twice per cycle. If we had a second copy of the signal shifted by exactly 90° (a quarter cycle) — namely \( A \sin(2\pi f t) \) — we could combine them:

$$ A = \sqrt{(A\cos\theta)^2 + (A\sin\theta)^2} = A $$

The squared cosine and squared sine always sum to 1, so the magnitude recovers the amplitude \( A \) regardless of where we are in the oscillation cycle. The Hilbert transform manufactures that companion signal — it shifts every frequency component by exactly −90° in phase without changing any amplitudes, so we can compute the amplitude at every time point.

The Hilbert Transform

The Hilbert Transform \( \mathcal{H}\{x(t)\} \) shifts every frequency component of \( x(t) \) by exactly −90° in phase, without changing any amplitudes. For a pure cosine, the Hilbert Transform produces a sine. For a complex, multi-frequency signal (like our bandpass-filtered neural trace), it shifts each constituent frequency independently by 90°.

Combining the original signal with its Hilbert Transform produces the analytic signal:

$$ z(t) = x(t) + j \cdot \mathcal{H}\{x(t)\} $$

Here \( j = \sqrt{-1} \). The analytic signal is complex-valued: the real part is the original signal, and the imaginary part is the 90°-shifted version.

Extracting the envelope

The envelope is the magnitude (absolute value) of the analytic signal at each time point:

$$ A(t) = |z(t)| = \sqrt{x(t)^2 + \mathcal{H}\{x(t)\}^2} $$

This produces a smooth, non-negative curve that traces the peaks of the oscillation. When the oscillation is large, the envelope is high. When the oscillation is small, the envelope is low. The rapid up-and-down cycling is removed, leaving only the slow amplitude modulation.

Time Amplitude weak oscillation strong oscillation Filtered signal x(t) Hilbert envelope A(t)

Figure A4. The Hilbert envelope (red) traces the instantaneous amplitude of a bandpass-filtered oscillation (gray). The envelope is always non-negative and smooth, making it easy to compare against behavioral states.

Why not just take the absolute value?

Taking \( |x(t)| \) directly would produce a bumpy, pulsating curve (flipping negative half-cycles upward) rather than a smooth envelope. The Hilbert approach yields a genuinely smooth curve because the 90° companion fills in the gaps between peaks, providing continuous amplitude information.

Role in this project: Bandpass filtering is used exclusively as a visualization tool, not in the quantitative pipeline. Combined with the Hilbert envelope (A9), it produces the band-decomposed time series in Figure 5 — making it possible to see theta and delta activity rise and fall in lockstep with social contact. This visual evidence was critical for building intuition about which frequency bands carry behavioral information, guiding the quantitative analysis that follows. The classification and clustering pipelines use Welch PSD band power integration (A6–A7) instead, which summarizes power per window as a single number.
III

Neural Population Analysis

Feature Extraction Pipeline

Purpose: Convert raw calcium traces into a compact set of numbers that a classifier can use — 6 spectral features per 1-second window.

Windowing and labeling

Each neuron's signal is divided into 1-second, non-overlapping windows (30 frames). Windows are labeled by behavioral purity:

Why 1-second windows? This is the shortest window that still permits PSD estimation across all four bands (delta's lowest frequency is 1 Hz, requiring at least one full cycle). Shorter windows degrade frequency resolution; longer windows reduce the number of labeled windows and introduce label ambiguity. This was a practical tradeoff, not formally optimized.

Per-window processing

Each window is linearly detrended, then its Welch PSD is computed and integrated into band powers.

The 6 features

$$ \mathbf{x} = \begin{bmatrix} P_{\text{infraslow}} \\ P_{\text{slow}} \\ P_{\text{delta}} \\ P_{\text{theta}} \\ H_{\text{spectral}} \\ P_{\text{theta}} / P_{\text{delta}} \end{bmatrix} $$
FeatureDescription
Infraslow powerIntegrated PSD, 0.01–0.1 Hz
Slow powerIntegrated PSD, 0.1–1 Hz
Delta powerIntegrated PSD, 1–4 Hz
Theta powerIntegrated PSD, 4–7 Hz
Spectral entropyShannon entropy of the normalized PSD (see below)
Theta/delta ratio\( P_\theta / \max(P_\delta, \, \varepsilon) \), where \( \varepsilon = 10^{-12} \)

Spectral entropy

The PSD is normalized to a probability distribution, then Shannon entropy is computed:

$$ H = -\sum_{i} p_i \ln(p_i), \quad \text{where } p_i = \frac{S(f_i)}{\sum_k S(f_k)} $$

Low entropy = power concentrated at few frequencies. High entropy = power spread broadly.

Theta/delta ratio

This ratio captures spectral shape (which band dominates) rather than raw power magnitude, making it robust to overall signal brightness. It emerged as the top discriminative feature for Cluster 1 classification.

K-Means Clustering

Purpose: Discover whether neurons form distinct spectral subpopulations — splitting the population before classification reveals a signal that the whole-population average hides.

Neurons are clustered by their 4-dimensional fractional band power vectors (infraslow, slow, delta, theta fractions summing to 1). K-Means iterates between two steps until convergence:

  1. Assign each neuron to its nearest centroid by Euclidean distance:
    $$ d(\mathbf{x}, \boldsymbol{\mu}_j) = \sqrt{\sum_{i=1}^{4} (x_i - \mu_{j,i})^2} $$
  2. Update each centroid to the mean of its assigned neurons:
    $$ \boldsymbol{\mu}_j = \frac{1}{|C_j|} \sum_{\mathbf{x} \in C_j} \mathbf{x} $$

Cluster selection

We evaluated k = 2 through 6 using the silhouette score:

$$ s(i) = \frac{b(i) - a(i)}{\max\!\big(a(i), b(i)\big)} $$

where \( a(i) \) = mean intra-cluster distance and \( b(i) \) = mean nearest-cluster distance. Range: [−1, +1]; higher is better. k = 2 consistently achieved the best score (~0.399) across all 18 sessions.

Classification

Purpose: Test whether spectral features can actually predict social vs. solo behavior — the classifier is the evaluation tool, not the contribution.

Standardization

Features are z-scored using training-set statistics only:

$$ x_{\text{scaled}} = \frac{x - \mu_{\text{train}}}{\sigma_{\text{train}}} $$

Classifiers

Three linear classifiers were tested:

Class balancing

Social and solo windows are often imbalanced. Balanced class weights increase the misclassification penalty for the minority class, preventing the classifier from defaulting to the majority label.

AUC-ROC

The primary metric is Area Under the ROC Curve. The ROC plots true positive rate vs. false positive rate across all classification thresholds. AUC = 0.5 indicates chance; AUC = 1.0 indicates perfect separation.

Spatial Analysis

Purpose: Check whether the two spectral clusters are spatially segregated or intermixed — ruling out the possibility that cluster identity is just a location artifact.

Centroid extraction

Each neuron's spatial footprint (from CNMF-E source extraction) is reduced to a centroid via center of mass:

$$ \bar{y} = \frac{\sum_{y,x} y \cdot A(y,x)}{\sum_{y,x} A(y,x)}, \quad \bar{x} = \frac{\sum_{y,x} x \cdot A(y,x)}{\sum_{y,x} A(y,x)} $$

Nearest-neighbor distances

Euclidean distances between centroids are computed to test spatial organization:

If Cluster 1 neurons were spatially grouped, intra-cluster NN would be smaller than cross-cluster NN. The opposite is observed: Cluster 1 neurons are spatially intermixed with Cluster 0, indicating spectral identity is an intrinsic cellular property, not location-dependent.

Weighted intensity maps

Each neuron's spatial footprint is weighted by its band power and overlaid to produce maps of the spatial distribution of delta and theta activity across the field of view.

IV

Evaluation

Cross-Validation

We use GroupKFold (5 folds) with recording sessions as groups. In each fold, all windows from a set of sessions are held out for testing while the remaining sessions are used for training. No data from a test session appears during training.

Performance is reported as the mean AUC across folds.

Fold Sess 1 Sess 2 Sess 3 Sess 4 Sess 5 ... Sess 18 1 ... 2 ... 3 ... TEST TEST TEST TEST TEST TEST Train Test Entire sessions stay together — no data leakage between train and test

Figure A5. GroupKFold cross-validation. In each fold, entire recording sessions are held out for testing (red). No windows from a test session appear in training (blue).

Why GroupKFold? Standard KFold could place windows from the same session in both train and test sets. Consecutive windows within a session are temporally correlated, so this would leak information and inflate performance. GroupKFold prevents this by keeping entire sessions together.

Permutation Testing

A permutation test determines whether classification performance exceeds what would be expected by chance. The procedure:

  1. Compute the real cross-validated AUC: \( \text{AUC}_{\text{real}} \)
  2. Repeat 100 times: shuffle behavior labels randomly, retrain and retest, record \( \text{AUC}_{\text{perm}}^{(i)} \)
  3. Compute the p-value:
    $$ p = \frac{\#\{\text{AUC}_{\text{perm}}^{(i)} \geq \text{AUC}_{\text{real}}\}}{n_{\text{perm}}} $$

If \( p < 0.05 \), the classifier performs significantly better than chance.

We run this test three times: once on the full population, and once on each spectral cluster separately. This is the key comparison — the whole-population classifier fails the permutation test (p = 0.196), but the Cluster 1 classifier passes (p = 0.020), revealing that the social signal is concentrated in the theta-enriched minority rather than spread across all neurons.

AUC score Count 0.44 0.48 0.52 0.56 0.60 Real AUC = 0.570 p-value region Null distribution (100 shuffled AUCs) Real classifier AUC

Figure A6. Permutation test. The gray histogram shows the distribution of AUC scores from 100 label-shuffled runs (null distribution). The red line marks the real classifier's AUC. The p-value is the fraction of shuffled AUCs that equal or exceed the real AUC.

Why 100 shuffles? This gives p-value resolution to 0.01, sufficient for significance testing at α = 0.05. More shuffles (e.g., 1,000) would give finer resolution at 10× the computational cost.

Wilcoxon Rank-Sum Test

To test whether band power differs between social and solo windows, we use the Wilcoxon rank-sum test (also called the Mann-Whitney U test). This is a non-parametric test: it makes no assumptions about the shape of the data distribution, only that the two groups are independent.

The procedure:

  1. Pool all social and solo band-power values and rank them from smallest to largest.
  2. Sum the ranks belonging to one group (e.g., social): this is the test statistic \( W \).
  3. Compare \( W \) to its expected distribution under the null hypothesis (no difference between groups) to obtain a p-value.

A small p-value means the band-power values in one condition tend to be systematically larger than the other — i.e., the two rank distributions are shifted apart. This test generates the p-values that Bonferroni correction (A16) is then applied to.

Why non-parametric? Band power distributions are typically right-skewed (many low values, a few very high ones). The Wilcoxon test does not assume normality, making it robust to these skewed distributions.

Cohen's d Effect Size

A p-value indicates whether an effect exists; Cohen's d quantifies its magnitude. It measures the distance between two group means in units of pooled standard deviation:

$$ d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}, \quad s_p = \sqrt{\frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{n_1 + n_2 - 2}} $$
|d|Interpretation
~0.2Small effect
~0.5Medium effect
~0.8Large effect

Theta-band social vs. solo: d = 0.235 (small but consistent across 14/18 sessions). Delta-band: d = 0.069.

Multiple Comparison Correction

Testing 4 frequency bands simultaneously inflates the false-positive rate. The Bonferroni correction divides the significance threshold by the number of tests:

$$ \alpha_{\text{corrected}} = \frac{\alpha}{m} = \frac{0.05}{4} = 0.0125 $$

Delta and theta pass Bonferroni correction (both p < 0.001), with theta showing the strongest effect. Infraslow (p = 0.435) and slow (p = 0.088) do not reach significance.