Design Matrices Construction#

There are two main supported uses for Category in nemos: using the category as a standalone predictor / main effect, and multiplying it by a continuous basis to estimate category-specific tuning curves. We’ll show both below.

Standalone Categorical Predictors#

To add a category as a main effect, drop one column after calling compute_features. The dropped category becomes the reference level and all remaining coefficients are contrasts against it.

For example, consider an experiment where a subject performs either a leftward or rightward turn on each trial, and let’s include the turn side as a predictor.

import numpy as np
import nemos as nmo

# Simulate data: 4 samples, two turn-side labels
turn_side = np.array(["L", "L", "R", "R"])
counts = np.array([10, 5, 10, 0])

cat_basis = nmo.basis.Category(["L", "R"])
X_cat = cat_basis.compute_features(turn_side)
X_cat = X_cat[:, 1:]  # "L" is the reference; remaining column codes "R" vs "L"

Warning

NeMoS GLMs include an intercept. Including all columns of a Category basis as a standalone predictor introduces perfect collinearity — the column sum equals the intercept column. Always drop one column per categorical variable when using categories as main effects. For a detailed discussion of identifiability and the effect of regularization, see Technical Note: Redundancy in Categorical Designs.

Splitting a Continuous Variable by Category#

The Category basis in NeMoS also allows you to estimate category-specific tuning curves by multiplying it with a continuous basis.

Continuing the previous example, let’s assume that we have also recorded the average animal speed per trial and suppose we want to learn how the neuron responds to speed depending on the turn side. You can multiply the Category basis by another basis to produce an appropriate design matrix:

speed = np.array([10., 3., 2., 20.])

# Category * continuous basis: one set of basis functions per category
bas = nmo.basis.Category(["L", "R"]) * nmo.basis.RaisedCosineLinearEval(3)
X = bas.compute_features(turn_side, speed)
print("X.shape: ", X.shape)  # (4, 6): 3 basis functions × 2 categories

X.shape:  (4, 6)

Complex Designs with `patsy` and `formulaic`#

For designs involving multiple categorical variables, higher-order interactions, or non-default contrast coding (sum-to-zero, Helmert, etc.), use patsy or formulaic to construct the design matrix. Those libraries resolve redundancies automatically and support a wide range of coding schemes.

Both libraries accept the same formula and produce equivalent design matrices; pick whichever you prefer.

patsy

import pandas as pd
import patsy

data = pd.DataFrame({
    'stimulus': ['Tri', 'Sq', 'Tri', 'Sq'],
    'context':  ['C',   'C',   'S',  'S'],
    'counts': [10, 5, 2, 0],
})

formula = "stimulus + context + stimulus:context"
design_df = patsy.dmatrix(formula, data, return_type="dataframe")

# patsy adds an intercept;
# drop it since NeMoS GLMs include one implicitly
design_df = design_df.drop(columns=["Intercept"])

formulaic

import pandas as pd
import formulaic

data = pd.DataFrame({
    'stimulus': ['Tri', 'Sq', 'Tri', 'Sq'],
    'context':  ['C',   'C',   'S',  'S'],
    'counts': [10, 5, 2, 0],
})

formula = "stimulus + context + stimulus:context"
design_df = formulaic.model_matrix(formula, data)

# formulaic adds an intercept;
# drop it since NeMoS GLMs include one implicitly
design_df = design_df.drop(columns=["Intercept"])

print("Design matrix:\n\n", design_df)

Design matrix:

    stimulus[T.Tri]  context[T.S]  stimulus[T.Tri]:context[T.S]
0              1.0           0.0                           0.0
1              0.0           0.0                           0.0
2              1.0           1.0                           1.0
3              0.0           1.0                           0.0

Full one-hot encoding of each term in the formula — the two categorical variables and their interaction — would have produced 8 columns, 4 of which would be redundant. patsy detects and drops all redundant columns automatically, guaranteeing that model coefficients are identifiable.

model = nmo.glm.GLM().fit(design_df, counts)

NeMoS Category basis provides a simple one-hot encoding of categorical variables. This is just one of the many encoding schemes that patsy provides.

For example, the encoding for one categorical predictor in NeMoS,

nmo.basis.Category(["Tri","Sq"]).compute_features(data["stimulus"])

Array([[0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.]], dtype=float32)

is equivalent to patsy’s,

patsy.dmatrix("0 + stimulus", data, return_type="dataframe")

	stimulus[Sq]	stimulus[Tri]
0	0.0	1.0
1	1.0	0.0
2	0.0	1.0
3	1.0	0.0

Similarly, the encoding for the interaction of two categories,

interaction = nmo.basis.Category(["Tri","Sq"]) * nmo.basis.Category(["C","S"])
interaction.compute_features(data["stimulus"], data["context"])

Array([[0., 0., 1., 0.],
       [1., 0., 0., 0.],
       [0., 0., 0., 1.],
       [0., 1., 0., 0.]], dtype=float32)

is equivalent to patsy’s

patsy.dmatrix("0 + context:stimulus", data, return_type="dataframe")

	context[C]:stimulus[Sq]	context[S]:stimulus[Sq]	context[C]:stimulus[Tri]	context[S]:stimulus[Tri]
0	0.0	0.0	1.0	0.0
1	1.0	0.0	0.0	0.0
2	0.0	0.0	0.0	1.0
3	0.0	1.0	0.0	0.0

NeMoS Category covers only basic encodings; for more complex design schemes, see patsy and formulaic.

Design Matrices Construction#

Standalone Categorical Predictors#

Splitting a Continuous Variable by Category#

Complex Designs with patsy and formulaic#

Complex Designs with `patsy` and `formulaic`#