nemos.identifiability_constraints.apply_identifiability_constraints#

nemos.identifiability_constraints.apply_identifiability_constraints(feature_matrix, add_intercept=True, warn_if_float32=True)[source]#

Apply identifiability constraints to a design matrix X.

Removes columns from X until it is full rank to ensure the uniqueness of the GLM (Generalized Linear Model) maximum-likelihood solution. This is particularly crucial for models using bases like BSplines and CyclicBspline, which, due to their construction, sum to 1 and can cause rank deficiency when combined with an intercept.

For GLMs, this rank deficiency means that different sets of coefficients might yield identical predicted rates and log-likelihoods, complicating parameter learning, especially in the absence of regularization.

For very large feature matrices generated by a sum of low-dimensional basis components, consider apply_identifiability_constraints_by_basis_component.

Parameters:
  • feature_matrix (NDArray | JaxArray) – The design matrix before applying the identifiability constraints.

  • add_intercept (bool) – Set to True if your model will add an intercept term, False otherwise.

  • warn_if_float32 (bool) – Raise a warning if feature matrix dtype is float32.

Return type:

Tuple[NDArray, NDArray[int]]

Returns:

  • constrained_x – The adjusted design matrix with redundant columns dropped and columns mean-centered.

  • kept_columns – The columns that have been kept.

Examples

>>> import numpy as np
>>> from nemos.identifiability_constraints import apply_identifiability_constraints
>>> from nemos.basis import BSplineEval
>>> from nemos.glm import GLM
>>> import jax
>>> jax.config.update('jax_enable_x64', True)
>>> # define a feature matrix
>>> bas = BSplineEval(5) + BSplineEval(6)
>>> feature_matrix = bas.compute_features(np.random.randn(100), np.random.randn(100))
>>> # apply constraints
>>> constrained_x, kept_columns = apply_identifiability_constraints(feature_matrix)
>>> constrained_x.shape
(100, 9)
>>> kept_columns
array([ 1,  2,  3,  4,  6,  7,  8,  9, 10])

Notes

Compilation is triggered at every loop. This can be slower than pure python for low number of samples and low dimension for the feature matrix. Usually, the design matrices we work with have a large number of samples. Running the code on GPU will reduce the computation time significantly.