M-spline basis functions for modeling and data transformation.
M-splines [1] are a type of spline basis function used for smooth curve fitting
and data representation. They are positive and integrate to one, making them
suitable for probabilistic models and density estimation. The order of an
M-spline defines its smoothness, with higher orders resulting in smoother
splines.
This class provides functionality to create M-spline basis functions, allowing
for flexible and smooth modeling of data. It inherits from the SplineBasis
abstract class, providing specific implementations for M-splines.
Parameters:
n_basis_funcs (int) – The number of basis functions to generate. More basis functions allow for
more flexible data modeling but can lead to overfitting.
order (int) – The order of the splines used in basis functions. Must be between [1,
n_basis_funcs]. Default is 2. Higher order splines have more continuous
derivatives at each interior knot, resulting in smoother basis functions.
bounds (Optional[Tuple[float, float]]) – The bounds for the basis domain. The default bounds[0] and bounds[1] are the
minimum and the maximum of the samples provided when evaluating the basis.
If a sample is outside the bounds, the basis will return NaN.
label (Optional[str]) – The label of the basis, intended to be descriptive of the task variable being processed.
For example: velocity, position, spike_counts.
References
Notes
MSplines must integrate to 1 over their domain (the area under the curve is 1). Therefore, if the domain
(x-axis) of an MSpline basis is expanded by a factor of \(\alpha\), the values on the co-domain
(y-axis) values will shrink by a factor of \(1/\alpha\).
For example, over the standard bounds of (0, 1), the maximum value of the MSpline is 18.
If we set the bounds to (0, 2), the maximum value will be 9, i.e., 18 / 2.
This uses PEP-487 [1] to set the set_{method}_request methods. It
looks for the information available in the set default values which are
set using __metadata_request__* class attributes, or inferred
from method signatures.
The __metadata_request__* class attributes are used when a method
does not explicitly accept a metadata through its arguments or if the
developer would like to specify a request value for those metadata
which are different from the default None.
Clone the basis while preserving attributes related to input shapes.
This method ensures that input shape attributes (e.g., _input_shape_product,
_input_shape_) are preserved during cloning. Reinitializing the class
as in the regular sklearn clone would drop these attributes, rendering
cross-validation unusable.
The basis is evaluated at the locations specified in the inputs. For example,
compute_features(np.array([0,.5])) would return the array:
b_1(0) ... b_n(0)
b_1(.5) ... b_n(.5)
where b_i is the i-th basis.
Parameters:
*xi (ArrayLike) – The input samples over which to apply the basis transformation. The samples can be passed
as multiple arguments, each representing a different dimension for multivariate inputs.
Evaluate the M-spline basis functions at given sample points.
Parameters:
sample_pts (NDArray) – The sample points at which the M-spline is evaluated.
sample_pts is a n-dimensional (n >= 1) array with first axis being the samples, i.e.
sample_pts.shape[0] == n_samples.
Return type:
NDArray
Returns:
An array where each column corresponds to one M-spline basis function
evaluated at the input sample points. The shape of the array is
(len(sample_pts), n_basis_funcs).
Notes
The implementation uses a recursive definition of M-splines. Boundary
conditions are handled such that the basis functions are positive and
integrate to one over the domain defined by the sample points.
Evaluate the M-spline basis functions on a uniformly spaced grid.
This method creates a uniformly spaced grid of sample points within the domain
[0, 1] and evaluates all the M-spline basis functions at these points. It is
particularly useful for visualizing the shape and distribution of the basis
functions across their domain.
Parameters:
n_samples (int) – The number of points in the uniformly spaced grid. A higher number of
samples will result in a more detailed visualization of the basis functions.
X (NDArray) – A 1D array of uniformly spaced sample points within the domain [0, 1].
Shape: (n_samples,).
Y (NDArray) – A 2D array where each row corresponds to the evaluated M-spline basis
function values at the points in X. Shape: (n_samples,n_basis_funcs).
Examples
Evaluate and visualize 4 M-spline basis functions of order 3:
compute_features() always returns a real-valued design matrix. For
complex bases (e.g., FourierEval), the real and imaginary parts are
returned as separate columns.
The number of output features can be determined only when the number of inputs
provided to the basis is known. Therefore, before the first call to compute_features,
this property will return None. After that call, or after setting the input shape with
set_input_shape, n_output_features will be available.
Set the expected input shape for the basis object.
This method configures the shape of the input data that the basis object expects.
xi can be specified as an integer, a tuple of integers, or derived
from an array. The method also calculates the total number of input
features and output features based on the number of basis functions.
The input shape specification.
- An integer: Represents the dimensionality of the input. A value of 1 is treated as scalar input.
- A tuple: Represents the exact input shape excluding the first axis (sample axis).
All elements must be integers.
An array: The shape is extracted, excluding the first axis (assumed to be the sample axis).
Raises:
ValueError – If a tuple is provided and it contains non-integer elements.
Returns:
Returns the instance itself to allow method chaining.
Return type:
self
Notes
All state attributes that depends on the input must be set in this method in order for
the API of basis to work correctly. In particular, this method is called by setup_basis,
which is equivalent to fit for a transformer. If any input dependent state
is not set in this method, then compute_features (equivalent to fit_transform) will break.
Examples
>>> importnemosasnmo>>> importnumpyasnp>>> basis=nmo.basis.MSplineEval(5)>>> # Configure with an integer input:>>> _=basis.set_input_shape(3)>>> basis.n_output_features15>>> # Configure with a tuple:>>> _=basis.set_input_shape((4,5))>>> basis.n_output_features100>>> # Configure with an array:>>> x=np.ones((10,4,5))>>> _=basis.set_input_shape(x)>>> basis.n_output_features100
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
This method corresponds sklearn transformer fit. As fit, it must receive the input and
it must set all basis states, i.e. kernel_ and all the states relative to the input shape.
The difference between this method and the transformer fit is in the expected input structure,
where the transformer fit method requires the inputs to be concatenated in a 2D array, while here
each input is provided as a separate time series for each basis element.
Decompose an array along a specified axis into sub-arrays based on the number of expected inputs.
This function takes an array (e.g., a design matrix or model coefficients) and splits it along
a designated axis.
How it works:
If the basis expects an input shape (n_samples,n_inputs), then the feature axis length will
be total_n_features=n_inputs*n_basis_funcs. This axis is reshaped into dimensions
(n_inputs,n_basis_funcs).
If the basis expects an input of shape (n_samples,), then the feature axis length will
be total_n_features=n_basis_funcs. This axis is reshaped into (1,n_basis_funcs).
For example, if the input array x has shape (1,2,total_n_features,4,5),
then after applying this method, it will be reshaped into (1,2,n_inputs,n_basis_funcs,4,5).
The specified axis (axis) determines where the split occurs, and all other dimensions
remain unchanged. See the example section below for the most common use cases.
Parameters:
x (NDArray) –
The input array to be split, representing concatenated features, coefficients,
or other data. The shape of x along the specified axis must match the total
number of features generated by the basis, i.e., self.n_output_features.
Examples:
For a design matrix: (n_samples,total_n_features)
For model coefficients: (total_n_features,) or (total_n_features,n_neurons).
axis (int, optional) – The axis along which to split the features. Defaults to 1.
Use axis=1 for design matrices (features along columns) and axis=0 for
coefficient arrays (features along rows). All other dimensions are preserved.
Raises:
ValueError – If the shape of x along the specified axis does not match self.n_output_features.
Returns:
A dictionary where:
Key: Label of the basis.
Value: the array reshaped to: (...,n_inputs,n_basis_funcs,...)