aurora.transfer_function.regression package¶
Submodules¶
aurora.transfer_function.regression.RME module¶
This module contains the regression M-estimator for non-remote reference regression.
Development Notes:
follows Gary’s TRME.m in iris_mt_scratch/egbert_codes-20210121T193218Z-001/egbert_codes/matlabPrototype_10-13-20/TF/classes
(Complex) regression-M estimate for the model Y = X*b Allows multiple columns of Y, but estimates b for each column separately
TODO: Consider making a QR-Estimator class between RegressionEstimator and RMEEstimator
Notes: QR-decomposition The QR-decomposition is employed on the matrix of input variables X. X = Q R where Q is unitary/orthogonal and R upper triangular. Since X is [n_data x n_channels_in] Q is [n_data x n_data].
Wikipedia has a nice description of the QR factorization: https://en.wikipedia.org/wiki/QR_decomposition The point of the QR decomposition is to transform the data into a domain where the inversion is done with a triangular matrix.
The symbol QH will denote the conjugate transpose of the matrix Q.
We employ here the “economical” form of the QR decomposition, so that Q is not square, and not in fact unitary. This is because its inverse is not defined (as it isn’t square). Q does however obey: Q.H @ Q = I.
Really Q = [Q1 | Q2] where Q1 has as many columns as there are input variables and Q2 is a matrix of zeros.
QQH is the projection matrix, or hat matrix equivalent to X(XHX)^-1XH. https://math.stackexchange.com/questions/3360485/projection-matrix-expressed-as-qr-identity
The quantity QHY floating around is a convenience matrix that makes computing the predicted data less numerically expensive. The use of QHY is not so much physically meaningful as it is a trick to compute more efficiently the predicted data QQHY.
The predicted data Y_hat = QQHY (since QQH is proj mtx) but, when computing the sums of squares of Y_hat, such as we do in the error variances calculation, we can compute QHY2 = np.linalg.norm(QHY, axis=0) ** 2 instead of QQHY2 = Y_hat2 = np.linalg.norm(Y_hat, axis=0) ** 2 since Y_hat.H @ Y_hat = QQHY.H @ QQHY = QHY.H @ Q.H @ Q @ QHY = QHY.H @ QHY.
The predicted data has to lie in span of the columns in the design matrix X. The predicted data has to be a linear combination of the columns of Y. Q is an orthogonal basis for the columns of X. The norms of QQHY and QHY are the same
MATLAB Documentation: https://www.mathworks.com/help/matlab/ref/qr.html
Matlab’s reference to the “economy” representation is what Trefethen and Bau call the “reduced QR factorization”. Golub & Van Loan (1996, §5.2) call Q1R1 the thin QR factorization of A;
There are several discussions online about the differences in numpy, scipy, sklearn, skcuda etc. https://mail.python.org/pipermail/numpy-discussion/2012-November/064485.html We will default to using numpy for now. Note that numpy’s default is to use the “reduced” form of Q, R. R isupper-right triangular.
This is cute: https://stackoverflow.com/questions/26932461/conjugate-transpose-operator-h-in-numpy
The Matlab mldivide flowchart can be found here: https://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator And the matlab mldivide documentation here http://matlab.izmiran.ru/help/techdoc/ref/mldivide.html
- class aurora.transfer_function.regression.RME.RME(**kwargs)[source]¶
Bases:
MEstimator
- Attributes:
Q
Returns the Q matrix from the QR decomposition
QH
Returns the conjugate transpose of the Q matrix from the QR decomposition
QHY
Returns the QH @ Y
QHYc
Returns the matrix QH @ Yc This is Q.conj().T from the QR decomposition multiplied with the cleaned data.
R
Returns the R matrix from the QR decomposition
Y_hat
returns the most recent predicted data
correction_factor
Return teh correction factor for the residual variance.
degrees_of_freedom
gets the number of degress of freedom in the dataset.
input_channel_names
returns the list of channel names associated with X
inverse_signal_covariance
Returns the inverse signal covariance matrix of the input channels as xarray
is_underdetermined
Check if the regression problem is under-determined
n_channels_in
Returns the number of input channels.
n_channels_out
number of output variables
n_data
Return the number of multivariate observations in the regression dataset.
noise_covariance
Returns the noise covariance matrix of the output channels as xarray
output_channel_names
returns the list of channel names associated with Y
r0
returns the Huber r0 value
residual_variance
returns the residual variance
u0
returns the u0 threshold for redescending regression step
Methods
apply_huber_regression
()This is the 'convergence loop' from RME, RME_RR
apply_redecending_influence_function
()Performs one or two iterations with re-descending influence curve cleaned data
b_to_xarray
()Wraps the TF results as an xarray
Computes the inverse signal covariance matrix of the input channels.
compute_noise_covariance
()Computes the noise covariance (covariance of the residuals)
compute_squared_coherence
()Updates the array self.R2
estimate
()Executes the regression
estimate_ols
([mode])Solve Y = Xb with ordinary least squares, not robust regression.
initial_estimate
()Make first estimate of TF (b), Y_hat, and residual_variance
qr_decomposition
([X, sanity_check])performs QR decomposition on input matrix X.
residual_variance_method1
()returns the residual variance of the output channels.
residual_variance_method2
()Returns the residual variance of the output channels (error variances).
solve_underdetermined
()Solves the regression problem if it is under-determined -- Not Stable
update_QHYc
()Updates the QHYc matrix with the most recent cleaned data
update_b
()Solve the regression problem using the latest cleaned data via QR-decompostion.
update_residual_variance
([correction_factor])updates residual variance from most recent cleaned and predicted data
update_y_cleaned_via_huber_weights
()Updates the values of self.Yc and self.expectation_psi_prime
update_y_cleaned_via_redescend_weights
()Updates estimate for self.Yc as a match-filtered sum of Y and Y_hat.
Update the predicted_data $hat{Y}$
- compute_inverse_signal_covariance() None [source]¶
Computes the inverse signal covariance matrix of the input channels.
Development Notes: Note that because X = QR, we have X.H @ X = (QR).H @ QR = R.H Q.H Q R = R.H @ R i.e. computing R.H @ R below is just computing the signal covariance matrix of X
- update_b() None [source]¶
Solve the regression problem using the latest cleaned data via QR-decompostion. matlab was: b = RQTY;
aurora.transfer_function.regression.RME_RR module¶
This module contains the regression M-estimator for remote reference regression.
Development Notes: - follows Gary’s TRME.m in iris_mt_scratch/egbert_codes-20210121T193218Z-001/egbert_codes/matlabPrototype_10-13-20/TF/classes
- class aurora.transfer_function.regression.RME_RR.RME_RR(**kwargs)[source]¶
Bases:
MEstimator
- Attributes:
Q
Returns the Q matrix from the QR decomposition
QH
Returns the conjugate transpose of the Q matrix from the QR decomposition
QHX
Returns the matrix QH @ X
QHY
Returns the QH @ Y
QHYc
Returns the matrix QH @ Yc This is Q.conj().T from the QR decomposition multiplied with the cleaned data.
R
Returns the R matrix from the QR decomposition
Y_hat
returns the most recent predicted data
correction_factor
Return teh correction factor for the residual variance.
degrees_of_freedom
gets the number of degress of freedom in the dataset.
input_channel_names
returns the list of channel names associated with X
inverse_signal_covariance
Returns the inverse signal covariance matrix of the input channels as xarray
is_underdetermined
Check if the regression problem is under-determined
n_channels_in
Returns the number of input channels.
n_channels_out
number of output variables
n_data
Return the number of multivariate observations in the regression dataset.
noise_covariance
Returns the noise covariance matrix of the output channels as xarray
output_channel_names
returns the list of channel names associated with Y
r0
returns the Huber r0 value
residual_variance
returns the residual variance
u0
returns the u0 threshold for redescending regression step
Methods
apply_huber_regression
()This is the 'convergence loop' from RME, RME_RR
apply_redecending_influence_function
()Performs one or two iterations with re-descending influence curve cleaned data
b_to_xarray
()Wraps the TF results as an xarray
Raises exception if not enough data for remote reference estimate
Raises exception if NaN in any of the regression data
Raises exception if data shape check fails.
Computes the inverse signal covariance matrix of the input channels.
compute_noise_covariance
()Computes the noise covariance (covariance of the residuals)
compute_squared_coherence
()Updates the array self.R2
estimate
()Executes the regression
estimate_ols
([mode])Solve Y = Xb with ordinary least squares, not robust regression.
initial_estimate
()Make first estimate of TF (b), Y_hat, and residual_variance
qr_decomposition
([X, sanity_check])performs QR decomposition on input matrix X.
residual_variance_method1
()returns the residual variance of the output channels.
residual_variance_method2
()Returns the residual variance of the output channels (error variances).
solve_underdetermined
()Solves the regression problem if it is under-determined -- Not Stable
update_QHYc
()Updates the QHYc matrix with the most recent cleaned data
update_b
()Updates the tf estimate data
update_residual_variance
([correction_factor])updates residual variance from most recent cleaned and predicted data
update_y_cleaned_via_huber_weights
()Updates the values of self.Yc and self.expectation_psi_prime
update_y_cleaned_via_redescend_weights
()Updates estimate for self.Yc as a match-filtered sum of Y and Y_hat.
updates the predicted data
- check_for_enough_data_for_rr_estimate() None [source]¶
Raises exception if not enough data for remote reference estimate
- compute_inverse_signal_covariance() DataArray [source]¶
Computes the inverse signal covariance matrix of the input channels.
Development Notes: Original Matlab code was basically this one-liner: Cov_SS = (self.ZH @self.X) (self.XH @ self.X) / (self.XH @self.Z) I broke the above line into B/A where B = (self.ZH @self.X) (self.XH @ self.X), and A = (self.XH @self.Z) Then follow matlab cookbook, B/A for matrices = (A’B’)
aurora.transfer_function.regression.base module¶
- This module contains the base class for regression functions.
It follows Gary Egbert’s EMTF Matlab code TRegression.m in which can be found in
iris_mt_scratch/egbert_codes-20210121T193218Z-001/egbert_codes/matlabPrototype_10-13-20/TF/classes
This class originally used numpy arrays to make adapting the Matlab easier, but experimental support for xarray is now added (2024).
- class aurora.transfer_function.regression.base.RegressionEstimator(X: ~xarray.core.dataset.Dataset | ~xarray.core.dataarray.DataArray | ~numpy.ndarray, Y: ~xarray.core.dataset.Dataset | ~xarray.core.dataarray.DataArray | ~numpy.ndarray, iter_control: ~aurora.transfer_function.regression.iter_control.IterControl = <aurora.transfer_function.regression.iter_control.IterControl object>, input_channel_names: list | None = None, output_channel_names: list | None = None, **kwargs)[source]¶
Bases:
object
Abstract base class for solving Y = X*b + epsilon for b, complex-valued
Many of the robust transfer estimation methods we will use repeat the model of solving Y = X*b +epsilon for b. X is variously called the “input”, “predictor”, “explanatory”, “confounding”, “independent” “exogenous”, variable(s) or the “design matrix”, “model matrix” or “regressor matrix”. Y are variously called the “output”, “predicted”, “outcome”, “response”, “endogenous”, “regressand”, or “dependent” variable. I will try to use input and output.
When we “regress Y on X”, we use the values of variable X to predict those of Y.
Typically operates on single “frequency_band”. Allows multiple columns of Y, but estimates b for each column separately.
Estimated signal and noise covariance matrices can be used to compute error together to compute covariance for the matrix of regression coefficients b
If the class has public attributes, they may be documented here in an
Attributes
section and follow the same formatting as a function’sArgs
section. Alternatively, attributes may be documented inline with the attribute’s declaration (see __init__ method below).Properties created with the
@property
decorator should be documented in the property’s getter method.- _X¶
The underlying dataset is assumed to be if shape nCH x nObs (normally 2-dimensional) These are the input variables. In the matlab codes each observation corresponds to a row and each parameter (channel) is a column. These data are transposed before regression
- Type:
Union[xr.Dataset, xr.DataArray, np.ndarray]
- X¶
This is a “pure array” representation of _X used to emulate Gary Egbert’s matlab codes. It may or may not be deprecated.
- Type:
numpy array (normally 2-dimensional)
- _Y¶
These are the output variables, arranged same as X above.
- Type:
xarray.Dataset
- Y¶
This is a “pure array” representation of _Y used to emulate Gary Egbert’s matlab codes. It may or may not be deprecated.
- Type:
numpy array (normally 2-dimensional)
- b¶
Matrix of regression coefficients, i.e. the solution to the regression problem. In our context they are the “Transfer Function” The dimension of b, to match X and Y is [n_input_ch, n_output_ch] When we are solving “channel-by-channel”, b is usually [2,1].
- Type:
numpy array (normally 2-dimensional)
- inverse_signal_covariance¶
This was Cov_SS in Gary’s matlab codes. It is basically inv(X.H @ X) Reference needed
- Type:
numpy array (n_input_ch, n_input_ch)
- noise_covariance¶
This was Cov_NN in Gary’s matlab codes Reference needed
- Type:
numpy array (n_output_ch, n_output_ch)
- squared_coherence¶
This was R2 in Gary’s matlab codes Formula? Reference? Squared coherence (top row is using raw data, bottom cleaned, with crude correction for amount of downweighted data)
- Type:
numpy array (n_output_ch)
- Yc¶
A “cleaned” version of Y the output variables.
- Type:
numpy array (normally 2-dimensional)
- iter_control¶
is a structure which controls the robust scheme iteration. On return also contains number of iterations.
- Type:
transfer_function.iter_control.IterControl()
- Kwargs¶
- Attributes:
Q
Returns the Q matrix from the QR decomposition
QH
Returns the conjugate transpose of the Q matrix from the QR decomposition
QHY
Returns the QH @ Y
R
Returns the R matrix from the QR decomposition
degrees_of_freedom
gets the number of degress of freedom in the dataset.
input_channel_names
returns the list of channel names associated with X
inverse_signal_covariance
Returns the inverse signal covariance matrix of the input channels as xarray
is_underdetermined
Check if the regression problem is under-determined
n_channels_in
Returns the number of input channels.
n_channels_out
number of output variables
n_data
Return the number of multivariate observations in the regression dataset.
noise_covariance
Returns the noise covariance matrix of the output channels as xarray
output_channel_names
returns the list of channel names associated with Y
Methods
Wraps the TF results as an xarray
estimate
()Executes the regression
estimate_ols
([mode])Solve Y = Xb with ordinary least squares, not robust regression.
qr_decomposition
([X, sanity_check])performs QR decomposition on input matrix X.
Solves the regression problem if it is under-determined -- Not Stable
- property QHY: ndarray¶
Returns the QH @ Y
QHY is a convenience matrix that makes computing the predicted data (QQHY) more efficient.
- property degrees_of_freedom: int¶
gets the number of degress of freedom in the dataset. Returns int
The total number of multivariate observations minus the number of input channels.
- estimate() ndarray [source]¶
Executes the regression
- Returns:
b – The regression solution to Y = Xb
- Return type:
np.ndarray
- estimate_ols(mode: str | None = 'solve') ndarray [source]¶
Solve Y = Xb with ordinary least squares, not robust regression.
Development Notes: Brute Force tends to be less stable because we actually compute the inverse matrix. It is not recommended, its just here for completeness. X’Y=X’Xb (X’X)^-1 X’Y = b
- Parameters:
mode (str) – must be one of [“qr”, “brute_force”, “solve”]
- Returns:
b – Normally the impedance tensor Z
- Return type:
numpy array
- property inverse_signal_covariance: ndarray¶
Returns the inverse signal covariance matrix of the input channels as xarray
- property is_underdetermined: bool¶
Check if the regression problem is under-determined :returns: True if regression problem is under-determined, otherwise False :rtype: bool
- property n_channels_in: int¶
Returns the number of input channels.
- Returns:
The number of channels of input data (columns of X)
- Return type:
- property n_channels_out: int¶
number of output variables
Returns int
number of output variables (Assumed to be num columns of a 2D array)
- property n_data: int¶
Return the number of multivariate observations in the regression dataset.
- Development Notes:
This may need to be modified if we use masked arrays
- Returns:
The number of rows in the input data vector
- Return type:
- property noise_covariance: DataArray¶
Returns the noise covariance matrix of the output channels as xarray
- qr_decomposition(X: ndarray | None = None, sanity_check: bool = False) tuple [source]¶
performs QR decomposition on input matrix X.
If X is not provided as an optional argument then check the value of self.qr_input
- Parameters:
X (numpy array) – In RME this is the Input channels X In RME_RR this is the RR channels Z
sanity_check (boolean) – check QR decomposition is working correctly. Set to True for debugging. Can probably be deprecated.
- Returns:
Q, R – The matrices Q and R from the QR-decomposition. X = Q R where Q is unitary/orthogonal and R upper triangular.
- Return type:
- solve_underdetermined() None [source]¶
Solves the regression problem if it is under-determined – Not Stable
Development Notes: 20210806 This method was originally in TRME.m, but it does not depend in general on using RME method so putting it in the base class.
We basically never get here and when we do, we don’t trust the results https://docs.scipy.org/doc/numpy-1.9.2/reference/generated/numpy.linalg.svd.html https://www.mathworks.com/help/matlab/ref/double.svd.html
Note that the svd outputs are different in matlab and numpy https://stackoverflow.com/questions/50930899/svd-command-in-python-v-s-matlab “The SVD of a matrix can be written as
A = U S V^H
Where the ^H signifies the conjugate transpose. Matlab’s svd command returns U, S and V, while numpy.linalg.svd returns U, the diagonal of S, and V^H. Thus, to get the same S and V as in Matlab you need to reconstruct the S and also get the V:
ORIGINAL MATLAB
% Overdetermined problem…use svd to invert, return % NOTE: the solution IS non - unique… and by itself RME is not setup % to do anything sensible to resolve the non - uniqueness(no prior info % is passed!). This is stop-gap, to prevent errors when using RME as % part of some other estimation scheme! [u,s,v] = svd(obj.X,’econ’); sInv = 1./diag(s); obj.b = v*diag(sInv)*u’*obj.Y; if ITER.returnCovariance
obj.Cov_NN = zeros(K,K); obj.Cov_SS = zeros(nParam,nParam);
end result = obj.b
aurora.transfer_function.regression.helper_functions module¶
This module contains methods for performing regression to solve Y = Xb.
- aurora.transfer_function.regression.helper_functions.direct_solve_tf(Y, X, R=None) ndarray [source]¶
Solve problem Y = Xb for b.
- This function can be used for testing. It is not as stable as using simple_solve_tf,
but it is instructive to have an example of regression using the crudest approach.
- Parameters:
Y (numpy.ndarray) – The “output” of regression problem. For testing this often has shape (N,). Y is normally a vector of Electric field FCs
X (numpy.ndarray) – The “input” of regression problem. For testing this often has shape (N,2). X is normally an array of Magnetic field FCs (two-component)
R (numpy.ndarray or None) – Remote reference channels (optional)
- Returns:
z – The TF estimate – Usually two complex numbers, (Zxx, Zxy), or (Zyx, Zyy)
- Return type:
- aurora.transfer_function.regression.helper_functions.rme_beta(r0: float) float [source]¶
Returns a normalization factor to correct for bias in residual variance.
Details: - This is an RME specific property. It represents a bias in the calculation of residual_variance which we correct for in TRME and TRME_RR. The implemented formula is an approximation. This is approximately equal to 1/beta where beta is defined by Equation A3 in Egbert & Booker 1986.
For r0=1.5 the correction factor is ~1.28 and beta ~0.78
Some notes from a discussion with Gary: In the regression estimate you down-weight things with large errors, but you need to define what is large. You estimate the standard deviation (sigma) of the errors from the residuals BUT with this cleaned data approach (Yc) sigma is smaller than it should be. You need to compensate for this by using a correction_factor. It’s basically the expectation, if the data really were Gaussian, and you estimated from the corrected data. This is how much too small the estimate would be.
If you change the penalty functional you may need pencil, paper and some calculus. The relationship between the corrected-data-residuals and the Gaussian residuals could change if you change the penalty.
- aurora.transfer_function.regression.helper_functions.simple_solve_tf(Y, X, R=None) ndarray [source]¶
Cast problem Y = Xb into scipy.linalg.solve form which solves: a @ x = b
Note that the “b” in the two equations is different. - The EMTF regression problem (and many regression problems in general) use Y=Xb - Y, X are known and b is the solution - scipy.linalg.solve form which solves: a @ x = b - Here a, b are known and x is the solution.
This function is used for testing vectorized, direct solver.
- Parameters:
Y (numpy.ndarray) – The “output” of regression problem. For testing this often has shape (N,). Y is normally a vector of Electric field FCs
X (numpy.ndarray) – The “input” of regression problem. For testing this often has shape (N,2). X is normally an array of Magnetic field FCs (two-component)
R (numpy.ndarray or None) – Remote reference channels (optional)
- Returns:
z – The TF estimate – Usually two complex numbers, (Zxx, Zxy), or (Zyx, Zyy)
- Return type:
aurora.transfer_function.regression.iter_control module¶
This module contains a class that holds parameters to control the iterations of robust regression including convergence criteria and thresholds.
Based on Gary’s IterControl.m in iris_mt_scratch/egbert_codes-20210121T193218Z-001/egbert_codes/matlabPrototype_10-13-20/TF/classes
- class aurora.transfer_function.regression.iter_control.IterControl(max_number_of_iterations: int = 10, max_number_of_redescending_iterations: int = 2, r0: float = 1.4, u0: float = 2.8, tolerance: float = 0.0005, verbosity: int = 0)[source]¶
Bases:
object
Notes
Here is a class to hold variables that are used to control the regression - currently this is used for variations on RME (Robust M-Estimator) - TODO: in the original matlab code there was a class property called epsilon, but it was unused; There was no documentation about it’s purpose, except possibly that the abstract base class solved Y = X*b + epsilon for b, complex-valued. Perhaps this was intended as an intrinsic tolerated noise level. The value of epsilon was set to 1000. - TODO The return covariance boolean just initializes arrays of zeros. Needs to be made functional or removed
- Attributes:
- continue_redescending
correction_factor
Returns correction factor for residual variances.
max_iterations_reached
Return True of the number of iterations carried out is greater or equal the
- number_of_iterations
- number_of_redescending_iterations
Methods
converged
(b, b0)- param b:
the most recent regression estimate
increment_iteration_number
increment_redescending_iteration_number
reset_number_of_iterations
reset_number_of_redescending_iterations
- property continue_redescending¶
- converged(b, b0)[source]¶
- Parameters:
b (complex-valued numpy array) – the most recent regression estimate
b0 (complex-valued numpy array) – The previous regression estimate
verbose (bool) – Set to True for debugging
- Returns:
converged (bool) – True of the regression has terminated, False otherwise
Notes
The variable maximum_change finds the maximum amplitude component of the vector
1-b/b0. Looking at the formula, one might want to cast this instead as
1 - abs(b/b0), however, that will be insensitive to phase changes in b,
which is complex valued. The way it is coded np.max(np.abs(1 - b / b0)) is
correct as it stands.
- property correction_factor¶
Returns correction factor for residual variances.
Detailed notes on usage in transfer_function.regression.helper_functions.rme_beta
- TODO: This is an RME specific property. Suggest move r0, u0 and this method
into an RME-config class.
- Returns:
correction_factor – correction factor used for scaling the residual error_variance
- Return type:
aurora.transfer_function.regression.m_estimator module¶
This module contains the MEstimator class - an extension of RegressionEstimator.
MEstimator has the class methods that are common to both RME and RME_RR. See Notes in RME, RME_RR for more details.
- class aurora.transfer_function.regression.m_estimator.MEstimator(**kwargs)[source]¶
Bases:
RegressionEstimator
- Attributes:
Q
Returns the Q matrix from the QR decomposition
QH
Returns the conjugate transpose of the Q matrix from the QR decomposition
QHY
Returns the QH @ Y
QHYc
Returns the matrix QH @ Yc This is Q.conj().T from the QR decomposition multiplied with the cleaned data.
R
Returns the R matrix from the QR decomposition
Y_hat
returns the most recent predicted data
correction_factor
Return teh correction factor for the residual variance.
degrees_of_freedom
gets the number of degress of freedom in the dataset.
input_channel_names
returns the list of channel names associated with X
inverse_signal_covariance
Returns the inverse signal covariance matrix of the input channels as xarray
is_underdetermined
Check if the regression problem is under-determined
n_channels_in
Returns the number of input channels.
n_channels_out
number of output variables
n_data
Return the number of multivariate observations in the regression dataset.
noise_covariance
Returns the noise covariance matrix of the output channels as xarray
output_channel_names
returns the list of channel names associated with Y
r0
returns the Huber r0 value
residual_variance
returns the residual variance
u0
returns the u0 threshold for redescending regression step
Methods
This is the 'convergence loop' from RME, RME_RR
Performs one or two iterations with re-descending influence curve cleaned data
b_to_xarray
()Wraps the TF results as an xarray
Computes the noise covariance (covariance of the residuals)
Updates the array self.R2
estimate
()Executes the regression
estimate_ols
([mode])Solve Y = Xb with ordinary least squares, not robust regression.
Make first estimate of TF (b), Y_hat, and residual_variance
qr_decomposition
([X, sanity_check])performs QR decomposition on input matrix X.
returns the residual variance of the output channels.
Returns the residual variance of the output channels (error variances).
solve_underdetermined
()Solves the regression problem if it is under-determined -- Not Stable
Updates the QHYc matrix with the most recent cleaned data
update_residual_variance
([correction_factor])updates residual variance from most recent cleaned and predicted data
Updates the values of self.Yc and self.expectation_psi_prime
Updates estimate for self.Yc as a match-filtered sum of Y and Y_hat.
updates the predicted data from the most recent cleaned data
- property QHYc: ndarray¶
Returns the matrix QH @ Yc This is Q.conj().T from the QR decomposition multiplied with the cleaned data.
- Returns:
QHYc – A convenience matrix that makes computing the predicted data (QQHYc) more efficient.
- Return type:
np.ndarray
- apply_huber_regression() None [source]¶
This is the ‘convergence loop’ from RME, RME_RR
- TODO: Consider not setting iter_control.number_of_iterations
Instead, Initialize a new iter_control object
- apply_redecending_influence_function() None [source]¶
Performs one or two iterations with re-descending influence curve cleaned data
- compute_noise_covariance() None [source]¶
Computes the noise covariance (covariance of the residuals)
res_clean: The cleaned data minus the predicted data. The residuals SSR_clean: Sum of squares of the residuals. Diagonal is real :param Y_hat:
- compute_squared_coherence() None [source]¶
Updates the array self.R2
Here is taken the ratio of the energy in the residuals with the energy in the cleaned data. This metric can be interpreted as how much of the signal (Y) is “explained” by the regression.
Development Notes: The matlab code (TRME_RR) claimed:
% R2 is squared coherence (top row is using raw data, bottom % cleaned, with crude correction for amount of down-weighted data)
- TODO: There seem to be other valid metrics for this sort of quantity. In particular, we may want to
consider SSY (the sum of squares of the observed data) over SSR.
- TODO: consider renaming self.R2. That name invokes the idea of the squared residuals. That is not what
is being stored in self.R2. This is more like a CMRR.
res: Residuals, the original data minus the predicted data. SSR : Sum of squares of the residuals, per channel
- property correction_factor: float¶
Return teh correction factor for the residual variance.
- Returns:
correction_factor – See doc in iter_control.IterControl.correction_factor()
- Return type:
- estimate() None [source]¶
Executes the regression
Development Notes: Here is a comment from the matlab codes: “need to look at how we should compute adjusted residual cov to make
consistent with tranmt”
See issue#69 aurora github repo addresses this
- residual_variance_method1() ndarray [source]¶
returns the residual variance of the output channels.
This is the method that was originally in RME_RR.m. It seems more correct than the one in RME, but also has more computational overhead.
- residual_variance_method2() ndarray [source]¶
Returns the residual variance of the output channels (error variances).
Computes the squared norms difference of the output channels from the “output channels inner-product with QQH”
This method used to take either QHY, Y as arguments or QHYc, Yc But now that Yc initializes to Y, we can just use with QHYc, Yc always (Note QHYc is updated everytime Yc is updated)
- Parameters:
QHY (numpy array) – QHY[i,j] = Q.H * Y[i,j] = <Q[:,i], Y[:,j]> So when we sum columns of norm(QHY) we are get in the zeroth position <Q[:,0], Y[:,0]> + <Q[:,1], Y[:,0]>, that is the 0th channel of Y projected onto each of the Q-basis vectors
Y_or_Yc (numpy array) – The output channels (self.Y) or the cleaned output channels self.Yc
correction_factor (float) – See doc in IterControl.correction_factor
- Returns:
residual_variance – One entry per output channel.
- Return type:
numpy array
- update_residual_variance(correction_factor=1)[source]¶
updates residual variance from most recent cleaned and predicted data
- update_y_cleaned_via_huber_weights() None [source]¶
Updates the values of self.Yc and self.expectation_psi_prime
- Parameters:
residual_variance (numpy array) – 1D array, the same length as the number of output channels see self.residual_variance() method for its calculation
Y_hat (numpy array) – The predicted data, usually from QQHY
Original matlab documenation: function [YC,E_psiPrime] = HuberWt(Y,YP,sig,r0) inputs are data (Y) and predicted (YP), estiamted error variances (for each column) and Huber parameter r0 allows for multiple columns of data
- update_y_cleaned_via_redescend_weights() None [source]¶
Updates estimate for self.Yc as a match-filtered sum of Y and Y_hat.
Note: It is not unheard of to observe RuntimeWarning: overflow encountered in exp in the calculation of t. This can happen when large residuals are present. In that case, t goes to -inf, and w goes to zero, – the desired behaviour. When this happens an “invalid value” will also occur in the calculation of t, but this does not propagate into self.expectation_psi_prime.
Matlab documentation: function[YC, E_psiPrime] = RedescendWt(Y, YP, sig, u0): inputs are data(Y) and predicted(YP), estimated error variances (for each column) and Huber parameter u0. Allows for multiple columns of data
- Parameters:
Y_hat (numpy array) – An estimate of the output data Y obtained by self.Q @ self.QHY or self.Q @ self.QHYc
residual_variance –