aurora.transfer_function.weights package

Submodules

aurora.transfer_function.weights.edf_weights module

This module contains a class for computing so-called “Effective Degrees of Freedom” weights.

Development notes: The code here is based on the function Edfwts.m from egbert_codes- 20210121T193218Z-001/egbert_codes/matlabPrototype_10-13-20/TF/functions/Edfwts.m

class aurora.transfer_function.weights.edf_weights.EffectiveDegreesOfFreedom(edf_l1: float | None = 20.0, alpha: float | None = 0.5, c1: float | None = 2.0, c2: float | None = 10.0, p3: float | None = 5.0, n_data: int | None = 0)[source]

Bases: object

Attributes:
p1

Threshold applied to edf.

p2

Threshold applied to edf.

Methods

compute_weights(X, use)

Compute the EDF Weights

compute_weights(X: ndarray, use: ndarray) ndarray[source]

Compute the EDF Weights

y_hat = P * y where P is the “hat” matrix, and y_hat is the predicted value of y

Development Notes: The data covariance matrix s and its inverse h are iteratively recomputed using fewer and fewer observations. The edf_weights are also computed at every iteration (but for all data points).

Discussing this with Gary: “… because you are down-weighting (omitting) more and more highpower events the total signal is going down. The signal power goes down with every call to this method” … “The Hat Matrix, where the diagonals of this matrix are really big means an individual data point is controlling its own prediction, and the estimate. If the problem was balanced (n data points contributing equally), each data point would contribute equally (1/n) to each parameter. When one data point is large and the others are tiny, then it may be contributing a lot, say 1/2 rather than 1/n. edf is like the diagonal of the Hat matrix (in the single station case) How much does the data point contribute to the prediction of itself.

Note: H = inv(S) in general has equal H[0,1] = H[1,0]; 2x2 matrices with matching off-diagonal terms have inverses with the same property.

—AI Explaination of the code— The effective degrees of freedom (edf) weights are computed based on the covariance matrix of the input data. The covariance matrix is calculated using the selected observations (indicated by the use boolean array). The inverse of the covariance matrix is then used to compute the edf weights for each observation. The edf weights are calculated as follows: - The covariance matrix S is computed as the outer product of the selected data. - The covariance matrix is normalized by the number of selected observations. - The inverse covariance matrix H is computed. - The edf weights are then calculated using the diagonal and off-diagonal elements

of the inverse covariance matrix H and the selected data.

—Statistical Context— The inverse covariance matrix (H = S^{-1}) is used here because it “whitens” the data: it removes correlations and scales by variance, so each observation is measured in units of its statistical uncertainty. This is analogous to the Mahalanobis distance, which accounts for the covariance structure of the data. In the context of regression, the resulting EDF (effective degrees of freedom) value for each observation is proportional to its leverage, similar to the diagonal of the Hat matrix. Observations that are outliers in the whitened space (i.e., unusual given the covariance structure) receive higher EDF values, indicating greater influence on the fit. Thus, the inverse covariance matrix is essential for properly quantifying the statistical distinctiveness and leverage of each observation, enabling robust down-weighting of high-leverage (potentially outlier) points.

In summary: The inverse covariance matrix is used to weight each observation according to its contribution to the fit, accounting for both variance and correlation, ensuring that the EDF weights reflect true statistical leverage.

A note on usage of real vs complex data: The terms ( X[0, :] * conj{X[0, :]} ) and ( X[1, :] * conj{X[1, :]} ) are always real and non-negative (they are squared magnitudes). The cross term ( conj{X[1, :]} * X[0, :] ) can be complex, but in the context of covariance and quadratic forms you want the real part, not the absolute value. Why? The quadratic form ( x^H H x ) (where ( x ) is a complex vector and ( H ) is Hermitian) is always real, and the real part is what contributes to the variance/leverage. Taking the absolute value would overstate the contribution of the cross term and would not be statistically correct for covariance-based leverage calculations.

Parameters:
  • X (np.ndarray) – The data to for which to determine weights.

  • use (np.ndarray) – populated with booleans

Returns:

edf – The weights values.

Return type:

np.ndarray

property p1: float

Threshold applied to edf. All edf below this value are set to weight=0

property p2: float

Threshold applied to edf. All edf above th this value are set to weight=0

aurora.transfer_function.weights.edf_weights.effective_degrees_of_freedom_weights(X: Dataset, R: Dataset | None = None, edf_obj: EffectiveDegreesOfFreedom | None = None) ndarray[source]

Computes the effective degrees of freedom weights. Emulates edfwts (“effective dof”) from tranmt. - Based on Edfwts.m matlab code from iris_mt_scratch/egbert_codes-20210121T193218Z-001/egbert_codes/matlabPrototype_10-13-20/TF/functions/

Flow: 0. Initialize weights vector (of 1’s) the length of the “observation” axis 1. Remove any nan in X,R 2. compute the weights on the reduced (no-nan) arrays of X and R 3. Overwrite the weights vector for the non-nan entries. 4. Return weights and broadcast-multiply against data to apply.

Development Notes: Note about the while loop: variable “use” never changes length, it only flips its bit. The while loop exits when n_valid_observations == sum(use) i.e.the effective dof are all below threshold Estimate dof. Then we “use” only points whose dof are smaller than the threshold. Then we recompute dof. This time the covariance matrix diagonals are smaller, there is less energy in the time series for the S, H calculation.

Parameters:
Returns:

weights – Weights for reducing leverage points.

Return type:

numpy.ndarray

Module contents