Pipelines¶
Process MTH5¶
Process an MTH5 using the metdata config object.
Note 1: process_mth5 assumes application of cascading decimation, and that the decimated data will be accessed from the previous decimation level. This should be revisited. It may make more sense to have a get_decimation_level() interface that provides an option of applying decimation or loading predecimated data. This will be addressed via creation of the FC layer inside mth5.
Note 2: We can encounter cases where some runs can be decimated and others can not. We need a way to handle this. For example, a short run may not yield any data from a later decimation level. An attempt to handle this has been made in TF Kernel by adding a is_valid_dataset column, associated with each run-decimation level pair.
- Note 3: This point in the loop marks the interface between _generation_ of the FCs and
their _usage_. In future the code above this comment would be pushed into create_fourier_coefficients() and the code below this would access those FCs and execute compute_transfer_function()
- aurora.pipelines.process_mth5.get_spectrogams(tfk, i_dec_level, units='MT')[source]¶
Can be make a method of TFK
- Parameters:
tfk (TransferFunctionKernel) –
i_dec_level (integer) –
units ("MT" or "SI", likely to be deprecated) –
- Return type:
dict of short time fourier transforms
- aurora.pipelines.process_mth5.load_stft_obj_from_mth5(i_dec_level, row, run_obj, channels=None)[source]¶
Load stft_obj from mth5 (instead of compute)
Note #1: See note #1 in time_series.frequency_band_helpers.extract_band
- Parameters:
i_dec_level (int) – The decimation level where the data are stored within the Fourier Coefficient group
row (pandas.core.series.Series) – A row of the TFK.dataset_df
run_obj (mth5.groups.run.RunGroup) – The original time-domain run associated with the data to load
- aurora.pipelines.process_mth5.make_stft_objects(processing_config, i_dec_level, run_obj, run_xrds, units='MT')[source]¶
Operates on a “per-run” basis
This method could be modifed in a multiple station code so that it doesn’t care if the station is “local” or “remote” but rather uses scale factors keyed by station_id
- Parameters:
processing_config (mt_metadata.transfer_functions.processing.aurora.Processing) – Metadata about the processing to be applied
i_dec_level (int) – The decimation level to process
run_obj (mth5.groups.master_station_run_channel.RunGroup) – The run to transform to stft
run_xrds (xarray.core.dataset.Dataset) – The data time series from the run to transform
units (str) – expects “MT”. May change so that this is the only accepted set of units
station_id (str) – To be deprecated, this information is contained in the run_obj as run_obj.station_group.metadata.id
- Returns:
stft_obj – Time series of calibrated Fourier coefficients per each channel in the run
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.process_mth5.process_mth5(config, tfk_dataset=None, units='MT', show_plot=False, z_file_path=None, return_collection=False)[source]¶
This is the main method used to transform a processing_config, and a kernel_dataset into a transfer function estimate.
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Processing or path to json) – All processing parameters
tfk_dataset (aurora.tf_kernel.dataset.Dataset or None) – Specifies what datasets to process according to config
units (string) – “MT” or “SI”. To be deprecated once data have units embedded
show_plot (boolean) – Only used for dev
z_file_path (string or pathlib.Path) – Target path for a z_file output if desired
return_collection (boolean) – return_collection=False will return an mt_metadata TF object return_collection=True will return aurora.transfer_function.transfer_function_collection.TransferFunctionCollection
- Returns:
tf (TransferFunctionCollection or mt_metadata TF) – The transfer funtion object
tf_cls (mt_metadata.transfer_functions.TF) – TF object
- aurora.pipelines.process_mth5.process_tf_decimation_level(config, i_dec_level, local_stft_obj, remote_stft_obj, units='MT')[source]¶
Processing pipeline for a single decimation_level
TODO: Add a check that the processing config sample rates agree with the data TODO: Add units to local_stft_obj, remote_stft_obj sampling rates otherwise raise Exception This method can be single station or remote based on the process cfg
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) – Config for a single decimation level
i_dec_level (int) – decimation level_id ?could we pack this into the decimation level as an attr?
local_stft_obj (xarray.core.dataset.Dataset) – The time series of Fourier coefficients from the local station
remote_stft_obj (xarray.core.dataset.Dataset or None) – The time series of Fourier coefficients from the remote station
units (str) – one of [“MT”,”SI”]
- Returns:
transfer_function_obj – The transfer function values packed into an object
- Return type:
- aurora.pipelines.process_mth5.save_fourier_coefficients(dec_level_config, row, run_obj, stft_obj)[source]¶
Note #1: Logic for building FC layers: If the processing config decimation_level.save_fcs_type = “h5” and fc_levels_already_exist is False, then open in append mode, else open in read mode. We should support a flag: force_rebuild_fcs, normally False. This flag is only needed when save_fcs_type==”h5”. If True, then we open in append mode, regarless of fc_levels_already_exist The task of setting mode=”a”, mode=”r” can be handled by tfk (maybe in tfk.validate())
- Parameters:
dec_level_config –
row –
run_obj –
stft_obj –
Time Series Helpers¶
- aurora.pipelines.time_series_helpers.apply_prewhitening(decimation_obj, run_xrds_input)[source]¶
Applies pre-whitening to time series to avoid spectral leakage when FFT is applied.
If “first difference”, may want to consider clipping first and last sample from the differentiated time series.
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds_input (xarray.core.dataset.Dataset) – Time series to be pre-whitened
- Returns:
run_xrds – pre-whitened time series
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.apply_recoloring(decimation_obj, stft_obj)[source]¶
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.fourier_coefficients.decimation.Decimation) – Information about how the decimation level is to be processed
stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be recoloured
- Returns:
stft_obj – Recolored time series of Fourier coefficients
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.calibrate_stft_obj(stft_obj, run_obj, units='MT', channel_scale_factors=None)[source]¶
- Parameters:
stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be calibrated
run_obj (mth5.groups.master_station_run_channel.RunGroup) – Provides information about filters for calibration
units (string) – usually “MT”, contemplating supporting “SI”
scale_factors (dict or None) – keyed by channel, supports a single scalar to apply to that channels data Useful for debugging. Should not be used in production and should throw a warning if it is not None
- Returns:
stft_obj – Time series of calibrated Fourier coefficients
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.nan_to_mean(xrds)[source]¶
Set Nan values to mean value
- Parameters:
xrds (TYPE) – DESCRIPTION
- Returns:
DESCRIPTION
- Return type:
TYPE
- aurora.pipelines.time_series_helpers.prototype_decimate(config, run_xrds)[source]¶
Consider: 1. Moving this function into time_series/decimate.py 2. Replacing the downsampled_time_axis with rolling mean, or somthing that takes the average value of the time, not the window start
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times
- Returns:
xr_ds – Decimated version of the input run_xrds
- Return type:
xr.Dataset
- aurora.pipelines.time_series_helpers.prototype_decimate_2(config, run_xrds)[source]¶
Uses the built-in xarray coarsen method. Not clear what AAF effects are. Method is fast. Might be non-linear. Seems to give similar performance to prototype_decimate for synthetic data.
N.B. config.factor must be integer valued
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times
- Returns:
xr_ds – Decimated version of the input run_xrds
- Return type:
xr.Dataset
- aurora.pipelines.time_series_helpers.prototype_decimate_3(config, run_xrds)[source]¶
Uses the built-in xarray coarsen method. Not clear what AAF effects are. Method is fast. Might be non-linear. Seems to give similar performance to prototype_decimate for synthetic data.
N.B. config.factor must be integer valued
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times
- Returns:
xr_ds – Decimated version of the input run_xrds
- Return type:
xr.Dataset
- aurora.pipelines.time_series_helpers.run_ts_to_stft(decimation_obj, run_xrds_orig)[source]¶
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_ts (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS
- Returns:
stft_obj – Note that the STFT object may have inf/nan in the DC harmonic, introduced by recoloring. This really doesn’t matter since we don’t use the DC harmonic for anything.
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.run_ts_to_stft_scipy(decimation_obj, run_xrds_orig)[source]¶
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds_orig (: xarray.core.dataset.Dataset) – Time series to be processed
- Returns:
stft_obj – Time series of Fourier coefficients
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.truncate_to_clock_zero(decimation_obj, run_xrds)[source]¶
Compute the time interval between the first data sample and the clock zero Identify the first sample in the xarray time series that corresponds to a window start sample.
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS
- Returns:
run_xrds – same as the input time series, but possibly slightly shortened
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.validate_sample_rate(run_ts, expected_sample_rate, tol=0.0001)[source]¶
- Parameters:
run_ts (mth5.timeseries.run_ts.RunTS) – Time series object
expected_sample_rate (float) – The sample rate the time series is expected to have. Normally taken from the processing config
Transfer Function Helpers¶
Note #1: repeatedly applying edf_weights seems to have no effect at all. tested 20240118 and found that test_compare in synthetic passed whether this was commented or not. TODO confirm this is a one-and-done add doc about why this is so.
- aurora.pipelines.transfer_function_helpers.apply_weights(X, Y, RR, W, segment=False, dropna=False)[source]¶
- aurora.pipelines.transfer_function_helpers.drop_nans(X, Y, RR)[source]¶
Just a helper intended to enhance readability TODO: document the implications of dropna on index of xarray for other weights
- aurora.pipelines.transfer_function_helpers.get_estimator_class(estimation_engine)[source]¶
- Parameters:
estimation_engine (str) – One of the keys in the ESTIMATOR_LIBRARY, designates the method that will be used to estimate the transfer function
- Returns:
estimator_class – The class that will do the TF estimation
- Return type:
aurora.transfer_function.regression.base.RegressionEstimator
- aurora.pipelines.transfer_function_helpers.process_transfer_functions(dec_level_config, local_stft_obj, remote_stft_obj, transfer_function_obj, segment_weights=[], channel_weights=None)[source]¶
This method based on TTFestBand.m
- Parameters:
dec_level_config –
local_stft_obj –
remote_stft_obj –
transfer_function_obj (aurora.transfer_function.TTFZ.TTFZ) – The transfer function container ready to receive values in this method.
segment_weights (numpy array or list of strings) – 1D array which should be of the same length as the time axis of the STFT objects If these weights are zero anywhere, we drop all that segment across all channels If it is a list of strings, each string corresponds to a weighting algorithm to be applied. [“jackknife_jj84”, “multiple_coherence”, “simple_coherence”]
channel_weights (numpy array or None) –
#1 (Note) –
all-at-once (vs.) –
#87) (we need to keep the all-at-once to get residual covariances (see issue) –
#2 (Note) –
dictionary. (Consider placing the segment weight logic in its own module with the various functions in a) –
weights (Possibly can combines (product) all segment) –
W = zeros for wt_style in segment_weights:
fcn = wt_fucntions[style] w = fcn(X, Y, RR, ) W *= w
return W
pseudocode (like the following) –
W = zeros for wt_style in segment_weights:
fcn = wt_fucntions[style] w = fcn(X, Y, RR, ) W *= w
return W
TODO: Consider push the nan-handling into the band extraction as a kwarg.
- aurora.pipelines.transfer_function_helpers.select_channel(xrda, channel_label)[source]¶
Extra helper function to make process_transfer_functions more readable without black forcing multiline :param xrda: :param channel_label:
- aurora.pipelines.transfer_function_helpers.set_up_iter_control(config)[source]¶
TODO: Review: maybe better to just make this the __init__ method of the IterControl object, iter_control = IterControl(config)
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) –
- Returns:
iter_control – Object with parameters about iteration control in regression
- Return type:
aurora.transfer_function.regression.iter_control.IterControl
- aurora.pipelines.transfer_function_helpers.stack_fcs(X, Y, RR)[source]¶
Reshape 2D arrays of frequency and time to 1D
Context: When the data for a frequency band are extracted from the Spectrogram, each channel is a 2D array, one axis is time (the time of the window that was FFT-ed) and the other axis is frequency. However if we make no distinction between the harmonics (bins) within a band in regression, then all the FCs for each channel can be put into a 1D array. This method performs that reshaping (ravelling) operation. **It is not important how we unravel the FCs but it is important that we use the same scheme for X and Y.
TODO: Make this take a list and return a list rather than X,Y,RR TODO: Decorate this with @dataset_or_dataarray
if isinstance(X, xr.Dataset): tmp = X.to_array(“channel”) tmp = tmp.stack() or similar