Pipelines¶

Process MTH5¶

Process an MTH5 using the metdata config object.

Note 1: process_mth5 assumes application of cascading decimation, and that the decimated data will be accessed from the previous decimation level. This should be revisited. It may make more sense to have a get_decimation_level() interface that provides an option of applying decimation or loading predecimated data. This will be addressed via creation of the FC layer inside mth5.

Note 2: We can encounter cases where some runs can be decimated and others can not. We need a way to handle this. For example, a short run may not yield any data from a later decimation level. An attempt to handle this has been made in TF Kernel by adding a is_valid_dataset column, associated with each run-decimation level pair.

Note 3: This point in the loop marks the interface between _generation_ of the FCs and: their _usage_. In future the code above this comment would be pushed into create_fourier_coefficients() and the code below this would access those FCs and execute compute_transfer_function()

aurora.pipelines.process_mth5.append_chunk_to_stfts(stfts, chunk, remote)[source]¶

aurora.pipelines.process_mth5.enrich_row(row)[source]¶

aurora.pipelines.process_mth5.get_spectrogams(tfk, i_dec_level, units='MT')[source]¶

Can be make a method of TFK

Parameters:

tfk (TransferFunctionKernel) –
i_dec_level (integer) –
units ("MT" or "SI", likely to be deprecated) –

Return type:

dict of short time fourier transforms

aurora.pipelines.process_mth5.load_stft_obj_from_mth5(i_dec_level, row, run_obj, channels=None)[source]¶

Load stft_obj from mth5 (instead of compute)

Note #1: See note #1 in time_series.frequency_band_helpers.extract_band

Parameters:

i_dec_level (int) – The decimation level where the data are stored within the Fourier Coefficient group
row (pandas.core.series.Series) – A row of the TFK.dataset_df
run_obj (mth5.groups.run.RunGroup) – The original time-domain run associated with the data to load

aurora.pipelines.process_mth5.make_stft_objects(processing_config, i_dec_level, run_obj, run_xrds, units='MT')[source]¶

Operates on a “per-run” basis

This method could be modifed in a multiple station code so that it doesn’t care if the station is “local” or “remote” but rather uses scale factors keyed by station_id

Parameters:

processing_config (mt_metadata.transfer_functions.processing.aurora.Processing) – Metadata about the processing to be applied
i_dec_level (int) – The decimation level to process
run_obj (mth5.groups.master_station_run_channel.RunGroup) – The run to transform to stft
run_xrds (xarray.core.dataset.Dataset) – The data time series from the run to transform
units (str) – expects “MT”. May change so that this is the only accepted set of units
station_id (str) – To be deprecated, this information is contained in the run_obj as run_obj.station_group.metadata.id

Returns:

stft_obj – Time series of calibrated Fourier coefficients per each channel in the run

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.process_mth5.merge_stfts(stfts, tfk)[source]¶

aurora.pipelines.process_mth5.process_mth5(config, tfk_dataset=None, units='MT', show_plot=False, z_file_path=None, return_collection=False)[source]¶

This is the main method used to transform a processing_config, and a kernel_dataset into a transfer function estimate.

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.Processing or path to json) – All processing parameters
tfk_dataset (aurora.tf_kernel.dataset.Dataset or None) – Specifies what datasets to process according to config
units (string) – “MT” or “SI”. To be deprecated once data have units embedded
show_plot (boolean) – Only used for dev
z_file_path (string or pathlib.Path) – Target path for a z_file output if desired
return_collection (boolean) – return_collection=False will return an mt_metadata TF object return_collection=True will return aurora.transfer_function.transfer_function_collection.TransferFunctionCollection

Returns:

tf (TransferFunctionCollection or mt_metadata TF) – The transfer funtion object
tf_cls (mt_metadata.transfer_functions.TF) – TF object

aurora.pipelines.process_mth5.process_tf_decimation_level(config, i_dec_level, local_stft_obj, remote_stft_obj, units='MT')[source]¶

Processing pipeline for a single decimation_level

TODO: Add a check that the processing config sample rates agree with the data TODO: Add units to local_stft_obj, remote_stft_obj sampling rates otherwise raise Exception This method can be single station or remote based on the process cfg

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) – Config for a single decimation level
i_dec_level (int) – decimation level_id ?could we pack this into the decimation level as an attr?
local_stft_obj (xarray.core.dataset.Dataset) – The time series of Fourier coefficients from the local station
remote_stft_obj (xarray.core.dataset.Dataset or None) – The time series of Fourier coefficients from the remote station
units (str) – one of [“MT”,”SI”]

Returns:

transfer_function_obj – The transfer function values packed into an object

Return type:

aurora.transfer_function.TTFZ.TTFZ

aurora.pipelines.process_mth5.save_fourier_coefficients(dec_level_config, row, run_obj, stft_obj)[source]¶

Note #1: Logic for building FC layers: If the processing config decimation_level.save_fcs_type = “h5” and fc_levels_already_exist is False, then open in append mode, else open in read mode. We should support a flag: force_rebuild_fcs, normally False. This flag is only needed when save_fcs_type==”h5”. If True, then we open in append mode, regarless of fc_levels_already_exist The task of setting mode=”a”, mode=”r” can be handled by tfk (maybe in tfk.validate())

Parameters:

dec_level_config –
row –
run_obj –
stft_obj –

aurora.pipelines.process_mth5.triage_issue_289(local_stfts, remote_stfts)[source]¶: Timing Error Workaround See Aurora Issue #289: seems associated with getting one fewer sample than expected from the edge of a run.

Time Series Helpers¶

aurora.pipelines.time_series_helpers.apply_prewhitening(decimation_obj, run_xrds_input)[source]¶

Applies pre-whitening to time series to avoid spectral leakage when FFT is applied.

If “first difference”, may want to consider clipping first and last sample from the differentiated time series.

Parameters:

decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds_input (xarray.core.dataset.Dataset) – Time series to be pre-whitened

Returns:

run_xrds – pre-whitened time series

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.time_series_helpers.apply_recoloring(decimation_obj, stft_obj)[source]¶

Parameters:

decimation_obj (mt_metadata.transfer_functions.processing.fourier_coefficients.decimation.Decimation) – Information about how the decimation level is to be processed
stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be recoloured

Returns:

stft_obj – Recolored time series of Fourier coefficients

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.time_series_helpers.calibrate_stft_obj(stft_obj, run_obj, units='MT', channel_scale_factors=None)[source]¶

Parameters:

stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be calibrated
run_obj (mth5.groups.master_station_run_channel.RunGroup) – Provides information about filters for calibration
units (string) – usually “MT”, contemplating supporting “SI”
scale_factors (dict or None) – keyed by channel, supports a single scalar to apply to that channels data Useful for debugging. Should not be used in production and should throw a warning if it is not None

Returns:

stft_obj – Time series of calibrated Fourier coefficients

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.time_series_helpers.nan_to_mean(xrds)[source]¶

Set Nan values to mean value

Parameters:: xrds (TYPE) – DESCRIPTION
Returns:: DESCRIPTION
Return type:: TYPE

aurora.pipelines.time_series_helpers.prototype_decimate(config, run_xrds)[source]¶

Consider: 1. Moving this function into time_series/decimate.py 2. Replacing the downsampled_time_axis with rolling mean, or somthing that takes the average value of the time, not the window start

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times

Returns:

xr_ds – Decimated version of the input run_xrds

Return type:

xr.Dataset

aurora.pipelines.time_series_helpers.prototype_decimate_2(config, run_xrds)[source]¶

Uses the built-in xarray coarsen method. Not clear what AAF effects are. Method is fast. Might be non-linear. Seems to give similar performance to prototype_decimate for synthetic data.

N.B. config.factor must be integer valued

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times

Returns:

xr_ds – Decimated version of the input run_xrds

Return type:

xr.Dataset

aurora.pipelines.time_series_helpers.prototype_decimate_3(config, run_xrds)[source]¶

Uses the built-in xarray coarsen method. Not clear what AAF effects are. Method is fast. Might be non-linear. Seems to give similar performance to prototype_decimate for synthetic data.

N.B. config.factor must be integer valued

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times

Returns:

xr_ds – Decimated version of the input run_xrds

Return type:

xr.Dataset

aurora.pipelines.time_series_helpers.run_ts_to_stft(decimation_obj, run_xrds_orig)[source]¶

Parameters:

decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_ts (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS

Returns:

stft_obj – Note that the STFT object may have inf/nan in the DC harmonic, introduced by recoloring. This really doesn’t matter since we don’t use the DC harmonic for anything.

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.time_series_helpers.run_ts_to_stft_scipy(decimation_obj, run_xrds_orig)[source]¶

Parameters:

decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds_orig (: xarray.core.dataset.Dataset) – Time series to be processed

Returns:

stft_obj – Time series of Fourier coefficients

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.time_series_helpers.truncate_to_clock_zero(decimation_obj, run_xrds)[source]¶

Compute the time interval between the first data sample and the clock zero Identify the first sample in the xarray time series that corresponds to a window start sample.

Parameters:

decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS

Returns:

run_xrds – same as the input time series, but possibly slightly shortened

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.time_series_helpers.validate_sample_rate(run_ts, expected_sample_rate, tol=0.0001)[source]¶

Parameters:

run_ts (mth5.timeseries.run_ts.RunTS) – Time series object
expected_sample_rate (float) – The sample rate the time series is expected to have. Normally taken from the processing config

Transfer Function Helpers¶

Note #1: repeatedly applying edf_weights seems to have no effect at all. tested 20240118 and found that test_compare in synthetic passed whether this was commented or not. TODO confirm this is a one-and-done add doc about why this is so.

aurora.pipelines.transfer_function_helpers.apply_weights(X, Y, RR, W, segment=False, dropna=False)[source]¶

aurora.pipelines.transfer_function_helpers.drop_nans(X, Y, RR)[source]¶: Just a helper intended to enhance readability TODO: document the implications of dropna on index of xarray for other weights

aurora.pipelines.transfer_function_helpers.get_estimator_class(estimation_engine)[source]¶

Parameters:: estimation_engine (str) – One of the keys in the ESTIMATOR_LIBRARY, designates the method that will be used to estimate the transfer function
Returns:: estimator_class – The class that will do the TF estimation
Return type:: aurora.transfer_function.regression.base.RegressionEstimator

aurora.pipelines.transfer_function_helpers.process_transfer_functions(dec_level_config, local_stft_obj, remote_stft_obj, transfer_function_obj, segment_weights=[], channel_weights=None)[source]¶

This method based on TTFestBand.m

Parameters:

dec_level_config –
local_stft_obj –
remote_stft_obj –
transfer_function_obj (aurora.transfer_function.TTFZ.TTFZ) – The transfer function container ready to receive values in this method.
segment_weights (numpy array or list of strings) – 1D array which should be of the same length as the time axis of the STFT objects If these weights are zero anywhere, we drop all that segment across all channels If it is a list of strings, each string corresponds to a weighting algorithm to be applied. [“jackknife_jj84”, “multiple_coherence”, “simple_coherence”]
channel_weights (numpy array or None) –
#1 (Note) –
all-at-once (vs.) –
#87) (we need to keep the all-at-once to get residual covariances (see issue) –
#2 (Note) –
dictionary. (Consider placing the segment weight logic in its own module with the various functions in a) –
weights (Possibly can combines (product) all segment) –
W = zeros for wt_style in segment_weights:

fcn = wt_fucntions[style] w = fcn(X, Y, RR, ) W *= w

return W
pseudocode (like the following) –
W = zeros for wt_style in segment_weights:

fcn = wt_fucntions[style] w = fcn(X, Y, RR, ) W *= w

return W

TODO: Consider push the nan-handling into the band extraction as a kwarg.

aurora.pipelines.transfer_function_helpers.select_channel(xrda, channel_label)[source]¶: Extra helper function to make process_transfer_functions more readable without black forcing multiline :param xrda: :param channel_label:

aurora.pipelines.transfer_function_helpers.set_up_iter_control(config)[source]¶

TODO: Review: maybe better to just make this the __init__ method of the IterControl object, iter_control = IterControl(config)

Parameters:: config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) –
Returns:: iter_control – Object with parameters about iteration control in regression
Return type:: aurora.transfer_function.regression.iter_control.IterControl

aurora.pipelines.transfer_function_helpers.stack_fcs(X, Y, RR)[source]¶

Reshape 2D arrays of frequency and time to 1D

Context: When the data for a frequency band are extracted from the Spectrogram, each channel is a 2D array, one axis is time (the time of the window that was FFT-ed) and the other axis is frequency. However if we make no distinction between the harmonics (bins) within a band in regression, then all the FCs for each channel can be put into a 1D array. This method performs that reshaping (ravelling) operation. **It is not important how we unravel the FCs but it is important that we use the same scheme for X and Y.

TODO: Make this take a list and return a list rather than X,Y,RR TODO: Decorate this with @dataset_or_dataarray

if isinstance(X, xr.Dataset): tmp = X.to_array(“channel”) tmp = tmp.stack() or similar