aurora.pipelines package¶
Submodules¶
aurora.pipelines.fourier_coefficients module¶
Supporting codes for building the FC level of the mth5
Here are the parameters that are defined via the mt_metadata fourier coefficients structures: “anti_alias_filter”: “default”, “bands”, “decimation.factor”: 4.0, “decimation.level”: 2, “decimation.method”: “default”, “decimation.sample_rate”: 0.0625, “extra_pre_fft_detrend_type”: “linear”, “prewhitening_type”: “first difference”, “window.clock_zero_type”: “ignore”, “window.num_samples”: 128, “window.overlap”: 32, “window.type”: “boxcar”
Creating the decimations config requires a decision about decimation factors and the number of levels. We have been getting this from the EMTF band setup file by default. It is desirable to continue supporting this, however, note that the EMTF band setup is really about processing, and not about making STFTs.
For the record, here is the legacy decimation config from EMTF, a.k.a. decset.cfg:
`
4 0 # of decimation level, & decimation offset
128 32. 1 0 0 7 4 32 1
1.0
128 32. 4 0 0 7 4 32 4
.2154 .1911 .1307 .0705
128 32. 4 0 0 7 4 32 4
.2154 .1911 .1307 .0705
128 32. 4 0 0 7 4 32 4
.2154 .1911 .1307 .0705
`
This essentially corresponds to a “Decimations Group” which is a list of decimations. Related to the generation of FCs is the ARMA prewhitening (Issue #60) which was controlled in EMTF with pwset.cfg 4 5 # of decimation level, # of channels 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Note 1: Assumes application of cascading decimation, and that the decimated data will be accessed from the previous decimation level.
Note 2: We can encounter cases where some runs can be decimated and others can not. We need a way to handle this. For example, a short run may not yield any data from a later decimation level. An attempt to handle this has been made in TF Kernel by adding a is_valid_dataset column, associated with each run-decimation level pair.
- Note 3: This point in the loop marks the interface between _generation_ of the FCs and
their _usage_. In future the code above this comment would be pushed into create_fourier_coefficients() and the code below this would access those FCs and execute compute_transfer_function()
- aurora.pipelines.fourier_coefficients.add_fcs_to_mth5(m: MTH5, fc_decimations: list | None = None) None [source]¶
Add Fourier Coefficient Levels ot an existing MTH5.
Notes:
This module computes the FCs differently than the legacy aurora pipeline. It uses scipy.signal.spectrogram. There is a test in Aurora to confirm that there are equivalent if we are not using fancy pre-whitening.
Nomenclature: “usssr_grouper” is the output of a group-by on unique {survey, station, sample_rate} tuples.
- Parameters:
m (MTH5 object) – The mth5 file, open in append mode.
fc_decimations (Union[str, None, List]) – This specifies the scheme to use for decimating the time series when building the FC layer. None: Just use default (something like four decimation levels, decimated by 4 each time say. String: Controlled Vocabulary, values are a work in progress, that will allow custom definition of the fc_decimations for some common cases. For example, say you have stored already decimated time series, then you want simply the zeroth decimation for each run, because the decimated time series live under another run container, and that will get its own FCs. This is experimental. List: (UNTESTED) – This means that the user thought about the decimations that they want to create and is passing them explicitly. – probably will need to be a dictionary actually, since this would get redefined at each sample rate.
- aurora.pipelines.fourier_coefficients.fc_decimations_creator(initial_sample_rate: float, decimation_factors: list | None = None, max_levels: int | None = 6, time_period: TimePeriod | None = None) list [source]¶
Creates mt_metadata FCDecimation objects that parameterize Fourier coefficient decimation levels.
Note 1: This does not yet work through the assignment of which bands to keep. Refer to mt_metadata.transfer_functions.processing.Processing.assign_bands() to see how this was done in the past
- Parameters:
initial_sample_rate (float) – Sample rate of the “level0” data – usually the sample rate during field acquisition.
decimation_factors (list (or other iterable)) – The decimation factors that will be applied at each FC decimation level
max_levels (int) – The maximum number of decimation levels to allow
time_period –
- Returns:
fc_decimations – Each element of the list is an object of type mt_metadata.transfer_functions.processing.fourier_coefficients.Decimation, (a.k.a. FCDecimation).
- The order of the list corresponds the order of the cascading decimation
No decimation levels are omitted.
This could be changed in future by using a dict instead of a list,
e.g. decimation_factors = dict(zip(np.arange(max_levels), decimation_factors))
- Return type:
- aurora.pipelines.fourier_coefficients.get_degenerate_fc_decimation(sample_rate: float) list [source]¶
Makes a default fc_decimation list. WIP This “degnerate” config will only operate on the first decimation level. This is useful for testing but could be used in future if an MTH5 stored time series in decimation levels already as separate runs.
- aurora.pipelines.fourier_coefficients.read_back_fcs(m: MTH5 | Path | str, mode='r')[source]¶
This is mostly a helper function for tests. It was used as a sanity check while debugging the FC files, and also is a good example for how to access the data at each level for each channel.
The Time axis of the FC array will change from level to level, but the frequency axis will stay the same shape (for now – storing all fcs by default)
- Parameters:
m – pathlib.Path, str or an MTH5 object The path to an h5 file that we will scan the fcs from
aurora.pipelines.helpers module¶
Helper functions for processing pipelines. These maybe reorganized later into other modules.
- aurora.pipelines.helpers.initialize_config(processing_config: Processing | str | Path) Processing [source]¶
Helper function to return an intialized processing config.
- Parameters:
processing_cfg (Union[Processing, str, pathlib.Path]) – Either an instance of the processing class or a path to a json file that a Processing object is stored in.
- Returns:
config – Object that contains the processing parameters
- Return type:
mt_metadata.transfer_functions.processing.aurora.Processing
aurora.pipelines.process_mth5 module¶
This module contains the main methods used in processing mth5 objects to transfer functions.
The main function is called process_mth5. This function was recently changed to process_mth5_legacy, os that process_mth5 can be repurposed for other TF estimation schemes. The “legacy” version corresponds to aurora default processing.
Notes on process_mth5_legacy: Note 1: process_mth5 assumes application of cascading decimation, and that the decimated data will be accessed from the previous decimation level. This should be revisited. It may make more sense to have a get_decimation_level() interface that provides an option of applying decimation or loading pre-decimated data. This will be addressed via creation of the FC layer inside mth5.
Note 2: We can encounter cases where some runs can be decimated and others can not. We need a way to handle this. For example, a short run may not yield any data from a later decimation level. An attempt to handle this has been made in TF Kernel by adding a is_valid_dataset column, associated with each run-decimation level pair.
- Note 3: This point in the loop marks the interface between _generation_ of the FCs and
their _usage_. In future the code above this comment would be pushed into create_fourier_coefficients() and the code below this would access those FCs and execute compute_transfer_function(). This would also be an appropriate place to place a feature extraction layer, and compute weights for the FCs.
- aurora.pipelines.process_mth5.append_chunk_to_stfts(stfts: dict, chunk: Dataset, remote: bool) dict [source]¶
Aggregate one STFT into a larger dictionary that tracks all the STFTs
- aurora.pipelines.process_mth5.get_spectrogams(tfk, i_dec_level, units='MT')[source]¶
Given a decimation level id, loads a dictianary of all spectragrams from information in tfk. TODO: Make this a method of TFK
- Parameters:
tfk (TransferFunctionKernel) –
i_dec_level (integer) – The decimation level of the spectrograms.
units (str) – “MT” or “SI”, likely to be deprecated
- Returns:
stfts – The short time fourier transforms for the decimation level as a dictionary. Keys are “local” and “remote”. Values are lists, one (element) xr.Dataset per run
- Return type:
- aurora.pipelines.process_mth5.load_stft_obj_from_mth5(i_dec_level: int, row: Series, run_obj: RunGroup, channels: list | None = None) Dataset [source]¶
Load stft_obj from mth5 (instead of compute)
Note #1: See note #1 in time_series.frequency_band_helpers.extract_band
- Parameters:
i_dec_level (int) – The decimation level where the data are stored within the Fourier Coefficient group
row (pandas.core.series.Series) – A row of the TFK.dataset_df
run_obj (mth5.groups.run.RunGroup) – The original time-domain run associated with the data to load
- Returns:
stft_chunk – An STFT from mth5.
- Return type:
xr.Dataset
- aurora.pipelines.process_mth5.make_stft_objects(processing_config, i_dec_level, run_obj, run_xrds, units='MT')[source]¶
Operates on a “per-run” basis. Applies STFT to all time series in the input run.
This method could be modified in a multiple station code so that it doesn’t care if the station is “local” or “remote” but rather uses scale factors keyed by station_id (WIP - issue #329)
- Parameters:
processing_config (mt_metadata.transfer_functions.processing.aurora.Processing) – Metadata about the processing to be applied
i_dec_level (int) – The decimation level to process
run_obj (mth5.groups.master_station_run_channel.RunGroup) – The run to transform to stft
run_xrds (xarray.core.dataset.Dataset) – The data time series from the run to transform
units (str) – expects “MT”. May change so that this is the only accepted set of units
station_id (str) – To be deprecated, this information is contained in the run_obj as run_obj.station_group.metadata.id
- Returns:
stft_obj – Time series of calibrated Fourier coefficients per each channel in the run
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.process_mth5.merge_stfts(stfts: dict, tfk: TransferFunctionKernel)[source]¶
Applies concatenation along the time axis to multiple arrays of STFTs from different runs. At the TF estimation level we treat all the FCs in one array. This builds the array for both the local and the remote STFTs.
- Parameters:
stfts (dict) – The dict is keyed by “local” and “remote”. Each value is a list of STFTs (one list for local and one for remote)
tfk (TransferFunctionKernel) – Just here to let us know if there is a remote reference to merge or not.
- Returns:
local_merged_stft_obj, remote_merged_stft_obj – Both are xr.Datasets
- Return type:
Tuple
- aurora.pipelines.process_mth5.process_mth5(config, tfk_dataset=None, units='MT', show_plot=False, z_file_path=None, return_collection=False, processing_type='legacy')[source]¶
This is a pass-through method that routes the config and tfk_dataset to MT data processing. It currently only supports legacy aurora processing.
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Processing or path to json) – All processing parameters
tfk_dataset (aurora.tf_kernel.dataset.Dataset or None) – Specifies what datasets to process according to config
units (string) – “MT” or “SI”. To be deprecated once data have units embedded
show_plot (boolean) – Only used for dev
z_file_path (string or pathlib.Path) – Target path for a z_file output if desired
return_collection (boolean) – return_collection=False will return an mt_metadata TF object return_collection=True will return aurora.transfer_function.transfer_function_collection.TransferFunctionCollection
processing_type (string) – Controlled vocabulary, must be one of [“legacy”,] This is not really supported now, but the idea is that in future, the config and tfk_dataset can be passed to another processing method if desired.
- Returns:
tf_obj – The transfer function object
- Return type:
TransferFunctionCollection or mt_metadata.transfer_functions.TF
- aurora.pipelines.process_mth5.process_mth5_legacy(config, tfk_dataset=None, units='MT', show_plot=False, z_file_path=None, return_collection=False)[source]¶
This is the main method used to transform a processing_config, and a kernel_dataset into a transfer function estimate.
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Processing or path to json) – All processing parameters
tfk_dataset (aurora.tf_kernel.dataset.Dataset or None) – Specifies what datasets to process according to config
units (string) – “MT” or “SI”. To be deprecated once data have units embedded
show_plot (boolean) – Only used for dev
z_file_path (string or pathlib.Path) – Target path for a z_file output if desired
return_collection (boolean) – return_collection=False will return an mt_metadata TF object return_collection=True will return aurora.transfer_function.transfer_function_collection.TransferFunctionCollection
- Returns:
tf_collection (TransferFunctionCollection or mt_metadata TF) – The transfer function object
tf_cls (mt_metadata.transfer_functions.TF) – TF object
- aurora.pipelines.process_mth5.process_tf_decimation_level(config: Processing, i_dec_level: int, local_stft_obj: Dataset, remote_stft_obj: Dataset | None, units='MT')[source]¶
Processing pipeline for a single decimation_level
TODO: Add a check that the processing config sample rates agree with the data TODO: Add units to local_stft_obj, remote_stft_obj sampling rates otherwise raise Exception This method can be single station or remote based on the process cfg
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) – Config for a single decimation level
i_dec_level (int) – decimation level_id ?could we pack this into the decimation level as an attr?
local_stft_obj (xarray.core.dataset.Dataset) – The time series of Fourier coefficients from the local station
remote_stft_obj (xarray.core.dataset.Dataset or None) – The time series of Fourier coefficients from the remote station
units (str) – one of [“MT”,”SI”]
- Returns:
transfer_function_obj – The transfer function values packed into an object
- Return type:
- aurora.pipelines.process_mth5.save_fourier_coefficients(dec_level_config, row, run_obj, stft_obj) None [source]¶
Optionally saves the stft object into the MTH5. Note that the dec_level_config must have its save_fcs attr set to True to actually save the data. WIP
Note #1: Logic for building FC layers: If the processing config decimation_level.save_fcs_type = “h5” and fc_levels_already_exist is False, then open in append mode, else open in read mode. We should support a flag: force_rebuild_fcs, normally False. This flag is only needed when save_fcs_type==”h5”. If True, then we open in append mode, regarless of fc_levels_already_exist The task of setting mode=”a”, mode=”r” can be handled by tfk (maybe in tfk.validate())
- Parameters:
dec_level_config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) – The information about decimation level associated with row, run, stft_obj
row (pd.Series) – A row of the TFK.dataset_df
run_obj (mth5.groups.run.RunGroup) – The run object associated with the STFTs.
stft_obj (xr.Dataset) – The data to pack intp the mth5.
- aurora.pipelines.process_mth5.triage_issue_289(local_stfts: list, remote_stfts: list)[source]¶
Takes STFT objects in and returns them after shape-checking and making sure they are same. WIP: Timing Error Workaround See Aurora Issue #289. Seems associated with getting one fewer sample than expected from the edge of a run.
- returns: Tuple
(local_stfts, remote_stfts) The original input arguments, shape-matched
aurora.pipelines.run_summary module¶
aurora.pipelines.time_series_helpers module¶
Collection of modules used in processing pipeline that operate on time series
- aurora.pipelines.time_series_helpers.apply_prewhitening(decimation_obj, run_xrds_input)[source]¶
Applies pre-whitening to time series to avoid spectral leakage when FFT is applied.
If “first difference”, may want to consider clipping first and last sample from the differentiated time series.
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds_input (xarray.core.dataset.Dataset) – Time series to be pre-whitened
- Returns:
run_xrds – pre-whitened time series
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.apply_recoloring(decimation_obj, stft_obj)[source]¶
Inverts the pre-whitening operation in frequency domain.
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.fourier_coefficients.decimation.Decimation) – Information about how the decimation level is to be processed
stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be recoloured
- Returns:
stft_obj – Recolored time series of Fourier coefficients
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.calibrate_stft_obj(stft_obj, run_obj, units='MT', channel_scale_factors=None)[source]¶
Calibrates frequency domain data into MT units.
- Development Notes:
The calibration often raises a runtime warning due to DC term in calibration response = 0. TODO: It would be nice to suppress this, maybe by only calibrating the non-dc terms and directly assigning np.nan to the dc component when DC-response is zero.
- Parameters:
stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be calibrated
run_obj (mth5.groups.master_station_run_channel.RunGroup) – Provides information about filters for calibration
units (string) – usually “MT”, contemplating supporting “SI”
scale_factors (dict or None) – keyed by channel, supports a single scalar to apply to that channels data Useful for debugging. Should not be used in production and should throw a warning if it is not None
- Returns:
stft_obj – Time series of calibrated Fourier coefficients
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.nan_to_mean(xrds: Dataset) Dataset [source]¶
Set Nan values in xr.Dataset to the mean value per channel.
- xrds: xr.Dataset
Time series data
- Returns:
run_xrds – The same as the input time series but NaN values are replaced by the mean of the time series (per channel).
- Return type:
xr.Dataset
- aurora.pipelines.time_series_helpers.prototype_decimate(config: Decimation, run_xrds: Dataset) Dataset [source]¶
- Basically a wrapper for scipy.signal.decimate. Takes input timeseries (as xarray
Dataset) and a Decimation config object and returns a decimated version of the input time series.
TODO: Consider moving this function into time_series/decimate.py TODO: Consider Replacing the downsampled_time_axis with rolling mean, or somthing that takes the average value of the time, not the window start TODO: Compare outputs with scipy resample_poly, which also has an FIR AAF and appears faster TODO: Add handling for case that could occur when sliced time axis has a different length than the decimated data – see mth5 issue #217 https://github.com/kujaku11/mth5/issues/217
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.Decimation) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times
- Returns:
xr_ds – Decimated version of the input run_xrds
- Return type:
xr.Dataset
- aurora.pipelines.time_series_helpers.run_ts_to_stft(decimation_obj, run_xrds_orig)[source]¶
Converts a runts object into a time series of Fourier coefficients. Similar to run_ts_to_stft_scipy, but in this implementation operations on individual windows are possible (for example pre-whitening per time window via ARMA filtering).
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_ts (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS
- Returns:
stft_obj – Note that the STFT object may have inf/nan in the DC harmonic, introduced by recoloring. This really doesn’t matter since we don’t use the DC harmonic for anything.
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.run_ts_to_stft_scipy(decimation_obj, run_xrds_orig)[source]¶
Converts a runts object into a time series of Fourier coefficients. This method uses scipy.signal.spectrogram.
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds_orig (: xarray.core.dataset.Dataset) – Time series to be processed
- Returns:
stft_obj – Time series of Fourier coefficients
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.truncate_to_clock_zero(decimation_obj, run_xrds)[source]¶
Compute the time interval between the first data sample and the clock zero Identify the first sample in the xarray time series that corresponds to a window start sample.
- Parameters:
decimation_obj (mt_metadata.transfer_functions.processing.aurora.DecimationLevel) – Information about how the decimation level is to be processed
run_xrds (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS
- Returns:
run_xrds – same as the input time series, but possibly slightly shortened
- Return type:
xarray.core.dataset.Dataset
- aurora.pipelines.time_series_helpers.validate_sample_rate(run_ts, expected_sample_rate, tol=0.0001)[source]¶
Check that the sample rate of a run_ts is the expected value, and warn if not.
- Parameters:
run_ts (mth5.timeseries.run_ts.RunTS) – Time series object with data and metadata.
expected_sample_rate (float) – The sample rate the time series is expected to have. Normally taken from the processing config
aurora.pipelines.transfer_function_helpers module¶
This module contains helper methods that are used during transfer function processing.
Development Notes: Note #1: repeatedly applying edf_weights seems to have no effect at all. tested 20240118 and found that test_compare in synthetic passed whether this was commented or not. TODO confirm this is a one-and-done add doc about why this is so.
- aurora.pipelines.transfer_function_helpers.apply_weights(X, Y, RR, W, segment=False, dropna=False)[source]¶
Applies data weights (W) to each of X, Y, RR. If weight is zero, we set to nan and optionally dropna.
- Parameters:
X (xarray.core.dataset.Dataset) –
Y (xarray.core.dataset.Dataset) –
RR (xarray.core.dataset.Dataset or None) –
W (numpy array) – The Weights to apply to the data
segment (bool) – If True the weights may need to be reshaped.
dropna (bool) – Whether or not to drop zero-weighted data. If true, we drop the nans.
- Returns:
X, Y, RR – Same as input but with weights applied and (optionally) nan dropped.
- Return type:
- aurora.pipelines.transfer_function_helpers.drop_nans(X: Dataset, Y: Dataset, RR: Dataset | None) tuple [source]¶
Drops Nan from any input xarrays. - Just a helper intended to enhance readability
- Development Notes:
TODO: document the implications of dropna on index of xarray for other weights
- Parameters:
X (xr.Dataset) – The input data for regression
Y (xr.Dataset) – The output data for regression
RR (Union[xr.Dataset, None]) – The remote refernce data for regression
- Returns:
X, Y, RR – Returns the input arugments with nan dropped form the xarrays.
- Return type:
- aurora.pipelines.transfer_function_helpers.get_estimator_class(estimation_engine: str) RegressionEstimator [source]¶
- Parameters:
estimation_engine (str) – One of the keys in the ESTIMATOR_LIBRARY, designates the method that will be used to estimate the transfer function
- Returns:
estimator_class – The class that will do the TF estimation
- Return type:
aurora.transfer_function.regression.base.RegressionEstimator
- aurora.pipelines.transfer_function_helpers.process_transfer_functions(dec_level_config, local_stft_obj, remote_stft_obj, transfer_function_obj, segment_weights=[], channel_weights=None)[source]¶
This is the main tf_processing method. It is based on TTFestBand.m
Note #1: Although it is advantageous to execute the regression channel-by-channel vs. all-at-once, we need to keep the all-at-once to get residual covariances (see issue #87)
Note #2: Consider placing the segment weight logic in its own module with the various functions in a dictionary. Possibly can combines (product) all segment weights, like the following pseudocode:
W = ones for wt_style in segment_weights: fcn = wt_functions[style] w = fcn(X, Y, RR, ) W *= w return W
TODO: Consider push the nan-handling into the band extraction as a kwarg.
- Parameters:
dec_level_config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) –
local_stft_obj (xarray.core.dataset.Dataset) –
remote_stft_obj (xarray.core.dataset.Dataset or None) –
transfer_function_obj (aurora.transfer_function.TTFZ.TTFZ) – The transfer function container ready to receive values in this method.
segment_weights (numpy array or list of strings) – 1D array which should be of the same length as the time axis of the STFT objects If these weights are zero anywhere, we drop all that segment across all channels If it is a list of strings, each string corresponds to a weighting algorithm to be applied. [“jackknife_jj84”, “multiple_coherence”, “simple_coherence”]
channel_weights (numpy array or None) –
- Returns:
transfer_function_obj
- Return type:
- aurora.pipelines.transfer_function_helpers.set_up_iter_control(config)[source]¶
Initializes an IterControl object based on values in the processing config.
Development Notes: TODO: Review: maybe better to just make this the __init__ method of the IterControl object, iter_control = IterControl(config)
- Parameters:
config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) –
- Returns:
iter_control – Object with parameters about iteration control in regression
- Return type:
aurora.transfer_function.regression.iter_control.IterControl
- aurora.pipelines.transfer_function_helpers.stack_fcs(X, Y, RR)[source]¶
Reshape 2D arrays of frequency and time to 1D.
Notes: When the data for a frequency band are extracted from the Spectrogram, each channel is a 2D array, one axis is time (the time of the window that was FFT-ed) and the other axis is frequency. However if we make no distinction between the harmonics (bins) within a band in regression, then all the FCs for each channel can be put into a 1D array. This method performs that reshaping (ravelling) operation. It is not important how we unravel the FCs but it is important that the same indexing scheme is used for X, Y and RR.
TODO: Consider this take a list and return a list rather than X,Y,RR
TODO: Consider decorate this with @dataset_or_dataarray
- Parameters:
X (xarray.core.dataset.Dataset) –
Y (xarray.core.dataset.Dataset) –
RR (xarray.core.dataset.Dataset or None) –
- Returns:
X, Y, RR
- Return type:
Same as input but with stacked time and frequency dimensions
aurora.pipelines.transfer_function_kernel module¶
This module contains the TrasnferFunctionKernel class which is the main object that links the KernelDataset to Processing configuration.
- class aurora.pipelines.transfer_function_kernel.TransferFunctionKernel(dataset=None, config=None)[source]¶
Bases:
object
- Attributes:
all_fcs_already_exist
Return true of all FCs needed to process data already exist in the mth5s
config
Returns the processing config object
dataset
returns the KernelDataset object
dataset_df
returns the KernelDataset dataframe
kernel_dataset
returns the KernelDataset object
mth5_objs
Returns self._mth5_objs
processing_config
Returns the processing config object
processing_summary
Returns the processing summary object – creates if it doesn’t yet exist.
processing_type
A description of the processing, will get passed to TF object,
Methods
apply_clock_zero
(dec_level_config)get clock-zero from data if needed
Fills out the "fc" column of dataset dataframe with True/False.
export_tf_collection
(tf_collection)Assign transfer_function, residual_covariance, inverse_signal_power, station, survey
check the mode of an open mth5 (read, write, append)
returns a dict of open mth5 objects, keyed by station_id
is_valid_dataset
(row, i_dec)Given a row from the RunSummary, answer the question:
Create the processing summary table.
Checks if a RAM issue should be anticipated.
show_processing_summary
([omit_columns])Prints the processing summary table via logger.
update_dataset_df
(i_dec_level)This function has two different modes.
Get the decimation levels that are valid.
validate
()apply all validators
Checks that the decimation_scheme and dataset are compatable.
Do some Validation checks.
Update save_fc values in the config to be appropriate.
- property all_fcs_already_exist: bool¶
Return true of all FCs needed to process data already exist in the mth5s
- apply_clock_zero(dec_level_config)[source]¶
get clock-zero from data if needed
- Parameters:
dec_level_config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) –
- Returns:
dec_level_config – The modified DecimationLevel with clock-zero information set.
- Return type:
mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel
- check_if_fcs_already_exist()[source]¶
Fills out the “fc” column of dataset dataframe with True/False.
If all FC Levels for a given station-run are already built, mark True otherwise False.
Iterates over the processing summary_df, grouping by unique “Survey-Station-Run”s. (Could also iterate over kernel_dataset.dataframe, to get the groupby).
Note 1: Because decimation is a cascading operation, we avoid the case where some (valid) decimation levels exist in the mth5 FC archive and others do not. The maximum granularity is the “station-run” level. For a given run, either all relevant FCs are in the h5 or we treat as if none of them are. To support variations at the decimation-level, an appropriate way to address would be to store decimated time series in the archive as well (they would simply be runs with different sample rates, and some extra filters).
Note #2: run_sub_df may have multiple rows, even though the run id is unique. This could happen for example when you have a long run at the local station, but multiple (say two) shorter runs at the reference station. In that case, the processing summary will have a separate row for the intersection of the long run with each of the remote runs. We ignore this for now, selecting only the first element of the run_sub_df, under the assumption that FCs have been created for the entire run, or not at all. This assumption can be relaxed in future by using the time_period attribute of the FC layer. For now, we proceed with the all-or-none logic. That is, if a [‘survey’, ‘station’, ‘run’,] has FCs, assume that the FCs are present for the entire run. We assign the “fc” column of dataset_df to have the same boolean value for all rows of same [‘survey’, ‘station’, ‘run’] .
- Returns: None
Modifies self.dataset_df inplace, assigning bools to the “fc” column
- property config¶
Returns the processing config object
- property dataset¶
returns the KernelDataset object
- property dataset_df: DataFrame¶
returns the KernelDataset dataframe
- export_tf_collection(tf_collection)[source]¶
Assign transfer_function, residual_covariance, inverse_signal_power, station, survey
- Parameters:
tf_collection (aurora.transfer_function.transfer_function_collection.TransferFunctionCollection) – Contains TF estimates, covariance, and signal power values
- Returns:
tf_cls – Transfer function container
- Return type:
mt_metadata.transfer_functions.core.TF
- initialize_mth5s()[source]¶
returns a dict of open mth5 objects, keyed by station_id
A future version of this for multiple station processing may need nested dict with [survey_id][station]
- Returns:
mth5_objs – Keyed by stations. local station id : mth5.mth5.MTH5 remote station id: mth5.mth5.MTH5
- Return type:
- is_valid_dataset(row, i_dec) bool [source]¶
- Given a row from the RunSummary, answer the question:
“Will this decimation level yield a valid dataset?”
- Parameters:
row (pandas.core.series.Series) – Row of the self._dataset_df (corresponding to a run that will be processed)
i_dec (integer) – refers to decimation level
- Returns:
is_valid – Whether the (run, decimation_level) pair associated with this row yields a valid dataset
- Return type:
Bool
- property kernel_dataset¶
returns the KernelDataset object
- make_processing_summary()[source]¶
Create the processing summary table. - Melt the decimation levels over the run summary. - Add columns to estimate the number of FFT windows for each row
- Returns:
processing_summary – One row per each run-deciamtion pair
- Return type:
pd.DataFrame
- memory_check() None [source]¶
Checks if a RAM issue should be anticipated.
Notes: Requires an estimate of available RAM, and an estimate of the dataset size Available RAM is taken from psutil, Dataset size is number of samples, times the number of bytes per sample Bits per sample is estimated to be 64 by default, (8-bytes)
- property processing_config¶
Returns the processing config object
- property processing_summary¶
Returns the processing summary object – creates if it doesn’t yet exist.
- property processing_type¶
A description of the processing, will get passed to TF object, can be used for Z-file
Could add a version or a hashtag to this Could also check dataset_df If remote.all==False append “Single Station”
- show_processing_summary(omit_columns=('mth5_path', 'channel_scale_factors', 'start', 'end', 'input_channels', 'output_channels', 'num_samples_overlap', 'num_samples_advance', 'run_dataarray'))[source]¶
Prints the processing summary table via logger.
- Parameters:
omit_columns (tuple) – List of columns to omit when showing channel summary (used to keep table small).
- update_dataset_df(i_dec_level)[source]¶
This function has two different modes. The first mode initializes values in the array, and could be placed into TFKDataset.initialize_time_series_data() The second mode, decimates. The function is kept in pipelines because it calls time series operations.
Notes: 1. When assigning xarrays to dataframe cells, df dislikes xr.Dataset, so we convert to DataArray before assignment
- Parameters:
i_dec_level (int) – decimation level id, indexed from zero
config (mt_metadata.transfer_functions.processing.aurora.decimation_level.DecimationLevel) – decimation level config
- Returns:
dataset_df – Same df that was input to the function but now has columns:
- Return type:
pd.DataFrame
- valid_decimations()[source]¶
Get the decimation levels that are valid. This is used when iterating over decimation levels in the processing, we do not want to have invalid levels get processed (they will fail).
- Returns:
dec_levels – Decimations from the config that are valid.
- Return type:
- validate_decimation_scheme_and_dataset_compatability(min_num_stft_windows=None)[source]¶
Checks that the decimation_scheme and dataset are compatable. Marks as invalid any rows that will fail to process based incompatibility.
Refers to issue #182 (and #103, and possibly #196 and #233). Determine if there exist (one or more) runs that will yield decimated datasets that have too few samples to be passed to the STFT algorithm.
Strategy for handling this: Mark as invlaid any rows of the processing summary that do not yield long enough time series to window. This way all other rows, with decimations up to the invalid cases will still process.
WCGW: If the runs are “chunked” we could encounter a situation where the chunk fails to yield a deep decimation level, yet the level could be produced if the chunk size were larger. In general, handling this seems a bit complicated. We will ignore this for now and assume that the user will select a chunk size that is appropriate to the decimation scheme, i.e. use large chunks for deep decimations.
A general solution: return a log that tells the user about each run and decimation level … how many STFT-windows it yielded at each decimation level. This conjures the notion of (run, decimation_level) pairs ——-
- validate_processing()[source]¶
Do some Validation checks. WIP.
Things that are validated: 1. The default estimation engine from the json file is “RME_RR”, which is fine ( we expect to in general to do more RR processing than SS) but if there is only one station (no remote)then the RME_RR should be replaced by default with “RME”. - also if there is only one station, set reference channels to []
make sure local station id is defined (correctly from kernel dataset)
- aurora.pipelines.transfer_function_kernel.mth5_has_fcs(m, survey_id, station_id, run_id, remote, processing_config, **kwargs)[source]¶
Checks if all needed fc-levels for survey-station-run are present under processing_config
- Note #1: At this point in the logic, it is established that there are FCs associated with run_id and there are
at least as many FC decimation levels as we require as per the processing config. The next step is to assert whether it is True that the existing FCs conform to the recipe in the processing config.
kwargs are here as a pass through to the decorator … we pass mode=”r”,”a”,”w”
- Parameters:
m –
survey_id –
station_id –
run_id –
dataset_df –