aurora.pipelines package¶

Submodules¶

aurora.pipelines.fourier_coefficients module¶

aurora.pipelines.helpers module¶

Helper functions for processing pipelines. These maybe reorganized later into other modules.

aurora.pipelines.helpers.initialize_config(processing_config: Processing | str | Path) → Processing[source]¶

Helper function to return an intialized processing config.

Parameters:: processing_cfg (Union[Processing, str, pathlib.Path]) – Either an instance of the processing class or a path to a json file that a Processing object is stored in.
Returns:: config – Object that contains the processing parameters
Return type:: mt_metadata.transfer_functions.processing.aurora.Processing

aurora.pipelines.process_mth5 module¶

This module contains the main methods used in processing mth5 objects to transfer functions.

The main function is called process_mth5. This function was recently changed to process_mth5_legacy, os that process_mth5 can be repurposed for other TF estimation schemes. The “legacy” version corresponds to aurora default processing.

Notes on process_mth5_legacy: Note 1: process_mth5 assumes application of cascading decimation, and that the decimated data will be accessed from the previous decimation level. This should be revisited. It may make more sense to have a get_decimation_level() interface that provides an option of applying decimation or loading pre-decimated data. This will be addressed via creation of the FC layer inside mth5.

Note 2: We can encounter cases where some runs can be decimated and others can not. We need a way to handle this. For example, a short run may not yield any data from a later decimation level. An attempt to handle this has been made in TF Kernel by adding a is_valid_dataset column, associated with each run-decimation level pair.

Note 3: This point in the loop marks the interface between _generation_ of the FCs and: their _usage_. In future the code above this comment would be pushed into the creation of the spectrograms and the code below this would access those FCs and execute compute_transfer_function(). This would also be an appropriate place to place a feature extraction layer, and compute weights for the FCs.

aurora.pipelines.process_mth5.process_mth5(config, tfk_dataset=None, units='MT', show_plot=False, z_file_path=None, return_collection=False, processing_type='legacy')[source]¶

This is a pass-through method that routes the config and tfk_dataset to MT data processing. It currently only supports legacy aurora processing.

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.Processing or path to json) – All processing parameters
tfk_dataset (aurora.tf_kernel.dataset.Dataset or None) – Specifies what datasets to process according to config
units (string) – “MT” or “SI”. To be deprecated once data have units embedded
show_plot (boolean) – Only used for dev
z_file_path (string or pathlib.Path) – Target path for a z_file output if desired
return_collection (boolean) – return_collection=False will return an mt_metadata TF object return_collection=True will return aurora.transfer_function.transfer_function_collection.TransferFunctionCollection
processing_type (string) – Controlled vocabulary, must be one of [“legacy”,] This is not really supported now, but the idea is that in future, the config and tfk_dataset can be passed to another processing method if desired.

Returns:

tf_obj – The transfer function object

Return type:

TransferFunctionCollection or mt_metadata.transfer_functions.TF

aurora.pipelines.process_mth5.process_mth5_legacy(config, tfk_dataset=None, units='MT', show_plot=False, z_file_path=None, return_collection=False)[source]¶

This is the main method used to transform a processing_config, and a kernel_dataset into a transfer function estimate.

Parameters:

config (mt_metadata.transfer_functions.processing.aurora.Processing or path to json) – All processing parameters
tfk_dataset (aurora.tf_kernel.dataset.Dataset or None) – Specifies what datasets to process according to config
units (string) – “MT” or “SI”. To be deprecated once data have units embedded
show_plot (boolean) – Only used for dev
z_file_path (string or pathlib.Path) – Target path for a z_file output if desired
return_collection (boolean) – return_collection=False will return an mt_metadata TF object return_collection=True will return aurora.transfer_function.transfer_function_collection.TransferFunctionCollection

Returns:

tf_collection (TransferFunctionCollection or mt_metadata TF) – The transfer function object
tf_cls (mt_metadata.transfer_functions.TF) – TF object

aurora.pipelines.process_mth5.process_tf_decimation_level(config: Processing, i_dec_level: int, local_stft_obj: Dataset, remote_stft_obj: Dataset | None, weights: Tuple[str] | None = None, units='MT')[source]¶

Processing pipeline for a single decimation_level

TODO: Add a check that the processing config sample rates agree with the data: sampling rates otherwise raise Exception

TODO: Add units to local_stft_obj, remote_stft_obj TODO: This is the method that should be accessing weights This method can be single station or remote based on the process cfg

Parameters:

config (aurora.config.metadata.processing.Processing,) – Config for a single decimation level
i_dec_level (int) – decimation level_id ?could we pack this into the decimation level as an attr?
local_stft_obj (xarray.core.dataset.Dataset) – The time series of Fourier coefficients from the local station
remote_stft_obj (xarray.core.dataset.Dataset or None) – The time series of Fourier coefficients from the remote station
units (str) – one of [“MT”,”SI”]

Returns:

transfer_function_obj – The transfer function values packed into an object

Return type:

aurora.transfer_function.TTFZ.TTFZ

aurora.pipelines.run_summary module¶

Note 1: Functionality of RunSummary() 1. User can get a list of local_station options, which correspond to unique pairs of values: (survey_id, station_id)

2. User can see all possible ways of processing the data: - one list per (survey_id, station_id) pair in the run_summary

Some of the following functionalities may end up in KernelDataset: 3. User can select local_station -this can trigger a reduction of runs to only those that are from the local staion and simultaneous runs at other stations 4. Given a local station, a list of possible reference stations can be generated 5. Given a remote reference station, a list of all relevent runs, truncated to maximize coverage of the local station runs is generated 6. Given such a “restricted run list”, runs can be dropped 7. Time interval endpoints can be changed

class aurora.pipelines.run_summary.RunSummary(**kwargs)[source]¶

Bases: object

The dependencies aren’t clear yet. Maybe still Dataset:

Could have methods
“drop_runs_shorter_than” “fill_gaps_by_time_interval” “fill_gaps_by_run_names” “

For the full MMT case this may need modification to a channel based summary.

Question: To return a copy or modify in-place when querying. Need to decide on standards and syntax. Handling this in general could use a decorator that allows df as kwarg, and if it is not passed the modification is done in place. The user who doesn’t want to modify in place can work with a clone. Could also try the @staticmethod decorator so that it returns a modified df.

Attributes:

mini_summary
print_mini_summary

Methods

add_duration([df])

param df:

check_runs_are_valid([drop])

kwargs can tell us what sorts of conditions to check, for example all_zero, there are nan, etc.

clone()

2022-10-20: Cloning may be causing issues with extra instances of open h5 files ...

drop_invalid_rows
from_mth5s

add_duration(df=None)[source]¶

Parameters:: df –

check_runs_are_valid(drop=False, **kwargs)[source]¶: kwargs can tell us what sorts of conditions to check, for example all_zero, there are nan, etc.

clone()[source]¶: 2022-10-20: Cloning may be causing issues with extra instances of open h5 files …

drop_invalid_rows()[source]¶

from_mth5s(mth5_list)[source]¶

property mini_summary¶

property print_mini_summary¶

aurora.pipelines.run_summary.channel_summary_to_run_summary(ch_summary, allowed_input_channels=['hx', 'hy', 'bx', 'h2', 'h1', 'by'], allowed_output_channels=['hz', 'e3', 'e4', 'bz', 'h3', 'ex', 'e1', 'ey', 'e2'], sortby=['station_id', 'start'])[source]¶

TODO: replace station_id with station, and run_id with run Note will need to modify: aurora/tests/config$ more test_dataset_dataframe.py TODO: Add logic for handling input and output channels based on channel summary. Specifically, consider the case where there is no vertical magnetic field, this information is available via ch_summary, and output channels should then not include hz. TODO: Just inherit all the run-level and higher el’ts of the channel_summary, including n_samples?

When creating the dataset dataframe, make it have these columns: [

“station_id”, “run_id”, “start”, “end”, “mth5_path”, “sample_rate”, “input_channels”, “output_channels”, “remote”, “channel_scale_factors”,

]

Parameters:

ch_summary (mth5.tables.channel_table.ChannelSummaryTable or pandas DataFrame) –

If its a dataframe it is a representation of an mth5 channel_summary.
Maybe restricted to only have certain stations and runs before being passed to this method
allowed_input_channels (list of strings) – Normally [“hx”, “hy”, ] These are the allowable input channel names for the processing. See further note under allowed_output_channels.
allowed_output_channels (list of strings) – Normally [“ex”, “ey”, “hz”, ] These are the allowable output channel names for the processing. A global list of these is kept at the top of this module. The purpose of this is to distinguish between runs that have different layouts, for example some runs will have hz and some will not, and we cannot process for hz the runs that do not have it. By making this a kwarg we sort of prop the door open for more general names (see issue #74).
sortby (bool or list) – Default: [“station_id”, “start”]

Returns:

run_summary_df – A table with one row per “acquistion run” that was in the input channel summary table

Return type:

pd.Dataframe

aurora.pipelines.run_summary.extract_run_summaries_from_mth5s(mth5_list, summary_type='run', deduplicate=True)[source]¶

ToDo: Move this method into mth5? or mth5_helpers? ToDo: Make this a class so that the __repr__ is a nice visual representation of the df, like what channel summary does in mth5

2022-05-28 Modified to allow this method to accept mth5 objects as well as the already supported types of pathlib.Path or str

Given a list of mth5s, this returns a dataframe of all available runs

In order to drop duplicates I used the solution here: https://stackoverflow.com/questions/43855462/pandas-drop-duplicates-method-not-working-on-dataframe-containing-lists

Parameters:

mth5_paths (list) – paths or strings that point to mth5s
summary_type (string) – one of [“channel”, “run”] “channel” returns concatenated channel summary, “run” returns concatenated run summary,
deduplicate (bool) – Default is True, deduplicates the summary_df

Returns:

super_summary

Return type:

pd.DataFrame

aurora.pipelines.run_summary.extract_run_summary_from_mth5(mth5_obj, summary_type='run')[source]¶

Parameters:

mth5_obj (mth5.mth5.MTH5) – The initialized mth5 object that will be interrogated
summary_type (str) – One of [“run”, “channel”]. Returns a run summary or a channel summary

Returns:

out_df – Table summarizing the available runs in the input mth5_obj

Return type:

pd.Dataframe

aurora.pipelines.time_series_helpers module¶

Collection of modules used in processing pipeline that operate on time series

aurora.pipelines.time_series_helpers.prototype_decimate(ts_decimation: TimeSeriesDecimation, run_xrds: Dataset) → Dataset[source]¶

Basically a wrapper for scipy.signal.decimate. Takes input timeseries (as xarray: Dataset) and a TimeSeriesDecimation object and returns a decimated version of the input time series.

TODO: Consider moving this function into time_series/decimate.py TODO: Consider Replacing the downsampled_time_axis with rolling mean, or somthing that takes the average value of the time, not the window start TODO: Compare outputs with scipy resample_poly, which also has an FIR AAF and appears faster TODO: Add handling for case that could occur when sliced time axis has a different length than the decimated data – see mth5 issue #217 https://github.com/kujaku11/mth5/issues/217

Parameters:

ts_decimation (AuroraDecimationLevel) –
run_xrds (xr.Dataset) – Originally from mth5.timeseries.run_ts.RunTS.dataset, but possibly decimated multiple times

Returns:

xr_ds – Decimated version of the input run_xrds

Return type:

xr.Dataset

aurora.pipelines.time_series_helpers.truncate_to_clock_zero(decimation_obj: DecimationLevel | Decimation, run_xrds: RunGroup)[source]¶

Compute the time interval between the first data sample and the clock zero Identify the first sample in the xarray time series that corresponds to a window start sample.

Parameters:

decimation_obj (Union[AuroraDecimationLevel, FCDecimation]) – Information about how the decimation level is to be processed
run_xrds (xarray.core.dataset.Dataset) – normally extracted from mth5.RunTS

Returns:

run_xrds – same as the input time series, but possibly slightly shortened

Return type:

xarray.core.dataset.Dataset

aurora.pipelines.transfer_function_helpers module¶

This module contains helper methods that are used during transfer function processing.

Development Notes:: Note #1: repeatedly applying edf_weights seems to have no effect at all. tested 20240118 and found that test_compare in synthetic passed whether this was commented or not. TODO confirm this is a one-and-done add doc about why this is so.

aurora.pipelines.transfer_function_helpers.apply_weights(X: Dataset, Y: Dataset, RR: Dataset, W: ndarray, segment: bool = False, dropna: bool = False) → tuple[source]¶

Applies data weights (W) to each of X, Y, RR. If weight is zero, we set to nan and optionally dropna.

Parameters:

X (xarray.core.dataset.Dataset) –
Y (xarray.core.dataset.Dataset) –
RR (xarray.core.dataset.Dataset or None) –
W (numpy array) – The Weights to apply to the data
segment (bool) – If True the weights may need to be reshaped.
dropna (bool) – Whether or not to drop zero-weighted data. If true, we drop the nans.

Returns:

X, Y, RR – Same as input but with weights applied and (optionally) nan dropped.

Return type:

tuple

aurora.pipelines.transfer_function_helpers.drop_nans(X: Dataset, Y: Dataset, RR: Dataset | None) → tuple[source]¶: Drops any observation where any variable in X, Y, or RR is NaN.

aurora.pipelines.transfer_function_helpers.get_estimator_class(estimation_engine: Literal['OLS', 'RME', 'RME_RR']) → RegressionEstimator[source]¶

Parameters:: estimation_engine (Literal["OLS", "RME", "RME_RR"]) – One of the keys in the ESTIMATOR_LIBRARY, designates the method that will be used to estimate the transfer function
Returns:: estimator_class – The class that will do the TF estimation
Return type:: aurora.transfer_function.regression.base.RegressionEstimator

aurora.pipelines.transfer_function_helpers.process_transfer_functions(dec_level_config: DecimationLevel, local_stft_obj: Dataset, remote_stft_obj: Dataset, transfer_function_obj)[source]¶

This is the main tf_processing method. It is based on the Matlab legacy code TTFestBand.m.

Note #1: Although it is advantageous to execute the regression channel-by-channel vs. all-at-once, we need to keep the all-at-once to get residual covariances (see aurora issue #87)

TODO: Consider push the nan-handling into the band extraction as a kwarg.

Parameters:

dec_level_config (AuroraDecimationLevel) – Processing parameters for the active decimation level.
local_stft_obj (xarray.core.dataset.Dataset) –
remote_stft_obj (xarray.core.dataset.Dataset or None) –
transfer_function_obj (aurora.transfer_function.TTFZ.TTFZ) – The transfer function container ready to receive values in this method.

Returns:

transfer_function_obj

Return type:

aurora.transfer_function.TTFZ.TTFZ

aurora.pipelines.transfer_function_helpers.process_transfer_functions_with_weights(dec_level_config: DecimationLevel, local_stft_obj: Dataset, remote_stft_obj: Dataset, transfer_function_obj)[source]¶

This is version of process_transfer_functions applies weights to the data.

Development Notes: Note #1: This is only for per-channel estimation, so it does not support the dec_level_config.estimator.estimate_per_channel = False Note #2: This was adapted from the process_transfer_functions method but the core loop is inverted to loop over channels first, then bands.

Parameters:

dec_level_config (AuroraDecimationLevel) – Processing parameters for the active decimation level.
local_stft_obj (xarray.core.dataset.Dataset) –
remote_stft_obj (xarray.core.dataset.Dataset or None) –
transfer_function_obj (aurora.transfer_function.TTFZ.TTFZ) – The transfer function container ready to receive values in this method.

Returns:

transfer_function_obj

Return type:

aurora.transfer_function.TTFZ.TTFZ

aurora.pipelines.transfer_function_helpers.set_up_iter_control(config: DecimationLevel)[source]¶

Initializes an IterControl object based on values in the processing config.

Development Notes: TODO: Review: maybe better to just make this the __init__ method of the IterControl object, iter_control = IterControl(config)

Parameters:: config (AuroraDecimationLevel) – metadata about the decimation level processing.
Returns:: iter_control – Object with parameters about iteration control in regression
Return type:: aurora.transfer_function.regression.iter_control.IterControl

aurora.pipelines.transfer_function_helpers.stack_fcs(X, Y, RR)[source]¶

Reshape 2D arrays of frequency and time to 1D.

Notes: When the data for a frequency band are extracted from the Spectrogram, each channel is a 2D array, one axis is time (the time of the window that was FFT-ed) and the other axis is frequency. However if we make no distinction between the harmonics (bins) within a band in regression, then all the FCs for each channel can be put into a 1D array. This method performs that reshaping (ravelling) operation. It is not important how we unravel the FCs but it is important that the same indexing scheme is used for X, Y and RR.

TODO: Consider this take a list and return a list rather than X,Y,RR

TODO: Consider decorate this with @dataset_or_dataarray

Parameters:

X (xarray.core.dataset.Dataset) –
Y (xarray.core.dataset.Dataset) –
RR (xarray.core.dataset.Dataset or None) –

Returns:

X, Y, RR

Return type:

Same as input but with stacked time and frequency dimensions

aurora.pipelines.transfer_function_kernel module¶

This module contains the TrasnferFunctionKernel class which is the main object that links the KernelDataset to Processing configuration.

class aurora.pipelines.transfer_function_kernel.TransferFunctionKernel(dataset: KernelDataset, config: Processing | str | Path)[source]¶

Bases: object

Attributes:

all_fcs_already_exist: Return true of all FCs needed to process data already exist in the mth5s
config: Returns the processing config object
dataset: returns the KernelDataset object
dataset_df: returns the KernelDataset dataframe
kernel_dataset: returns the KernelDataset object
mth5_objs: Returns self._mth5_objs
processing_config: Returns the processing config object
processing_summary: Returns the processing summary object – creates if it doesn’t yet exist.
processing_type: A description of the processing, will get passed to TF object,

Methods

`apply_clock_zero`(dec_level_config)	get clock-zero from data if needed
`check_if_fcs_already_exist`()	Fills out the "fc" column of dataset dataframe with True/False.
`export_tf_collection`(tf_collection)	Assign transfer_function, residual_covariance, inverse_signal_power, station, survey
`get_mth5_file_open_mode`()	check the mode of an open mth5 (read, write, append)
`initialize_mth5s`()	returns a dict of open mth5 objects, keyed by station_id
`is_valid_dataset`(row, i_dec)	Given a row from the RunSummary, answer the question:
`memory_check`()	Checks if a RAM issue should be anticipated.
`show_processing_summary`([omit_columns])	Prints the processing summary table via logger.
`update_dataset_df`(i_dec_level)	This function has two different modes.
`update_processing_summary`()	Creates or updates the processing summary table based on processing parameters and kernel dataset.
`valid_decimations`()	Get the decimation levels that are valid.
`validate`()	apply all validators
`validate_decimation_scheme_and_dataset_compatability`([...])	Checks that the decimation_scheme and dataset are compatable.
`validate_processing`()	Do some Validation checks.
`validate_save_fc_settings`()	Update save_fc values in the config to be appropriate.

property all_fcs_already_exist: bool¶: Return true of all FCs needed to process data already exist in the mth5s

apply_clock_zero(dec_level_config: DecimationLevel)[source]¶

get clock-zero from data if needed

Parameters:: dec_level_config (AuroraDecimationLevel) – metadata about the decimation level processing.
Returns:: dec_level_config – The modified DecimationLevel with clock-zero information set.
Return type:: AuroraDecimationLevel

check_if_fcs_already_exist()[source]¶

Fills out the “fc” column of dataset dataframe with True/False.

If all FC Levels for a given station-run are already built, mark True otherwise False.

Iterates over the processing summary_df, grouping by unique “Survey-Station-Run”s. (Could also iterate over kernel_dataset.dataframe, to get the groupby).

Note 1: Because decimation is a cascading operation, we avoid the case where some (valid) decimation levels exist in the mth5 FC archive and others do not. The maximum granularity is the “station-run” level. For a given run, either all relevant FCs are in the h5 or we treat as if none of them are. To support variations at the decimation-level, an appropriate way to address would be to store decimated time series in the archive as well (they would simply be runs with different sample rates, and some extra filters).

Note #2: run_sub_df may have multiple rows, even though the run id is unique. This could happen for example when you have a long run at the local station, but multiple (say two) shorter runs at the reference station. In that case, the processing summary will have a separate row for the intersection of the long run with each of the remote runs. We ignore this for now, selecting only the first element of the run_sub_df, under the assumption that FCs have been created for the entire run, or not at all. This assumption can be relaxed in future by using the time_period attribute of the FC layer. For now, we proceed with the all-or-none logic. That is, if a [‘survey’, ‘station’, ‘run’,] has FCs, assume that the FCs are present for the entire run. We assign the “fc” column of dataset_df to have the same boolean value for all rows of same [‘survey’, ‘station’, ‘run’] .

Returns: None: Modifies self.dataset_df inplace, assigning bools to the “fc” column

property config: Processing¶: Returns the processing config object

property dataset: KernelDataset¶: returns the KernelDataset object

property dataset_df: DataFrame¶: returns the KernelDataset dataframe

export_tf_collection(tf_collection: TransferFunctionCollection)[source]¶

Assign transfer_function, residual_covariance, inverse_signal_power, station, survey

Parameters:: tf_collection (aurora.transfer_function.TransferFunctionCollection) – Contains TF estimates, covariance, and signal power values
Returns:: tf_cls – Transfer function container
Return type:: mt_metadata.transfer_functions.core.TF

get_mth5_file_open_mode() → str[source]¶: check the mode of an open mth5 (read, write, append)

initialize_mth5s()[source]¶

returns a dict of open mth5 objects, keyed by station_id

A future version of this for multiple station processing may need nested dict with [survey_id][station]

Returns:: mth5_objs – Keyed by stations. local station id : mth5.mth5.MTH5 remote station id: mth5.mth5.MTH5
Return type:: dict

is_valid_dataset(row, i_dec) → bool[source]¶

Given a row from the RunSummary, answer the question:: “Will this decimation level yield a valid dataset?”

Parameters:

row (pandas.core.series.Series) – Row of the self._dataset_df (corresponding to a run that will be processed)
i_dec (integer) – refers to decimation level

Returns:

is_valid – Whether the (run, decimation_level) pair associated with this row yields a valid dataset

Return type:

Bool

property kernel_dataset: KernelDataset¶: returns the KernelDataset object

memory_check() → None[source]¶

Checks if a RAM issue should be anticipated.

Notes: Requires an estimate of available RAM, and an estimate of the dataset size Available RAM is taken from psutil, Dataset size is number of samples, times the number of bytes per sample Bits per sample is estimated to be 64 by default, (8-bytes)

property mth5_objs: dict¶: Returns self._mth5_objs

property processing_config: Processing¶: Returns the processing config object

property processing_summary: DataFrame¶: Returns the processing summary object – creates if it doesn’t yet exist.

property processing_type¶

A description of the processing, will get passed to TF object, can be used for Z-file

Could add a version or a hashtag to this Could also check dataset_df If remote.all==False append “Single Station”

show_processing_summary(omit_columns=('mth5_path', 'channel_scale_factors', 'start', 'end', 'input_channels', 'output_channels', 'num_samples_overlap', 'num_samples_advance', 'run_dataarray'))[source]¶

Prints the processing summary table via logger.

Parameters:: omit_columns (tuple) – List of columns to omit when showing channel summary (used to keep table small).

update_dataset_df(i_dec_level: int) → None[source]¶

This function has two different modes. The first mode initializes values in the array, and could be placed into TFKDataset.initialize_time_series_data() The second mode, decimates. The function is kept in pipelines because it calls time series operations.

Notes: 1. When assigning xarrays to dataframe cells, df dislikes xr.Dataset, so we convert to DataArray before assignment

Parameters:: i_dec_level (int) – decimation level id, indexed from zero
Returns:: dataset_df – Same df that was input to the function but now has columns:
Return type:: pd.DataFrame

update_processing_summary()[source]¶

Creates or updates the processing summary table based on processing parameters and kernel dataset. - Melt the decimation levels over the run summary. - Add columns to estimate the number of FFT windows for each row

Returns:: processing_summary – One row per each run-deciamtion pair
Return type:: pd.DataFrame

valid_decimations() → List[DecimationLevel][source]¶

Get the decimation levels that are valid. This is used when iterating over decimation levels in the processing. We do not want to try processing invalid levels (they will fail).

Returns:: dec_levels – Decimations from the config that are valid.
Return type:: list

validate()[source]¶: apply all validators

validate_decimation_scheme_and_dataset_compatability(min_num_stft_windows=None)[source]¶

Checks that the decimation_scheme and dataset are compatable. Marks as invalid any rows that will fail to process based incompatibility.

Refers to issue #182 (and #103, and possibly #196 and #233). Determine if there exist (one or more) runs that will yield decimated datasets that have too few samples to be passed to the STFT algorithm.

Strategy for handling this: Mark as invlaid any rows of the processing summary that do not yield long enough time series to window. This way all other rows, with decimations up to the invalid cases will still process.

WCGW: If the runs are “chunked” we could encounter a situation where the chunk fails to yield a deep decimation level, yet the level could be produced if the chunk size were larger. In general, handling this seems a bit complicated. We will ignore this for now and assume that the user will select a chunk size that is appropriate to the decimation scheme, i.e. use large chunks for deep decimations.

A general solution: return a log that tells the user about each run and decimation level … how many STFT-windows it yielded at each decimation level. This conjures the notion of (run, decimation_level) pairs ——-

validate_processing()[source]¶

Do some Validation checks. WIP.

Things that are validated: 1. The default estimation engine from the json file is “RME_RR”, which is fine ( we expect to in general to do more RR processing than SS) but if there is only one station (no remote)then the RME_RR should be replaced by default with “RME”. - also if there is only one station, set reference channels to []

make sure local station id is defined (correctly from kernel dataset)

validate_save_fc_settings()[source]¶: Update save_fc values in the config to be appropriate. - If all FCs exist, set save_fc to False.

aurora.pipelines.transfer_function_kernel.mth5_has_fcs(m, survey_id, station_id, run_id, remote, processing_config, **kwargs)[source]¶

Checks if all needed fc-levels for survey-station-run are present under processing_config

Note #1: At this point in the logic, it is established that there are FCs associated with run_id and there are: at least as many FC decimation levels as we require as per the processing config. The next step is to assert whether it is True that the existing FCs conform to the recipe in the processing config.

kwargs are here as a pass through to the decorator … we pass mode=”r”,”a”,”w”

Parameters:

m –
survey_id –
station_id –
run_id –
dataset_df –

aurora.pipelines.transfer_function_kernel.station_obj_from_row(row)[source]¶

Access the station object Note if/else could avoidable if replacing text string “none” with a None object in survey column

Parameters:: row (pd.Series) – A row of tfk.dataset_df
Return type:: station_obj

aurora.pipelines package¶

Submodules¶

aurora.pipelines.fourier_coefficients module¶

aurora.pipelines.helpers module¶

aurora.pipelines.process_mth5 module¶

aurora.pipelines.run_summary module¶

aurora.pipelines.time_series_helpers module¶

aurora.pipelines.transfer_function_helpers module¶

aurora.pipelines.transfer_function_kernel module¶

Module contents¶