Synthetic Data Tutorial¶
This notebook shows how to process MTH5 data from a synthetic dataset.
It also shows how to modify processing so that Fourier coefficients are saved in the mth5. These FCs can be used to perform TF estimation using different regression settings. The same FCs will be used for feature extraction in future (This section is a work in Progress).
Contents:¶
Process Synthetic Data with Aurora¶
Here is a minimal example of running aurora processing on an mth5 populated with synthetic time series.
Steps: 1. Create the synthetic mth5 2. Get a Run Summary from the mth5 3. Select the station to process and optionally the remote reference station 4. Create a processing config 5. Generate TFs 6. Archive the TFs (in emtf_xml or z-file)
Here are the modules we will need to import¶
[1]:
import pathlib
import warnings
from aurora.config.config_creator import ConfigCreator
from aurora.pipelines.process_mth5 import process_mth5
from mth5.data.make_mth5_from_asc import create_test12rr_h5
from mth5.data.paths import SyntheticTestPaths
from mtpy.processing import RunSummary, KernelDataset
warnings.filterwarnings('ignore')
/home/kkappler/software/irismt/mtpy-v2/mtpy/modeling/simpeg/recipes/inversion_2d.py:39: UserWarning: Pardiso not installed see https://github.com/simpeg/pydiso/blob/main/README.md.
warnings.warn(
Define target folder and mth5 path¶
By default, the synthetic mth5 file is used for testing in aurora/tests/synthetic/
and probably already exists on your system if you have run the tests. In the code below, we check if the file exists already, and if not we make it.
NOTE: If using a read-only HPC installation, you may not be able to write to the directory where aurora is installed. In that case, defining your target path as somewhere you have write permission. In that case, uncommment the READ ONLY INSTALLATION block below.
[2]:
synthetic_test_paths = SyntheticTestPaths()
target_folder = synthetic_test_paths.mth5_path
## READ ONLY INSTALLATION
# home = pathlib.Path.home()
# target_folder = home.joinpath("aurora_test_folder")
# target_folder.mkdir(parents=True, exist_ok=True)
mth5_path = target_folder.joinpath("test12rr.h5")
If the mth5 doesn’t already exist, or you want to re-make it, call create_test12rr_h5()
[3]:
# Uncomment this to start with a fresh mth5 file
# mth5_path.unlink()
[4]:
if not mth5_path.exists():
create_test12rr_h5(target_folder=target_folder)
24:10:01T07:35:10 | INFO | line:679 |mth5.mth5 | _initialize_file | Initialized MTH5 0.1.0 file /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5 in mode w
24:10:01T07:35:12 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
Get a Run Summary¶
Note that we didn’t need to explicitly open the mth5 to do that, we can pass the path if we want. Run summary takes a list of mth5 paths as input argument.
[5]:
mth5_run_summary = RunSummary()
mth5_run_summary.from_mth5s([mth5_path,])
run_summary = mth5_run_summary.clone()
run_summary.mini_summary
24:10:01T07:35:12 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
[5]:
survey | station | run | start | end | duration | |
---|---|---|---|---|---|---|
0 | EMTF Synthetic | test1 | 001 | 1980-01-01 00:00:00+00:00 | 1980-01-01 11:06:39+00:00 | 39999.0 |
1 | EMTF Synthetic | test2 | 001 | 1980-01-01 00:00:00+00:00 | 1980-01-01 11:06:39+00:00 | 39999.0 |
Define a Kernel Dataset¶
[6]:
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "test1", "test2")
kernel_dataset.mini_summary
24:10:01T07:35:12 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'pandas._libs.missing.NAType'>.
24:10:01T07:35:12 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:10:01T07:35:12 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:12 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:12 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'NoneType'>.
[6]:
survey | station | run | start | end | duration | |
---|---|---|---|---|---|---|
0 | EMTF Synthetic | test1 | 001 | 1980-01-01 00:00:00+00:00 | 1980-01-01 11:06:39+00:00 | 39999.0 |
1 | EMTF Synthetic | test2 | 001 | 1980-01-01 00:00:00+00:00 | 1980-01-01 11:06:39+00:00 | 39999.0 |
Now define the processing Configuration¶
The only things we need to provide are our band processing scheme, and the data sample rate to generate a default processing configuration.
The config will get its information about the specific stations to process via the kernel dataset.
NOTE: When doing only single station processing you need to specify RME processing (rather than remote reference processing which expects extra time series from another station)
[7]:
cc = ConfigCreator()
config = cc.create_from_kernel_dataset(kernel_dataset)
# you can export the config to a json by uncommenting the following line
# cfg_json = config.to_json()
24:10:01T07:35:12 | INFO | line:108 |aurora.config.config_creator | determine_band_specification_style | Bands not defined; setting to EMTF BANDS_DEFAULT_FILE
Take a look at the processing configuration¶
[8]:
config
[8]:
{
"processing": {
"band_setup_file": "/home/kkappler/software/irismt/aurora/aurora/config/emtf_band_setup/bs_test.cfg",
"band_specification_style": "EMTF",
"channel_nomenclature.ex": "ex",
"channel_nomenclature.ey": "ey",
"channel_nomenclature.hx": "hx",
"channel_nomenclature.hy": "hy",
"channel_nomenclature.hz": "hz",
"decimations": [
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.23828125,
"frequency_min": 0.19140625,
"index_max": 30,
"index_min": 25
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.19140625,
"frequency_min": 0.15234375,
"index_max": 24,
"index_min": 20
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.15234375,
"frequency_min": 0.12109375,
"index_max": 19,
"index_min": 16
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.12109375,
"frequency_min": 0.09765625,
"index_max": 15,
"index_min": 13
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.09765625,
"frequency_min": 0.07421875,
"index_max": 12,
"index_min": 10
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.07421875,
"frequency_min": 0.05859375,
"index_max": 9,
"index_min": 8
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.05859375,
"frequency_min": 0.04296875,
"index_max": 7,
"index_min": 6
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.04296875,
"frequency_min": 0.03515625,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 1.0,
"decimation.level": 0,
"decimation.method": "default",
"decimation.sample_rate": 1.0,
"estimator.engine": "RME_RR",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0341796875,
"frequency_min": 0.0263671875,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0263671875,
"frequency_min": 0.0205078125,
"index_max": 13,
"index_min": 11
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0205078125,
"frequency_min": 0.0166015625,
"index_max": 10,
"index_min": 9
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0166015625,
"frequency_min": 0.0126953125,
"index_max": 8,
"index_min": 7
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0126953125,
"frequency_min": 0.0107421875,
"index_max": 6,
"index_min": 6
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0107421875,
"frequency_min": 0.0087890625,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 4.0,
"decimation.level": 1,
"decimation.method": "default",
"decimation.sample_rate": 0.25,
"estimator.engine": "RME_RR",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.008544921875,
"frequency_min": 0.006591796875,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.006591796875,
"frequency_min": 0.005126953125,
"index_max": 13,
"index_min": 11
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.005126953125,
"frequency_min": 0.004150390625,
"index_max": 10,
"index_min": 9
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.004150390625,
"frequency_min": 0.003173828125,
"index_max": 8,
"index_min": 7
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.003173828125,
"frequency_min": 0.002685546875,
"index_max": 6,
"index_min": 6
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.002685546875,
"frequency_min": 0.002197265625,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 4.0,
"decimation.level": 2,
"decimation.method": "default",
"decimation.sample_rate": 0.0625,
"estimator.engine": "RME_RR",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00274658203125,
"frequency_min": 0.00213623046875,
"index_max": 22,
"index_min": 18
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00213623046875,
"frequency_min": 0.00164794921875,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00164794921875,
"frequency_min": 0.00115966796875,
"index_max": 13,
"index_min": 10
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00115966796875,
"frequency_min": 0.00079345703125,
"index_max": 9,
"index_min": 7
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00079345703125,
"frequency_min": 0.00054931640625,
"index_max": 6,
"index_min": 5
}
}
],
"decimation.factor": 4.0,
"decimation.level": 3,
"decimation.method": "default",
"decimation.sample_rate": 0.015625,
"estimator.engine": "RME_RR",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
}
],
"id": "test1-rr_test2_sr1",
"stations.local.id": "test1",
"stations.local.mth5_path": "/home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5",
"stations.local.remote": false,
"stations.local.runs": [
{
"run": {
"id": "001",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
}
],
"sample_rate": 1.0,
"time_periods": [
{
"time_period": {
"end": "1980-01-01T11:06:39+00:00",
"start": "1980-01-01T00:00:00+00:00"
}
}
]
}
}
],
"stations.remote": [
{
"station": {
"id": "test2",
"mth5_path": "/home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5",
"remote": true,
"runs": [
{
"run": {
"id": "001",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
}
],
"sample_rate": 1.0,
"time_periods": [
{
"time_period": {
"end": "1980-01-01T11:06:39+00:00",
"start": "1980-01-01T00:00:00+00:00"
}
}
]
}
}
]
}
}
]
}
}
Call process_mth5¶
[9]:
show_plot = True
tf_cls = process_mth5(config,
kernel_dataset,
units="MT",
show_plot=show_plot,
z_file_path=None,
)
24:10:01T07:35:12 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:10:01T07:35:12 | INFO | line:278 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
duration has_data n_samples run station survey run_hdf5_reference station_hdf5_reference fc remote stft mth5_obj dec_level dec_factor sample_rate window_duration num_samples_window num_samples num_stft_windows
0 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 0 1.0 1.000000 128.0 128 39999.0 416.0
1 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 1 4.0 0.250000 512.0 128 9999.0 103.0
2 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 2 4.0 0.062500 2048.0 128 2499.0 25.0
3 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 3 4.0 0.015625 8192.0 128 624.0 6.0
4 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> True None None 0 1.0 1.000000 128.0 128 39999.0 416.0
5 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> True None None 1 4.0 0.250000 512.0 128 9999.0 103.0
6 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> True None None 2 4.0 0.062500 2048.0 128 2499.0 25.0
7 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> True None None 3 4.0 0.015625 8192.0 128 624.0 6.0
24:10:01T07:35:12 | INFO | line:654 |aurora.pipelines.transfer_function_kernel | memory_check | Total memory: 62.74 GB
24:10:01T07:35:12 | INFO | line:658 |aurora.pipelines.transfer_function_kernel | memory_check | Total Bytes of Raw Data: 0.001 GB
24:10:01T07:35:12 | INFO | line:661 |aurora.pipelines.transfer_function_kernel | memory_check | Raw Data will use: 0.001 % of memory
24:10:01T07:35:12 | INFO | line:707 |aurora.pipelines.transfer_function_kernel | mth5_has_fcs | Fourier coefficients not detected for survey: EMTF Synthetic, station: test1, run: 001-- Fourier coefficients will be computed
24:10:01T07:35:12 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:12 | INFO | line:707 |aurora.pipelines.transfer_function_kernel | mth5_has_fcs | Fourier coefficients not detected for survey: EMTF Synthetic, station: test2, run: 001-- Fourier coefficients will be computed
24:10:01T07:35:12 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:12 | INFO | line:248 |aurora.pipelines.transfer_function_kernel | check_if_fcs_already_exist | FC levels not present
24:10:01T07:35:12 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:10:01T07:35:12 | INFO | line:445 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:10:01T07:35:13 | INFO | line:899 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:10:01T07:35:13 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:10:01T07:35:13 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:13 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:13 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s (0.038867Hz)
24:10:01T07:35:13 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s (0.050177Hz)
24:10:01T07:35:13 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s (0.065945Hz)
24:10:01T07:35:13 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s (0.085135Hz)
24:10:01T07:35:13 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s (0.108745Hz)
24:10:01T07:35:14 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s (0.135823Hz)
24:10:01T07:35:14 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s (0.170762Hz)
24:10:01T07:35:14 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s (0.213562Hz)
24:10:01T07:35:14 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:10:01T07:35:14 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:10:01T07:35:14 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:15 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:15 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s (0.009717Hz)
24:10:01T07:35:15 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s (0.011678Hz)
24:10:01T07:35:15 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s (0.014518Hz)
24:10:01T07:35:15 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s (0.018452Hz)
24:10:01T07:35:15 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s (0.023254Hz)
24:10:01T07:35:15 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s (0.030020Hz)
24:10:01T07:35:15 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:10:01T07:35:15 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:10:01T07:35:16 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:16 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s (0.002429Hz)
24:10:01T07:35:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s (0.002919Hz)
24:10:01T07:35:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s (0.003629Hz)
24:10:01T07:35:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s (0.004613Hz)
24:10:01T07:35:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s (0.005813Hz)
24:10:01T07:35:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s (0.007505Hz)
24:10:01T07:35:16 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:10:01T07:35:16 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:10:01T07:35:17 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:17 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:10:01T07:35:17 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s (0.000660Hz)
24:10:01T07:35:17 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s (0.000959Hz)
24:10:01T07:35:17 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s (0.001382Hz)
24:10:01T07:35:17 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s (0.001876Hz)
24:10:01T07:35:17 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s (0.002422Hz)
24:10:01T07:35:17 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:17 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
Export TF to a file¶
[10]:
xml_file_base = f"synthetic_test1.xml"
tf_cls.write(fn=xml_file_base, file_type="emtfxml")
[10]:
EMTFXML(station='test1', latitude=0.00, longitude=0.00, elevation=0.00)
[11]:
edi_file_base = f"synthetic_test1.edi"
tf_cls.write(fn=edi_file_base, file_type="edi")
[11]:
Station: test1
--------------------------------------------------
Survey: EMTF Synthetic
Project: None
Acquired by: None
Acquired date: 1980-01-01
Latitude: 0.000
Longitude: 0.000
Elevation: 0.000
Impedance: True
Tipper: True
Number of periods: 25
Period Range: 4.68249E+00 -- 1.51470E+03 s
Frequency Range 6.60196E-04 -- 2.13561E-01 s
NOTE Fourier coefficient section below here is a work in progress.
Fourier coefficient storage in MTH5¶
The capability to store Fourier coeffficients (FCs) in MTH5 is now available. This will enable some different approaches to processing and data quality control (QC). The data QC tools are a work in progress. The following examples show how to add some FC levels to an MTH5, providing a starting point for processing or feature extraction from these data.
There are currently two main ways to add FCs to the MTH5.
Store on the fly while processing with Aurora
Use a dedicated method to make FCs only.
Storing while processing with Aurora¶
We can store FCs while processing by changing the processing config’s save_fcs
attribute to True
If we do that with the current processing config however we encouter a warning:
Saving FCs for remote reference processing is not supported
- To save FCs, process as single station, then you can use the FCs for RR processing
- See issue #319 for details
- forcing processing config save_fcs = False
There are two workarounds for this: 1. Process each station as a single station, and then the FCs (TODO: Add an example of this) 2. Explicitly call add_fcs
function.
[12]:
file_size_before_adding_fcs = mth5_path.stat().st_size
print(f"file_size_before_adding_fcs: {file_size_before_adding_fcs}")
file_size_before_adding_fcs: 1007455
Using the “single-station processing method” of FC generation¶
[13]:
# set kernel_dataset to work on one station only
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "test1", None)
# Update the processing config
cc = ConfigCreator()
config = cc.create_from_kernel_dataset(kernel_dataset)
# tell the config to save the FCs
for dec in config.decimations:
dec.save_fcs = True
dec.save_fcs_type = "h5"
# you can export the config to a json by uncommenting the following line
# cfg_json = config.to_json()
kernel_dataset.mini_summary
24:10:01T07:35:18 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'pandas._libs.missing.NAType'>.
24:10:01T07:35:18 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:10:01T07:35:18 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:18 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:18 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:18 | INFO | line:108 |aurora.config.config_creator | determine_band_specification_style | Bands not defined; setting to EMTF BANDS_DEFAULT_FILE
[13]:
survey | station | run | start | end | duration | |
---|---|---|---|---|---|---|
0 | EMTF Synthetic | test1 | 001 | 1980-01-01 00:00:00+00:00 | 1980-01-01 11:06:39+00:00 | 39999.0 |
[14]:
show_plot = False
tf_cls = process_mth5(config,
kernel_dataset,
units="MT",
show_plot=show_plot,
z_file_path=None,
)
24:10:01T07:35:18 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:10:01T07:35:18 | INFO | line:278 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
duration has_data n_samples run station survey run_hdf5_reference station_hdf5_reference fc remote stft mth5_obj dec_level dec_factor sample_rate window_duration num_samples_window num_samples num_stft_windows
0 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 0 1.0 1.000000 128.0 128 39999.0 416.0
1 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 1 4.0 0.250000 512.0 128 9999.0 103.0
2 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 2 4.0 0.062500 2048.0 128 2499.0 25.0
3 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 3 4.0 0.015625 8192.0 128 624.0 6.0
24:10:01T07:35:18 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:18 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:18 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:18 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:18 | INFO | line:654 |aurora.pipelines.transfer_function_kernel | memory_check | Total memory: 62.74 GB
24:10:01T07:35:18 | INFO | line:658 |aurora.pipelines.transfer_function_kernel | memory_check | Total Bytes of Raw Data: 0.000 GB
24:10:01T07:35:18 | INFO | line:661 |aurora.pipelines.transfer_function_kernel | memory_check | Raw Data will use: 0.000 % of memory
24:10:01T07:35:18 | INFO | line:707 |aurora.pipelines.transfer_function_kernel | mth5_has_fcs | Fourier coefficients not detected for survey: EMTF Synthetic, station: test1, run: 001-- Fourier coefficients will be computed
24:10:01T07:35:18 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:18 | INFO | line:248 |aurora.pipelines.transfer_function_kernel | check_if_fcs_already_exist | FC levels not present
24:10:01T07:35:18 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:10:01T07:35:18 | INFO | line:445 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:10:01T07:35:18 | INFO | line:899 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:10:01T07:35:18 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:10:01T07:35:18 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s (0.038867Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s (0.050177Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s (0.065945Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s (0.085135Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s (0.108745Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s (0.135823Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s (0.170762Hz)
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s (0.213562Hz)
24:10:01T07:35:19 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:10:01T07:35:19 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:10:01T07:35:19 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s (0.009717Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s (0.011678Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s (0.014518Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s (0.018452Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s (0.023254Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s (0.030020Hz)
24:10:01T07:35:20 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:10:01T07:35:20 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:10:01T07:35:20 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s (0.002429Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s (0.002919Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s (0.003629Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s (0.004613Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s (0.005813Hz)
24:10:01T07:35:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s (0.007505Hz)
24:10:01T07:35:20 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:10:01T07:35:20 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:10:01T07:35:21 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:21 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s (0.000660Hz)
24:10:01T07:35:21 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s (0.000959Hz)
24:10:01T07:35:21 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s (0.001382Hz)
24:10:01T07:35:21 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s (0.001876Hz)
24:10:01T07:35:21 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s (0.002422Hz)
24:10:01T07:35:21 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
[15]:
file_size_after_adding_fcs_station_1 = mth5_path.stat().st_size
print(f"file_size_after_adding_fcs_station_1: {file_size_after_adding_fcs_station_1}")
file_size_after_adding_fcs_station_1: 3716514
[16]:
# Now the other station
# set kernel_dataset to work on one station only
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "test2", None)
# Update the processing config
cc = ConfigCreator()
config = cc.create_from_kernel_dataset(kernel_dataset)
# tell the config to save the FCs
for dec in config.decimations:
dec.save_fcs = True
dec.save_fcs_type = "h5"
24:10:01T07:35:21 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'pandas._libs.missing.NAType'>.
24:10:01T07:35:21 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:10:01T07:35:21 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:21 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:21 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:21 | INFO | line:108 |aurora.config.config_creator | determine_band_specification_style | Bands not defined; setting to EMTF BANDS_DEFAULT_FILE
[17]:
show_plot = False
tf_cls = process_mth5(config,
kernel_dataset,
units="MT",
show_plot=show_plot,
z_file_path=None,
)
24:10:01T07:35:21 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:10:01T07:35:21 | INFO | line:278 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
duration has_data n_samples run station survey run_hdf5_reference station_hdf5_reference fc remote stft mth5_obj dec_level dec_factor sample_rate window_duration num_samples_window num_samples num_stft_windows
0 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 0 1.0 1.000000 128.0 128 39999.0 416.0
1 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 1 4.0 0.250000 512.0 128 9999.0 103.0
2 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 2 4.0 0.062500 2048.0 128 2499.0 25.0
3 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> <NA> False None None 3 4.0 0.015625 8192.0 128 624.0 6.0
24:10:01T07:35:21 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:21 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:21 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:21 | INFO | line:411 |aurora.pipelines.transfer_function_kernel | validate_processing | No RR station specified, switching RME_RR to RME
24:10:01T07:35:21 | INFO | line:654 |aurora.pipelines.transfer_function_kernel | memory_check | Total memory: 62.74 GB
24:10:01T07:35:21 | INFO | line:658 |aurora.pipelines.transfer_function_kernel | memory_check | Total Bytes of Raw Data: 0.000 GB
24:10:01T07:35:21 | INFO | line:661 |aurora.pipelines.transfer_function_kernel | memory_check | Raw Data will use: 0.000 % of memory
24:10:01T07:35:21 | INFO | line:707 |aurora.pipelines.transfer_function_kernel | mth5_has_fcs | Fourier coefficients not detected for survey: EMTF Synthetic, station: test2, run: 001-- Fourier coefficients will be computed
24:10:01T07:35:21 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:21 | INFO | line:248 |aurora.pipelines.transfer_function_kernel | check_if_fcs_already_exist | FC levels not present
24:10:01T07:35:21 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:10:01T07:35:21 | INFO | line:445 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:10:01T07:35:22 | INFO | line:899 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:10:01T07:35:22 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:10:01T07:35:22 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s (0.038867Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s (0.050177Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s (0.065945Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s (0.085135Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s (0.108745Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s (0.135823Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s (0.170762Hz)
24:10:01T07:35:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s (0.213562Hz)
24:10:01T07:35:23 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:10:01T07:35:23 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:10:01T07:35:23 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:23 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s (0.009717Hz)
24:10:01T07:35:23 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s (0.011678Hz)
24:10:01T07:35:23 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s (0.014518Hz)
24:10:01T07:35:23 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s (0.018452Hz)
24:10:01T07:35:23 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s (0.023254Hz)
24:10:01T07:35:23 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s (0.030020Hz)
24:10:01T07:35:23 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:10:01T07:35:23 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:10:01T07:35:23 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s (0.002429Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s (0.002919Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s (0.003629Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s (0.004613Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s (0.005813Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s (0.007505Hz)
24:10:01T07:35:24 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:10:01T07:35:24 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:10:01T07:35:24 | INFO | line:364 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Saving FC level
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s (0.000660Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s (0.000959Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s (0.001382Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s (0.001876Hz)
24:10:01T07:35:24 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s (0.002422Hz)
24:10:01T07:35:25 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
Now, the FCs are there (for the specific processing configuration).
If you were to re-process the data, the FCs are already there,
[18]:
file_size_after_adding_fcs_station_2 = mth5_path.stat().st_size
print(f"file_size_after_adding_fcs_station_2: {file_size_after_adding_fcs_station_2}")
file_size_after_adding_fcs_station_2: 6429530
[19]:
# set kernel_dataset to work on both stations again
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "test1", "test2")
# Update the processing config
cc = ConfigCreator()
config = cc.create_from_kernel_dataset(kernel_dataset)
24:10:01T07:35:25 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'pandas._libs.missing.NAType'>.
24:10:01T07:35:25 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:10:01T07:35:25 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:25 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:25 | INFO | line:262 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'NoneType'>.
24:10:01T07:35:25 | INFO | line:108 |aurora.config.config_creator | determine_band_specification_style | Bands not defined; setting to EMTF BANDS_DEFAULT_FILE
[20]:
from aurora.pipelines.transfer_function_kernel import TransferFunctionKernel
tfk = TransferFunctionKernel(dataset=kernel_dataset, config=config)
tfk.make_processing_summary()
tfk.check_if_fcs_already_exist()
tfk.dataset_df
24:10:01T07:35:25 | INFO | line:710 |aurora.pipelines.transfer_function_kernel | mth5_has_fcs | FCs detected -- checking against processing requirements.
24:10:01T07:35:25 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:25 | INFO | line:710 |aurora.pipelines.transfer_function_kernel | mth5_has_fcs | FCs detected -- checking against processing requirements.
24:10:01T07:35:25 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:25 | INFO | line:248 |aurora.pipelines.transfer_function_kernel | check_if_fcs_already_exist | All fc_levels already existSkip time series processing is OK
[20]:
channel_scale_factors | duration | end | has_data | input_channels | mth5_path | n_samples | output_channels | run | sample_rate | start | station | survey | run_hdf5_reference | station_hdf5_reference | fc | remote | run_dataarray | stft | mth5_obj | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '... | 39999.0 | 1980-01-01 11:06:39+00:00 | True | [hx, hy] | /home/kkappler/software/irismt/mth5/mth5/data/... | 40000 | [ex, ey, hz] | 001 | 1.0 | 1980-01-01 00:00:00+00:00 | test1 | EMTF Synthetic | <HDF5 object reference> | <HDF5 object reference> | True | False | None | None | None |
1 | {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '... | 39999.0 | 1980-01-01 11:06:39+00:00 | True | [hx, hy] | /home/kkappler/software/irismt/mth5/mth5/data/... | 40000 | [ex, ey, hz] | 001 | 1.0 | 1980-01-01 00:00:00+00:00 | test2 | EMTF Synthetic | <HDF5 object reference> | <HDF5 object reference> | True | True | None | None | None |
[21]:
show_plot = False
tf_cls = process_mth5(config,
kernel_dataset,
units="MT",
show_plot=show_plot,
z_file_path=None,
)
24:10:01T07:35:25 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:10:01T07:35:25 | INFO | line:278 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
duration has_data n_samples run station survey run_hdf5_reference station_hdf5_reference fc remote stft mth5_obj dec_level dec_factor sample_rate window_duration num_samples_window num_samples num_stft_windows
0 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None None 0 1.0 1.000000 128.0 128 39999.0 416.0
1 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None None 1 4.0 0.250000 512.0 128 9999.0 103.0
2 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None None 2 4.0 0.062500 2048.0 128 2499.0 25.0
3 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None None 3 4.0 0.015625 8192.0 128 624.0 6.0
4 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None None 0 1.0 1.000000 128.0 128 39999.0 416.0
5 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None None 1 4.0 0.250000 512.0 128 9999.0 103.0
6 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None None 2 4.0 0.062500 2048.0 128 2499.0 25.0
7 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None None 3 4.0 0.015625 8192.0 128 624.0 6.0
24:10:01T07:35:25 | INFO | line:654 |aurora.pipelines.transfer_function_kernel | memory_check | Total memory: 62.74 GB
24:10:01T07:35:25 | INFO | line:658 |aurora.pipelines.transfer_function_kernel | memory_check | Total Bytes of Raw Data: 0.001 GB
24:10:01T07:35:25 | INFO | line:661 |aurora.pipelines.transfer_function_kernel | memory_check | Raw Data will use: 0.001 % of memory
24:10:01T07:35:25 | INFO | line:456 |aurora.pipelines.transfer_function_kernel | validate_save_fc_settings | FC Layer already exists -- forcing processing config save_fcs=False
24:10:01T07:35:25 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:10:01T07:35:25 | INFO | line:445 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:10:01T07:35:25 | INFO | line:890 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | row channel_scale_factors {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...
duration 39999.0
end 1980-01-01 11:06:39+00:00
has_data True
input_channels [hx, hy]
mth5_path /home/kkappler/software/irismt/mth5/mth5/data/...
n_samples 40000
output_channels [ex, ey, hz]
run 001
sample_rate 1.0
start 1980-01-01 00:00:00+00:00
station test1
survey EMTF Synthetic
run_hdf5_reference <HDF5 object reference>
station_hdf5_reference <HDF5 object reference>
fc True
remote False
run_dataarray None
stft None
mth5_obj /:\n====================\n |- Group: Survey...
Name: 0, dtype: object already has fcs prescribed by processing config-- skipping time series initialisation
24:10:01T07:35:25 | INFO | line:890 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | row channel_scale_factors {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...
duration 39999.0
end 1980-01-01 11:06:39+00:00
has_data True
input_channels [hx, hy]
mth5_path /home/kkappler/software/irismt/mth5/mth5/data/...
n_samples 40000
output_channels [ex, ey, hz]
run 001
sample_rate 1.0
start 1980-01-01 00:00:00+00:00
station test2
survey EMTF Synthetic
run_hdf5_reference <HDF5 object reference>
station_hdf5_reference <HDF5 object reference>
fc True
remote True
run_dataarray None
stft None
mth5_obj /:\n====================\n |- Group: Survey...
Name: 1, dtype: object already has fcs prescribed by processing config-- skipping time series initialisation
24:10:01T07:35:26 | INFO | line:899 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:10:01T07:35:26 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s (0.038867Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s (0.050177Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s (0.065945Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s (0.085135Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s (0.108745Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s (0.135823Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s (0.170762Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s (0.213562Hz)
24:10:01T07:35:26 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:10:01T07:35:26 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test1, run: 001 -- skipping decimation
24:10:01T07:35:26 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test2, run: 001 -- skipping decimation
24:10:01T07:35:26 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s (0.009717Hz)
24:10:01T07:35:26 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s (0.011678Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s (0.014518Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s (0.018452Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s (0.023254Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s (0.030020Hz)
24:10:01T07:35:27 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:10:01T07:35:27 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test1, run: 001 -- skipping decimation
24:10:01T07:35:27 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test2, run: 001 -- skipping decimation
24:10:01T07:35:27 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s (0.002429Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s (0.002919Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s (0.003629Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s (0.004613Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s (0.005813Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s (0.007505Hz)
24:10:01T07:35:27 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:10:01T07:35:27 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test1, run: 001 -- skipping decimation
24:10:01T07:35:27 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test2, run: 001 -- skipping decimation
24:10:01T07:35:27 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s (0.000660Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s (0.000959Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s (0.001382Hz)
24:10:01T07:35:27 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s (0.001876Hz)
24:10:01T07:35:28 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s (0.002422Hz)
24:10:01T07:35:28 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:28 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
Using Dedicated FC Generation (add_fcs
function)¶
[22]:
from aurora.pipelines.fourier_coefficients import add_fcs_to_mth5
from aurora.pipelines.fourier_coefficients import fc_decimations_creator
from aurora.pipelines.fourier_coefficients import read_back_fcs
Build a fresh copy of the synthetic data file
[23]:
mth5_path.unlink()
create_test12rr_h5(target_folder=target_folder)
24:10:01T07:35:28 | INFO | line:679 |mth5.mth5 | _initialize_file | Initialized MTH5 0.1.0 file /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5 in mode w
24:10:01T07:35:30 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
[23]:
PosixPath('/home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5')
In this case we need to prescribe the decimation configuration.
If we dont want to decimate, we can just pass fc_decimations == "degenerate"
to add_fcs
Here is how the decimations can be created
[24]:
sample_rate = 1.0
fc_decimations = fc_decimations_creator(initial_sample_rate=sample_rate, time_period=None)
And here is what they look like, … a list, with each element specifying the needed info for decimation and FFT.
[25]:
fc_decimations
[25]:
[{
"decimation": {
"anti_alias_filter": "default",
"channels_estimated": [],
"decimation_factor": 1,
"decimation_level": 0,
"harmonic_indices": [
-1
],
"hdf5_reference": null,
"id": "0",
"method": "fft",
"min_num_stft_windows": 2,
"mth5_type": null,
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"sample_rate_decimation": 1.0,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00",
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation": {
"anti_alias_filter": "default",
"channels_estimated": [],
"decimation_factor": 4,
"decimation_level": 1,
"harmonic_indices": [
-1
],
"hdf5_reference": null,
"id": "1",
"method": "fft",
"min_num_stft_windows": 2,
"mth5_type": null,
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"sample_rate_decimation": 0.25,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00",
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation": {
"anti_alias_filter": "default",
"channels_estimated": [],
"decimation_factor": 4,
"decimation_level": 2,
"harmonic_indices": [
-1
],
"hdf5_reference": null,
"id": "2",
"method": "fft",
"min_num_stft_windows": 2,
"mth5_type": null,
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"sample_rate_decimation": 0.0625,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00",
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation": {
"anti_alias_filter": "default",
"channels_estimated": [],
"decimation_factor": 4,
"decimation_level": 3,
"harmonic_indices": [
-1
],
"hdf5_reference": null,
"id": "3",
"method": "fft",
"min_num_stft_windows": 2,
"mth5_type": null,
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"sample_rate_decimation": 0.015625,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00",
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation": {
"anti_alias_filter": "default",
"channels_estimated": [],
"decimation_factor": 4,
"decimation_level": 4,
"harmonic_indices": [
-1
],
"hdf5_reference": null,
"id": "4",
"method": "fft",
"min_num_stft_windows": 2,
"mth5_type": null,
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"sample_rate_decimation": 0.00390625,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00",
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation": {
"anti_alias_filter": "default",
"channels_estimated": [],
"decimation_factor": 4,
"decimation_level": 5,
"harmonic_indices": [
-1
],
"hdf5_reference": null,
"id": "5",
"method": "fft",
"min_num_stft_windows": 2,
"mth5_type": null,
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"sample_rate_decimation": 0.0009765625,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00",
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
}]
Now add the decimation levels:
[26]:
add_fcs_to_mth5(mth5_path, fc_decimations=fc_decimations)
24:10:01T07:35:30 | INFO | line:190 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 |
survey: EMTF Synthetic, station: test1, sample_rate 1.0
24:10:01T07:35:30 | INFO | line:207 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | survey: EMTF Synthetic, station: test1, sample_rate 1.0, i_run_row 0
24:10:01T07:35:31 | WARNING | line:292 |mt_metadata.transfer_functions.processing.fourier_coefficients.decimation | is_valid_for_time_series_length | 157 not enough samples for minimum of 2 stft windows of length 128 and overlap 32
24:10:01T07:35:31 | INFO | line:241 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | Decimation Level 4 invalid, TS of 157 samples too short
24:10:01T07:35:31 | WARNING | line:292 |mt_metadata.transfer_functions.processing.fourier_coefficients.decimation | is_valid_for_time_series_length | 40 not enough samples for minimum of 2 stft windows of length 128 and overlap 32
24:10:01T07:35:31 | INFO | line:241 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | Decimation Level 5 invalid, TS of 40 samples too short
24:10:01T07:35:31 | INFO | line:190 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 |
survey: EMTF Synthetic, station: test2, sample_rate 1.0
24:10:01T07:35:31 | INFO | line:207 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | survey: EMTF Synthetic, station: test2, sample_rate 1.0, i_run_row 0
24:10:01T07:35:32 | WARNING | line:292 |mt_metadata.transfer_functions.processing.fourier_coefficients.decimation | is_valid_for_time_series_length | 157 not enough samples for minimum of 2 stft windows of length 128 and overlap 32
24:10:01T07:35:32 | INFO | line:241 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | Decimation Level 4 invalid, TS of 157 samples too short
24:10:01T07:35:32 | WARNING | line:292 |mt_metadata.transfer_functions.processing.fourier_coefficients.decimation | is_valid_for_time_series_length | 40 not enough samples for minimum of 2 stft windows of length 128 and overlap 32
24:10:01T07:35:32 | INFO | line:241 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | Decimation Level 5 invalid, TS of 40 samples too short
24:10:01T07:35:32 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
Now take a look at the mth5
[27]:
import mth5
m = mth5.mth5.MTH5(mth5_path)
m.open_mth5()
[27]:
/:
====================
|- Group: Survey
----------------
|- Group: Filters
-----------------
|- Group: coefficient
---------------------
|- Group: 0.1
-------------
|- Group: 1
-----------
|- Group: 10
------------
|- Group: fap
-------------
|- Group: fir
-------------
|- Group: time_delay
--------------------
|- Group: zpk
-------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Stations
------------------
|- Group: test1
---------------
|- Group: 001
-------------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: Fourier_Coefficients
------------------------------
|- Group: 001
-------------
|- Group: 0
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: 1
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: 2
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: 3
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: Transfer_Functions
----------------------------
|- Group: test2
---------------
|- Group: 001
-------------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: Fourier_Coefficients
------------------------------
|- Group: 001
-------------
|- Group: 0
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: 1
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: 2
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: 3
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: Transfer_Functions
----------------------------
--> Dataset: channel_summary
..............................
--> Dataset: fc_summary
.........................
--> Dataset: tf_summary
.........................
[28]:
m.close_mth5()
24:10:01T07:35:32 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
Alternatively, we could have translated the Aurora processing config to an mt_metadata FC decimation and gnerated FCs with the same function.
[29]:
mth5_path.unlink()
create_test12rr_h5(target_folder=target_folder)
24:10:01T07:35:33 | INFO | line:679 |mth5.mth5 | _initialize_file | Initialized MTH5 0.1.0 file /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5 in mode w
24:10:01T07:35:34 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
[29]:
PosixPath('/home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5')
[30]:
# Extract FC decimations from processing config and build the layer
fc_decimations = [
x.to_fc_decimation() for x in config.decimations
]
add_fcs_to_mth5(mth5_path, fc_decimations=fc_decimations)
24:10:01T07:35:34 | INFO | line:190 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 |
survey: EMTF Synthetic, station: test1, sample_rate 1.0
24:10:01T07:35:34 | INFO | line:207 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | survey: EMTF Synthetic, station: test1, sample_rate 1.0, i_run_row 0
24:10:01T07:35:36 | INFO | line:190 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 |
survey: EMTF Synthetic, station: test2, sample_rate 1.0
24:10:01T07:35:36 | INFO | line:207 |aurora.pipelines.fourier_coefficients | add_fcs_to_mth5 | survey: EMTF Synthetic, station: test2, sample_rate 1.0, i_run_row 0
24:10:01T07:35:37 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
[31]:
read_back_fcs(mth5_path)
24:10:01T07:35:37 | INFO | line:308 |aurora.pipelines.fourier_coefficients | read_back_fcs | survey: EMTF Synthetic, station: test1, sample_rate 1.0
24:10:01T07:35:37 | INFO | line:311 |aurora.pipelines.fourier_coefficients | read_back_fcs | FC Groups: ['001']
24:10:01T07:35:37 | INFO | line:308 |aurora.pipelines.fourier_coefficients | read_back_fcs | survey: EMTF Synthetic, station: test2, sample_rate 1.0
24:10:01T07:35:37 | INFO | line:311 |aurora.pipelines.fourier_coefficients | read_back_fcs | FC Groups: ['001']
24:10:01T07:35:37 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
And we can see that when processing, aurora detects that the FC layer is already there, and skips creating it.
[32]:
show_plot = False
tf_cls = process_mth5(config,
kernel_dataset,
units="MT",
show_plot=show_plot,
z_file_path=None,
)
24:10:01T07:35:37 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:10:01T07:35:37 | INFO | line:278 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
duration has_data n_samples run station survey run_hdf5_reference station_hdf5_reference fc remote stft mth5_obj dec_level dec_factor sample_rate window_duration num_samples_window num_samples num_stft_windows
0 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None HDF5 file is closed and cannot be accessed. 0 1.0 1.000000 128.0 128 39999.0 416.0
1 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None HDF5 file is closed and cannot be accessed. 1 4.0 0.250000 512.0 128 9999.0 103.0
2 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None HDF5 file is closed and cannot be accessed. 2 4.0 0.062500 2048.0 128 2499.0 25.0
3 39999.0 True 40000 001 test1 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True False None HDF5 file is closed and cannot be accessed. 3 4.0 0.015625 8192.0 128 624.0 6.0
4 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None HDF5 file is closed and cannot be accessed. 0 1.0 1.000000 128.0 128 39999.0 416.0
5 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None HDF5 file is closed and cannot be accessed. 1 4.0 0.250000 512.0 128 9999.0 103.0
6 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None HDF5 file is closed and cannot be accessed. 2 4.0 0.062500 2048.0 128 2499.0 25.0
7 39999.0 True 40000 001 test2 EMTF Synthetic <HDF5 object reference> <HDF5 object reference> True True None HDF5 file is closed and cannot be accessed. 3 4.0 0.015625 8192.0 128 624.0 6.0
24:10:01T07:35:37 | INFO | line:654 |aurora.pipelines.transfer_function_kernel | memory_check | Total memory: 62.74 GB
24:10:01T07:35:37 | INFO | line:658 |aurora.pipelines.transfer_function_kernel | memory_check | Total Bytes of Raw Data: 0.001 GB
24:10:01T07:35:37 | INFO | line:661 |aurora.pipelines.transfer_function_kernel | memory_check | Raw Data will use: 0.001 % of memory
24:10:01T07:35:37 | INFO | line:456 |aurora.pipelines.transfer_function_kernel | validate_save_fc_settings | FC Layer already exists -- forcing processing config save_fcs=False
24:10:01T07:35:37 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:10:01T07:35:37 | INFO | line:445 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:10:01T07:35:37 | INFO | line:890 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | row channel_scale_factors {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...
duration 39999.0
end 1980-01-01 11:06:39+00:00
has_data True
input_channels [hx, hy]
mth5_path /home/kkappler/software/irismt/mth5/mth5/data/...
n_samples 40000
output_channels [ex, ey, hz]
run 001
sample_rate 1.0
start 1980-01-01 00:00:00+00:00
station test1
survey EMTF Synthetic
run_hdf5_reference <HDF5 object reference>
station_hdf5_reference <HDF5 object reference>
fc True
remote False
run_dataarray [[<xarray.DataArray ()> Size: 8B\narray(345)\n...
stft None
mth5_obj /:\n====================\n |- Group: Survey...
Name: 0, dtype: object already has fcs prescribed by processing config-- skipping time series initialisation
24:10:01T07:35:38 | INFO | line:890 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | row channel_scale_factors {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...
duration 39999.0
end 1980-01-01 11:06:39+00:00
has_data True
input_channels [hx, hy]
mth5_path /home/kkappler/software/irismt/mth5/mth5/data/...
n_samples 40000
output_channels [ex, ey, hz]
run 001
sample_rate 1.0
start 1980-01-01 00:00:00+00:00
station test2
survey EMTF Synthetic
run_hdf5_reference <HDF5 object reference>
station_hdf5_reference <HDF5 object reference>
fc True
remote True
run_dataarray [[<xarray.DataArray ()> Size: 8B\narray(520)\n...
stft None
mth5_obj /:\n====================\n |- Group: Survey...
Name: 1, dtype: object already has fcs prescribed by processing config-- skipping time series initialisation
24:10:01T07:35:38 | INFO | line:899 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:10:01T07:35:38 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s (0.038867Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s (0.050177Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s (0.065945Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s (0.085135Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s (0.108745Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s (0.135823Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s (0.170762Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s (0.213562Hz)
24:10:01T07:35:39 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:10:01T07:35:39 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test1, run: 001 -- skipping decimation
24:10:01T07:35:39 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test2, run: 001 -- skipping decimation
24:10:01T07:35:39 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s (0.009717Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s (0.011678Hz)
24:10:01T07:35:39 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s (0.014518Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s (0.018452Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s (0.023254Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s (0.030020Hz)
24:10:01T07:35:40 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:10:01T07:35:40 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test1, run: 001 -- skipping decimation
24:10:01T07:35:40 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test2, run: 001 -- skipping decimation
24:10:01T07:35:40 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s (0.002429Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s (0.002919Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s (0.003629Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s (0.004613Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s (0.005813Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s (0.007505Hz)
24:10:01T07:35:40 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:10:01T07:35:40 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test1, run: 001 -- skipping decimation
24:10:01T07:35:40 | INFO | line:134 |aurora.pipelines.transfer_function_kernel | update_dataset_df | FC already exists for survey: EMTF Synthetic, station: test2, run: 001 -- skipping decimation
24:10:01T07:35:40 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s (0.000660Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s (0.000959Hz)
24:10:01T07:35:40 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s (0.001382Hz)
24:10:01T07:35:41 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s (0.001876Hz)
24:10:01T07:35:41 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s (0.002422Hz)
24:10:01T07:35:41 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
24:10:01T07:35:41 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/mth5/mth5/data/mth5/test12rr.h5
NOTE Feature storage section below is a work in progress.
Feature Storage Experimental work in progress¶
Any spectrogram (FC array at some decimation level) can be passed to a feature extraction method. These features can, in turn be stored in the mth5.
This will have the following steps:
select the source FC decimation level (Will use “0” to start, but also test with “1”,)
Select the frequency bands (probably make these wider than normal so we have plenty of FCs per time window)
Note these are
feature_extraction_bands
, which are possibly different from TF estimation bands.
To do this we will want to look at the frequencies associated with each FCGroup, and each decimation level. - Group Runs on Sample Rate - tabulate decimation levels - group decimation levels by id (and validate that the decimation_level.metadata.sample_rate_decimations agree) - for each decimation level, we should get the frequencies associated – this can be deduced from the shape of the dataset, and the min/max frequecies of the metadata, but for now we can just unpack the xarray and take the frequncey axis)
loop over the bands (for each band:)
2a. extract the band from the FClayer in the mth5 (shape will be nch x ntime x nharmonics)
loop over time window (for each time):
3a) extract feature
As a placeholder, we will compute the channel crosspowers, and or channel energy.
3b) place the feature into an xarray
save teh xarray as “feature_cross_power”
[33]:
# Imports
from mth5.mth5 import MTH5
import numpy as np
import pandas as pd
[34]:
# # Selection of frequency bands
# active_decimation_level = "0"
# m = MTH5(mth5_path)
# m.open_mth5()
# run_summary_df = m.run_summary # This fails on current main but works on add_aurora_tools branch.
# run_summary_df
It would be nice if we could get a summary of the fcs available in the mth5. here is a prototype for how that could be done.
[35]:
# # Prototype for fc_summary
# fc_summary = None
# for i, row in run_summary_df.iterrows():
# print(row.station, row.run)
# station_obj = m.get_station(row.station)
# run_fc_group = station_obj.fourier_coefficients_group.get_fc_group(row.run)
# if fc_summary is None:
# fc_summary = run_fc_group.decimation_level_summary
# fc_summary["station"] = row.station
# fc_summary["run"] = row.run
# fc_summary["sample_rate_time_series"] = 0.0
# fc_summary["window_num_samples"] = 0
# fc_summary["harmonic_indices"] = None
# else:
# tmp = run_fc_group.decimation_level_summary
# tmp["station"] = row.station
# tmp["run"] = row.run
# tmp["sample_rate_time_series"] = 0.0
# tmp["window_num_samples"] = 0
# tmp["harmonic_indices"] = None
# fc_summary = pd.concat([fc_summary, tmp], ignore_index=True)
# #print(run_fc_group)
# for dec_level_id in run_fc_group.groups_list:
# dec_level = run_fc_group.get_decimation_level(dec_level_id)
# cond1 = fc_summary.station==row.station
# cond2 = fc_summary.run==row.run
# cond3 = fc_summary.component==dec_level_id
# fc_summary.loc[cond1 & cond2 & cond3] # this should be a unique row
# ndx = fc_summary.loc[cond1 & cond2 & cond3].index
# assert(len(ndx))==1
# #fc_summary["sample_rate_time_series"].at[ndx] = dec_level.metadata.i
# fc_summary["window_num_samples"].at[ndx[0]] = dec_level.metadata.window.num_samples
# fc_summary["sample_rate_time_series"].at[ndx[0]] = dec_level.metadata.sample_rate
# fc_summary["harmonic_indices"].at[ndx[0]] = dec_level.metadata.harmonic_indices
# #print(dec_level_id, dec_level)
# xr_dec_level = dec_level.to_xarray()
# print("station", row.station, "run", row.run, "decimation level", dec_level_id, "SAMPLE rate", dec_level.metadata.sample_rate) # xr_dec_level.frequency)
[36]:
# fc_summary["delta_f"] = fc_summary["sample_rate_time_series"] / fc_summary["window_num_samples"]
# fc_summary
The above table tells us what we need to know to get frequency axes .. and actually, we can make _frequencies()
a method of FC_dec_level
.
Note that when harmonic_indices is [-1,], that means all the (1-sided) FCs are kept. - this can be sanity checked by asserting that the dataset size is window_num_samples
//2
So now pick a decimation level, … In this case, the component
column is telling us the decimation level. - It would seem sensible to make sure that the sample rates, (and delta_f) values are same as well.
[37]:
# cond0 = fc_summary["component"] == active_decimation_level
# fc_runs_to_featureize_df = fc_summary[cond0]
# fc_runs_to_featureize_df
[38]:
# frqs = np.fft.fftfreq?
Now pick a scheme for defining the feature frequency bands - we prefer not to use the lowest few harmonics … this is like setting the minimum number of cycles per window, say 10 - we also prefer not to take harmonics very close to the Nyquist frequency as these maybe attenuated by AAF, say we go up to 80% Nyq - For TF estimation, bands may be only a few FCs wide,but for feature extraction we make them much wider. - lets try with around half an octave - how about 2 bands per octave, mean
[39]:
# min_cycles = 10
# nyquist_fraction = 0.8
# n_feature_bands_per_octave = 2
[40]:
# for i, row in fc_runs_to_featureize_df.iterrows():
# frequencies = np.fft.rfftfreq(row.window_num_samples, row.sample_rate_time_series)
# lower_index = int(min_cycles)
# upper_index = int(nyquist_fraction*(row.window_num_samples//2))
# print(f" lower_index={lower_index}, upper_index={upper_index}")
# freq_min = frequencies[lower_index]
# freq_max = frequencies[upper_index]
# print(f"Available band: {freq_min:3f} to {freq_max:3f}")
# frequency_ratio = freq_max/freq_min
# available_decades = np.log10(frequency_ratio)
# available_ocataves = np.log2(frequency_ratio)
# print(f"Available decades: {available_decades:3f}")
# print(f"Available octaves: {available_ocataves:3f}")
# num_feature_bands = available_ocataves * n_feature_bands_per_octave
[41]:
# dec_level.metadata
# dec_level.to_xarray()
[42]:
# cond1 = fc_summary.station=="test1"
# cond2 = fc_summary.run=="001"
# cond3 = fc_summary.component=="0"
# fc_summary.loc[cond1 & cond2 & cond3].index
[43]:
# fc_summary["sample_rate_time_series"].at[0]
[44]:
# m.close_mth5()
[ ]: