Build an MTH5 and Operate the Aurora Pipeline¶

This notebook pulls MT miniSEED data from the IRIS Dataselect web service to produce an MTH5 file, and then process the time series to create transfer function outputs.

It outlines the process of making an MTH5 file, generating a processing configuration object, and running the Aurora processor.

Aurora must be installed to run the notebook. Installation instructions are here. This will also install mth5, mt_metadata and mtpy-v2.

Flow of this notebook¶

Section 1: Construct a table of the data that will be accessed(in future this process may be automated via Earthscope’s data availability tools). The table is then used to build an MTH5 archive.

Section 2: The metadata and the data are accessed and used to process the data and generate transfer functions.

[1]:

# # Uncomment while developing
# %load_ext autoreload
# %autoreload 2

[2]:

# Required imports for the program.
from pathlib import Path
import pandas as pd
import warnings

from mth5 import mth5, timeseries
from mth5.clients.fdsn import FDSN
from mth5.clients.make_mth5 import MakeMTH5
from mth5.utils.helpers import initialize_mth5
from mt_metadata.utils.mttime import get_now_utc, MTime
from aurora.config import BANDS_DEFAULT_FILE
from aurora.config.config_creator import ConfigCreator
from aurora.pipelines.process_mth5 import process_mth5
from mtpy.processing import RunSummary, KernelDataset

warnings.filterwarnings('ignore')

/home/kkappler/anaconda3/envs/aurora-test/lib/python3.9/site-packages/mtpy/modeling/simpeg/recipes/inversion_2d.py:39: UserWarning: Pardiso not installed see https://github.com/simpeg/pydiso/blob/main/README.md.
  warnings.warn(

1. Build an MTH5 file from Earthscope archives¶

If you have already built an MTH5 you can skip this section

Set path so MTH5 file builds to current working directory.

[3]:

default_path = Path().cwd()
default_path

[3]:

PosixPath('/home/kkappler/software/irismt/aurora/docs/examples')

Select mth5 file version

[4]:

# mth5_version = '0.1.0'
mth5_version = '0.2.0'

[5]:

# Initalize an FDSN object to access column names for request df
fdsn_obj = FDSN()

1A: Specify the data to access from IRIS¶

Note that here we explicitly prescribe the data, but this dataframe could be built from IRIS data availability tools in a programatic way

[6]:

# Generate data frame of FDSN Network, Station, Location, Channel, Startime, Endtime codes of interest
station_id = "CAS04"

CAS04LQE = ['8P', station_id, '', 'LQE', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04LQN = ['8P', station_id, '', 'LQN', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04BFE = ['8P', station_id, '', 'LFE', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04BFN = ['8P', station_id, '', 'LFN', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04BFZ = ['8P', station_id, '', 'LFZ', '2020-06-02T19:00:00', '2020-07-13T19:00:00']

request_list = [CAS04LQE, CAS04LQN, CAS04BFE, CAS04BFN, CAS04BFZ]

# Turn list into dataframe
request_df =  pd.DataFrame(request_list, columns=fdsn_obj.request_columns)

# Note that the file that will be build
h5_basename = f"8P_{station_id}.h5"
print(f"The MTH5 file will be named (automatically) based on the network and station: {h5_basename}")

The MTH5 file will be named (automatically) based on the network and station: 8P_CAS04.h5

[7]:

# Inspect the dataframe
request_df

[7]:

	network	station	channel	start	end
0	8P	CAS04	LQE	2020-06-02T19:00:00	2020-07-13T19:00:00
1	8P	CAS04	LQN	2020-06-02T19:00:00	2020-07-13T19:00:00
2	8P	CAS04	LFE	2020-06-02T19:00:00	2020-07-13T19:00:00
3	8P	CAS04	LFN	2020-06-02T19:00:00	2020-07-13T19:00:00
4	8P	CAS04	LFZ	2020-06-02T19:00:00	2020-07-13T19:00:00

[8]:

# Request the inventory information from IRIS
inventory = fdsn_obj.get_inventory_from_df(request_df, data=False)

[9]:

# Inspect the inventory
inventory

[9]:

(Inventory created at 2024-09-06T08:31:28.958121Z
        Created by: ObsPy 1.4.1
                    https://www.obspy.org
        Sending institution: MTH5
        Contains:
                Networks (1):
                        8P
                Stations (1):
                        8P.CAS04 (Corral Hollow, CA, USA)
                Channels (8):
                        8P.CAS04..LFZ, 8P.CAS04..LFN, 8P.CAS04..LFE, 8P.CAS04..LQN (2x),
                        8P.CAS04..LQE (3x),
 0 Trace(s) in Stream:
)

Builds an MTH5 file from the user defined database.

With the mth5 object set, we are ready to actually request the data from the fdsn client (IRIS) and save it to an MTH5 file. This process builds an MTH5 file and can take some time depending on how much data is requested.

Note: interact keeps the MTH5 open after it is done building

[10]:

mth5_object = MakeMTH5.from_fdsn_client(request_df, interact=True)

24:09:06T01:31:32 | WARNING | line:611 |mth5.mth5 | open_mth5 | 8P_CAS04.h5 will be overwritten in 'w' mode
24:09:06T01:31:32 | INFO | line:679 |mth5.mth5 | _initialize_file | Initialized MTH5 0.2.0 file /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5 in mode w
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:09:06T01:36:37 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:09:06T01:36:38 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup a already exists, returning existing group.
24:09:06T01:36:38 | WARNING | line:645 |mth5.timeseries.run_ts | validate_metadata | start time of dataset 2020-06-02T19:00:00+00:00 does not match metadata start 2020-06-02T18:41:43+00:00 updating metatdata value to 2020-06-02T19:00:00+00:00
24:09:06T01:36:39 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id a. Setting to ch.run_metadata.id to a
24:09:06T01:36:39 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id a. Setting to ch.run_metadata.id to a
24:09:06T01:36:39 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id a. Setting to ch.run_metadata.id to a
24:09:06T01:36:39 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id a. Setting to ch.run_metadata.id to a
24:09:06T01:36:39 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id a. Setting to ch.run_metadata.id to a
24:09:06T01:36:40 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup b already exists, returning existing group.
24:09:06T01:36:40 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id b. Setting to ch.run_metadata.id to b
24:09:06T01:36:40 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id b. Setting to ch.run_metadata.id to b
24:09:06T01:36:41 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id b. Setting to ch.run_metadata.id to b
24:09:06T01:36:41 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id b. Setting to ch.run_metadata.id to b
24:09:06T01:36:41 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id b. Setting to ch.run_metadata.id to b
24:09:06T01:36:41 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup c already exists, returning existing group.
24:09:06T01:36:42 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id c. Setting to ch.run_metadata.id to c
24:09:06T01:36:43 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id c. Setting to ch.run_metadata.id to c
24:09:06T01:36:43 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id c. Setting to ch.run_metadata.id to c
24:09:06T01:36:43 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id c. Setting to ch.run_metadata.id to c
24:09:06T01:36:43 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id c. Setting to ch.run_metadata.id to c
24:09:06T01:36:43 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup d already exists, returning existing group.
24:09:06T01:36:44 | WARNING | line:658 |mth5.timeseries.run_ts | validate_metadata | end time of dataset 2020-07-13T19:00:00+00:00 does not match metadata end 2020-07-13T21:46:12+00:00 updating metatdata value to 2020-07-13T19:00:00+00:00
24:09:06T01:36:44 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id d. Setting to ch.run_metadata.id to d
24:09:06T01:36:44 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id d. Setting to ch.run_metadata.id to d
24:09:06T01:36:45 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id d. Setting to ch.run_metadata.id to d
24:09:06T01:36:45 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id d. Setting to ch.run_metadata.id to d
24:09:06T01:36:45 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 !=  group run.id d. Setting to ch.run_metadata.id to d
24:09:06T01:36:45 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
24:09:06T01:36:45 | WARNING | line:330 |mth5.mth5 | filename | MTH5 file is not open or has not been created yet. Returning default name

1B: Examine and Update the MTH5 object¶

With the open MTH5 Object, we can start to examine what is in it. For example, retrieve the filename and file_version. You can additionally do things such as getting the station information and edit it by setting a new value, in this case the declination model.

[11]:

mth5_object

[11]:

/:
====================
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
            |- Group: CONUS_South
            ---------------------
                |- Group: Filters
                -----------------
                    |- Group: coefficient
                    ---------------------
                        |- Group: electric_analog_to_digital
                        ------------------------------------
                        |- Group: electric_dipole_92.000
                        --------------------------------
                        |- Group: electric_si_units
                        ---------------------------
                        |- Group: magnetic_analog_to_digital
                        ------------------------------------
                    |- Group: fap
                    -------------
                    |- Group: fir
                    -------------
                    |- Group: time_delay
                    --------------------
                        |- Group: electric_time_offset
                        ------------------------------
                        |- Group: hx_time_offset
                        ------------------------
                        |- Group: hy_time_offset
                        ------------------------
                        |- Group: hz_time_offset
                        ------------------------
                    |- Group: zpk
                    -------------
                        |- Group: electric_butterworth_high_pass_30000
                        ----------------------------------------------
                            --> Dataset: poles
                            ....................
                            --> Dataset: zeros
                            ....................
                        |- Group: electric_butterworth_low_pass
                        ---------------------------------------
                            --> Dataset: poles
                            ....................
                            --> Dataset: zeros
                            ....................
                        |- Group: magnetic_butterworth_low_pass
                        ---------------------------------------
                            --> Dataset: poles
                            ....................
                            --> Dataset: zeros
                            ....................
                |- Group: Reports
                -----------------
                |- Group: Standards
                -------------------
                    --> Dataset: summary
                    ......................
                |- Group: Stations
                ------------------
                    |- Group: CAS04
                    ---------------
                        |- Group: Fourier_Coefficients
                        ------------------------------
                        |- Group: Transfer_Functions
                        ----------------------------
                        |- Group: a
                        -----------
                            --> Dataset: ex
                            .................
                            --> Dataset: ey
                            .................
                            --> Dataset: hx
                            .................
                            --> Dataset: hy
                            .................
                            --> Dataset: hz
                            .................
                        |- Group: b
                        -----------
                            --> Dataset: ex
                            .................
                            --> Dataset: ey
                            .................
                            --> Dataset: hx
                            .................
                            --> Dataset: hy
                            .................
                            --> Dataset: hz
                            .................
                        |- Group: c
                        -----------
                            --> Dataset: ex
                            .................
                            --> Dataset: ey
                            .................
                            --> Dataset: hx
                            .................
                            --> Dataset: hy
                            .................
                            --> Dataset: hz
                            .................
                        |- Group: d
                        -----------
                            --> Dataset: ex
                            .................
                            --> Dataset: ey
                            .................
                            --> Dataset: hx
                            .................
                            --> Dataset: hy
                            .................
                            --> Dataset: hz
                            .................
        --> Dataset: channel_summary
        ..............................
        --> Dataset: fc_summary
        .........................
        --> Dataset: tf_summary
        .........................

[12]:

mth5_path = mth5_object.filename

[13]:

mth5_object.file_version

[13]:

'0.2.0'

[14]:

mth5_object.close_mth5()

24:09:06T01:36:45 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5

[15]:

mth5_object = initialize_mth5(mth5_path)

1C: Optionally Update Metdata:¶

[16]:

# Edit and update the MTH5 metadata
s = mth5_object.get_station(station_id, survey="CONUS_South")
print(s.metadata.location.declination.model)
s.metadata.location.declination.model = 'IGRF'
print(s.metadata.location.declination.model)
s.write_metadata()    # writes to file mth5_filename

IGRF-13
IGRF

[17]:

# Print some info about the mth5
mth5_filename = mth5_object.filename
version = mth5_object.file_version
print(f" Filename: {mth5_filename} \n Version: {version}")

 Filename: /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
 Version: 0.2.0

[18]:

# Get the available stations and runs from the MTH5 object
mth5_object.channel_summary.summarize()
ch_summary = mth5_object.channel_summary.to_dataframe()

2: Process Data¶

If MTH5 file already exists you can start here if you dont want to execute the previous code to get data again.

[19]:

interact = False
if interact:
    pass
else:
    h5_basename = f"8P_{station_id}.h5"
    h5_path = default_path.joinpath(h5_basename)
    mth5_object = initialize_mth5(h5_path, mode="a", file_version=mth5_version)
    ch_summary = mth5_object.channel_summary.to_dataframe()

Generate an Aurora Configuration file using MTH5 as an input¶

Up to this point, we have used mth5 and mt_metadata, but haven’t yet used aurora. So we will use the MTH5 that we just created (and examined and updated) as input into Aurora.

Channel Summary¶

This is a very useful datastructure inside the mth5. It acts basically like an index of available data at the channel-run level, i.e. there is one row for every contiguous chunk of time-series recorded by an electric dipole or magnetometer

[20]:

ch_summary

[20]:

	survey	station	run	latitude	longitude	elevation	component	start	end	n_samples	sample_rate	measurement_type	azimuth	tilt	units	has_data	hdf5_reference	run_hdf5_reference	station_hdf5_reference
0	CONUS South	CAS04	a	37.633351	-121.468382	335.261765	ex	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11267	1.0	electric	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
1	CONUS South	CAS04	a	37.633351	-121.468382	335.261765	ey	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11267	1.0	electric	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
2	CONUS South	CAS04	a	37.633351	-121.468382	335.261765	hx	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11267	1.0	magnetic	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
3	CONUS South	CAS04	a	37.633351	-121.468382	335.261765	hy	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11267	1.0	magnetic	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
4	CONUS South	CAS04	a	37.633351	-121.468382	335.261765	hz	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11267	1.0	magnetic	0.0	90.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
5	CONUS South	CAS04	b	37.633351	-121.468382	335.261765	ex	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847649	1.0	electric	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
6	CONUS South	CAS04	b	37.633351	-121.468382	335.261765	ey	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847649	1.0	electric	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
7	CONUS South	CAS04	b	37.633351	-121.468382	335.261765	hx	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847649	1.0	magnetic	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
8	CONUS South	CAS04	b	37.633351	-121.468382	335.261765	hy	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847649	1.0	magnetic	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
9	CONUS South	CAS04	b	37.633351	-121.468382	335.261765	hz	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847649	1.0	magnetic	0.0	90.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
10	CONUS South	CAS04	c	37.633351	-121.468382	335.261765	ex	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638043	1.0	electric	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
11	CONUS South	CAS04	c	37.633351	-121.468382	335.261765	ey	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638043	1.0	electric	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
12	CONUS South	CAS04	c	37.633351	-121.468382	335.261765	hx	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638043	1.0	magnetic	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
13	CONUS South	CAS04	c	37.633351	-121.468382	335.261765	hy	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638043	1.0	magnetic	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
14	CONUS South	CAS04	c	37.633351	-121.468382	335.261765	hz	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638043	1.0	magnetic	0.0	90.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
15	CONUS South	CAS04	d	37.633351	-121.468382	335.261765	ex	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034586	1.0	electric	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
16	CONUS South	CAS04	d	37.633351	-121.468382	335.261765	ey	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034586	1.0	electric	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
17	CONUS South	CAS04	d	37.633351	-121.468382	335.261765	hx	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034586	1.0	magnetic	13.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
18	CONUS South	CAS04	d	37.633351	-121.468382	335.261765	hy	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034586	1.0	magnetic	103.2	0.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
19	CONUS South	CAS04	d	37.633351	-121.468382	335.261765	hz	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034586	1.0	magnetic	0.0	90.0	digital counts	True	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>

The Channel summary has a lot of uses, below we use it to check if the data have mixed sample rates, and to get a list of available stations

[21]:

available_runs = ch_summary.run.unique()
sr = ch_summary.sample_rate.unique()
if len(sr) != 1:
    print('Only one sample rate per run is available')

available_stations = ch_summary.station.unique()
print(f"Available stations: {available_stations}")

Available stations: ['CAS04']

Run Summary¶

A cousin of the channel summary is the Run Summary. This is a condensed version of the channel summary, with one row per continuous acquistion run at a station.

The run summary can be accessed from an open mth5 object, or from an iterable of h5 paths as in the example below

[22]:

mth5_run_summary = RunSummary()
h5_path = default_path.joinpath(h5_basename)
mth5_run_summary.from_mth5s([h5_path,])
run_summary = mth5_run_summary.clone()
run_summary.df

24:09:06T01:36:46 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5

[22]:

	channel_scale_factors	duration	end	has_data	input_channels	mth5_path	n_samples	output_channels	run	sample_rate	start	station	survey	run_hdf5_reference	station_hdf5_reference
0	{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...	11266.0	2020-06-02 22:07:46+00:00	True	[hx, hy]	/home/kkappler/software/irismt/aurora/docs/exa...	11267	[ex, ey, hz]	a	1.0	2020-06-02 19:00:00+00:00	CAS04	CONUS South	<HDF5 object reference>	<HDF5 object reference>
1	{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...	847648.0	2020-06-12 17:52:23+00:00	True	[hx, hy]	/home/kkappler/software/irismt/aurora/docs/exa...	847649	[ex, ey, hz]	b	1.0	2020-06-02 22:24:55+00:00	CAS04	CONUS South	<HDF5 object reference>	<HDF5 object reference>
2	{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...	1638042.0	2020-07-01 17:32:59+00:00	True	[hx, hy]	/home/kkappler/software/irismt/aurora/docs/exa...	1638043	[ex, ey, hz]	c	1.0	2020-06-12 18:32:17+00:00	CAS04	CONUS South	<HDF5 object reference>	<HDF5 object reference>
3	{'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '...	1034585.0	2020-07-13 19:00:00+00:00	True	[hx, hy]	/home/kkappler/software/irismt/aurora/docs/exa...	1034586	[ex, ey, hz]	d	1.0	2020-07-01 19:36:55+00:00	CAS04	CONUS South	<HDF5 object reference>	<HDF5 object reference>

Now we have a dataframe of the available runs to process from the MTH5

Sometimes we just want to look at the survey, station, run, and time intervals we can for that we can call mini_summary

[23]:

run_summary.mini_summary

[23]:

	survey	station	run	start	end	duration
0	CONUS South	CAS04	a	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11266.0
1	CONUS South	CAS04	b	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847648.0
2	CONUS South	CAS04	c	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638042.0
3	CONUS South	CAS04	d	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034585.0

But here are the columns in the run summary

[24]:

run_summary.df.columns

[24]:

Index(['channel_scale_factors', 'duration', 'end', 'has_data',
       'input_channels', 'mth5_path', 'n_samples', 'output_channels', 'run',
       'sample_rate', 'start', 'station', 'survey', 'run_hdf5_reference',
       'station_hdf5_reference'],
      dtype='object')

Make your own mini summary by choosing columns

[25]:

coverage_short_list_columns = ["survey", 'station', 'run', 'start', 'end', ]
run_summary.df[coverage_short_list_columns]

[25]:

	survey	station	run	start	end
0	CONUS South	CAS04	a	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00
1	CONUS South	CAS04	b	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00
2	CONUS South	CAS04	c	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00
3	CONUS South	CAS04	d	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00

Kernel Dataset¶

This is like a run summary, but for a single station or a pair of stations. It is used to specify the inputs to aurora processing.

It takes a run_summary and a station name, and optionally, a remote reference station name

It is made based on the available data in the MTH5 archive.

Syntax: kernel_dataset.from_run_summary(run_summary, local_station_id, reference_station_id)

By Default, all runs will be processed

To restrict to processing a single run, or a list of runs, we can either tell KernelDataset to keep or drop a station_run dictionary.

[26]:

kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, station_id)
kernel_dataset.mini_summary

24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.

[26]:

	survey	station	run	start	end	duration
0	CONUS South	CAS04	a	2020-06-02 19:00:00+00:00	2020-06-02 22:07:46+00:00	11266.0
1	CONUS South	CAS04	b	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00	847648.0
2	CONUS South	CAS04	c	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00	1638042.0
3	CONUS South	CAS04	d	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00	1034585.0

Here is one way to select a single run:¶

[27]:

station_runs_dict = {}
station_runs_dict[station_id] = ["a", ]
keep_or_drop = "keep"

kernel_dataset.select_station_runs(station_runs_dict, keep_or_drop)
print(kernel_dataset.df[coverage_short_list_columns])

        survey station run                     start                       end
0  CONUS South   CAS04   a 2020-06-02 19:00:00+00:00 2020-06-02 22:07:46+00:00

To discard runs that are not very long¶

[28]:

kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, station_id)
cutoff_duration_in_seconds = 15000
kernel_dataset.drop_runs_shorter_than(cutoff_duration_in_seconds)
kernel_dataset.df[coverage_short_list_columns]

24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.

[28]:

	survey	station	run	start	end
0	CONUS South	CAS04	b	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00
1	CONUS South	CAS04	c	2020-06-12 18:32:17+00:00	2020-07-01 17:32:59+00:00
2	CONUS South	CAS04	d	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00

Select only runs “b” & “d”¶

[29]:

kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "CAS04")
station_runs_dict = {}
station_runs_dict[station_id] = ["b","d"]
keep_or_drop = "keep"
kernel_dataset.select_station_runs(station_runs_dict, keep_or_drop)
kernel_dataset.df[coverage_short_list_columns]

24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.

[29]:

	survey	station	run	start	end
0	CONUS South	CAS04	b	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00
1	CONUS South	CAS04	d	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00

The same result can be obtained by excluding runs a & c¶

[30]:

kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, station_id)
station_runs_dict = {}
station_runs_dict[station_id] = ["a","c"]
keep_or_drop = "drop"
kernel_dataset.select_station_runs(station_runs_dict, keep_or_drop)
kernel_dataset.df[coverage_short_list_columns]

24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:09:06T01:36:46 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.

[30]:

	survey	station	run	start	end
0	CONUS South	CAS04	b	2020-06-02 22:24:55+00:00	2020-06-12 17:52:23+00:00
1	CONUS South	CAS04	d	2020-07-01 19:36:55+00:00	2020-07-13 19:00:00+00:00

To process only a segment of data¶

Say that you have weeks of data available, but you want to restrict the data processed to a subset. If it is one contiguous subset block, you can just modify the run_summary table as below. You should also update the duration column by calling kernel_dataset._add_duration_column() afterwards:

[31]:

# kernel_dataset.df["start"].iloc[0] += pd.Timedelta(days=5)
# kernel_dataset.df["start"].iloc[1] += pd.Timedelta(days=7)
# kernel_dataset._add_duration_column()

Exercise:¶

Print the kernel_dataset dataframe by calling kernel_dataset.df
Modify the start and end times, call _add_duration_column() and print it again.
You can also process the data with and without this change
Are the TFs different?

Make an aurora configuration file (and then save that json file.)

[32]:

cc = ConfigCreator()
config = cc.create_from_kernel_dataset(kernel_dataset,
                                       emtf_band_file=BANDS_DEFAULT_FILE,)

[33]:

for decimation in config.decimations:
    # decimation.output_channels = ["ex", "ey"]
    decimation.estimator.engine = "RME"

Take a look at the config:

[34]:

config

[34]:

{
    "processing": {
        "band_setup_file": "/home/kkappler/anaconda3/envs/aurora-test/lib/python3.9/site-packages/aurora/config/emtf_band_setup/bs_test.cfg",
        "band_specification_style": "EMTF",
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [
            {
                "decimation_level": {
                    "anti_alias_filter": "default",
                    "bands": [
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.23828125,
                                "frequency_min": 0.19140625,
                                "index_max": 30,
                                "index_min": 25
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.19140625,
                                "frequency_min": 0.15234375,
                                "index_max": 24,
                                "index_min": 20
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.15234375,
                                "frequency_min": 0.12109375,
                                "index_max": 19,
                                "index_min": 16
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.12109375,
                                "frequency_min": 0.09765625,
                                "index_max": 15,
                                "index_min": 13
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.09765625,
                                "frequency_min": 0.07421875,
                                "index_max": 12,
                                "index_min": 10
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.07421875,
                                "frequency_min": 0.05859375,
                                "index_max": 9,
                                "index_min": 8
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.05859375,
                                "frequency_min": 0.04296875,
                                "index_max": 7,
                                "index_min": 6
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 0,
                                "frequency_max": 0.04296875,
                                "frequency_min": 0.03515625,
                                "index_max": 5,
                                "index_min": 5
                            }
                        }
                    ],
                    "decimation.factor": 1.0,
                    "decimation.level": 0,
                    "decimation.method": "default",
                    "decimation.sample_rate": 1.0,
                    "estimator.engine": "RME",
                    "estimator.estimate_per_channel": true,
                    "extra_pre_fft_detrend_type": "linear",
                    "input_channels": [
                        "hx",
                        "hy"
                    ],
                    "method": "fft",
                    "min_num_stft_windows": 2,
                    "output_channels": [
                        "ex",
                        "ey",
                        "hz"
                    ],
                    "pre_fft_detrend_type": "linear",
                    "prewhitening_type": "first difference",
                    "recoloring": true,
                    "reference_channels": [
                        "hx",
                        "hy"
                    ],
                    "regression.max_iterations": 10,
                    "regression.max_redescending_iterations": 2,
                    "regression.minimum_cycles": 10,
                    "regression.r0": 1.5,
                    "regression.tolerance": 0.005,
                    "regression.u0": 2.8,
                    "regression.verbosity": 0,
                    "save_fcs": false,
                    "window.clock_zero_type": "ignore",
                    "window.num_samples": 128,
                    "window.overlap": 32,
                    "window.type": "boxcar"
                }
            },
            {
                "decimation_level": {
                    "anti_alias_filter": "default",
                    "bands": [
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 1,
                                "frequency_max": 0.0341796875,
                                "frequency_min": 0.0263671875,
                                "index_max": 17,
                                "index_min": 14
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 1,
                                "frequency_max": 0.0263671875,
                                "frequency_min": 0.0205078125,
                                "index_max": 13,
                                "index_min": 11
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 1,
                                "frequency_max": 0.0205078125,
                                "frequency_min": 0.0166015625,
                                "index_max": 10,
                                "index_min": 9
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 1,
                                "frequency_max": 0.0166015625,
                                "frequency_min": 0.0126953125,
                                "index_max": 8,
                                "index_min": 7
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 1,
                                "frequency_max": 0.0126953125,
                                "frequency_min": 0.0107421875,
                                "index_max": 6,
                                "index_min": 6
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 1,
                                "frequency_max": 0.0107421875,
                                "frequency_min": 0.0087890625,
                                "index_max": 5,
                                "index_min": 5
                            }
                        }
                    ],
                    "decimation.factor": 4.0,
                    "decimation.level": 1,
                    "decimation.method": "default",
                    "decimation.sample_rate": 0.25,
                    "estimator.engine": "RME",
                    "estimator.estimate_per_channel": true,
                    "extra_pre_fft_detrend_type": "linear",
                    "input_channels": [
                        "hx",
                        "hy"
                    ],
                    "method": "fft",
                    "min_num_stft_windows": 2,
                    "output_channels": [
                        "ex",
                        "ey",
                        "hz"
                    ],
                    "pre_fft_detrend_type": "linear",
                    "prewhitening_type": "first difference",
                    "recoloring": true,
                    "reference_channels": [
                        "hx",
                        "hy"
                    ],
                    "regression.max_iterations": 10,
                    "regression.max_redescending_iterations": 2,
                    "regression.minimum_cycles": 10,
                    "regression.r0": 1.5,
                    "regression.tolerance": 0.005,
                    "regression.u0": 2.8,
                    "regression.verbosity": 0,
                    "save_fcs": false,
                    "window.clock_zero_type": "ignore",
                    "window.num_samples": 128,
                    "window.overlap": 32,
                    "window.type": "boxcar"
                }
            },
            {
                "decimation_level": {
                    "anti_alias_filter": "default",
                    "bands": [
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 2,
                                "frequency_max": 0.008544921875,
                                "frequency_min": 0.006591796875,
                                "index_max": 17,
                                "index_min": 14
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 2,
                                "frequency_max": 0.006591796875,
                                "frequency_min": 0.005126953125,
                                "index_max": 13,
                                "index_min": 11
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 2,
                                "frequency_max": 0.005126953125,
                                "frequency_min": 0.004150390625,
                                "index_max": 10,
                                "index_min": 9
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 2,
                                "frequency_max": 0.004150390625,
                                "frequency_min": 0.003173828125,
                                "index_max": 8,
                                "index_min": 7
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 2,
                                "frequency_max": 0.003173828125,
                                "frequency_min": 0.002685546875,
                                "index_max": 6,
                                "index_min": 6
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 2,
                                "frequency_max": 0.002685546875,
                                "frequency_min": 0.002197265625,
                                "index_max": 5,
                                "index_min": 5
                            }
                        }
                    ],
                    "decimation.factor": 4.0,
                    "decimation.level": 2,
                    "decimation.method": "default",
                    "decimation.sample_rate": 0.0625,
                    "estimator.engine": "RME",
                    "estimator.estimate_per_channel": true,
                    "extra_pre_fft_detrend_type": "linear",
                    "input_channels": [
                        "hx",
                        "hy"
                    ],
                    "method": "fft",
                    "min_num_stft_windows": 2,
                    "output_channels": [
                        "ex",
                        "ey",
                        "hz"
                    ],
                    "pre_fft_detrend_type": "linear",
                    "prewhitening_type": "first difference",
                    "recoloring": true,
                    "reference_channels": [
                        "hx",
                        "hy"
                    ],
                    "regression.max_iterations": 10,
                    "regression.max_redescending_iterations": 2,
                    "regression.minimum_cycles": 10,
                    "regression.r0": 1.5,
                    "regression.tolerance": 0.005,
                    "regression.u0": 2.8,
                    "regression.verbosity": 0,
                    "save_fcs": false,
                    "window.clock_zero_type": "ignore",
                    "window.num_samples": 128,
                    "window.overlap": 32,
                    "window.type": "boxcar"
                }
            },
            {
                "decimation_level": {
                    "anti_alias_filter": "default",
                    "bands": [
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 3,
                                "frequency_max": 0.00274658203125,
                                "frequency_min": 0.00213623046875,
                                "index_max": 22,
                                "index_min": 18
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 3,
                                "frequency_max": 0.00213623046875,
                                "frequency_min": 0.00164794921875,
                                "index_max": 17,
                                "index_min": 14
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 3,
                                "frequency_max": 0.00164794921875,
                                "frequency_min": 0.00115966796875,
                                "index_max": 13,
                                "index_min": 10
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 3,
                                "frequency_max": 0.00115966796875,
                                "frequency_min": 0.00079345703125,
                                "index_max": 9,
                                "index_min": 7
                            }
                        },
                        {
                            "band": {
                                "center_averaging_type": "geometric",
                                "closed": "left",
                                "decimation_level": 3,
                                "frequency_max": 0.00079345703125,
                                "frequency_min": 0.00054931640625,
                                "index_max": 6,
                                "index_min": 5
                            }
                        }
                    ],
                    "decimation.factor": 4.0,
                    "decimation.level": 3,
                    "decimation.method": "default",
                    "decimation.sample_rate": 0.015625,
                    "estimator.engine": "RME",
                    "estimator.estimate_per_channel": true,
                    "extra_pre_fft_detrend_type": "linear",
                    "input_channels": [
                        "hx",
                        "hy"
                    ],
                    "method": "fft",
                    "min_num_stft_windows": 2,
                    "output_channels": [
                        "ex",
                        "ey",
                        "hz"
                    ],
                    "pre_fft_detrend_type": "linear",
                    "prewhitening_type": "first difference",
                    "recoloring": true,
                    "reference_channels": [
                        "hx",
                        "hy"
                    ],
                    "regression.max_iterations": 10,
                    "regression.max_redescending_iterations": 2,
                    "regression.minimum_cycles": 10,
                    "regression.r0": 1.5,
                    "regression.tolerance": 0.005,
                    "regression.u0": 2.8,
                    "regression.verbosity": 0,
                    "save_fcs": false,
                    "window.clock_zero_type": "ignore",
                    "window.num_samples": 128,
                    "window.overlap": 32,
                    "window.type": "boxcar"
                }
            }
        ],
        "id": "CAS04_sr1",
        "stations.local.id": "CAS04",
        "stations.local.mth5_path": "/home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5",
        "stations.local.remote": false,
        "stations.local.runs": [
            {
                "run": {
                    "id": "b",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 1.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-06-12T17:52:23+00:00",
                                "start": "2020-06-02T22:24:55+00:00"
                            }
                        }
                    ]
                }
            },
            {
                "run": {
                    "id": "d",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 1.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-07-13T19:00:00+00:00",
                                "start": "2020-07-01T19:36:55+00:00"
                            }
                        }
                    ]
                }
            }
        ],
        "stations.remote": []
    }
}

What if I have unconventional channel names?¶

Aurora uses “ex”, “ey”, “hx”, “hy”, “hz” as default names, but not all MTH5 files will use this nomenclauture. For example, files generated from some Phoenix system call channels “e1”, “e2”, “h1”, “h2”, “h3”

A complete list of supported channel mappings is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json

Here is an exmaple of how update the config in this case:

[35]:

# config.channel_nomenclature.keyword = "phoenix123"
# config.set_default_input_output_channels()
# config.set_default_reference_channels()

Exercise:¶

Print the processing config by calling config
Modify the nomenclature using the above code, and print it again.
Confirm that the two configs are different. Can you spot the differences?

Run the Aurora Pipeline using the input MTh5 and Confiugration File

[36]:

show_plot = True
tf_cls = process_mth5(config,
                    kernel_dataset,
                    units="MT",
                    show_plot=show_plot,
                    z_file_path=None,
                )

24:09:06T01:36:46 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:09:06T01:36:46 | INFO | line:278 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
    duration  has_data  n_samples run station       survey       run_hdf5_reference   station_hdf5_reference     fc  remote  stft mth5_obj dec_level  dec_factor  sample_rate  window_duration  num_samples_window  num_samples  num_stft_windows
0   847648.0      True     847649   b   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         0         1.0     1.000000            128.0                 128     847648.0            8829.0
1   847648.0      True     847649   b   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         1         4.0     0.250000            512.0                 128     211912.0            2207.0
2   847648.0      True     847649   b   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         2         4.0     0.062500           2048.0                 128      52978.0             551.0
3   847648.0      True     847649   b   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         3         4.0     0.015625           8192.0                 128      13244.0             137.0
4  1034585.0      True    1034586   d   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         0         1.0     1.000000            128.0                 128    1034585.0           10776.0
5  1034585.0      True    1034586   d   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         1         4.0     0.250000            512.0                 128     258646.0            2693.0
6  1034585.0      True    1034586   d   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         2         4.0     0.062500           2048.0                 128      64661.0             673.0
7  1034585.0      True    1034586   d   CAS04  CONUS South  <HDF5 object reference>  <HDF5 object reference>  False   False  None     None         3         4.0     0.015625           8192.0                 128      16165.0             168.0
24:09:06T01:36:46 | INFO | line:654 |aurora.pipelines.transfer_function_kernel | memory_check | Total memory: 62.56 GB
24:09:06T01:36:46 | INFO | line:658 |aurora.pipelines.transfer_function_kernel | memory_check | Total Bytes of Raw Data: 0.014 GB
24:09:06T01:36:46 | INFO | line:661 |aurora.pipelines.transfer_function_kernel | memory_check | Raw Data will use: 0.022 % of memory
24:09:06T01:36:46 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:09:06T01:36:46 | INFO | line:445 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:09:06T01:36:49 | INFO | line:889 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:09:06T01:36:49 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:09:06T01:36:50 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:36:52 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:36:52 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s  (0.038867Hz)
24:09:06T01:36:52 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s  (0.050177Hz)
24:09:06T01:36:52 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s  (0.065945Hz)
24:09:06T01:36:52 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s  (0.085135Hz)
24:09:06T01:36:53 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s  (0.108745Hz)
24:09:06T01:36:53 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s  (0.135823Hz)
24:09:06T01:36:53 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s  (0.170762Hz)
24:09:06T01:36:54 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s  (0.213562Hz)

../_images/examples_operate_aurora_66_1.png

24:09:06T01:36:55 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:09:06T01:36:55 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:09:06T01:36:56 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:36:56 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:36:56 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s  (0.009717Hz)
24:09:06T01:36:57 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s  (0.011678Hz)
24:09:06T01:36:57 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s  (0.014518Hz)
24:09:06T01:36:57 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s  (0.018452Hz)
24:09:06T01:36:57 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s  (0.023254Hz)
24:09:06T01:36:57 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s  (0.030020Hz)

../_images/examples_operate_aurora_66_3.png

24:09:06T01:36:58 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:09:06T01:36:58 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:09:06T01:36:58 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:36:59 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:36:59 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s  (0.002429Hz)
24:09:06T01:36:59 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s  (0.002919Hz)
24:09:06T01:36:59 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s  (0.003629Hz)
24:09:06T01:36:59 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s  (0.004613Hz)
24:09:06T01:36:59 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s  (0.005813Hz)
24:09:06T01:36:59 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s  (0.007505Hz)

../_images/examples_operate_aurora_66_5.png

24:09:06T01:37:00 | INFO | line:124 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:09:06T01:37:00 | INFO | line:143 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:09:06T01:37:00 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:37:00 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc =  False
24:09:06T01:37:00 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s  (0.000660Hz)
24:09:06T01:37:00 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s  (0.000959Hz)
24:09:06T01:37:00 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s  (0.001382Hz)
24:09:06T01:37:00 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s  (0.001876Hz)
24:09:06T01:37:01 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s  (0.002422Hz)

../_images/examples_operate_aurora_66_7.png

24:09:06T01:37:01 | INFO | line:771 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5

[37]:

type(tf_cls)

[37]:

mt_metadata.transfer_functions.core.TF

Write the transfer functions generated by the Aurora pipeline

[38]:

 tf_cls.write(fn="emtfxml_test.xml", file_type="emtfxml")

[38]:

EMTFXML(station='CAS04', latitude=37.63, longitude=-121.47, elevation=335.26)

[39]:

tf_cls.write(fn="emtfxml_test.edi", file_type="edi")

[39]:

Station: CAS04
--------------------------------------------------
        Survey:        CONUS South
        Project:       USMTArray
        Acquired by:   None
        Acquired date: 2020-06-02
        Latitude:      37.633
        Longitude:     -121.468
        Elevation:     335.262
        Impedance:     True
        Tipper:        True
        Number of periods: 25
                Period Range:   4.68249E+00 -- 1.51470E+03 s
                Frequency Range 6.60196E-04 -- 2.13561E-01 s

[40]:

 tf_cls.write(fn="emtfxml_test.zss", file_type="zmm")

[40]:

MT( station='CAS04', latitude=37.63, longitude=-121.47, elevation=335.26 )

[ ]: