Build an MTH5 and Operate the Aurora Pipeline¶
This notebook pulls MT miniSEED data from the IRIS Dataselect web service and produces MTH5 out of it. It outlines the process of making an MTH5 file, generating a processing config, and running the Aurora processor.
It assumes that aurora, mth5, and mt_metadata have all been installed.
In this “new” version, the workflow has changed somewhat.
The process_mth5 call works with a dataset dataframe, rather than a single run_id
The config object is now based on the mt_metadata.base Base class
Remote reference processing is supported (at least in theory)
0. Flow of this notebook¶
Section 1: Here we do imports and construct a table of the data that we will access to build the mth5. Note that there is no explanation here as to the table source – a future update can show how to create such a table from IRIS data_availability tools
Seciton 2: the metadata and the data are accessed, and the mth5 is created and stored.
Section 3: Aurora is used to process the data
[1]:
# # Uncomment while developing
# %load_ext autoreload
# %autoreload 2
[2]:
# Required imports for the program.
from pathlib import Path
import pandas as pd
import warnings
from mth5 import mth5, timeseries
from mth5.clients.fdsn import FDSN
from mth5.clients.make_mth5 import MakeMTH5
from mth5.utils.helpers import initialize_mth5
from mt_metadata.utils.mttime import get_now_utc, MTime
from aurora.config import BANDS_DEFAULT_FILE
from aurora.config.config_creator import ConfigCreator
from aurora.pipelines.process_mth5 import process_mth5
from mtpy.processing import RunSummary, KernelDataset
warnings.filterwarnings('ignore')
1. Build an MTH5 file from information extracted by IRIS¶
If you have already built an MTH5 you can skip this section
Set path so MTH5 file builds to current working directory.
[3]:
default_path = Path().cwd()
default_path
[3]:
PosixPath('/home/kkappler/software/irismt/aurora/docs/examples')
Select mth5 file version
[4]:
# mth5_version = '0.1.0'
mth5_version = '0.2.0'
[5]:
# Initialize the Make MTH5 code.
maker = MakeMTH5(mth5_version=mth5_version)
maker.client = "IRIS"
maker.interact = True
# Initalize an FDSN object to access column names for request df
fdsn_obj = FDSN()
1A: Specify the data to access from IRIS¶
Note that here we explicitly prescribe the data, but this dataframe could be built from IRIS data availability tools in a programatic way
[6]:
# Generate data frame of FDSN Network, Station, Location, Channel, Startime, Endtime codes of interest
CAS04LQE = ['8P', 'CAS04', '', 'LQE', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04LQN = ['8P', 'CAS04', '', 'LQN', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04BFE = ['8P', 'CAS04', '', 'LFE', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04BFN = ['8P', 'CAS04', '', 'LFN', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
CAS04BFZ = ['8P', 'CAS04', '', 'LFZ', '2020-06-02T19:00:00', '2020-07-13T19:00:00']
request_list = [CAS04LQE, CAS04LQN, CAS04BFE, CAS04BFN, CAS04BFZ]
# Turn list into dataframe
request_df = pd.DataFrame(request_list, columns=fdsn_obj.request_columns)
[7]:
# Inspect the dataframe
request_df
[7]:
network | station | location | channel | start | end | |
---|---|---|---|---|---|---|
0 | 8P | CAS04 | LQE | 2020-06-02T19:00:00 | 2020-07-13T19:00:00 | |
1 | 8P | CAS04 | LQN | 2020-06-02T19:00:00 | 2020-07-13T19:00:00 | |
2 | 8P | CAS04 | LFE | 2020-06-02T19:00:00 | 2020-07-13T19:00:00 | |
3 | 8P | CAS04 | LFN | 2020-06-02T19:00:00 | 2020-07-13T19:00:00 | |
4 | 8P | CAS04 | LFZ | 2020-06-02T19:00:00 | 2020-07-13T19:00:00 |
[8]:
# Request the inventory information from IRIS
inventory = fdsn_obj.get_inventory_from_df(request_df, data=False)
[9]:
# Inspect the inventory
inventory
[9]:
(Inventory created at 2024-08-28T22:58:43.177868Z
Created by: ObsPy 1.4.0
https://www.obspy.org
Sending institution: MTH5
Contains:
Networks (1):
8P
Stations (1):
8P.CAS04 (Corral Hollow, CA, USA)
Channels (8):
8P.CAS04..LFZ, 8P.CAS04..LFN, 8P.CAS04..LFE, 8P.CAS04..LQN (2x),
8P.CAS04..LQE (3x),
0 Trace(s) in Stream:
)
Builds an MTH5 file from the user defined database.
With the mth5 object set, we are ready to actually request the data from the fdsn client (IRIS) and save it to an MTH5 file. This process builds an MTH5 file and can take some time depending on how much data is requested.
Note: interact
keeps the MTH5 open after it is done building
[10]:
mth5_object = maker.from_fdsn_client(request_df)
24:08:28T15:58:43 | WARNING | line:611 |mth5.mth5 | open_mth5 | 8P_CAS04.h5 will be overwritten in 'w' mode
24:08:28T15:58:44 | INFO | line:679 |mth5.mth5 | _initialize_file | Initialized MTH5 0.2.0 file /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5 in mode w
24:08:28T15:58:51 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:08:28T15:58:51 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:08:28T15:58:51 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:08:28T15:58:51 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:08:28T15:58:51 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:08:28T15:58:51 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:08:28T15:58:52 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:08:28T15:58:52 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:08:28T15:58:52 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_si_units to a CoefficientFilter.
24:08:28T15:58:52 | INFO | line:133 |mt_metadata.timeseries.filters.obspy_stages | create_filter_from_stage | Converting PoleZerosResponseStage electric_dipole_92.000 to a CoefficientFilter.
24:08:28T15:58:53 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup a already exists, returning existing group.
24:08:28T15:58:54 | WARNING | line:645 |mth5.timeseries.run_ts | validate_metadata | start time of dataset 2020-06-02T19:00:00+00:00 does not match metadata start 2020-06-02T18:41:43+00:00 updating metatdata value to 2020-06-02T19:00:00+00:00
24:08:28T15:58:54 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id a. Setting to ch.run_metadata.id to a
24:08:28T15:58:54 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id a. Setting to ch.run_metadata.id to a
24:08:28T15:58:54 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id a. Setting to ch.run_metadata.id to a
24:08:28T15:58:55 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id a. Setting to ch.run_metadata.id to a
24:08:28T15:58:55 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id a. Setting to ch.run_metadata.id to a
24:08:28T15:58:55 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup b already exists, returning existing group.
24:08:28T15:58:56 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id b. Setting to ch.run_metadata.id to b
24:08:28T15:58:56 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id b. Setting to ch.run_metadata.id to b
24:08:28T15:58:56 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id b. Setting to ch.run_metadata.id to b
24:08:28T15:58:57 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id b. Setting to ch.run_metadata.id to b
24:08:28T15:58:57 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id b. Setting to ch.run_metadata.id to b
24:08:28T15:58:57 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup c already exists, returning existing group.
24:08:28T15:58:58 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id c. Setting to ch.run_metadata.id to c
24:08:28T15:58:58 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id c. Setting to ch.run_metadata.id to c
24:08:28T15:58:59 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id c. Setting to ch.run_metadata.id to c
24:08:28T15:58:59 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id c. Setting to ch.run_metadata.id to c
24:08:28T15:59:00 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id c. Setting to ch.run_metadata.id to c
24:08:28T15:59:00 | INFO | line:331 |mth5.groups.base | _add_group | RunGroup d already exists, returning existing group.
24:08:28T15:59:00 | WARNING | line:658 |mth5.timeseries.run_ts | validate_metadata | end time of dataset 2020-07-13T19:00:00+00:00 does not match metadata end 2020-07-13T21:46:12+00:00 updating metatdata value to 2020-07-13T19:00:00+00:00
24:08:28T15:59:01 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id d. Setting to ch.run_metadata.id to d
24:08:28T15:59:01 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id d. Setting to ch.run_metadata.id to d
24:08:28T15:59:01 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id d. Setting to ch.run_metadata.id to d
24:08:28T15:59:01 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id d. Setting to ch.run_metadata.id to d
24:08:28T15:59:02 | WARNING | line:677 |mth5.groups.run | from_runts | Channel run.id sr1_001 != group run.id d. Setting to ch.run_metadata.id to d
24:08:28T15:59:02 | INFO | line:761 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
24:08:28T15:59:02 | WARNING | line:328 |mth5.mth5 | filename | MTH5 file is not open or has not been created yet. Returning default name
1B: Examine and Update the MTH5 object¶
With the open MTH5 Object, we can start to examine what is in it. For example, retrieve the filename and file_version. You can additionally do things such as getting the station information and edit it by setting a new value, in this case the declination model.
[11]:
mth5_object
[11]:
/:
====================
|- Group: Experiment
--------------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Surveys
-----------------
|- Group: CONUS_South
---------------------
|- Group: Filters
-----------------
|- Group: coefficient
---------------------
|- Group: electric_analog_to_digital
------------------------------------
|- Group: electric_dipole_92.000
--------------------------------
|- Group: electric_si_units
---------------------------
|- Group: magnetic_analog_to_digital
------------------------------------
|- Group: fap
-------------
|- Group: fir
-------------
|- Group: time_delay
--------------------
|- Group: electric_time_offset
------------------------------
|- Group: hx_time_offset
------------------------
|- Group: hy_time_offset
------------------------
|- Group: hz_time_offset
------------------------
|- Group: zpk
-------------
|- Group: electric_butterworth_high_pass_30000
----------------------------------------------
--> Dataset: poles
....................
--> Dataset: zeros
....................
|- Group: electric_butterworth_low_pass
---------------------------------------
--> Dataset: poles
....................
--> Dataset: zeros
....................
|- Group: magnetic_butterworth_low_pass
---------------------------------------
--> Dataset: poles
....................
--> Dataset: zeros
....................
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Stations
------------------
|- Group: CAS04
---------------
|- Group: Fourier_Coefficients
------------------------------
|- Group: Transfer_Functions
----------------------------
|- Group: a
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: b
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: c
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
|- Group: d
-----------
--> Dataset: ex
.................
--> Dataset: ey
.................
--> Dataset: hx
.................
--> Dataset: hy
.................
--> Dataset: hz
.................
--> Dataset: channel_summary
..............................
--> Dataset: tf_summary
.........................
[12]:
mth5_path = mth5_object.filename
[13]:
mth5_object.file_version
[13]:
'0.2.0'
[14]:
mth5_object.close_mth5()
24:08:28T15:59:02 | INFO | line:761 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
[15]:
mth5_object = initialize_mth5(mth5_path)
1C: Optionally Update Metdata:¶
[16]:
# Edit and update the MTH5 metadata
s = mth5_object.get_station("CAS04", survey="CONUS_South")
print(s.metadata.location.declination.model)
s.metadata.location.declination.model = 'IGRF'
print(s.metadata.location.declination.model)
s.write_metadata() # writes to file mth5_filename
IGRF-13
IGRF
[17]:
# Print some info about the mth5
mth5_filename = mth5_object.filename
version = mth5_object.file_version
print(f" Filename: {mth5_filename} \n Version: {version}")
Filename: /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
Version: 0.2.0
[18]:
# Get the available stations and runs from the MTH5 object
mth5_object.channel_summary.summarize()
ch_summary = mth5_object.channel_summary.to_dataframe()
2: Process Data¶
If MTH5 file already exists you can start here if you dont want to execute the previous code to get data again.
[19]:
interact = False
if interact:
pass
else:
h5_path = default_path.joinpath("8P_CAS04.h5")
mth5_object = initialize_mth5(h5_path, mode="a", file_version=mth5_version)
ch_summary = mth5_object.channel_summary.to_dataframe()
Generate an Aurora Configuration file using MTH5 as an input¶
Up to this point, we have used mth5 and mt_metadata, but haven’t yet used aurora. So we will use the MTH5 that we just created (and examined and updated) as input into Aurora.
Channel Summary¶
This is a very useful datastructure inside the mth5. It acts basically like an index of available data at the channel-run level, i.e. there is one row for every contiguous chunk of time-series recorded by an electric dipole or magnetometer
[20]:
ch_summary
[20]:
survey | station | run | latitude | longitude | elevation | component | start | end | n_samples | sample_rate | measurement_type | azimuth | tilt | units | has_data | hdf5_reference | run_hdf5_reference | station_hdf5_reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CONUS South | CAS04 | a | 37.633351 | -121.468382 | 335.261765 | ex | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11267 | 1.0 | electric | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
1 | CONUS South | CAS04 | a | 37.633351 | -121.468382 | 335.261765 | ey | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11267 | 1.0 | electric | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
2 | CONUS South | CAS04 | a | 37.633351 | -121.468382 | 335.261765 | hx | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11267 | 1.0 | magnetic | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
3 | CONUS South | CAS04 | a | 37.633351 | -121.468382 | 335.261765 | hy | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11267 | 1.0 | magnetic | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
4 | CONUS South | CAS04 | a | 37.633351 | -121.468382 | 335.261765 | hz | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11267 | 1.0 | magnetic | 0.0 | 90.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
5 | CONUS South | CAS04 | b | 37.633351 | -121.468382 | 335.261765 | ex | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847649 | 1.0 | electric | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
6 | CONUS South | CAS04 | b | 37.633351 | -121.468382 | 335.261765 | ey | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847649 | 1.0 | electric | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
7 | CONUS South | CAS04 | b | 37.633351 | -121.468382 | 335.261765 | hx | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847649 | 1.0 | magnetic | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
8 | CONUS South | CAS04 | b | 37.633351 | -121.468382 | 335.261765 | hy | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847649 | 1.0 | magnetic | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
9 | CONUS South | CAS04 | b | 37.633351 | -121.468382 | 335.261765 | hz | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847649 | 1.0 | magnetic | 0.0 | 90.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
10 | CONUS South | CAS04 | c | 37.633351 | -121.468382 | 335.261765 | ex | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638043 | 1.0 | electric | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
11 | CONUS South | CAS04 | c | 37.633351 | -121.468382 | 335.261765 | ey | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638043 | 1.0 | electric | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
12 | CONUS South | CAS04 | c | 37.633351 | -121.468382 | 335.261765 | hx | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638043 | 1.0 | magnetic | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
13 | CONUS South | CAS04 | c | 37.633351 | -121.468382 | 335.261765 | hy | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638043 | 1.0 | magnetic | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
14 | CONUS South | CAS04 | c | 37.633351 | -121.468382 | 335.261765 | hz | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638043 | 1.0 | magnetic | 0.0 | 90.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
15 | CONUS South | CAS04 | d | 37.633351 | -121.468382 | 335.261765 | ex | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034586 | 1.0 | electric | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
16 | CONUS South | CAS04 | d | 37.633351 | -121.468382 | 335.261765 | ey | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034586 | 1.0 | electric | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
17 | CONUS South | CAS04 | d | 37.633351 | -121.468382 | 335.261765 | hx | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034586 | 1.0 | magnetic | 13.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
18 | CONUS South | CAS04 | d | 37.633351 | -121.468382 | 335.261765 | hy | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034586 | 1.0 | magnetic | 103.2 | 0.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
19 | CONUS South | CAS04 | d | 37.633351 | -121.468382 | 335.261765 | hz | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034586 | 1.0 | magnetic | 0.0 | 90.0 | digital counts | True | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
The Channel summary has a lot of uses, below we use it to check if the data have mixed sample rates, and to get a list of available stations
[21]:
available_runs = ch_summary.run.unique()
sr = ch_summary.sample_rate.unique()
if len(sr) != 1:
print('Only one sample rate per run is available')
available_stations = ch_summary.station.unique()
print(f"Available stations: {available_stations}")
Available stations: ['CAS04']
Run Summary¶
A cousin of the channel summary is the Run Summary. This is a condensed version of the channel summary, with one row per continuous acquistion run at a station.
The run summary can be accessed from an open mth5 object, or from an iterable of h5 paths as in the example below
[22]:
mth5_run_summary = RunSummary()
h5_path = default_path.joinpath("8P_CAS04.h5")
mth5_run_summary.from_mth5s([h5_path,])
run_summary = mth5_run_summary.clone()
run_summary.df
24:08:28T15:59:03 | INFO | line:761 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
[22]:
channel_scale_factors | duration | end | has_data | input_channels | mth5_path | n_samples | output_channels | run | sample_rate | start | station | survey | run_hdf5_reference | station_hdf5_reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '... | 11266.0 | 2020-06-02 22:07:46+00:00 | True | [hx, hy] | /home/kkappler/software/irismt/aurora/docs/exa... | 11267 | [ex, ey, hz] | a | 1.0 | 2020-06-02 19:00:00+00:00 | CAS04 | CONUS South | <HDF5 object reference> | <HDF5 object reference> |
1 | {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '... | 847648.0 | 2020-06-12 17:52:23+00:00 | True | [hx, hy] | /home/kkappler/software/irismt/aurora/docs/exa... | 847649 | [ex, ey, hz] | b | 1.0 | 2020-06-02 22:24:55+00:00 | CAS04 | CONUS South | <HDF5 object reference> | <HDF5 object reference> |
2 | {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '... | 1638042.0 | 2020-07-01 17:32:59+00:00 | True | [hx, hy] | /home/kkappler/software/irismt/aurora/docs/exa... | 1638043 | [ex, ey, hz] | c | 1.0 | 2020-06-12 18:32:17+00:00 | CAS04 | CONUS South | <HDF5 object reference> | <HDF5 object reference> |
3 | {'ex': 1.0, 'ey': 1.0, 'hx': 1.0, 'hy': 1.0, '... | 1034585.0 | 2020-07-13 19:00:00+00:00 | True | [hx, hy] | /home/kkappler/software/irismt/aurora/docs/exa... | 1034586 | [ex, ey, hz] | d | 1.0 | 2020-07-01 19:36:55+00:00 | CAS04 | CONUS South | <HDF5 object reference> | <HDF5 object reference> |
Now we have a dataframe of the available runs to process from the MTH5
Sometimes we just want to look at the survey, station, run, and time intervals we can for that we can call mini_summary
[23]:
run_summary.mini_summary
[23]:
survey | station | run | start | end | duration | |
---|---|---|---|---|---|---|
0 | CONUS South | CAS04 | a | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11266.0 |
1 | CONUS South | CAS04 | b | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847648.0 |
2 | CONUS South | CAS04 | c | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638042.0 |
3 | CONUS South | CAS04 | d | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034585.0 |
But here are the columns in the run summary
[24]:
run_summary.df.columns
[24]:
Index(['channel_scale_factors', 'duration', 'end', 'has_data',
'input_channels', 'mth5_path', 'n_samples', 'output_channels', 'run',
'sample_rate', 'start', 'station', 'survey', 'run_hdf5_reference',
'station_hdf5_reference'],
dtype='object')
Make your own mini summary by choosing columns
[25]:
coverage_short_list_columns = ["survey", 'station', 'run', 'start', 'end', ]
run_summary.df[coverage_short_list_columns]
[25]:
survey | station | run | start | end | |
---|---|---|---|---|---|
0 | CONUS South | CAS04 | a | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 |
1 | CONUS South | CAS04 | b | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 |
2 | CONUS South | CAS04 | c | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 |
3 | CONUS South | CAS04 | d | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 |
Kernel Dataset¶
This is like a run summary, but for a single station or a pair of stations. It is used to specify the inputs to aurora processing.
It takes a run_summary and a station name, and optionally, a remote reference station name
It is made based on the available data in the MTH5 archive.
Syntax: kernel_dataset.from_run_summary(run_summary, local_station_id, reference_station_id)
By Default, all runs will be processed
To restrict to processing a single run, or a list of runs, we can either tell KernelDataset to keep or drop a station_run dictionary.
[26]:
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "CAS04")
kernel_dataset.mini_summary
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.
[26]:
survey | station | run | start | end | duration | |
---|---|---|---|---|---|---|
0 | CONUS South | CAS04 | a | 2020-06-02 19:00:00+00:00 | 2020-06-02 22:07:46+00:00 | 11266.0 |
1 | CONUS South | CAS04 | b | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 | 847648.0 |
2 | CONUS South | CAS04 | c | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 | 1638042.0 |
3 | CONUS South | CAS04 | d | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 | 1034585.0 |
Here is one way to select a single run:¶
[27]:
station_runs_dict = {}
station_runs_dict["CAS04"] = ["a", ]
keep_or_drop = "keep"
kernel_dataset.select_station_runs(station_runs_dict, keep_or_drop)
print(kernel_dataset.df[coverage_short_list_columns])
survey station run start end
0 CONUS South CAS04 a 2020-06-02 19:00:00+00:00 2020-06-02 22:07:46+00:00
To discard runs that are not very long¶
[28]:
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "CAS04")
cutoff_duration_in_seconds = 15000
kernel_dataset.drop_runs_shorter_than(cutoff_duration_in_seconds)
kernel_dataset.df[coverage_short_list_columns]
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:08:28T15:59:03 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.
[28]:
survey | station | run | start | end | |
---|---|---|---|---|---|
0 | CONUS South | CAS04 | b | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 |
1 | CONUS South | CAS04 | c | 2020-06-12 18:32:17+00:00 | 2020-07-01 17:32:59+00:00 |
2 | CONUS South | CAS04 | d | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 |
Select only runs “b” & “d”¶
[29]:
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "CAS04")
station_runs_dict = {}
station_runs_dict["CAS04"] = ["b","d"]
keep_or_drop = "keep"
kernel_dataset.select_station_runs(station_runs_dict, keep_or_drop)
kernel_dataset.df[coverage_short_list_columns]
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.
[29]:
survey | station | run | start | end | |
---|---|---|---|---|---|
0 | CONUS South | CAS04 | b | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 |
1 | CONUS South | CAS04 | d | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 |
The same result can be obtained by excluding runs a & c¶
[30]:
kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary, "CAS04")
station_runs_dict = {}
station_runs_dict["CAS04"] = ["a","c"]
keep_or_drop = "drop"
kernel_dataset.select_station_runs(station_runs_dict, keep_or_drop)
kernel_dataset.df[coverage_short_list_columns]
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column fc, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column remote, adding and setting dtype to <class 'bool'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column run_dataarray, adding and setting dtype to <class 'object'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column stft, adding and setting dtype to <class 'object'>.
24:08:28T15:59:04 | INFO | line:250 |mtpy.processing.kernel_dataset | _add_columns | KernelDataset DataFrame needs column mth5_obj, adding and setting dtype to <class 'object'>.
[30]:
survey | station | run | start | end | |
---|---|---|---|---|---|
0 | CONUS South | CAS04 | b | 2020-06-02 22:24:55+00:00 | 2020-06-12 17:52:23+00:00 |
1 | CONUS South | CAS04 | d | 2020-07-01 19:36:55+00:00 | 2020-07-13 19:00:00+00:00 |
To process only a segment of data¶
Say that you have weeks of data available, but you want to restrict the data processed to a subset. If it is one contiguous subset block, you can just modify the run_summary table as below. You should also update the duration column by calling kernel_dataset._add_duration_column() afterwards:
[31]:
# kernel_dataset.df["start"].iloc[0] += pd.Timedelta(days=5)
# kernel_dataset.df["start"].iloc[1] += pd.Timedelta(days=7)
# kernel_dataset._add_duration_column()
Exercise:¶
Print the kernel_dataset dataframe by calling
kernel_dataset.df
Modify the start and end times, call _add_duration_column() and print it again.
You can also process the data with and without this change
Are the TFs different?
Make an aurora configuration file (and then save that json file.)
[32]:
cc = ConfigCreator()
config = cc.create_from_kernel_dataset(kernel_dataset,
emtf_band_file=BANDS_DEFAULT_FILE,)
[33]:
for decimation in config.decimations:
# decimation.output_channels = ["ex", "ey"]
decimation.estimator.engine = "RME"
Take a look at the config:
[34]:
config
[34]:
{
"processing": {
"band_setup_file": "/home/kkappler/software/irismt/aurora/aurora/config/emtf_band_setup/bs_test.cfg",
"band_specification_style": "EMTF",
"channel_nomenclature.ex": "ex",
"channel_nomenclature.ey": "ey",
"channel_nomenclature.hx": "hx",
"channel_nomenclature.hy": "hy",
"channel_nomenclature.hz": "hz",
"decimations": [
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.23828125,
"frequency_min": 0.19140625,
"index_max": 30,
"index_min": 25
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.19140625,
"frequency_min": 0.15234375,
"index_max": 24,
"index_min": 20
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.15234375,
"frequency_min": 0.12109375,
"index_max": 19,
"index_min": 16
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.12109375,
"frequency_min": 0.09765625,
"index_max": 15,
"index_min": 13
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.09765625,
"frequency_min": 0.07421875,
"index_max": 12,
"index_min": 10
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.07421875,
"frequency_min": 0.05859375,
"index_max": 9,
"index_min": 8
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.05859375,
"frequency_min": 0.04296875,
"index_max": 7,
"index_min": 6
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 0,
"frequency_max": 0.04296875,
"frequency_min": 0.03515625,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 1.0,
"decimation.level": 0,
"decimation.method": "default",
"decimation.sample_rate": 1.0,
"estimator.engine": "RME",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0341796875,
"frequency_min": 0.0263671875,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0263671875,
"frequency_min": 0.0205078125,
"index_max": 13,
"index_min": 11
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0205078125,
"frequency_min": 0.0166015625,
"index_max": 10,
"index_min": 9
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0166015625,
"frequency_min": 0.0126953125,
"index_max": 8,
"index_min": 7
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0126953125,
"frequency_min": 0.0107421875,
"index_max": 6,
"index_min": 6
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 1,
"frequency_max": 0.0107421875,
"frequency_min": 0.0087890625,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 4.0,
"decimation.level": 1,
"decimation.method": "default",
"decimation.sample_rate": 0.25,
"estimator.engine": "RME",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.008544921875,
"frequency_min": 0.006591796875,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.006591796875,
"frequency_min": 0.005126953125,
"index_max": 13,
"index_min": 11
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.005126953125,
"frequency_min": 0.004150390625,
"index_max": 10,
"index_min": 9
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.004150390625,
"frequency_min": 0.003173828125,
"index_max": 8,
"index_min": 7
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.003173828125,
"frequency_min": 0.002685546875,
"index_max": 6,
"index_min": 6
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 2,
"frequency_max": 0.002685546875,
"frequency_min": 0.002197265625,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 4.0,
"decimation.level": 2,
"decimation.method": "default",
"decimation.sample_rate": 0.0625,
"estimator.engine": "RME",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
{
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00274658203125,
"frequency_min": 0.00213623046875,
"index_max": 22,
"index_min": 18
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00213623046875,
"frequency_min": 0.00164794921875,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00164794921875,
"frequency_min": 0.00115966796875,
"index_max": 13,
"index_min": 10
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00115966796875,
"frequency_min": 0.00079345703125,
"index_max": 9,
"index_min": 7
}
},
{
"band": {
"center_averaging_type": "geometric",
"closed": "left",
"decimation_level": 3,
"frequency_max": 0.00079345703125,
"frequency_min": 0.00054931640625,
"index_max": 6,
"index_min": 5
}
}
],
"decimation.factor": 4.0,
"decimation.level": 3,
"decimation.method": "default",
"decimation.sample_rate": 0.015625,
"estimator.engine": "RME",
"estimator.estimate_per_channel": true,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"method": "fft",
"min_num_stft_windows": 2,
"output_channels": [
"ex",
"ey",
"hz"
],
"pre_fft_detrend_type": "linear",
"prewhitening_type": "first difference",
"recoloring": true,
"reference_channels": [
"hx",
"hy"
],
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 2,
"regression.minimum_cycles": 10,
"regression.r0": 1.5,
"regression.tolerance": 0.005,
"regression.u0": 2.8,
"regression.verbosity": 0,
"save_fcs": false,
"window.clock_zero_type": "ignore",
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
}
],
"id": "CAS04_sr1",
"stations.local.id": "CAS04",
"stations.local.mth5_path": "/home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5",
"stations.local.remote": false,
"stations.local.runs": [
{
"run": {
"id": "b",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
}
],
"sample_rate": 1.0,
"time_periods": [
{
"time_period": {
"end": "2020-06-12T17:52:23+00:00",
"start": "2020-06-02T22:24:55+00:00"
}
}
]
}
},
{
"run": {
"id": "d",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
}
],
"sample_rate": 1.0,
"time_periods": [
{
"time_period": {
"end": "2020-07-13T19:00:00+00:00",
"start": "2020-07-01T19:36:55+00:00"
}
}
]
}
}
],
"stations.remote": []
}
}
What if I have unconventional channel names?¶
Aurora uses “ex”, “ey”, “hx”, “hy”, “hz” as default names, but not all MTH5 files will use this nomenclauture. For example, files generated from some Phoenix system call channels “e1”, “e2”, “h1”, “h2”, “h3”
A complete list of supported channel mappings is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json
Here is an exmaple of how update the config in this case:
[35]:
# config.channel_nomenclature.keyword = "phoenix123"
# config.set_default_input_output_channels()
# config.set_default_reference_channels()
Exercise:¶
Print the processing config by calling config
Modify the nomenclature using the above code, and print it again.
Confirm that the two configs are different. Can you spot the differences?
Run the Aurora Pipeline using the input MTh5 and Confiugration File
[36]:
show_plot = True
tf_cls = process_mth5(config,
kernel_dataset,
units="MT",
show_plot=show_plot,
z_file_path=None,
)
24:08:28T15:59:04 | INFO | line:276 |aurora.pipelines.transfer_function_kernel | show_processing_summary | Processing Summary Dataframe:
24:08:28T15:59:04 | INFO | line:277 |aurora.pipelines.transfer_function_kernel | show_processing_summary |
duration has_data n_samples run station survey run_hdf5_reference station_hdf5_reference fc remote stft mth5_obj dec_level dec_factor sample_rate window_duration num_samples_window num_samples num_stft_windows
0 847648.0 True 847649 b CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 0 1.0 1.000000 128.0 128 847648.0 8829.0
1 847648.0 True 847649 b CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 1 4.0 0.250000 512.0 128 211912.0 2207.0
2 847648.0 True 847649 b CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 2 4.0 0.062500 2048.0 128 52978.0 551.0
3 847648.0 True 847649 b CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 3 4.0 0.015625 8192.0 128 13244.0 137.0
4 1034585.0 True 1034586 d CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 0 1.0 1.000000 128.0 128 1034585.0 10776.0
5 1034585.0 True 1034586 d CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 1 4.0 0.250000 512.0 128 258646.0 2693.0
6 1034585.0 True 1034586 d CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 2 4.0 0.062500 2048.0 128 64661.0 673.0
7 1034585.0 True 1034586 d CAS04 CONUS South <HDF5 object reference> <HDF5 object reference> False False None None 3 4.0 0.015625 8192.0 128 16165.0 168.0
24:08:28T15:59:04 | INFO | line:674 |aurora.pipelines.transfer_function_kernel | memory_warning | Total memory: 62.74 GB
24:08:28T15:59:04 | INFO | line:678 |aurora.pipelines.transfer_function_kernel | memory_warning | Total Bytes of Raw Data: 0.014 GB
24:08:28T15:59:04 | INFO | line:683 |aurora.pipelines.transfer_function_kernel | memory_warning | Raw Data will use: 0.022 % of memory
24:08:28T15:59:04 | INFO | line:517 |aurora.pipelines.process_mth5 | process_mth5_legacy | Processing config indicates 4 decimation levels
24:08:28T15:59:04 | INFO | line:456 |aurora.pipelines.transfer_function_kernel | valid_decimations | After validation there are 4 valid decimation levels
24:08:28T15:59:05 | INFO | line:889 |mtpy.processing.kernel_dataset | initialize_dataframe_for_processing | Dataset dataframe initialized successfully
24:08:28T15:59:05 | INFO | line:140 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 0 Successfully
24:08:28T15:59:07 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:08 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:08 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 25.728968s (0.038867Hz)
24:08:28T15:59:08 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 19.929573s (0.050177Hz)
24:08:28T15:59:09 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 15.164131s (0.065945Hz)
24:08:28T15:59:09 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 11.746086s (0.085135Hz)
24:08:28T15:59:10 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 9.195791s (0.108745Hz)
24:08:28T15:59:11 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 7.362526s (0.135823Hz)
24:08:28T15:59:11 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 5.856115s (0.170762Hz)
24:08:28T15:59:12 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 4.682492s (0.213562Hz)
24:08:28T15:59:14 | INFO | line:123 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 1
24:08:28T15:59:14 | INFO | line:140 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 1 Successfully
24:08:28T15:59:15 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:16 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 102.915872s (0.009717Hz)
24:08:28T15:59:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 85.631182s (0.011678Hz)
24:08:28T15:59:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 68.881694s (0.014518Hz)
24:08:28T15:59:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 54.195827s (0.018452Hz)
24:08:28T15:59:16 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 43.003958s (0.023254Hz)
24:08:28T15:59:17 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 33.310722s (0.030020Hz)
24:08:28T15:59:18 | INFO | line:123 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 2
24:08:28T15:59:18 | INFO | line:140 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 2 Successfully
24:08:28T15:59:19 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:19 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 411.663489s (0.002429Hz)
24:08:28T15:59:19 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 342.524727s (0.002919Hz)
24:08:28T15:59:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 275.526776s (0.003629Hz)
24:08:28T15:59:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 216.783308s (0.004613Hz)
24:08:28T15:59:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 172.015831s (0.005813Hz)
24:08:28T15:59:20 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 133.242890s (0.007505Hz)
24:08:28T15:59:21 | INFO | line:123 |aurora.pipelines.transfer_function_kernel | update_dataset_df | DECIMATION LEVEL 3
24:08:28T15:59:21 | INFO | line:140 |aurora.pipelines.transfer_function_kernel | update_dataset_df | Dataset Dataframe Updated for decimation level 3 Successfully
24:08:28T15:59:22 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:22 | INFO | line:354 |aurora.pipelines.process_mth5 | save_fourier_coefficients | Skip saving FCs. dec_level_config.save_fc = False
24:08:28T15:59:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1514.701336s (0.000660Hz)
24:08:28T15:59:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 1042.488956s (0.000959Hz)
24:08:28T15:59:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 723.371271s (0.001382Hz)
24:08:28T15:59:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 532.971560s (0.001876Hz)
24:08:28T15:59:22 | INFO | line:35 |aurora.time_series.frequency_band_helpers | get_band_for_tf_estimate | Processing band 412.837995s (0.002422Hz)
24:08:28T15:59:23 | INFO | line:761 |mth5.mth5 | close_mth5 | Flushing and closing /home/kkappler/software/irismt/aurora/docs/examples/8P_CAS04.h5
[37]:
type(tf_cls)
[37]:
mt_metadata.transfer_functions.core.TF
Write the transfer functions generated by the Aurora pipeline
[38]:
tf_cls.write(fn="emtfxml_test.xml", file_type="emtfxml")
[38]:
EMTFXML(station='CAS04', latitude=37.63, longitude=-121.47, elevation=335.26)
[39]:
tf_cls.write(fn="emtfxml_test.edi", file_type="edi")
[39]:
Station: CAS04
--------------------------------------------------
Survey: CONUS South
Project: USMTArray
Acquired by: None
Acquired date: 2020-06-02
Latitude: 37.633
Longitude: -121.468
Elevation: 335.262
Impedance: True
Tipper: True
Number of periods: 25
Period Range: 4.68249E+00 -- 1.51470E+03 s
Frequency Range 6.60196E-04 -- 2.13561E-01 s
[40]:
tf_cls.write(fn="emtfxml_test.zss", file_type="zmm")
[40]:
MT( station='CAS04', latitude=37.63, longitude=-121.47, elevation=335.26 )
[ ]: