Dataset Definition DataFrame¶
The purpose is to show: - how a custom dataframe can be used to define a dataset for processing - that a processing config can be generated based on that dataframe. - Specically, the stations level of the processing config, which contains information about which stations and runs are available. - that we can re-use a run with different start and end times (provided that these data are available) - i.e. say there is a long run at some station, we can use an early chunk and a later chunk from the same run, and omit some intermediate time interal from processing simply by creating one row of the dataset dataframe per run-chunk to process.
There are examples of using ConfigCreator to generate an entire processing config in operate_aurora.ipynb
.
A user can pass to the processing config a dataframe with information about which runs to process.
Here is a simple example of how to do that.
[1]:
import pandas as pd
from mt_metadata.transfer_functions.processing.aurora import Processing
An example of creating a dataframe from scratch with the required columns to pass to Processing
.
Here we consider three stations, mt01
, rr01
, rr02
, each having three runs, labelled 000
, 001
, 002
. Note that we do not need to specify the information about the actual start and end times of the runs that were acquired in the field, the start and end times specified here correspond to time intervals to process. The information about the actual start and end times of the field data acquisition are stored elsewhere, in an MTH5 archive.
[2]:
starts = ["2020-01-01T00:00:00", "2020-02-02T00:00:00"]
ends = ["2020-01-31T12:00:00", "2020-02-28T12:00:00"]
data_list = []
for i_run in range(3):
run_id = f"{i_run}".zfill(3) # note that the run_id could be different for the different stations
for start, end in zip(starts, ends):
entry = {
"station": "mt01",
"run": run_id,
"start": start,
"end": end,
"mth5_path": r"/home/mth5_path.h5" ,
"sample_rate": 10,
"input_channels": ["hx", "hy"],
"output_channels": ["hz", "ex", "ey"],
"remote": False
}
data_list.append(entry)
rr_entry_01 = {
"station": "rr01",
"run": run_id,
"start": start,
"end": end,
"mth5_path": r"/home/mth5_path.h5" ,
"sample_rate": 10,
"input_channels": ["hx", "hy"],
"output_channels": ["hz", "ex", "ey"],
"remote": True
}
data_list.append(rr_entry_01)
rr_entry_02 = {
"station": "rr02",
"run": run_id,
"start": start,
"end": end,
"mth5_path": r"/home/mth5_path.h5" ,
"sample_rate": 10,
"input_channels": ["hx", "hy"],
"output_channels": ["hz", "ex", "ey"],
"remote": True
}
data_list.append(rr_entry_02)
dataset_df = pd.DataFrame(data_list)
dataset_df.start = pd.to_datetime(dataset_df.start, utc=True)
dataset_df.end = pd.to_datetime(dataset_df.end, utc=True)
Here is the dataset_dataframe
. If this is passed to the Processing object p
, then p
will know how to create station metadata. Note that we can specify more than one remote. The ability to work with multiple remotes is something that maybe implemented in future, for now only the first remote station will be used in the processing.
[3]:
dataset_df
[3]:
station | run | start | end | mth5_path | sample_rate | input_channels | output_channels | remote | |
---|---|---|---|---|---|---|---|---|---|
0 | mt01 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | False |
1 | rr01 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
2 | rr02 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
3 | mt01 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | False |
4 | rr01 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
5 | rr02 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
6 | mt01 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | False |
7 | rr01 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
8 | rr02 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
9 | mt01 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | False |
10 | rr01 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
11 | rr02 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
12 | mt01 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | False |
13 | rr01 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
14 | rr02 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
15 | mt01 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | False |
16 | rr01 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
17 | rr02 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10 | [hx, hy] | [hz, ex, ey] | True |
Initialize an empty processing object
[4]:
p = Processing()
p
[4]:
{
"processing": {
"channel_nomenclature.ex": "ex",
"channel_nomenclature.ey": "ey",
"channel_nomenclature.hx": "hx",
"channel_nomenclature.hy": "hy",
"channel_nomenclature.hz": "hz",
"decimations": [],
"id": null,
"stations.local.id": null,
"stations.local.mth5_path": null,
"stations.local.remote": false,
"stations.local.runs": [],
"stations.remote": []
}
}
Create the Stations
container¶
[5]:
p.stations.from_dataset_dataframe(dataset_df)
Now p
has all the station and run information.
[6]:
p
[6]:
{
"processing": {
"channel_nomenclature.ex": "ex",
"channel_nomenclature.ey": "ey",
"channel_nomenclature.hx": "hx",
"channel_nomenclature.hy": "hy",
"channel_nomenclature.hz": "hz",
"decimations": [],
"id": null,
"stations.local.id": "mt01",
"stations.local.mth5_path": "/home/mth5_path.h5",
"stations.local.remote": false,
"stations.local.runs": [
{
"run": {
"id": "000",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
},
{
"run": {
"id": "001",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
},
{
"run": {
"id": "002",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
}
],
"stations.remote": [
{
"station": {
"id": "rr01",
"mth5_path": "/home/mth5_path.h5",
"remote": true,
"runs": [
{
"run": {
"id": "000",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
},
{
"run": {
"id": "001",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
},
{
"run": {
"id": "002",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
}
]
}
},
{
"station": {
"id": "rr02",
"mth5_path": "/home/mth5_path.h5",
"remote": true,
"runs": [
{
"run": {
"id": "000",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
},
{
"run": {
"id": "001",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
},
{
"run": {
"id": "002",
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": 10.0,
"time_periods": [
{
"time_period": {
"end": "2020-01-31T12:00:00+00:00",
"start": "2020-01-01T00:00:00+00:00"
}
},
{
"time_period": {
"end": "2020-02-28T12:00:00+00:00",
"start": "2020-02-02T00:00:00+00:00"
}
}
]
}
}
]
}
}
]
}
}
We can recover the dataframe from p
by asking it for a dataset_dataframe
[7]:
df2 = p.stations.to_dataset_dataframe()
The new dataframe df2
contains the same information as the original, but is not sorted exactly the same
[8]:
df2
[8]:
station | run | start | end | mth5_path | sample_rate | input_channels | output_channels | remote | channel_scale_factors | |
---|---|---|---|---|---|---|---|---|---|---|
0 | mt01 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
1 | mt01 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
2 | mt01 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
3 | mt01 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
4 | mt01 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
5 | mt01 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
6 | rr01 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
7 | rr01 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
8 | rr01 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
9 | rr01 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
10 | rr01 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
11 | rr01 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
12 | rr02 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
13 | rr02 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
14 | rr02 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
15 | rr02 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
16 | rr02 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
17 | rr02 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
To recover the original dataframe, (in this specific example) sort by run, start and station
[9]:
df2.sort_values(by=["run", "start", "station"], inplace=True)
[10]:
df2.reset_index(drop=True, inplace=True)
[11]:
df2
[11]:
station | run | start | end | mth5_path | sample_rate | input_channels | output_channels | remote | channel_scale_factors | |
---|---|---|---|---|---|---|---|---|---|---|
0 | mt01 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
1 | rr01 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
2 | rr02 | 000 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
3 | mt01 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
4 | rr01 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
5 | rr02 | 000 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
6 | mt01 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
7 | rr01 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
8 | rr02 | 001 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
9 | mt01 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
10 | rr01 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
11 | rr02 | 001 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
12 | mt01 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
13 | rr01 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
14 | rr02 | 002 | 2020-01-01 00:00:00+00:00 | 2020-01-31 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
15 | mt01 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | False | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
16 | rr01 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
17 | rr02 | 002 | 2020-02-02 00:00:00+00:00 | 2020-02-28 12:00:00+00:00 | /home/mth5_path.h5 | 10.0 | [hx, hy] | [hz, ex, ey] | True | {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '... |
We can see that the dataframes are equal:
[12]:
(df2[dataset_df.columns]==dataset_df).all().all()
[12]:
True
[ ]:
Excercise:¶
Below is an example of getting a channel_summary dataframe from an mth5, which can be used to inform the choice of a dataset dataframe.
Run the cells below and then create a dataset definition from the channel summary that will process two chunks of data, 3 days at the start of the second run, and 4 days at then end of the last run
[13]:
from mth5.mth5 import MTH5
from mt_metadata import MT_EXPERIMENT_MULTIPLE_RUNS
from mt_metadata.timeseries import Experiment
[14]:
MT_EXPERIMENT_MULTIPLE_RUNS
[14]:
PosixPath('/home/kkappler/software/irismt/mt_metadata/mt_metadata/data/mt_xml/multi_run_experiment.xml')
[15]:
experiment = Experiment()
experiment.from_xml(MT_EXPERIMENT_MULTIPLE_RUNS)
[16]:
m = MTH5()
m.open_mth5("test_dataset_definition.h5", "w")
2024-08-28T15:52:24.361188-0700 | WARNING | mth5.mth5 | open_mth5 | test_dataset_definition.h5 will be overwritten in 'w' mode
2024-08-28T15:52:24.913025-0700 | INFO | mth5.mth5 | _initialize_file | Initialized MTH5 0.2.0 file test_dataset_definition.h5 in mode w
[16]:
/:
====================
|- Group: Experiment
--------------------
|- Group: Reports
-----------------
|- Group: Standards
-------------------
--> Dataset: summary
......................
|- Group: Surveys
-----------------
--> Dataset: channel_summary
..............................
--> Dataset: tf_summary
.........................
[17]:
m.from_experiment(experiment)
[18]:
m.channel_summary.clear_table()
m.channel_summary.summarize()
channel_df = m.channel_summary.to_dataframe()
channel_df
[18]:
survey | station | run | latitude | longitude | elevation | component | start | end | n_samples | sample_rate | measurement_type | azimuth | tilt | units | has_data | hdf5_reference | run_hdf5_reference | station_hdf5_reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CONUS South | UTS14 | a | 37.563198 | -113.301663 | 2490.775 | ex | 2020-07-05 23:19:41+00:00 | 2020-07-06 00:11:55+00:00 | 3134 | 1.0 | electric | 11.193362 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
1 | CONUS South | UTS14 | a | 37.563198 | -113.301663 | 2490.775 | ey | 2020-07-05 23:19:41+00:00 | 2020-07-06 00:11:55+00:00 | 3134 | 1.0 | electric | 101.193362 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
2 | CONUS South | UTS14 | a | 37.563198 | -113.301663 | 2490.775 | hx | 2020-07-05 23:19:41+00:00 | 2020-07-06 00:11:55+00:00 | 3134 | 1.0 | magnetic | 11.193362 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
3 | CONUS South | UTS14 | a | 37.563198 | -113.301663 | 2490.775 | hy | 2020-07-05 23:19:41+00:00 | 2020-07-06 00:11:55+00:00 | 3134 | 1.0 | magnetic | 101.193362 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
4 | CONUS South | UTS14 | a | 37.563198 | -113.301663 | 2490.775 | hz | 2020-07-05 23:19:41+00:00 | 2020-07-06 00:11:55+00:00 | 3134 | 1.0 | magnetic | 0.000000 | 90.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
5 | CONUS South | UTS14 | b | 37.563198 | -113.301663 | 2490.775 | ex | 2020-07-06 00:32:41+00:00 | 2020-07-20 17:43:45+00:00 | 1271464 | 1.0 | electric | 11.193368 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
6 | CONUS South | UTS14 | b | 37.563198 | -113.301663 | 2490.775 | ey | 2020-07-06 00:32:41+00:00 | 2020-07-20 17:43:45+00:00 | 1271464 | 1.0 | electric | 101.193368 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
7 | CONUS South | UTS14 | b | 37.563198 | -113.301663 | 2490.775 | hx | 2020-07-06 00:32:41+00:00 | 2020-07-20 17:43:45+00:00 | 1271464 | 1.0 | magnetic | 11.193368 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
8 | CONUS South | UTS14 | b | 37.563198 | -113.301663 | 2490.775 | hy | 2020-07-06 00:32:41+00:00 | 2020-07-20 17:43:45+00:00 | 1271464 | 1.0 | magnetic | 101.193368 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
9 | CONUS South | UTS14 | b | 37.563198 | -113.301663 | 2490.775 | hz | 2020-07-06 00:32:41+00:00 | 2020-07-20 17:43:45+00:00 | 1271464 | 1.0 | magnetic | 0.000000 | 90.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
10 | CONUS South | UTS14 | c | 37.563198 | -113.301663 | 2490.775 | ex | 2020-07-20 18:54:26+00:00 | 2020-07-28 16:38:25+00:00 | 683039 | 1.0 | electric | 11.193367 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
11 | CONUS South | UTS14 | c | 37.563198 | -113.301663 | 2490.775 | ey | 2020-07-20 18:54:26+00:00 | 2020-07-28 16:38:25+00:00 | 683039 | 1.0 | electric | 101.193367 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
12 | CONUS South | UTS14 | c | 37.563198 | -113.301663 | 2490.775 | hx | 2020-07-20 18:54:26+00:00 | 2020-07-28 16:38:25+00:00 | 683039 | 1.0 | magnetic | 11.193367 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
13 | CONUS South | UTS14 | c | 37.563198 | -113.301663 | 2490.775 | hy | 2020-07-20 18:54:26+00:00 | 2020-07-28 16:38:25+00:00 | 683039 | 1.0 | magnetic | 101.193367 | 0.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
14 | CONUS South | UTS14 | c | 37.563198 | -113.301663 | 2490.775 | hz | 2020-07-20 18:54:26+00:00 | 2020-07-28 16:38:25+00:00 | 683039 | 1.0 | magnetic | 0.000000 | 90.0 | counts | False | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
[19]:
m.close_mth5()
2024-08-28T15:52:26.355757-0700 | INFO | mth5.mth5 | close_mth5 | Flushing and closing test_dataset_definition.h5
From the above channel summary we can see that there are three runs, at station UTS14, the first being a few minutes long, the second about two weeks, and the third around 1 week.
Insert code here for creating custom dataset