Dataset Definition DataFrame¶

The purpose is to show: - how a custom dataframe can be used to define a dataset for processing - that a processing config can be generated based on that dataframe. - Specically, the stations level of the processing config, which contains information about which stations and runs are available. - that we can re-use a run with different start and end times (provided that these data are available) - i.e. say there is a long run at some station, we can use an early chunk and a later chunk from the same run, and omit some intermediate time interal from processing simply by creating one row of the dataset dataframe per run-chunk to process.

There are examples of using ConfigCreator to generate an entire processing config in operate_aurora.ipynb.

A user can pass to the processing config a dataframe with information about which runs to process.

Here is a simple example of how to do that.

[1]:

import pandas as pd
from mt_metadata.transfer_functions.processing.aurora import Processing

An example of creating a dataframe from scratch with the required columns to pass to Processing.

Here we consider three stations, mt01, rr01, rr02, each having three runs, labelled 000, 001, 002. Note that we do not need to specify the information about the actual start and end times of the runs that were acquired in the field, the start and end times specified here correspond to time intervals to process. The information about the actual start and end times of the field data acquisition are stored elsewhere, in an MTH5 archive.

[2]:

starts = ["2020-01-01T00:00:00", "2020-02-02T00:00:00"]
ends = ["2020-01-31T12:00:00", "2020-02-28T12:00:00"]

data_list = []

for i_run in range(3):
    run_id = f"{i_run}".zfill(3)  # note that the run_id could be different for the different stations
    for start, end in zip(starts, ends):
        entry = {
            "station": "mt01",
            "run": run_id,
            "start": start,
            "end": end,
            "mth5_path": r"/home/mth5_path.h5" ,
            "sample_rate": 10,
            "input_channels": ["hx", "hy"],
            "output_channels": ["hz", "ex", "ey"],
            "remote": False
        }

        data_list.append(entry)

        rr_entry_01 = {
            "station": "rr01",
            "run": run_id,
            "start": start,
            "end": end,
            "mth5_path": r"/home/mth5_path.h5" ,
            "sample_rate": 10,
            "input_channels": ["hx", "hy"],
            "output_channels": ["hz", "ex", "ey"],
            "remote": True
        }
        data_list.append(rr_entry_01)

        rr_entry_02 = {
            "station": "rr02",
            "run": run_id,
            "start": start,
            "end": end,
            "mth5_path": r"/home/mth5_path.h5" ,
            "sample_rate": 10,
            "input_channels": ["hx", "hy"],
            "output_channels": ["hz", "ex", "ey"],
            "remote": True
        }
        data_list.append(rr_entry_02)


dataset_df = pd.DataFrame(data_list)
dataset_df.start = pd.to_datetime(dataset_df.start, utc=True)
dataset_df.end = pd.to_datetime(dataset_df.end, utc=True)

Here is the dataset_dataframe. If this is passed to the Processing object p, then p will know how to create station metadata. Note that we can specify more than one remote. The ability to work with multiple remotes is something that maybe implemented in future, for now only the first remote station will be used in the processing.

[3]:

dataset_df

[3]:

	station	run	start	end	mth5_path	sample_rate	input_channels	output_channels	remote
0	mt01	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	False
1	rr01	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
2	rr02	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
3	mt01	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	False
4	rr01	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
5	rr02	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
6	mt01	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	False
7	rr01	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
8	rr02	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
9	mt01	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	False
10	rr01	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
11	rr02	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
12	mt01	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	False
13	rr01	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
14	rr02	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
15	mt01	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	False
16	rr01	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True
17	rr02	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10	[hx, hy]	[hz, ex, ey]	True

Initialize an empty processing object

[4]:

p = Processing()
p

[4]:

{
    "processing": {
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [],
        "id": null,
        "stations.local.id": null,
        "stations.local.mth5_path": null,
        "stations.local.remote": false,
        "stations.local.runs": [],
        "stations.remote": []
    }
}

Create the `Stations` container¶

[5]:

p.stations.from_dataset_dataframe(dataset_df)

Now p has all the station and run information.

[6]:

[6]:

{
    "processing": {
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [],
        "id": null,
        "stations.local.id": "mt01",
        "stations.local.mth5_path": "/home/mth5_path.h5",
        "stations.local.remote": false,
        "stations.local.runs": [
            {
                "run": {
                    "id": "000",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 10.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-01-31T12:00:00+00:00",
                                "start": "2020-01-01T00:00:00+00:00"
                            }
                        },
                        {
                            "time_period": {
                                "end": "2020-02-28T12:00:00+00:00",
                                "start": "2020-02-02T00:00:00+00:00"
                            }
                        }
                    ]
                }
            },
            {
                "run": {
                    "id": "001",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 10.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-01-31T12:00:00+00:00",
                                "start": "2020-01-01T00:00:00+00:00"
                            }
                        },
                        {
                            "time_period": {
                                "end": "2020-02-28T12:00:00+00:00",
                                "start": "2020-02-02T00:00:00+00:00"
                            }
                        }
                    ]
                }
            },
            {
                "run": {
                    "id": "002",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 10.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-01-31T12:00:00+00:00",
                                "start": "2020-01-01T00:00:00+00:00"
                            }
                        },
                        {
                            "time_period": {
                                "end": "2020-02-28T12:00:00+00:00",
                                "start": "2020-02-02T00:00:00+00:00"
                            }
                        }
                    ]
                }
            }
        ],
        "stations.remote": [
            {
                "station": {
                    "id": "rr01",
                    "mth5_path": "/home/mth5_path.h5",
                    "remote": true,
                    "runs": [
                        {
                            "run": {
                                "id": "000",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "001",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "002",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            },
            {
                "station": {
                    "id": "rr02",
                    "mth5_path": "/home/mth5_path.h5",
                    "remote": true,
                    "runs": [
                        {
                            "run": {
                                "id": "000",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "001",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "002",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    }
}

We can recover the dataframe from p by asking it for a dataset_dataframe

[7]:

df2 = p.stations.to_dataset_dataframe()

The new dataframe df2 contains the same information as the original, but is not sorted exactly the same

[8]:

df2

[8]:

	station	run	start	end	mth5_path	sample_rate	input_channels	output_channels	remote	channel_scale_factors
0	mt01	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
1	mt01	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
2	mt01	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
3	mt01	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
4	mt01	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
5	mt01	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
6	rr01	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
7	rr01	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
8	rr01	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
9	rr01	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
10	rr01	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
11	rr01	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
12	rr02	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
13	rr02	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
14	rr02	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
15	rr02	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
16	rr02	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
17	rr02	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...

To recover the original dataframe, (in this specific example) sort by run, start and station

[9]:

df2.sort_values(by=["run", "start", "station"], inplace=True)

[10]:

df2.reset_index(drop=True, inplace=True)

[11]:

df2

[11]:

	station	run	start	end	mth5_path	sample_rate	input_channels	output_channels	remote	channel_scale_factors
0	mt01	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
1	rr01	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
2	rr02	000	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
3	mt01	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
4	rr01	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
5	rr02	000	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
6	mt01	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
7	rr01	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
8	rr02	001	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
9	mt01	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
10	rr01	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
11	rr02	001	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
12	mt01	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
13	rr01	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
14	rr02	002	2020-01-01 00:00:00+00:00	2020-01-31 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
15	mt01	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	False	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
16	rr01	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
17	rr02	002	2020-02-02 00:00:00+00:00	2020-02-28 12:00:00+00:00	/home/mth5_path.h5	10.0	[hx, hy]	[hz, ex, ey]	True	{'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...

We can see that the dataframes are equal:

[12]:

(df2[dataset_df.columns]==dataset_df).all().all()

[12]:

True

[ ]:

Excercise:¶

Below is an example of getting a channel_summary dataframe from an mth5, which can be used to inform the choice of a dataset dataframe.

Run the cells below and then create a dataset definition from the channel summary that will process two chunks of data, 3 days at the start of the second run, and 4 days at then end of the last run

[13]:

from mth5.mth5 import MTH5
from mt_metadata import MT_EXPERIMENT_MULTIPLE_RUNS
from mt_metadata.timeseries import Experiment

[14]:

MT_EXPERIMENT_MULTIPLE_RUNS

[14]:

PosixPath('/home/kkappler/software/irismt/mt_metadata/mt_metadata/data/mt_xml/multi_run_experiment.xml')

[15]:

experiment = Experiment()
experiment.from_xml(MT_EXPERIMENT_MULTIPLE_RUNS)

[16]:

m = MTH5()
m.open_mth5("test_dataset_definition.h5", "w")

2024-08-28T15:52:24.361188-0700 | WARNING | mth5.mth5 | open_mth5 | test_dataset_definition.h5 will be overwritten in 'w' mode
2024-08-28T15:52:24.913025-0700 | INFO | mth5.mth5 | _initialize_file | Initialized MTH5 0.2.0 file test_dataset_definition.h5 in mode w

[16]:

/:
====================
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
        --> Dataset: channel_summary
        ..............................
        --> Dataset: tf_summary
        .........................

[17]:

m.from_experiment(experiment)

[18]:

m.channel_summary.clear_table()
m.channel_summary.summarize()
channel_df = m.channel_summary.to_dataframe()
channel_df

[18]:

	survey	station	run	latitude	longitude	elevation	component	start	end	n_samples	sample_rate	measurement_type	azimuth	tilt	units	has_data	hdf5_reference	run_hdf5_reference	station_hdf5_reference
0	CONUS South	UTS14	a	37.563198	-113.301663	2490.775	ex	2020-07-05 23:19:41+00:00	2020-07-06 00:11:55+00:00	3134	1.0	electric	11.193362	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
1	CONUS South	UTS14	a	37.563198	-113.301663	2490.775	ey	2020-07-05 23:19:41+00:00	2020-07-06 00:11:55+00:00	3134	1.0	electric	101.193362	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
2	CONUS South	UTS14	a	37.563198	-113.301663	2490.775	hx	2020-07-05 23:19:41+00:00	2020-07-06 00:11:55+00:00	3134	1.0	magnetic	11.193362	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
3	CONUS South	UTS14	a	37.563198	-113.301663	2490.775	hy	2020-07-05 23:19:41+00:00	2020-07-06 00:11:55+00:00	3134	1.0	magnetic	101.193362	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
4	CONUS South	UTS14	a	37.563198	-113.301663	2490.775	hz	2020-07-05 23:19:41+00:00	2020-07-06 00:11:55+00:00	3134	1.0	magnetic	0.000000	90.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
5	CONUS South	UTS14	b	37.563198	-113.301663	2490.775	ex	2020-07-06 00:32:41+00:00	2020-07-20 17:43:45+00:00	1271464	1.0	electric	11.193368	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
6	CONUS South	UTS14	b	37.563198	-113.301663	2490.775	ey	2020-07-06 00:32:41+00:00	2020-07-20 17:43:45+00:00	1271464	1.0	electric	101.193368	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
7	CONUS South	UTS14	b	37.563198	-113.301663	2490.775	hx	2020-07-06 00:32:41+00:00	2020-07-20 17:43:45+00:00	1271464	1.0	magnetic	11.193368	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
8	CONUS South	UTS14	b	37.563198	-113.301663	2490.775	hy	2020-07-06 00:32:41+00:00	2020-07-20 17:43:45+00:00	1271464	1.0	magnetic	101.193368	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
9	CONUS South	UTS14	b	37.563198	-113.301663	2490.775	hz	2020-07-06 00:32:41+00:00	2020-07-20 17:43:45+00:00	1271464	1.0	magnetic	0.000000	90.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
10	CONUS South	UTS14	c	37.563198	-113.301663	2490.775	ex	2020-07-20 18:54:26+00:00	2020-07-28 16:38:25+00:00	683039	1.0	electric	11.193367	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
11	CONUS South	UTS14	c	37.563198	-113.301663	2490.775	ey	2020-07-20 18:54:26+00:00	2020-07-28 16:38:25+00:00	683039	1.0	electric	101.193367	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
12	CONUS South	UTS14	c	37.563198	-113.301663	2490.775	hx	2020-07-20 18:54:26+00:00	2020-07-28 16:38:25+00:00	683039	1.0	magnetic	11.193367	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
13	CONUS South	UTS14	c	37.563198	-113.301663	2490.775	hy	2020-07-20 18:54:26+00:00	2020-07-28 16:38:25+00:00	683039	1.0	magnetic	101.193367	0.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
14	CONUS South	UTS14	c	37.563198	-113.301663	2490.775	hz	2020-07-20 18:54:26+00:00	2020-07-28 16:38:25+00:00	683039	1.0	magnetic	0.000000	90.0	counts	False	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>

[19]:

m.close_mth5()

2024-08-28T15:52:26.355757-0700 | INFO | mth5.mth5 | close_mth5 | Flushing and closing test_dataset_definition.h5

From the above channel summary we can see that there are three runs, at station UTS14, the first being a few minutes long, the second about two weeks, and the third around 1 week.

Insert code here for creating custom dataset

Dataset Definition DataFrame¶

Create the Stations container¶

Excercise:¶

Create the `Stations` container¶