Dataset Definition DataFrame

The purpose is to show: - how a custom dataframe can be used to define a dataset for processing - that a processing config can be generated based on that dataframe. - Specically, the stations level of the processing config, which contains information about which stations and runs are available. - that we can re-use a run with different start and end times (provided that these data are available) - i.e. say there is a long run at some station, we can use an early chunk and a later chunk from the same run, and omit some intermediate time interal from processing simply by creating one row of the dataset dataframe per run-chunk to process.

There are examples of using ConfigCreator to generate an entire processing config in operate_aurora.ipynb.

A user can pass to the processing config a dataframe with information about which runs to process.

Here is a simple example of how to do that.

[1]:
import pandas as pd
from mt_metadata.transfer_functions.processing.aurora import Processing

An example of creating a dataframe from scratch with the required columns to pass to Processing.

Here we consider three stations, mt01, rr01, rr02, each having three runs, labelled 000, 001, 002. Note that we do not need to specify the information about the actual start and end times of the runs that were acquired in the field, the start and end times specified here correspond to time intervals to process. The information about the actual start and end times of the field data acquisition are stored elsewhere, in an MTH5 archive.

[2]:
starts = ["2020-01-01T00:00:00", "2020-02-02T00:00:00"]
ends = ["2020-01-31T12:00:00", "2020-02-28T12:00:00"]

data_list = []

for i_run in range(3):
    run_id = f"{i_run}".zfill(3)  # note that the run_id could be different for the different stations
    for start, end in zip(starts, ends):
        entry = {
            "station": "mt01",
            "run": run_id,
            "start": start,
            "end": end,
            "mth5_path": r"/home/mth5_path.h5" ,
            "sample_rate": 10,
            "input_channels": ["hx", "hy"],
            "output_channels": ["hz", "ex", "ey"],
            "remote": False
        }

        data_list.append(entry)

        rr_entry_01 = {
            "station": "rr01",
            "run": run_id,
            "start": start,
            "end": end,
            "mth5_path": r"/home/mth5_path.h5" ,
            "sample_rate": 10,
            "input_channels": ["hx", "hy"],
            "output_channels": ["hz", "ex", "ey"],
            "remote": True
        }
        data_list.append(rr_entry_01)

        rr_entry_02 = {
            "station": "rr02",
            "run": run_id,
            "start": start,
            "end": end,
            "mth5_path": r"/home/mth5_path.h5" ,
            "sample_rate": 10,
            "input_channels": ["hx", "hy"],
            "output_channels": ["hz", "ex", "ey"],
            "remote": True
        }
        data_list.append(rr_entry_02)


dataset_df = pd.DataFrame(data_list)
dataset_df.start = pd.to_datetime(dataset_df.start, utc=True)
dataset_df.end = pd.to_datetime(dataset_df.end, utc=True)

Here is the dataset_dataframe. If this is passed to the Processing object p, then p will know how to create station metadata. Note that we can specify more than one remote. The ability to work with multiple remotes is something that maybe implemented in future, for now only the first remote station will be used in the processing.

[3]:
dataset_df
[3]:
station run start end mth5_path sample_rate input_channels output_channels remote
0 mt01 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] False
1 rr01 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
2 rr02 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
3 mt01 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] False
4 rr01 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
5 rr02 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
6 mt01 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] False
7 rr01 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
8 rr02 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
9 mt01 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] False
10 rr01 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
11 rr02 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
12 mt01 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] False
13 rr01 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
14 rr02 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
15 mt01 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] False
16 rr01 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True
17 rr02 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10 [hx, hy] [hz, ex, ey] True

Initialize an empty processing object

[4]:
p = Processing()
p
[4]:
{
    "processing": {
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [],
        "id": null,
        "stations.local.id": null,
        "stations.local.mth5_path": null,
        "stations.local.remote": false,
        "stations.local.runs": [],
        "stations.remote": []
    }
}

Create the Stations container

[5]:
p.stations.from_dataset_dataframe(dataset_df)

Now p has all the station and run information.

[6]:
p
[6]:
{
    "processing": {
        "channel_nomenclature.ex": "ex",
        "channel_nomenclature.ey": "ey",
        "channel_nomenclature.hx": "hx",
        "channel_nomenclature.hy": "hy",
        "channel_nomenclature.hz": "hz",
        "decimations": [],
        "id": null,
        "stations.local.id": "mt01",
        "stations.local.mth5_path": "/home/mth5_path.h5",
        "stations.local.remote": false,
        "stations.local.runs": [
            {
                "run": {
                    "id": "000",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 10.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-01-31T12:00:00+00:00",
                                "start": "2020-01-01T00:00:00+00:00"
                            }
                        },
                        {
                            "time_period": {
                                "end": "2020-02-28T12:00:00+00:00",
                                "start": "2020-02-02T00:00:00+00:00"
                            }
                        }
                    ]
                }
            },
            {
                "run": {
                    "id": "001",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 10.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-01-31T12:00:00+00:00",
                                "start": "2020-01-01T00:00:00+00:00"
                            }
                        },
                        {
                            "time_period": {
                                "end": "2020-02-28T12:00:00+00:00",
                                "start": "2020-02-02T00:00:00+00:00"
                            }
                        }
                    ]
                }
            },
            {
                "run": {
                    "id": "002",
                    "input_channels": [
                        {
                            "channel": {
                                "id": "hx",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "hy",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "output_channels": [
                        {
                            "channel": {
                                "id": "hz",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ex",
                                "scale_factor": 1.0
                            }
                        },
                        {
                            "channel": {
                                "id": "ey",
                                "scale_factor": 1.0
                            }
                        }
                    ],
                    "sample_rate": 10.0,
                    "time_periods": [
                        {
                            "time_period": {
                                "end": "2020-01-31T12:00:00+00:00",
                                "start": "2020-01-01T00:00:00+00:00"
                            }
                        },
                        {
                            "time_period": {
                                "end": "2020-02-28T12:00:00+00:00",
                                "start": "2020-02-02T00:00:00+00:00"
                            }
                        }
                    ]
                }
            }
        ],
        "stations.remote": [
            {
                "station": {
                    "id": "rr01",
                    "mth5_path": "/home/mth5_path.h5",
                    "remote": true,
                    "runs": [
                        {
                            "run": {
                                "id": "000",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "001",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "002",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            },
            {
                "station": {
                    "id": "rr02",
                    "mth5_path": "/home/mth5_path.h5",
                    "remote": true,
                    "runs": [
                        {
                            "run": {
                                "id": "000",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "001",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        },
                        {
                            "run": {
                                "id": "002",
                                "input_channels": [
                                    {
                                        "channel": {
                                            "id": "hx",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "hy",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "output_channels": [
                                    {
                                        "channel": {
                                            "id": "hz",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ex",
                                            "scale_factor": 1.0
                                        }
                                    },
                                    {
                                        "channel": {
                                            "id": "ey",
                                            "scale_factor": 1.0
                                        }
                                    }
                                ],
                                "sample_rate": 10.0,
                                "time_periods": [
                                    {
                                        "time_period": {
                                            "end": "2020-01-31T12:00:00+00:00",
                                            "start": "2020-01-01T00:00:00+00:00"
                                        }
                                    },
                                    {
                                        "time_period": {
                                            "end": "2020-02-28T12:00:00+00:00",
                                            "start": "2020-02-02T00:00:00+00:00"
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    }
}

We can recover the dataframe from p by asking it for a dataset_dataframe

[7]:
df2 = p.stations.to_dataset_dataframe()

The new dataframe df2 contains the same information as the original, but is not sorted exactly the same

[8]:
df2
[8]:
station run start end mth5_path sample_rate input_channels output_channels remote channel_scale_factors
0 mt01 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
1 mt01 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
2 mt01 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
3 mt01 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
4 mt01 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
5 mt01 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
6 rr01 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
7 rr01 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
8 rr01 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
9 rr01 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
10 rr01 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
11 rr01 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
12 rr02 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
13 rr02 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
14 rr02 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
15 rr02 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
16 rr02 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
17 rr02 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...

To recover the original dataframe, (in this specific example) sort by run, start and station

[9]:
df2.sort_values(by=["run", "start", "station"], inplace=True)
[10]:
df2.reset_index(drop=True, inplace=True)
[11]:
df2
[11]:
station run start end mth5_path sample_rate input_channels output_channels remote channel_scale_factors
0 mt01 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
1 rr01 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
2 rr02 000 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
3 mt01 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
4 rr01 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
5 rr02 000 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
6 mt01 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
7 rr01 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
8 rr02 001 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
9 mt01 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
10 rr01 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
11 rr02 001 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
12 mt01 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
13 rr01 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
14 rr02 002 2020-01-01 00:00:00+00:00 2020-01-31 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
15 mt01 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] False {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
16 rr01 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...
17 rr02 002 2020-02-02 00:00:00+00:00 2020-02-28 12:00:00+00:00 /home/mth5_path.h5 10.0 [hx, hy] [hz, ex, ey] True {'hx': 1.0, 'hy': 1.0, 'hz': 1.0, 'ex': 1.0, '...

We can see that the dataframes are equal:

[12]:
(df2[dataset_df.columns]==dataset_df).all().all()
[12]:
True
[ ]:

Excercise:

Below is an example of getting a channel_summary dataframe from an mth5, which can be used to inform the choice of a dataset dataframe.

Run the cells below and then create a dataset definition from the channel summary that will process two chunks of data, 3 days at the start of the second run, and 4 days at then end of the last run

[13]:
from mth5.mth5 import MTH5
from mt_metadata import MT_EXPERIMENT_MULTIPLE_RUNS
from mt_metadata.timeseries import Experiment
[14]:
MT_EXPERIMENT_MULTIPLE_RUNS
[14]:
PosixPath('/home/kkappler/software/irismt/mt_metadata/mt_metadata/data/mt_xml/multi_run_experiment.xml')
[15]:
experiment = Experiment()
experiment.from_xml(MT_EXPERIMENT_MULTIPLE_RUNS)
[16]:
m = MTH5()
m.open_mth5("test_dataset_definition.h5", "w")
2024-08-28T15:52:24.361188-0700 | WARNING | mth5.mth5 | open_mth5 | test_dataset_definition.h5 will be overwritten in 'w' mode
2024-08-28T15:52:24.913025-0700 | INFO | mth5.mth5 | _initialize_file | Initialized MTH5 0.2.0 file test_dataset_definition.h5 in mode w
[16]:
/:
====================
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
        --> Dataset: channel_summary
        ..............................
        --> Dataset: tf_summary
        .........................
[17]:
m.from_experiment(experiment)
[18]:
m.channel_summary.clear_table()
m.channel_summary.summarize()
channel_df = m.channel_summary.to_dataframe()
channel_df
[18]:
survey station run latitude longitude elevation component start end n_samples sample_rate measurement_type azimuth tilt units has_data hdf5_reference run_hdf5_reference station_hdf5_reference
0 CONUS South UTS14 a 37.563198 -113.301663 2490.775 ex 2020-07-05 23:19:41+00:00 2020-07-06 00:11:55+00:00 3134 1.0 electric 11.193362 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
1 CONUS South UTS14 a 37.563198 -113.301663 2490.775 ey 2020-07-05 23:19:41+00:00 2020-07-06 00:11:55+00:00 3134 1.0 electric 101.193362 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
2 CONUS South UTS14 a 37.563198 -113.301663 2490.775 hx 2020-07-05 23:19:41+00:00 2020-07-06 00:11:55+00:00 3134 1.0 magnetic 11.193362 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
3 CONUS South UTS14 a 37.563198 -113.301663 2490.775 hy 2020-07-05 23:19:41+00:00 2020-07-06 00:11:55+00:00 3134 1.0 magnetic 101.193362 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
4 CONUS South UTS14 a 37.563198 -113.301663 2490.775 hz 2020-07-05 23:19:41+00:00 2020-07-06 00:11:55+00:00 3134 1.0 magnetic 0.000000 90.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
5 CONUS South UTS14 b 37.563198 -113.301663 2490.775 ex 2020-07-06 00:32:41+00:00 2020-07-20 17:43:45+00:00 1271464 1.0 electric 11.193368 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
6 CONUS South UTS14 b 37.563198 -113.301663 2490.775 ey 2020-07-06 00:32:41+00:00 2020-07-20 17:43:45+00:00 1271464 1.0 electric 101.193368 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
7 CONUS South UTS14 b 37.563198 -113.301663 2490.775 hx 2020-07-06 00:32:41+00:00 2020-07-20 17:43:45+00:00 1271464 1.0 magnetic 11.193368 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
8 CONUS South UTS14 b 37.563198 -113.301663 2490.775 hy 2020-07-06 00:32:41+00:00 2020-07-20 17:43:45+00:00 1271464 1.0 magnetic 101.193368 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
9 CONUS South UTS14 b 37.563198 -113.301663 2490.775 hz 2020-07-06 00:32:41+00:00 2020-07-20 17:43:45+00:00 1271464 1.0 magnetic 0.000000 90.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
10 CONUS South UTS14 c 37.563198 -113.301663 2490.775 ex 2020-07-20 18:54:26+00:00 2020-07-28 16:38:25+00:00 683039 1.0 electric 11.193367 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
11 CONUS South UTS14 c 37.563198 -113.301663 2490.775 ey 2020-07-20 18:54:26+00:00 2020-07-28 16:38:25+00:00 683039 1.0 electric 101.193367 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
12 CONUS South UTS14 c 37.563198 -113.301663 2490.775 hx 2020-07-20 18:54:26+00:00 2020-07-28 16:38:25+00:00 683039 1.0 magnetic 11.193367 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
13 CONUS South UTS14 c 37.563198 -113.301663 2490.775 hy 2020-07-20 18:54:26+00:00 2020-07-28 16:38:25+00:00 683039 1.0 magnetic 101.193367 0.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
14 CONUS South UTS14 c 37.563198 -113.301663 2490.775 hz 2020-07-20 18:54:26+00:00 2020-07-28 16:38:25+00:00 683039 1.0 magnetic 0.000000 90.0 counts False <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
[19]:
m.close_mth5()
2024-08-28T15:52:26.355757-0700 | INFO | mth5.mth5 | close_mth5 | Flushing and closing test_dataset_definition.h5

From the above channel summary we can see that there are three runs, at station UTS14, the first being a few minutes long, the second about two weeks, and the third around 1 week.

Insert code here for creating custom dataset