SVRIMG - SeVere Reflectivity IMaGe Dataset

Training and Testing Machine Learning Models using SVRIMG

Home Classify Data Machine Learning Notebooks (GitHub) Mean Storms

Machine Learning

A major hurdle for automated approaches is the training and validation of machine learning models. These models use massive amounts of data to find patterns that can be used to make predictions or classifications. Image classification models, such as convolutional neural networks, need even more data than "traditional" machine learning approaches. Datasets such as MNIST and CIFAR are used to train these models, and contain 10s of thousands of images. At the moment, there is no comparable dataset for storm morphology. Initially, we are going to provide a limited dataset of labels on which to test your down machine learning models. We hope that with crowd-sourcing the classification process, we can greatly expand the size of the dataset.

Initial Data

The initial sample data are available below. We use a train/validation/test split of ~70% / ~10% / ~20%, respectively, and the data are organized as follows:

You can browse image thumbnails for each class at the following links:

Initial Benchmarking Results

These will be posted shortly, along with examples.

Data Description

Radar images are centered on SPC severe weather reports and extracted from the closest hourly data in GridRad which can be downloaded from the Research Data Archive. The original ~2x2km 3D data are converted to 2D by calculating the column maximum reflectivity. These values are then converted to 8-bit integers and interpolated to a 3.75 km Lambert conformal conic grid using nearest neighbor. The 136 x 136 dimensions result in a region approximately 512 x 512 km.

Each image is assigned to one of six classes. These classes and their descriptions are as follows:

Class Name Class Description
Cellular Circular areas of red and orange near the center of the image.

QLCS Continuous red and orange lines that intersect the center of the image.

Tropical Green and yellow lines that appear to circle around the bottom or left edge of the image.

Other Morphologies that do not obviously fit into one of the previous three classes.

Noise Low intensity rings, spikes, or pixelation that does not look natural.

Missing The entire image or the majority of the image is blue (i.e., missing intensity).

Disclaimers and Caveats

The data are provided at no cost, as-is, with no warranty of any kind. No modification of either the SPC reports or the GridRad data (beyond interpolation) is done before these data are hosted on the website. The process is completely repeatable from start to finish, assuming you have patience or access to a supercomputer cluster. Please examine the GridRad and the SPC severe weather reports pages to read about the caveats and issues with those data before using these data. See the data page for more information.

Data Citations

We are generating these data solely because we think they would be of interest to the meteorology and climatology community. That being said, we would like to get some credit if you find them useful!

If using these data in a paper or project, please cite the methods paper:

Haberlie, A. M., W. S. Ashley, and M. Karpinski, 2020: Mean storms: Composites of radar reflectivity images during two decades of severe thunderstorm events. International Journal of Climatology, In Press.

Please cite the GridRad dataset as well:

Bowman, K. P., and C. R. Homeyer. 2017. GridRad - Three-Dimensional Gridded NEXRAD WSR-88D Radar Data. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory.