End to End Climate Science Workflow Management Service: Automation of Climate Science Centric Workflows
We are designing and running thousands of simulations under the CASCADE SFA project. These simulations range from the short term initialized simulations to the multi-decadal ensembles. These runs will produce large, complex and diverse datasets that must be validated and quality controlled. We run TECA (Toolkit for Extreme Climate Analysis) on these data to detect and quantify events, and then use detailed analyses to answer specific science questions. Critically, the dataset size after this initial processing to find events is by orders of magnitude smaller, easing subsequent analyses and uncertainty quantification efforts.
Since it is impractical to manage these complex processes by hand, we use a lightweight workflow system to automate most of the tasks in the pipeline so that climate scientists on the team can focus on interpretation. Our workflow system is implemented in Python and is capable of ingesting configurations, launching jobs on HPC systems, validating job output, and archiving data. The workflow system executes a configurable set of user-specified feature detection and statistical analysis tasks. This greatly facilitates intercomparisons of model runs and analyses of ensembles.