JWCRP Project to Improve the Computational Efficiency of UKCA in the UKMO UM
A 33 month project has been funded by the Joint Weather and Climate Research Programme between UKMO, UoLeeds and NCAS. The plan is to analyse components of UKCA and implement revisions that improve its computational efficiency. Two logical demarcations are aerosols and chemistry. As of March 2015 developments within the interface to the aerosol sub-system are preparing to deal with columns of atmosphere and thus allow better use of cache. These changes also allow for the extension of Open MP into UKCA. Later analysis will address the chemical solver and build on the experience already gained from the investigation into the backward Euler method as a replacement for the Newton-Raphson technique.
There is a working branch for OpenMP implementation in vn8.6 (ref?) (since Feb 2016)
The vn10.3 branch with only "chunking" did not make it in time for vn10.4. Currently (15 march 2016) making changes in a vn10.4 branch to merge the changes from vn10.3 and add any new coding to cope with the version differences. Paper accepted for Cray User Group (with presentation) in May 2016
Work with vn8.6 was presented at CUG2016 and well received. The target of vn10.5 was missed so new target is vn10.6. The development branch is still at vn10.4 but now has a segmenting method that allows any number of columns in the range 1 to (row_length X rows) to be selected for a segment. Testing has been delayed due to issues running on Monsoon.
The segment method has now been lodged on the trunk and work on OpenMP activation has been started. There are difficulties due to the fact that the code was taken from the head of trunk (HoT) and thus the main development rose suite has become inoperable for that.
Likely scenario going forward is to continue two development branches (i) the branch from last committed segment code and continue to use the rose suite. (ii) the HoT branch for use with rose stem to pick up any variation or side effects.
Below is earlier work prior to 2014
The cost of the Stratosphere-Troposphere chemistry scheme in UKCA using the Newton-Raphson solver with the on-line photolysis scheme Fast-jX relative to the climate model HadGEM3-A is as follows:
|Model||PEs||OpenMP Threads||Time Elapsed (sec)|
|HadGEM3-A + StratTrop(N-R) + Fast-jX||8x16||1||15602|
|HadGEM3-A + StratTrop(N-R) + Fast-jX||8x16||2||11730|
On a 8x16 PE configuration and 1 OpenMP thread on the Monsoon facility or the Met Office's Power6 IBM (hpc1e/1f), UKCA and Fast-jX together add 310% to the cost of HadGEM3-A. However, with 2 OpenMP threads (now standard in HadGEM3-A runs) and no OpenMP compiler directives in UKCA, the relative cost of UKCA is even higher, adding approximately 400% to the cost of HadGEM3-A. Adding aerosol chemistry and UKCA-MODE aerosols will make it even more costly. Therefore, there is a clear need for optimisation. As part of the HadGEM3-ES development project, currently being led by F. O'Connor, there are plans to do some optimisation work on UKCA. In particular, a more complete assessment of the model cost will be carried out and the potential speedup which may be gained from simple code re-writing, load balancing, and the use of OpenMP, dedicated I/O servers and maths libraries will be explored. Some scientific optimisation such as throwing out unwanted reactions (or species), tweaking chemistry to improve convergence, etc. should also be considered. The use of an alternative solver, such as a Rosenbrock solver, may also be investigated.
Updates (8 June 2012)
- The High Performance Computing (HPC) team at the Met Office have now been provided with 3 jobs: HadGEM3-A, HadGEM3-A+StratTrop+Fast-jX, and HadGEM3-A+StratTrop+Achem+Fast-jX+MODE, which run on the Met Office Power7 IBM (hpc2e).
- Calls to Dr Hook in UKCA do not work - Andy Malcolm to investigate and fix.
- Based on crude timings, chemistry scales super-linearly with number of processors.
- Based on crude timings, load balancing appears to be worse for the chemistry than for Fast-jX.
- Further profiling and Dr Hook timing to do.
- Current advice for hpc2e is to run on a PE configuration of 16x16 with 1 OpenMP thread.