Moving jobs to Monsoon Phase II

From UKCA

The Phase-II of the MONSooN system with IBM Power-7 processors has replaced phase I which will be switched off on Monday 17th September.

This has roughly 3x the number of nodes as compared to the Phase I MONSooN system.

Full details of the new system and user instructions can be found at http://collab.metoffice.gov.uk/twiki/bin/view/Support/MONSooNPhase2 but some important points and UKCA specific changes are summarised below:


  • The system can accessed as 'ibm02' from lander and as 'hpc2c' from ibm00 or postproc01 (the current ibm00 is aliased as 'hpc1c')
  • The /home, /projects and /nerc filesystems have been copied over in the last week of July, so changes made after this on the current system will not be available on the new one and it is now the user's responsibility to ensure they have all the data they need on the new system.
  • Normal 'scp' or 'rsync' commands can be used to transfer between the systems e.g:
 rsync -av --bwlimit=10000 $HOME/mydir hpc2c:$HOME/
  • To modify your UMUI jobs for the new system:

1. Change the Target Machine to 'ibm02'

2a. For vn7.3 jobs:

    Compilation and Modifications --> UM user overrides --> Include the following machine overrides:
    /home/umui/overrides/monsoon/2c_netcdf_7.3-7.4.ovr  Y

2b. for vn8.x jobs: (override at 2. not required)

    Compilation and Modifications --> UM user overrides --> Include the following machine overrides:
    /home/ros/umui_jobs/overrides/monsoon/remove_grib_api.ovr  Y

3. The following Environment Settings have helped overcome runtime memory limit problems for UKCA on the Power7 systems:

    Input/ Output Control and Resources --> Scripts Insertions and Modifications --> Define General Environmental settings:
    MP_SHARED_MEMORY=NO   (Default was YES)
    MP_SHMCC_EXCLUDE_LIST=Gatherv

4. In case the Compilation job runs out of memory:

    Compilation and modifications --> Compilation options of UM--> Specify max no. of compilation processes = 3
    Include the handedit: /home/mdalvi/umui_jobs/hand_edits/incr_compile_mem.ed 

5. Modify any paths/ file locations that have changed after the folders were last copied.

  • So far, we have found that results from UKCA runs between the two systems are identical. However, any change to number of PEs or NRUN/CRUN length does lead to differences (for Newton-Raphson solver at least).
  • Any un-initialised variables in the code were found to be more likely to pick up random values on the new system, changing the results. E.g. an un-initialised logical in the reconfiguration (See fix at r2946 of PUMA trunk) now persistently acquires TRUE value, modifying the start dump, so this fix should be added to your branches.