ARCHER porting

From UKCA
Revision as of 17:52, 14 January 2014 by Nla27 (talk | contribs)

Several of the UKCA Release Jobs have been ported from HECToR to ARCHER, and their jobids are detailed on that page.

Timings

Scaling and efficiency profiling for N48L60 UKCA CheS+ on ARCHER

The plot shows how UKCA scales when ported to ARCHER. The most efficient number of nodes to use is 3 (8x9), although the model will run fastest on 6 nodes (e.g. 16x9 or 12x12). Moving to 3 nodes from 6 will increase the run-time by 50%. Linear scaling is possible up to 4 nodes (12x8).

Code Changes

Cray cce Fortran Compiler

Corrections required for the Cray cce compiler on archer can be found on the bugfixes page.

Running Jobs

To get started on ARCHER you will first need to complete the setting up step as was done for HECToR. To do this, please see the CMS FCM Setting Up page (only changing login.hector.ac.uk for login.archer.ac.uk). Some steps will already have been done, and you can re-use the ssh-key you use for HECToR.

Differences

The key differences between the two machines can be found by diffing two of the UKCA standard jobs, e.g.

Job xfvfc Title UKCA-TropIsop HECToR Phase3 N48L60 QESM-A
Job xfvfd Title UKCA-TropIsop ARCHER cce N48L60 QESM-A
Difference in window subindep_Target_Machine
 -> Model Selection
   -> User Information and Target Machine
     -> Target Machine
Entry box: Machine name
 Job xfvfc: Entry is set to 'phase3.hector.ac.uk'
 Job xfvfd: Entry is set to 'login.archer.ac.uk'
Entry box: Define the number of processors East-West. (Must be 1 or an even number).
 Job xfvfc: Entry is set to '16'
 Job xfvfd: Entry is set to '8'
Entry box: Define the number of processors North-South
 Job xfvfc: Entry is set to '8'
 Job xfvfd: Entry is set to '9'

Difference in window subindep_FileDir
 -> Model Selection
   -> Input/Output Control and Resources
     -> Time Convention and SCRIPT Environment Variables.
Differences in Table Defined Environment Variables for Directories
 1,6c1,7
<  HGPKG2 /work/n02/n02/ros/HadGEM3-A/vn7.3/HGPKG2
<  STARTDUMPS /work/n02/n02/annette/hadgem3/initial/atmos/N48L60
<  UKCA_L60_DIR /work/n02/n02/ukca/ANCILS/L60
<  ANCIL_ATMOS /work/n02/n02/annette/hadgem3/ancil/atmos_n48
<  ANCIL_LAND /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48
<  QESM_DIR /work/n02/n02/ukca/ANCILS/QESM
---
>  UKCA_WORKDIR /work/n02/n02/ukca
>  HGPKG2 $UKCA_WORKDIR/HadGEM3-A/vn7.3/HGPKG2
>  STARTDUMPS $UKCA_WORKDIR/hadgem3/initial/atmos/N48L60
>  UKCA_L60_DIR $UKCA_WORKDIR/ANCILS/L60
>  ANCIL_ATMOS $UKCA_WORKDIR/hadgem3/ancil/atmos_n48
>  ANCIL_LAND $UKCA_WORKDIR/hadgem3/ancil/land/ORCA2_N48
>  QESM_DIR $UKCA_WORKDIR/ANCILS/QESM

Difference in window subindep_JobRes2
 -> Model Selection
   -> Input/Output Control and Resources
     -> Job submission, resources and re-submission pattern
       -> Follow on window
Entry box: Job time limit, for QSUB 
 Job xfvfc: Entry is set to '43200'
 Job xfvfd: Entry is set to '21600'
Entry box: Months 
 Job xfvfc: Entry is set to '2'
 Job xfvfd: Entry is set to '0'

Difference in window subindep_HandEdit
 -> Model Selection
   -> Input/Output Control and Resources
     -> User hand edit files
Differences in Table Hand edits
 23a24
>  ~ros/umui_jobs/hand-edits/archer/cce-7.3.ed Y

Difference in window subindep_Compile
 -> Model Selection
   -> Compilation and Modifications
     -> Compile options for the UM model
Check box: Change the system default for the max no of compilation processes?
 Job xfvfc: Entry is set to 'ON'
 Job xfvfd: Entry is set to 'OFF'

Difference in window subindep_Compile_User
 -> Model Selection
   -> Compilation and Modifications
     -> UM User Override Files
Differences in Table User machine overrides
 1a2
>  /home/ros/umui_jobs/overrides/archer_cce_7.3_machine Y


Difference in window subindep_Recon_Gen
 -> Model Selection
   -> Reconfiguration
     -> General Reconfiguration Options
Entry box: Define the number of processors East-West
 Job xfvfc: Entry is set to '12'
 Job xfvfd: Entry is set to '4'
Entry box: Define the number of processors North-South
 Job xfvfc: Entry is set to '8'
 Job xfvfd: Entry is set to '6'

Difference in window atmos_STASH_UserProgs
 -> Model Selection
   -> Atmosphere
     -> STASH
       -> Initialisation of User Prognostics
Differences in Table Specify Initialisation Option
 1,2c1,2
<  Blank Blank 7  /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
<  Blank Blank 7  /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev
---
>  Blank Blank 7  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
>  Blank Blank 7  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev

Essentially, the domain decomposition (see above) needs to change as the nodes are different (24 cores per node, rather than 32 as on HECToR). The run-time also needs to change to reflect that ARCHER is faster. The location of the N48 MetUM ancillaries has now been moved to the /work/n02/n02/ukca directory. There is also a hand-edit and compiler over-ride required for the cce Fortran compiler, which are needed for the UM and are not UKCA specific.