Difference between revisions of "ARCHER porting"

From UKCA
Line 17: Line 17:
   
 
The <code>/nerc/n02/n02</code> disk is visable from ARCHER, although the lms is not. On HECToR/lms you should rsync your data from <code>/work/n02/n02</code> and <code>/home/n02/n02</code> to <code>/nerc/n02/n02</code>, and then on ARCHER you can then copy this data back to the correct place. ARCHER also has <code>/work/n02/n02</code> and <code>/home/n02/n02</code> which serve the same function as on HECToR.
 
The <code>/nerc/n02/n02</code> disk is visable from ARCHER, although the lms is not. On HECToR/lms you should rsync your data from <code>/work/n02/n02</code> and <code>/home/n02/n02</code> to <code>/nerc/n02/n02</code>, and then on ARCHER you can then copy this data back to the correct place. ARCHER also has <code>/work/n02/n02</code> and <code>/home/n02/n02</code> which serve the same function as on HECToR.
  +
  +
It is advisable that you copy files across from HECToR to the equivalent location on ARCHER. You should continue to extract the source code to <code>/home</code> and not <code>/work</code>.
   
 
==Timings==
 
==Timings==

Revision as of 12:55, 16 January 2014

Several of the UKCA Release Jobs have been ported from HECToR to ARCHER, and their jobids are detailed on that page.

Requesting an account on ARCHER

Please see the NCAS-CMS ARCHER page for information as to how to register for access and other useful information.

To get started on ARCHER you will first need to complete the setting up step as was done for HECToR. To do this, please see the CMS FCM Setting Up page (only changing login.hector.ac.uk for login.archer.ac.uk). Some steps will already have been done, and you can re-use the ssh-key you use for HECToR.

You should also put the following lines in your .profile

module use /home/n02/n02/ros/modules
module load um-cce

to load the cce compiler.

Copying Files

The /nerc/n02/n02 disk is visable from ARCHER, although the lms is not. On HECToR/lms you should rsync your data from /work/n02/n02 and /home/n02/n02 to /nerc/n02/n02, and then on ARCHER you can then copy this data back to the correct place. ARCHER also has /work/n02/n02 and /home/n02/n02 which serve the same function as on HECToR.

It is advisable that you copy files across from HECToR to the equivalent location on ARCHER. You should continue to extract the source code to /home and not /work.

Timings

Scaling and efficiency profiling for N48L60 UKCA CheS+ on ARCHER

The plot shows how UKCA scales when ported to ARCHER. The most efficient number of nodes to use is 3 (8x9 or 12x6), although the model will run fastest on 6 nodes (e.g. 16x9 or 12x12). Moving to 3 nodes from 6 will increase the run-time by 50%. Linear scaling is possible up to 4 nodes (12x8).

Core-for-core, ARCHER is around 2.3 times faster than HECToR. However, there are 24 cores per node on ARCHER compared to 32 for HECToR, so the domain decomposition of the model will be slightly different on ARCHER.

Code Changes

Cray cce Fortran Compiler

Corrections required for the Cray cce compiler on ARCHER can be found on the bugfixes page for the Release Jobs.

Spectral Files

Due to updates to the compiler, the old format of the UM spectral files will cause the code the crash. These need to be re-formatted using a python script which can be found at

/work/n02/n02/ukca/bin/lex.py

which was provided by Cray to NCAS-CMS. This script should be run as

/work/n02/n02/ukca/bin/lex.py file.in > file.out

and it is this file.out which should be used by the UM.

The changes to the standard spectral files should have been made for you, but if you are using your own spectral files you will need to make these changes.

Running Jobs

Differences

The key differences between the two machines can be found by diffing two of the UKCA standard jobs, e.g.

Job xfvfc Title UKCA-TropIsop HECToR Phase3 N48L60 QESM-A
Job xfvfd Title UKCA-TropIsop ARCHER cce N48L60 QESM-A
Difference in window subindep_Target_Machine
 -> Model Selection
   -> User Information and Target Machine
     -> Target Machine
Entry box: Machine name
 Job xfvfc: Entry is set to 'phase3.hector.ac.uk'
 Job xfvfd: Entry is set to 'login.archer.ac.uk'
Entry box: Define the number of processors East-West. (Must be 1 or an even number).
 Job xfvfc: Entry is set to '16'
 Job xfvfd: Entry is set to '8'
Entry box: Define the number of processors North-South
 Job xfvfc: Entry is set to '8'
 Job xfvfd: Entry is set to '9'

Difference in window subindep_FileDir
 -> Model Selection
   -> Input/Output Control and Resources
     -> Time Convention and SCRIPT Environment Variables.
Differences in Table Defined Environment Variables for Directories
 1,6c1,7
<  HGPKG2 /work/n02/n02/ros/HadGEM3-A/vn7.3/HGPKG2
<  STARTDUMPS /work/n02/n02/annette/hadgem3/initial/atmos/N48L60
<  UKCA_L60_DIR /work/n02/n02/ukca/ANCILS/L60
<  ANCIL_ATMOS /work/n02/n02/annette/hadgem3/ancil/atmos_n48
<  ANCIL_LAND /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48
<  QESM_DIR /work/n02/n02/ukca/ANCILS/QESM
---
>  UKCA_WORKDIR /work/n02/n02/ukca
>  HGPKG2 $UKCA_WORKDIR/HadGEM3-A/vn7.3/HGPKG2
>  STARTDUMPS $UKCA_WORKDIR/hadgem3/initial/atmos/N48L60
>  UKCA_L60_DIR $UKCA_WORKDIR/ANCILS/L60
>  ANCIL_ATMOS $UKCA_WORKDIR/hadgem3/ancil/atmos_n48
>  ANCIL_LAND $UKCA_WORKDIR/hadgem3/ancil/land/ORCA2_N48
>  QESM_DIR $UKCA_WORKDIR/ANCILS/QESM

Difference in window subindep_JobRes2
 -> Model Selection
   -> Input/Output Control and Resources
     -> Job submission, resources and re-submission pattern
       -> Follow on window
Entry box: Job time limit, for QSUB 
 Job xfvfc: Entry is set to '43200'
 Job xfvfd: Entry is set to '21600'
Entry box: Months 
 Job xfvfc: Entry is set to '2'
 Job xfvfd: Entry is set to '0'

Difference in window subindep_HandEdit
 -> Model Selection
   -> Input/Output Control and Resources
     -> User hand edit files
Differences in Table Hand edits
 23a24
>  ~ros/umui_jobs/hand-edits/archer/cce-7.3.ed Y

Difference in window subindep_Compile
 -> Model Selection
   -> Compilation and Modifications
     -> Compile options for the UM model
Check box: Change the system default for the max no of compilation processes?
 Job xfvfc: Entry is set to 'ON'
 Job xfvfd: Entry is set to 'OFF'

Difference in window subindep_Compile_User
 -> Model Selection
   -> Compilation and Modifications
     -> UM User Override Files
Differences in Table User machine overrides
 1a2
>  /home/ros/umui_jobs/overrides/archer_cce_7.3_machine Y


Difference in window subindep_Recon_Gen
 -> Model Selection
   -> Reconfiguration
     -> General Reconfiguration Options
Entry box: Define the number of processors East-West
 Job xfvfc: Entry is set to '12'
 Job xfvfd: Entry is set to '4'
Entry box: Define the number of processors North-South
 Job xfvfc: Entry is set to '8'
 Job xfvfd: Entry is set to '6'

Difference in window atmos_STASH_UserProgs
 -> Model Selection
   -> Atmosphere
     -> STASH
       -> Initialisation of User Prognostics
Differences in Table Specify Initialisation Option
 1,2c1,2
<  Blank Blank 7  /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
<  Blank Blank 7  /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev
---
>  Blank Blank 7  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
>  Blank Blank 7  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev

Essentially, the domain decomposition (see above) needs to change as the nodes are different (24 cores per node, rather than 32 as on HECToR). The run-time also needs to change to reflect that ARCHER is faster. The location of the N48 MetUM ancillaries has now been moved to the /work/n02/n02/ukca directory. There is also a hand-edit and compiler over-ride required for the cce Fortran compiler, which are needed for the UM and are not UKCA specific.