ARCHER porting

From UKCA

Several of the UKCA Release Jobs have been ported from HECToR to ARCHER, and their jobids are detailed on that page.

Requesting an account on ARCHER

Please see the NCAS-CMS ARCHER page for information as to how to register for access and other useful information.

To get started on ARCHER you will first need to complete the setting up step as was done for HECToR. To do this, please see the CMS FCM Setting Up page (only changing login.hector.ac.uk for login.archer.ac.uk). Some steps will already have been done, and you can re-use the ssh-key you use for HECToR.

You should be able to use the same .profile and .bashrc as you do on HECToR.

Copying Files

The /nerc/n02/n02 disk is visable from ARCHER, although the lms is not. On HECToR/lms you should rsync your data from /work/n02/n02 and /home/n02/n02 to /nerc/n02/n02, and then on ARCHER you can then copy this data back to the correct place. ARCHER also has /work/n02/n02 and /home/n02/n02 which serve the same function as on HECToR.

It is advisable that you copy files across from HECToR to the equivalent location on ARCHER as this will give the minimum number of UMUI changes. You should continue to extract the source code to /home and not /work.

Timings

Scaling and efficiency profiling for N48L60 UKCA CheS+ on ARCHER

The plot shows how UKCA scales when ported to ARCHER. The most efficient number of nodes to use is 3 (8x9 or 12x6), although the model will run fastest on 6 nodes (e.g. 16x9 or 12x12). Moving to 3 nodes from 6 will increase the run-time by 50%. Linear scaling is possible up to 4 nodes (12x8).

Core-for-core, ARCHER is around 2.3 times faster than HECToR. However, there are 24 cores per node on ARCHER compared to 32 for HECToR, so the domain decomposition of the model will be slightly different on ARCHER.

Code Changes

Cray cce Fortran Compiler

Corrections required for the Cray cce compiler on ARCHER can be found on the bugfixes page for the Release Jobs.

Spectral Files

Due to updates to the compiler, the old format of the UM spectral files will cause the code the crash. These need to be re-formatted using a python script which can be found at

/work/n02/n02/ukca/bin/lex.py

which was provided by Cray to NCAS-CMS. This script should be run as

/work/n02/n02/ukca/bin/lex.py file.in > file.out

and it is this file.out which should be used by the UM.

The changes to the standard spectral files should have been made for you, but if you are using your own spectral files you will need to make these changes.

Hard-coded paths

Unfortunately UKCA makes use of hard-coded paths to certain text files. For the standard release jobs these have been changed to point to the /work/n02/n02/ukca directory, but for pre-release jobs these files may be searched for in directories that do not exist on ARCHER.

The files that may be affected are:

  • ukca_phot2d.F90: This is location of the 2D photolysis files. The value of file2 is set twice in READ2D_ORIG and READ2D_OPT, and should be
 file2='/work/n02/n02/ukca/photol/r1.0/'
  • ukca_read_aerosol.F90: The location of the file Sulfate_SAD_SPARC_Background.asc is required. The variable pathname should be set
 '/work/n02/n02/ukca/ANCILS/QESM/'
  • ukca_stratf.F90: For the TropIsop configuration, top-boundary conditions are required for O3, CH4, and the NOy species, if that option is required. These should be opened as
 OPEN(70,FILE='/work/n02/n02/ukca/topbound/ch4_topbound.dat')
 OPEN(71,FILE='/work/n02/n02/ukca/topbound/o3_topbound.dat')
 OPEN(72,FILE='/work/n02/n02/ukca/topbound/noy_topbound.dat')

You can check if there are any other files with hard-coded paths on the HECToR/ARCHER filesystem by cd-ing into the UKCA directory and doing

grep '/work/n02/n02' *

This should list all the files with paths in this way.

UMUI Changes

Machine Name

The Machine name in the Model Selection → User Information and Target Machine → Target Machine panel should be set to login.archer.ac.uk

Domain Decomposition

From the scaling tests above, for N48L60 jobs we recommend a 3-node domain decomposition for efficiency reasons (either as 8x9 or 12x6), although jobs can be run on up to 6-nodes (16x9 or 12x12) if throughput is more important. For jobs of this resolution there is no gain in running on more than 6 nodes.

Hand-edits

Some hand-edits (which are located on PUMA) also point to paths which only exist on HECToR. The UKCA Release Jobs only use hand-edits which point to the /work/n02/n02/ukca directory, although some pre-release jobs which use particular functionality will be affected. You can check if you have any non-existant paths by saving and processing your job and then cd-ing into the jobid directory located in the umui_jobs/ folder and doing

grep '/work/n02/n02' *

For example, pre-release jobs using FastJX and use the FJX_spec_Nov11.dat file held in variable jvspec_file= will find this located in /work/n02/n02/ukca/photol/CCMI/fastjx/. To use this file you should include the

/home/ukca/hand_edits/VN7.3/CCMI/FastJX_HECToR_Nov11.ed 

hand-edit.

If other paths appear when you grep, you may need to look through the hand-edit panel and see which files need to be changed. There are a standard set of the most used hand-edits in /home/ukca/hand_edits/VN7.3/r1.0 on PUMA.

User multi- and single-level ancillary files

A standard set of user mutli- and single-level files is located in the /work/n02/n02/ukca/ANCILS/QESM directory, and you may find that the file you are using has a copy there. If not, you should copy the file(s) you need over to your /work/n02/n02 directory on ARCHER and change the path in the UMUI accordingly.

UM Ancillary (and other) files

As well as the UKCA ancillary files, the standard UM ancillary files will also have moved. For the N48L60 jobs, the locations of the HGPKG2, STARTDUMPS, ANCIL_ATMOS, and ANCIL_LAND directories, defined in Model Selection → Input/Output Control and Resources → Time Convention and SCRIPT Environment Variables. These should now be set to

 HGPKG2      /work/n02/n02/ukca/HadGEM3-A/vn7.3/HGPKG2
 STARTDUMPS  /work/n02/n02/ukca/hadgem3/initial/atmos/N48L60
 ANCIL_ATMOS /work/n02/n02/ukca/hadgem3/ancil/atmos_n48
 ANCIL_LAND  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48

The UKCA Release Jobs make use of the UKCA_WORKDIR environment variable, which is set to /work/n02/n02/ukca.

Setting these variables should cover most of the files you are using, but you may want to check through INITHIS file which is found in the the jobid directory located in your umui_jobs/ folder>. This will giving a listing of all the input files used and you should be able to spot any directories that are out of place.

Tracer Initial conditions

The intial conditions for the UKCA species (set in Model Selection → Atmosphere → STASH → Initialisation of User Prognostics) may be set to a file which is located in the /work/n02/n02/ukca/ANCILS/QESM directory. If not, please copy the file(s) across from HECToR to your own /work/n02/n02 directory on ARCHER and change the path in the UMUI. You should also change the paths for the topog_index_mean and topog_index_stdev files to

/work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
/work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev

at the top of this panel.

Start dump

The location of the standard UM start dump (initial condition file, which is found in Model Selection → Atmosphere → Ancillary and input data files → Start dump) has now changed to the /work/n02/n02/ukca/hadgem3/initial/atmos/N48L60 directory. If your job uses a different file, you will need to copy it across from HECToR to your own /work/n02/n02 directory on ARCHER and change the path in the UMUI.

Running Jobs

Differences

The key differences between the two machines can be found by diffing two of the UKCA standard jobs, e.g.

Job xfvfc Title UKCA-TropIsop HECToR Phase3 N48L60 QESM-A
Job xfvfd Title UKCA-TropIsop ARCHER cce N48L60 QESM-A
Difference in window subindep_Target_Machine
 -> Model Selection
   -> User Information and Target Machine
     -> Target Machine
Entry box: Machine name
 Job xfvfc: Entry is set to 'phase3.hector.ac.uk'
 Job xfvfd: Entry is set to 'login.archer.ac.uk'
Entry box: Define the number of processors East-West. (Must be 1 or an even number).
 Job xfvfc: Entry is set to '16'
 Job xfvfd: Entry is set to '8'
Entry box: Define the number of processors North-South
 Job xfvfc: Entry is set to '8'
 Job xfvfd: Entry is set to '9'

Difference in window subindep_FileDir
 -> Model Selection
   -> Input/Output Control and Resources
     -> Time Convention and SCRIPT Environment Variables.
Differences in Table Defined Environment Variables for Directories
 1,6c1,7
<  HGPKG2 /work/n02/n02/ros/HadGEM3-A/vn7.3/HGPKG2
<  STARTDUMPS /work/n02/n02/annette/hadgem3/initial/atmos/N48L60
<  UKCA_L60_DIR /work/n02/n02/ukca/ANCILS/L60
<  ANCIL_ATMOS /work/n02/n02/annette/hadgem3/ancil/atmos_n48
<  ANCIL_LAND /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48
<  QESM_DIR /work/n02/n02/ukca/ANCILS/QESM
---
>  UKCA_WORKDIR /work/n02/n02/ukca
>  HGPKG2 $UKCA_WORKDIR/HadGEM3-A/vn7.3/HGPKG2
>  STARTDUMPS $UKCA_WORKDIR/hadgem3/initial/atmos/N48L60
>  UKCA_L60_DIR $UKCA_WORKDIR/ANCILS/L60
>  ANCIL_ATMOS $UKCA_WORKDIR/hadgem3/ancil/atmos_n48
>  ANCIL_LAND $UKCA_WORKDIR/hadgem3/ancil/land/ORCA2_N48
>  QESM_DIR $UKCA_WORKDIR/ANCILS/QESM

Difference in window subindep_JobRes2
 -> Model Selection
   -> Input/Output Control and Resources
     -> Job submission, resources and re-submission pattern
       -> Follow on window
Entry box: Job time limit, for QSUB 
 Job xfvfc: Entry is set to '43200'
 Job xfvfd: Entry is set to '21600'
Entry box: Months 
 Job xfvfc: Entry is set to '2'
 Job xfvfd: Entry is set to '0'

Difference in window subindep_HandEdit
 -> Model Selection
   -> Input/Output Control and Resources
     -> User hand edit files
Differences in Table Hand edits
 23a24
>  ~ros/umui_jobs/hand-edits/archer/cce-7.3.ed Y

Difference in window subindep_Compile
 -> Model Selection
   -> Compilation and Modifications
     -> Compile options for the UM model
Check box: Change the system default for the max no of compilation processes?
 Job xfvfc: Entry is set to 'ON'
 Job xfvfd: Entry is set to 'OFF'

Difference in window subindep_Compile_User
 -> Model Selection
   -> Compilation and Modifications
     -> UM User Override Files
Differences in Table User machine overrides
 1a2
>  /home/ros/umui_jobs/overrides/archer_cce_7.3_machine Y


Difference in window subindep_Recon_Gen
 -> Model Selection
   -> Reconfiguration
     -> General Reconfiguration Options
Entry box: Define the number of processors East-West
 Job xfvfc: Entry is set to '12'
 Job xfvfd: Entry is set to '4'
Entry box: Define the number of processors North-South
 Job xfvfc: Entry is set to '8'
 Job xfvfd: Entry is set to '6'

Difference in window atmos_STASH_UserProgs
 -> Model Selection
   -> Atmosphere
     -> STASH
       -> Initialisation of User Prognostics
Differences in Table Specify Initialisation Option
 1,2c1,2
<  Blank Blank 7  /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
<  Blank Blank 7  /work/n02/n02/annette/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev
---
>  Blank Blank 7  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_mean
>  Blank Blank 7  /work/n02/n02/ukca/hadgem3/ancil/land/ORCA2_N48/topog_index_stdev

Essentially, the domain decomposition (see above) needs to change as the nodes are different (24 cores per node, rather than 32 as on HECToR). The run-time also needs to change to reflect that ARCHER is faster. The location of the N48 MetUM ancillaries has now been moved to the /work/n02/n02/ukca directory. There is also a hand-edit and compiler over-ride required for the cce Fortran compiler, which are needed for the UM and are not UKCA specific.