Difference between revisions of "Minutes of the code management group meeting 2015-03-04"

From UKCA
Line 51: Line 51:
 
===New supercomputer: Cray XC-40===
 
===New supercomputer: Cray XC-40===
   
* Paul Cresswell has sent me the following about Rose stem jobs on XC-40:
+
* Paul Cresswell has sent LA the following about Rose stem jobs on XC-40:
   
 
All the vn10.1 tests on the IBM are also available on the Cray, including Mohit's new tests, although many of them
 
All the vn10.1 tests on the IBM are also available on the Cray, including Mohit's new tests, although many of them

Revision as of 09:55, 4 March 2015

Present

Teleconference Numbers

  • UK Freefone: 0800 917 1956
  • Participant passcode: 51615813 then #

Agenda

  1. Minutes of the last meeing
  2. vn10.1
  3. GA7.0 update
  4. Rose on MONSooN
  5. Bugfixes
  6. Targets for vn10.2
  7. LFRic
  8. New supercomputer: Cray XC-40
    • vn7.3 on MONSooN XC-40
  9. A.O.B./D.N.M.
  10. Date of next meeting

Minutes

Minutes of the last meeing

vn10.1

GA7.0 update

Rose on MONSooN

Bugfixes

Targets for vn10.2

LFRic

New supercomputer: Cray XC-40

  • Paul Cresswell has sent LA the following about Rose stem jobs on XC-40:
All the vn10.1 tests on the IBM are also available on the Cray, including Mohit's new tests, although many of them 
have issues - some of them are segfaulting, some have issues writing the dumps in a reasonable time (this can 
sometimes be fixed with the right environment variables to tweak the MPI behaviour), some of them run to completion 
but are not bit-comparable between runs. Mohit's new tests are in the first category. The hadgem_nd_ukca test is the 
only one which hits the particular error fixed by #346, so everything else must not be executing this particular piece of 
code (or crashing out sooner).

The aqum_nd test is one of the few that seems to behave and be reproducible. Much of the UM 10.2 dev cycle will 
be spent fixing the others! The aqum_nd_comp_check (compiler checking) job also works, producing lots of warning 
messages that will be helpful in the future.

Please note that no XC40 tests are available in the standard collective groups (e.g. developer, ukca), as not everyone 
has access to the machine yet - it would not do to have rose-stem suites failing for users who do not have accounts 
set up. The Cray tests are all available individually by replacing "metohpc" with "meto_xc40" in the group name.

[The porting of Rose stem tests are] a joint effort - the UM System and HPC Optimisation teams are both working on it 
to varying degrees, but there's a lot needs doing, so if a developer/code owner/test owner wants to prioritise fixing 
their own job and get the fixes to us, we'll gladly accept the help. Some things will be down to the technical teams 
though - for example if lots of jobs are not bit comparing then that's probably an issue with the compiler 
settings/optimisation env vars and not something we'd expect a developer would want to focus on.

vn7.3 on MONSooN Cray XC-40

Any other business/do not minute

Date of next meeing