Difference between revisions of "Met Office Virtual Machine"

From UKCA
(Created page with "This page will describe how to get UKCA at GA7 vn10.6+ working on the Met Office Virtual Machine ==Download and set-up the Virtual Machine== ===Prerequisites=== This makes ...")
(No difference)

Revision as of 13:52, 13 November 2017

This page will describe how to get UKCA at GA7 vn10.6+ working on the Met Office Virtual Machine

Download and set-up the Virtual Machine

Prerequisites

This makes use of VirtualBox and Vagrant. You should use the most recent versions, if possible, and install them before proceeding.

You will also need an account on the Met Office Science Repository Service, known as the SRS, or MOSRS. You will need to make sure that you have access to the following projects:

  • um
  • JULES
  • SOCRATES
  • roses-u

at least. You can request access by emailing cms-support@ncas.ac.uk.

You should only do this on a local machine with enough memory that is running Ubuntu 16.04 (ideally 16.04.2). Currently the machines that can be used are:

  • celsius

Set-up VirtualBox and Vagrant to use /scratch

You MUST change the VirtualBox & Vagrant settings to ensure that the guest machine disk is not placed in your /home directory.

The names of the example directories given below can be changed, but they must be placed somewhere in in /scractch/[YOUR CRSID]

Note: because you will be putting the virtual disk images on /scractch, you will only be able to use the VM from one machine, e.g. celsius, brewer etc. You will not ever be running from these directories - they are just used to hold the VM disk image. You will be running from the metomi-vms directory that you will learn about in the next section.

Change the VirtualBox preferences

To do this for VirtualBox, open it from the command line by typing

virtualbox

then click file -> preferences and under the General settings, change the Default Machine Folder to e.g. /scractch/[YOUR CRSID]/VirtualBox_VMs.

You should not use VirtualBox to make the VM, this will be done using Vagrant.

Change the Vagrant home directory

To ensure that the vagrant boxes are on scratch you need to set the $VAGRANT_HOME environment variable. To do this, put the following at the end of your .profile:

export VAGRANT_HOME=/scratch/[YOUR CRSID]/VirtualBox_VMs/.vagrant.d

This will then put it in the same directory as you have set for VirtualBox.

Set-up the VM

The Rose/Cylc VM that this is based on can be found on GitHub here:

You should download it, e.g. using git:

git clone https://github.com/metomi/metomi-vms.git

The current (as of 2017-01-19) recommended OS is Ubuntu 16.04 and this is the default, although this also works in Ubuntu 15.10. Using 16.04 is recommended as this is the long-term supported version.

For running UKCA you need a minimum of 6GB of RAM, with a recommended amount of 8GB (to be able to compile and run with rigorous compiler settings. You may also want to set-up a shared directory with the host filesystem.

For macOS or GNU/Linux hosts, you don't need to run the fill desktop environments, so you can delete the desktop from the config.vm.provision line, and also comment-out the v.gui = true line.

If you wish to mount a directory, you need to add a block, like the following, at the end of your Vagrantfile:

Vagrant.configure("2") do |config|
  # other config here

    config.vm.synced_folder "/path/on/host/machine", "/path/on/VM"
end

The /path/on/VM will be created. I suggest something like /mnt/Shared.

Start the VM

When you have your Vagrantfile how you like it, on macOS or linux, in the metomi-vms directory, you should type

vagrant up

You should then get something similar to the following:

Bringing machine 'metomi-vm-ubuntu-1604' up with 'virtualbox' provider...
==> metomi-vm-ubuntu-1604: Box 'bento/ubuntu-16.04' could not be found. Attempting to find and install...
    metomi-vm-ubuntu-1604: Box Provider: virtualbox
    metomi-vm-ubuntu-1604: Box Version: >= 0
==> metomi-vm-ubuntu-1604: Loading metadata for box 'bento/ubuntu-16.04'
    metomi-vm-ubuntu-1604: URL: https://atlas.hashicorp.com/bento/ubuntu-16.04
==> metomi-vm-ubuntu-1604: Adding box 'bento/ubuntu-16.04' (v2.3.1) for provider: virtualbox
    metomi-vm-ubuntu-1604: Downloading: https://atlas.hashicorp.com/bento/boxes/ubuntu-16.04/versions/2.3.1/providers/virtualbox.box

The OS will the be downloaded and installed. This may take some time, possibly around 45 minutes depending on the speed of your internet connection.

When it is finished you will see the message:

==> metomi-vm-ubuntu-1604: Finished provisioning at Thu Jan 19 11:09:03 UTC 2017 (started at Thu Jan 19 10:53:43 UTC 2017)
==> metomi-vm-ubuntu-1604: 
==> metomi-vm-ubuntu-1604: Please run vagrant ssh to connect.

You should then use

vagrant ssh

to connect. The first time you do, you will be prompted for your MOSRS password and username, e.g.:

Met Office Science Repository Service password: 
Met Office Science Repository Service username: lukeabraham
Subversion password cached
Rosie password cached

Stopping the VM

To stop the VM you must first log-out, e.g. by using Ctrl-d, then type vagrant halt. You should then see

==> metomi-vm-ubuntu-1604: Attempting graceful shutdown of VM...

Set-up the UM

File:Vm install.png
Figure 1: Gcylc window from the rose stem --group=vm_install -S CENTRAL_INSTALL=true -S UKCA=true --group=install_source command.

Start the VM using vagrant up in the metomi-vms directory. Now you can install and set-up the UM. Further details are here:

But you can just follow the instructions below to get started.

Currently only UM vn10.4 and above can be used on the VM.

2017-03-02: There is an issue with Cylc v7, so to use the UM properly you need to run the following command to install Cylc 6:

  • sudo install-cylc-6

This issue should be resolved in a few weeks.

Essnentially, you should run the following commands, in this order:

  1. sudo install-um-extras
  2. um-setup
  3. install-um-data
  4. install-rose-meta

It can take several minutes to complete each of the above steps.

You should then make a ticket on MOSRS with a milestone of Not for Builds to use to make a branch to put on your VM to run the required rose stem groups and also to make prebuilds. For example, I made a vn10.6 branch called vn10.6_prebuilds as it gives a nice naming convention once these are make. However, you should not use my branch, but instead make your own, e.g.

fcm branch-create --type=DEV --ticket=N prebuilds fcm:um.x_tr@vn10.6

where N is your ticket number, and prebuilds is the name of the branch you will be making (it will end up being called vn10.6_prebuilds), and then do e.g.

fcm checkout https://code.metoffice.gov.uk/svn/um/main/branches/dev/lukeabraham/vn10.6_prebuilds

Note: You should NOT use this branch though, please make your own. You should not make any changes so that you know that the code within it is identical to the trunk at the UM version you are using (e.g. vn10.6).

You should then run the following command from inside the top-level directory of your new branch:

rose stem --group=vm_install -S CENTRAL_INSTALL=true -S UKCA=true --group=install_source

Note: if you are working at vn10.7 or higher you should instead use this command to install the mule utilities that have replaced cumf, pumf etc.

rose stem --group=vm_install_mule,vm_install_ctldata -S CENTRAL_INSTALL=true -S UKCA=true --group=install_source

It might take about 10-15 minutes to do this step.

Upgrading to a new version

When a new version of the UM comes out, you will need to repeat the following steps to be able to run again:

  • um-setup
  • install-rose-meta
  • rose stem --group=vm_install -S CENTRAL_INSTALL=true -S UKCA=true --group=install_source on a new branch that you have made (you may have problems running this command on a direct checkout of the trunk).

Prebuilds

File:Gcylc prebuilds.png
Figure 2: Gcylc window from the rose stem --group=fcm_make --name=vn10.6_prebuilds -S MAKE_PREBUILDS=true command.

To speed-up compiling of your code, you can use pre-compiled builds, where the UM source-code is already compiled, and only the files that you have changed are compiled from scratch. This can greatly speed-up compile time. To do this you should run the following command from inside the top-level directory of your new branch (which has no changes in it, so is effectively the same source-code as the vn10.x trunk):

rose stem --group=fcm_make --name=vn10.6_prebuilds -S MAKE_PREBUILDS=true

Further information on pre-builds can be found in UMDPX04. It can take about an hour or so to make the prebuilds.

Testing the UM

File:Vm n48 eg omp noios.png
Figure 3:Gcylc running the rose stem -O offline --group=vm_n48_eg_omp_noios -S INTEGRATION_TESTING=true command using Graph View.

I would advise using a different branch to the one that you used to make the prebuilds, to prevent anything being accidentally over-written, e.g.

fcm branch-create --type=DEV --ticket=N vm_testing fcm:um.x_tr@vn10.6
fcm checkout https://code.metoffice.gov.uk/svn/um/main/branches/dev/lukeabraham/vn10.6_vm_testing

You can then run the rose stem test suggested on the MOSRS page, e.g. cd into the top level directory of your new branch and run the following:

rose stem -O offline --group=vm_n48_eg_omp_noios -S INTEGRATION_TESTING=true

Setting-up UKCA

Required input files

There are a number of missing ancillary and netCDF emissions files that need to be copied into the VM. These can be installed by running the install-ukca-data script provided as part of the VM install scripts. It will extract the required files from JASMIN and put them into the appropriate directories.

vn10.6 example suite: u-ai906

When you open rosie go, add u as a data source and then search for u-ai906. This suite is a 3-hour run with some standard STASH included. It will take ~12 minutes for a fresh compile (~50s for a recompile), ~20s for reconfiguration, and ~12minutes for the atmosphere step. You should right-click on this and click copy (and not checkout).

The STASH output is:

600px

This is just representative to cover various bits of UKCA code. Please feel free to add your own as needed to your own suite.

Output is in

$HOME/cylc-run/[SUITE-ID]/work/1/atmos/atmosa.pa19810901_00


Known Issues

MOSRS ticket #2442 identified and fixed an issue with the gnu compiler (used on the VM) with the UM as a whole and with UKCA specifically. Ticket #2536 fixed an issue with the pressure-level output of the UKCA tracers.

If you are using vn10.6 you should include the following branch to fix the compiler problems:

branches/dev/lukeabraham/vn10.6_UKCA_gnu@31624

and the following branch to fix the pressure level issue:

branches/dev/lukeabraham/vn10.6_ukca_plev_tr_bug@30736

At vn10.4, vn10.5, and vn10.6 you should include the following compiler flag

-fdefault-double-8

to the fcflags_overrides variable the Advanced compilation panel in Rose which will correct for the compiler problem.

Additional Set-up

Xconv

You will need to download Xconv (xconv1.93) from here:

You can download this by

wget http://cms.ncas.ac.uk/documents/xconv/_downloads/xconv1.93_linux_x86_64.tar.gz

Download it to your $HOME/bin on the VM, cd into this directory, tar -zxvf the tar-ball, and then

ln -s xconv1.93 convsh1.93
ln -s xconv1.93 xconv
ln -s xconv1.93 convsh

Iris

There is an install-iris script provided, but you will need to set-up modules yourself to be able to use it properly. The anaconda install breaks Rose if put in your PATH, however, there is now an alias

conda

which will open a new terminal with all the anaconda python packages in its $PATH. This will allow you to use Rose in one terminal and Iris in another.

I have found a handy way to use python is to use ipython with the following arguments

ipython --pylab --logfile=ipython-`date +\"%Y%m%d-%H%M%S\"`.py"

I have aliased this to

pylab

in my .bashrc. The

--pylab 

sets-up a MatLab-type environment (numpy, scipy, matplotlib all loaded using standard shortcuts), and

--logfile=ipython-`date +\"%Y%m%d-%H%M%S\"`.py

means that all commands are saved to a file of the format

ipython-YYMMDD-HHMMSS.py

Debugging

Some handy commands for debugging UM jobs on the VM are below.

Get memory usage relative to print statements

You can diagnose the memory of the running UM job by doing the following:

  1. Change the job to run 1x1 rather than 1x2 (change the values of UM_ATM_NPROCX and/or UM_ATM_NPROCY in the suite.rc file) - this prevents any confusion in the following output from top
  2. Turn on flushing of the print buffer
  3. Change the run command to be /usr/bin/time -v --output=/home/vagrant/um-atmos.time um-atmos instead of um-atmos
  4. In a xterm window run the following command and pipe it to a file: top -b -d1 -u vagrant | grep --line-buffered um-atmos
  5. In another xterm window use the following command: tail -n 0 -qF atmos.fort6.pe0 and pipe this to another file
  6. You can combine these two files together using tail -n 0 -qF to give the memory usage of the um-atmos executable at the same time as the print statements from the model

This can give you output like:

ATMOS_PHYSICS1:calling cosp_init
12154 vagrant   22   2 3642600 1.306g  15088 R  96.6 16.7   0:03.64 um-atmos.exe
12154 vagrant   22   2 4167928 1.657g  15088 R  95.9 21.3   0:03.74 um-atmos.exe
12154 vagrant   22   2 7713796 2.059g  15088 R 100.0 26.4   0:03.85 um-atmos.exe
12154 vagrant   22   2 7713796 2.461g  15088 R  95.4 31.6   0:03.95 um-atmos.exe
12154 vagrant   22   2 7713796 2.863g  15088 R  95.6 36.7   0:04.05 um-atmos.exe
12154 vagrant   22   2 7713796 3.260g  15088 R 100.0 41.8   0:04.16 um-atmos.exe
12154 vagrant   22   2 7713796 3.663g  15088 R 100.0 47.0   0:04.27 um-atmos.exe
12154 vagrant   22   2 7713796 4.061g  15088 R  96.5 52.1   0:04.37 um-atmos.exe
12154 vagrant   22   2 7713796 4.470g  15088 R 100.0 57.3   0:04.48 um-atmos.exe
12154 vagrant   22   2 7713796 4.872g  15088 R  96.3 62.5   0:04.58 um-atmos.exe 
ATMOS_PHYSICS1:left cosp_init

Using the time command will give the following output

Command being timed: "um-atmos"
User time (seconds): 518.43
System time (seconds): 84.94
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 10:04.59
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4260268
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 684122
Voluntary context switches: 272816
Involuntary context switches: 266901
Swaps: 0
File system inputs: 0
File system outputs: 1435448
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Where the line

Maximum resident set size (kbytes): 4260268

above gives the maximum memory used by the job. It is also helpful to run 1x1 here, as if you run 1x2 the value for this will actually be ~50% of the maximum value.


Written by Luke Abraham 13th November 2017