Monsoon users MASS data

From UKCA
Revision as of 16:40, 2 August 2019 by Mdalvi (talk | contribs) (Created page with "== Management of Monsoon-user's MASS data == === Background === The UKCA project on the Monsoon HPC system has been active for almost 10 years now and at the last count (Jan...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Management of Monsoon-user's MASS data

Background

The UKCA project on the Monsoon HPC system has been active for almost 10 years now and at the last count (Jan 2018) has archived nearly 1.5 Petabytes of data to the MASS tape system. It should be fairly obvious that a significant part of this data becomes redundant as time progresses. In some instances, the data archived is from test simulations and should have been deleted as soon as its purpose was fulfilled. In other cases the data is no longer relevant since the configuration/ settings used are now out-dated and/ or any value that could be derived from this (for PhD thesis, publications, projects) has been obtained and should be safe to delete.
A related issue is that till late 2017, the MASS system was configured to store all data in duplex (two copies) form by default, possibly to reduce the chances of data loss resulting from any physical damage to the tapes/ drives. This means that any (now redundant) data generated before this time is occupying double the amount of space.
One limitation for Monsoon has been that any data, once archived to MASS is owned by the Principal Investigator (PI) from Met Office side, and the data creators themselves have no permissions to delete this. This is not expected to change any time soon, so a central method of data management becomes necessary.

Current status

As of now (Aug 2019), all data generated by UKCA (and UKCA- subprojects) is owned by Colin Johnson as the original PI for the project. Following a general activity to review data held on MASS and optimise the demand for tapes, an attempt was made in January 2018 to attribute at least the datasets resulting from UMUI runs to the original creators (i.e. the owners of these jobs) and ask them to review the need to hold the data. This was done by cross-referencing the datasets on MASS owned by Colin and having an x prefix (recommended job-id group for Monsoon runs) with the UMUI database on the PUMA system. An e-mail was subsequently sent to the concerned users with the job/data details with a request to review and report to back Colin the datasets that could be deleted. It is not clear if any data deletion was requessted as a result of the e-mail.
Following his retirement from the Met Office role in 2018, the issue of tranferring the ownership of this data has arisen. After discussion with interested parties at the Met Office, it was proposed that the Monsoon-UKCA data ownership on MASS will be shared between Fiona O'Connor (Manager, Atmospheric Composition and Climate team) and Adrian Hill (Manager, Aerosol group