Norwegian Service Centre for Climate Modelling -> /noserc User Guide
 
 

 

     

/noserc User Guide

Introduction

This guide is written for those that need to:

  • access data stored on the /noserc filesystem,
  • know how to use the Data Migration Facility on /noserc,
  • find information regarding the /noserc-disk,
  • get the highest performance out of the /noserc-disk

Tools

Tools described in this document can be found in the directory /noserc/felles/bin on gridur.ntnu.no
Special manpages for tools developed by NoSerC are also stored in this directory, while the DMF manpages are stored in /usr/dmf/dmbase/dmbase/man
To be able to use the tools and to view the manpages, set the following environment variables (here shown for bash-shell):
export PATH=$PATH:/usr/dmf/dmbase/dmbase/bin:/noserc/felles/bin
export MANPATH=$MANPATH:/usr/dmf/dmbase/dmbase/man:/noserc/felles/bin
Note that /usr/dmf/dmbase/dmbase/bin and /usr/dmf/dmbase/dmbase/man may already be set for you.

What is DMF?

DMF (Data Migration Facility) manages online disk resources by automatically detecting a dip below the filesystem free-space thresholds and moving data from online disk to tapes. Data is automatically migrated based on the size and age of the file. Large files and files that have not been accessed in some time will be moved from disk to tape.

For the user the files appear to be on the disk all of the time. When a file is accessed the DMF will look up in a database to check if the file is already on the disk or if it has to be retrieved from tape (in the latter case the user will notice that it may take up to a minute to retrieve the file). Please note that when several files are transfered to tape they may be transferred to different tapes. One cannot assume that files from the same directory are on the same tape either.

To determine the state of a file, use the dmls -l command as shown below:

gridur 34% dmls -l
total 256
-rw-r--r--    1 noserc   noserc      2583 Oct  5 11:59 (OFL) dmdu.1
-rw-r--r--    1 noserc   noserc      2102 Oct 18 13:08 (OFL) dmtouch.1
-rw-r--r--    1 noserc   noserc      2194 Oct  5 10:33 (DUL) gribdump.1
-rw-r--r--    1 noserc   noserc      2032 Oct  8 12:10 (OFL) gribtocdl.1
-rw-r--r--    1 noserc   noserc      5629 Oct  5 10:33 (OFL) gribtonc.1
-rw-r--r--    1 noserc   noserc      8305 Oct  5 13:10 (DUL) ncdump.1
-rw-r--r--    1 noserc   noserc     14371 Oct  5 13:10 (OFL) ncgen.1
-rw-r--r--    1 noserc   noserc      1888 Oct  5 13:17 (OFL) udunits.1
The file status is listed before the filename. The status may be:

(REG) File not managed by DMF, file is only on disk
(MIG) Migrating, file is being copied to tape
(DUL) Dual-state, file is both on disk and tape
(OFL) Offline, file is now on tape only
(UNM) Unmigrating, file is being copied from tape
(NMG) Nonmigratable file, file can not be migrated by DMF
(INV) DMF cannot determine the file's state

All normal UNIX commands will work for the files that are managed by DMF. Note however that some commands may not work as expected. For instance the ls -l command does report the correct size of a file, even if the file actually is stored on tape. Using the du command in the same directory will only sum the size for the files that are actually on the disk. Thus one may see several hundred Mb of files by ls -l, while du reports near zero diskspace used. To view both disk and tape usage in a directory, use the command /noserc/felles/bin/dmdu (dmdu (1) man-page).

An example is shown below, where one can see that the majority of files in the directory structure have been migrated to tape. Note that du counts file size in 512 byte blocks (option -k converts this number to KB) while dmdu counts filesize in bytes whithout considering blocksize. In this example one can see that 20 GB of disk space is used and that all 172 GB of files also exist on tape.

gridur % du -k .
64      ./eulmet1997
64      ./eulmet2000
1857408 ./eulmet1995
64      ./eulmet1996
64      ./eulmet1990
64      ./eulmet1999
64      ./eulmet1998
18030400        ./eulmetold96
19888192        .
gridur % dmdu -s -k .

 Total     On disk   On tape      Directory
      513       512         1 KB .
 21494584        64  21494520 KB ./eulmet1990
 21494584   1857344  21494520 KB ./eulmet1995
 21553314        64  21553250 KB ./eulmet1996
 21494584        64  21494520 KB ./eulmet1997
 21494584        64  21494520 KB ./eulmet1998
 21494584        64  21494520 KB ./eulmet1999
 21553314        64  21553250 KB ./eulmet2000
 21590014  18029614  21589950 KB ./eulmetold96

172170075  19887854 172169051 KB .
       97        11        97  # files and 8 directories

Some UNIX commands that modify file attributes may have problems with files that are managed by DMF. One special case that may be mentioned here is that the command touch is allowed used on the files (the command resets the "last accessed/modified" time to "now"), but the options -a and -m are not allowed used (returns an error message). Also note that the touch command does not automatically retrieve migrated files. The NoSerC crew have made a new command dmtouch that allows you to retrieve files from tape and also updates the "last access time" so that the files are not so quickly migrated again.

A number of commands are available to investigate files status, migrate files to/from tape, etc. Look at the Documentation here or use the manpages on the computer.
Hint: The command man dmdu dmtouch dmattr dmcopy dmfind dmget dmls dmput | a2ps -m -o ~/dm.ps will generate a postscript file ~/dm.ps of all the named man-pages that you may print out for reference.

Disk/tape utilization

Continously updated system utilization is available through Notur-Palantir (click on the palantir picture, select the "gridur" computer and click on the /noserc-icon). In this graphical interface you will also see a history of disk usage for the last days. It can easily be seen when the disk is filled up and data migration is initiated, see example below.
A chart showing disk usage over several days
The Palatir interface does not show how much data is actually stored on tape. This is however possible through a special script /noserc/felles/bin/dmdu. Note that this information is only possible to view for the directories/files that you have access-rights to read. See example under What is DMF? and the dmdu manpage for further information.

User support and FAQ

Questions concerning the computer system in general is sent to support@notur.org.
Before you send your questions, please look through the rest of this document and the User Support document at http://notur.ntnu.no/. The User Support document also has some useful FAQ-lists.
Questions regarding the Disk Migration Facility can also be sent to noserc@met.no.

Using the /noserc disk

Assuming that you have already read "What is DMF?", you may wonder how to get the most out of this disk.

First you must consider what type of operation is to be performed on the files.

If files are only referenced once (typically file copying and file transfers to other computers) one need not do any preparations before starting. Each file will be copied/transfered in sequence, with a long pause for each file that needs to be transferred from tape storage (before the operation continues). As each file is retrieved, old files may be migrated back to tape as they are no longer needed. This happens without the need of user intervention.

If multiple files are to be used in parallell (typically as program input/data-files and it's not recommended - see below) one should first make sure that the files are available on disk and that they will not be migrated during the program run. Use the command dmls -l to check file status, (REG) or (DUL) means that the file is on disk. The command dmget is then used to force a file back to disk. If you fear that the file may become migrated, use the dmtouch command to update the last access-time for the file, so it will not be "first in line" during the next automatic migration process. This method should however be used with care, because if many users do this, you will be competing for the same disk space and eventually your files will be migrated again.

The most preferred solution when /noserc-files are to be processed in programs is to copy them to a work-area (/work) where they will reside only for the duration of the program execution. For most cases, this will also have the effect of speeding up your program run. The normal work-disks have higher access-speeds than the /noserc-disk. Also, if you batch-run a parallellized program that tries to access migrated files, the program halts until the files are restored from tape, possibly blocking other users from the allocated processors during the wait. It's a waste of resources...

Avoid storing small files or large collections of files as separate files. To get the best response from the file system, allways pack files into archives (using the tar-command). Compare the retrieval of a single tar-file containing 100 files from tape with the process of retrieving 10 single small files from tape. The timeconsuming process is allways the tape mount/unmount - file size has a much smaller influence on time spent in retrieval from tape. Consider the following example, 12 files of 810 Kb each are retrieved one-by-one from migrated status:

gridur 50% dmls -l *89*
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:12 (OFL) apr89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:12 (OFL) aug89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:12 (OFL) dec89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) feb89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) jan89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) jul89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) jun89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) mar89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) may89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) nov89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) oct89.txt
-rw-r--r--    1 noserc   noserc    810679 Sep 10 11:13 (OFL) sep89.txt
gridur 51% time cp *89* /noserc/felles/tst/.
0.001u 0.151s 5:44.94 0.0% 0+0k 0+2io 0pf+0w

That was a total of 5 min 45 sec to retrieve the 9Mb of data from tape! This gives an average of 30 sec for each file. Now compare this to the retrieval of files from a single large tar-file:

gridur 84% dmls -l noaa89.tar
-rw-r--r--    1 arildbu  noserc   9748480 Jan  3 12:22 (OFL) noaa89.tar
gridur 85% time tar -xvf noaa89.tar
x apr89.txt, 810679 bytes, 1584 blocks
x aug89.txt, 810679 bytes, 1584 blocks
x dec89.txt, 810679 bytes, 1584 blocks
x feb89.txt, 810679 bytes, 1584 blocks
x jan89.txt, 810679 bytes, 1584 blocks
x jul89.txt, 810679 bytes, 1584 blocks
x jun89.txt, 810679 bytes, 1584 blocks
x mar89.txt, 810679 bytes, 1584 blocks
x may89.txt, 810679 bytes, 1584 blocks
x nov89.txt, 810679 bytes, 1584 blocks
x oct89.txt, 810679 bytes, 1584 blocks
x sep89.txt, 810679 bytes, 1584 blocks
0.034u 0.578s 0:40.59 1.4% 0+0k 3+12io 0pf+0w

This time a total of 40 sec for a single file retrieval. Packing/unpacking of tar-files is not time-consuming compared to the tape-handling, and will give you the best performance on /noserc.

Note that packing and unpacking of tar-files should be performed on the /work-disk. If you keep unpacked files on the /noserc-disk you may interfer with the migration of files as you work with them. Consider the following problem: As you are collecting hundreds of files into a tar-file, DMF starts to migrate your files to tape, causing a 30-60 second delay for each file as they have to be unmigrated before being stored into the tar-file.

Copying of files to the /noserc-disk has been found to cause problems when files are very large (>100MB), due to memory buffering of data during the copy process. There have been situations where the computer has run out of memory during a filecopy, causing a full crash. The solution to this problem is to use the cp option -D when copying files to /noserc. The -D option makes sure that the file is copied without memory buffering. The same problem may occur also during file move (mv), but there is no option to skip memory buffering with mv, so in this case one has to first copy files with cp -D and afterwards remove them with rm.

Accesscontrol

To support the demands for accesscontrol of the files stored on /noserc we have made a directory structure that matches the different research projects registered (so far):

        /noserc/chemclim/
        /noserc/noclim/
        /noserc/regclim/
        /noserc/emep/
        /noserc/felles/

The directorynames reflect the major climate projects known to need large diskspace. In /noserc/felles are files that are available to all users (not restricted access). For future projects we will create new directories /noserc/project as the need arises.

Each directory can only be accessed by users that belong to the same unix group (directory name and unix group name will be identical). Users can not create files on a higher directory level than their project directory. If sub-projects want to further regulate access this may be arranged by creating sub-directories and new unix groups. The group name will then be a combination of project name and sub-project name. As an example, the task "Oppgave 4" in NoClim has its own restricted area in /noserc/noclim/oppgave4/ for users with unix group noclimo4.

It is up to each user to set file attributes so that access within a directory (or for individual files) are as desired. Please refer to the unix commands chmod and chgrp for more information.

Requests for new projects/sub-projects or changes in user access should be adressed as shown in Requests.

Requests

Each user that wants to access the /noserc-disk must request access from NoSerC. The request should be adressed to noserc@met.no This may also be done as a collected request for a new project. For each user we need :

  • Username on gridur.ntnu.no (if already registered user)
  • Full name and adress
  • Project name and eventual sub-project(s) if neccesary.
  • E-mail adress to the user

For those that still have not gained a userid on NOTUR, we refer to NOTUR Computer Grants Page and the information/requests found there.

We want to keep a list of names and e-mailadresses of all NoSerC users, so that we efficiently can issue news to all involved. Please contact NoSerC if you are not already on our e-mail list.

Documentation

Norwegian High Performance Computing Consortium
The NTNU HPC project.

dmdu (1) man-page.
dmtouch (1) man-page.

Man-pages for the Data Migration Facility from http://techpubs.sgi.com:

dmattr (1) - Acquires DMF attributes of a file
dmcopy (1) - Copies all or part of the data from a migrated file to an online file
dmfind (1) - Searches for files in a directory hierarchy
dmget (1) - Recalls previously migrated files
dmls (1) - Lists contents of directories
dmput (1) - Migrates online files to offline media

For further reading about the DMF, see the manual DMF Administrator's guide for IRIX® Systems, doc. no. 007-3681-005 from http://techpubs.sgi.com
 


Send comments to webmaster