2. General Information

2.1. How to log into Svante

To login into the Svante cluster, from a terminal window on your local computer, type:

ssh -Y «username»@svante-login.mit.edu

a password will be requested; use your athena password (your svante «username» above will be the same your athena «username»). After you log in, you should receive a terminal prompt with your local working directory set to /home/«username»; you will logged into the Svante cluster login (or “head”) node svante-login. See Section 6 for discussion of proper uses of the head node, file server nodes, and compute nodes.

2.2. /home spaces

/home/«username»/: we have about 100 terabytes (TB) of total ‘home space’ for general purpose usage, source code, plots and figures, model builds, etc. Every svante user is given disk space here; quotas on home space are 500 gigabytes (GB) per user. /home is backed up daily (offsite) and protected from disk failure via a RAID array. Svante /home space is mounted to all nodes in the cluster. Home space is not intended as a repository for large (or large numbers of) data files for analyses, or to be used as disk space for large model runs.

2.3. File servers

Svante file servers are named fsxx (see Table 2.1), currently fs01-fs11, with total capacity presently roughly 3 PB. To get onto a file server node, for example, typing

ssh fs02

(from svante-login or any other node in the Svante cluster) will give you a shell on file server fs02. Once you ssh to a file server, local disks should be accessed as /d0, /d1, /d2, and /d3; note some partitions not present on all fileservers. From all other nodes, this space can be reached through ‘remote’ mounts in which your access paths would be, for example, /net/fs02/d0 to access the /d0 partition on fs02. Storage on these file servers is for runs, experiments, downloaded data, etc. that will be accessed and kept for longer-term periods, and these spaces are backed up (weekly offsite backup, although in periods of heavy use, it may take longer for backups to complete). These machines can also be accessed externally (i.e. from outside svante): fs02 is svante2.mit.edu, fs03 is svante3.mit.edu, etc. (see Table 2.1). User directories of various sizes, organized by research group, will be created and allocated by Jeff on a project-based, need basis. These disk spaces are reserved for research-related work; they are not intended as “personal” storage spaces, and any such use will not be tolerated.

2.4. Compute nodes

Svante includes a large pool of compute nodes, which comprise the main computational engine of the cluster, interconnected through a fast infiniband network. Users cannot directly ssh to log into compute nodes, but instead must go through the SLURM scheduler, see Section 4 (the exception to this is if the user has a running job on a compute node, ssh to that specific node is permitted).

Users can also access local disk space ( /scratch) on any given compute node, as limited by the size of its disk (see Table 2.1). Every user has unlimited storage in these spaces, the caveat being there is no safeguard for you or another user filling up a local compute node disk. These spaces are not backed up, and any files left longer than six months will be deleted without notice. Local scratch spaces can also be accessed (say, from svante-login or a file server node) through remote mounts /net/«cxxx»/scratch, where «cxxx» denotes the compute node name – see Table 2.1, although not all remote scratch disks may be available to other compute nodes (i.e. on a compute node, use the local scratch disk, not another compute node’s scratch space via a /net mount).

2.5. Archive space

Server fs01 contains two archive spaces. /data holds downloaded external data, e.g. reanalysis data, CMIP data, etc. Let us know if you require external data sets to be stored on svante; if we think such data might benefit general users, we would be happy to store it here. /archive holds file server spaces of users that are no longer actively working at MIT; these spaces may be compressed or uncompressed, depending on the data format and likelihood access will be required. Our general policy is to maintain users’ data in perpetuity. Current svante users do not have individual spaces on fs01, nor can users ssh to fs01 to use this server for computational analysis purposes.

2.6. Svante node specs

Table 2.1 Svante Node Specifications
Node(s)
External
Name
#
Cores
RAM
(GB)
CPU
Arch.
CPU
Speed
(Ghz)
IB
Partition
Size
(TB)
Usage

svante- login

svante- login

16

64

broadwell

E5-2609 v4

1.70

EDR

login node

c041 - c060

16

64

sandy bridge

E5-2670

2.60

FDR

/scratch

2

compute nodes

stooges

24

128

haswell

E5-2680 v3

2.50

FDR

/scratch

4

compute nodes

c061 - c096

32

128

broadwell

E5-2697A v4

2.60

EDR

/scratch

4

compute nodes

fs01

20

512

xeon scalable

Silver 4210

2.20

EDR

/data

570

external datasets

/archive

570

old user archive

fs02

svante2

16

96

sandy bridge

E5-2660

2.20

FDR

/d0

60

Solomon

/d1

40

Heald

/d2

40

CGCS + JP

/d3

40

Solomon

fs03

svante3

24

512

xeon scalable

Silver 4214R

2.40

EDR

/d0

250

Selin

/d1

120

Selin

fs04

svante4

16

128

broadwell

E5-2620 v4

2.10

EDR

/d0

100

Land Group

/d1

100

Land Group

/d2

255

Land Group + JP

fs05

svante5

4

24

nehalem

W3530

2.80

FDR

/d0

40

Land Group

/d1

120

Land Group

fs06

svante6

OFFLINE

fs07

svante7

16

96

sandy bridge

E5-2660

2.20

FDR

/d0

80

Land Group

/d1

70

Land Group

fs08

svante8

16

128

ivy bridge

E5-2640 v2

2.00

FDR

/d0

150

Marshall

fs09

svante9

12

128

ivy bridge

E5-2620 v2

2.10

FDR

/d0

50

Marshall

/d1

65

Marshall

/d2

70

Marshall

fs10

svante10

16

128

ivy bridge

E5-2650 v2

2.60

FDR

/d0

110

Marshall

/d1

110

JP

/d2

110

JP

fs11

svante11

24

512

broadwell

E5-2650 v4

2.20

EDR

/d0

150

Selin

/d1

150

Selin

gesochem

geoschem

12

32

ivy bridge

E5-2620 v2

2.10

FDR

/data

166

geoschem data

Notes:

  • At present, all svante nodes’ CPUs use Intel architecture.

  • Svante’s operating system is Linux and by default terminal windows are Bash shell, although we maintain limited legacy support of C shell.

  • Nodes with external names can be accessed from outside the Svante cluster, e.g. ssh -Y «username»@svante2.mit.edu would log you into fs02.

  • From inside Svante, it is possible to ssh to compute nodes as well as file servers (as discussed above), but for compute nodes only if the user has a SLURM job running concurrently on that specific node.

  • The stooge nodes – curly, shemp, moe, and larry – do not follow the cxxx compute node naming convention, but are in the same partion with the other FDR nodes c041 - c060. Until this is changed, requesting stooge nodes to the exclusion of other FDR nodes is a bit awkward; one can always however request a specific node by name using the SLURM -w option (see Section 4.2)

  • Although the clock speeds are fairly similar across all nodes, in terms of general speed, nehalem < sandy bridge < ivy bridge < haswell < broadwell < xeon scalable. Your job might run 40% faster on c070 than c045, for example. There have been improvements in cpu efficiency and RAM speed over the years, which translates to faster calculations, etc. that typically will improve your job’s efficiency.

  • Infiniband (IB) is a super-fast network connection, running in parallel with the standard ethernet connection. Note however that although the FDR and EDR switches are interconnected, MPI jobs cannot span across FDR and EDR compute nodes.