{"id":14887,"date":"2023-05-02T12:30:06","date_gmt":"2023-05-02T12:30:06","guid":{"rendered":"https:\/\/science.f4studio.de\/?p=14887"},"modified":"2023-08-23T14:35:56","modified_gmt":"2023-08-23T14:35:56","slug":"system-grete-in-goettingen-is-online","status":"publish","type":"post","link":"https:\/\/science.f4studio.de\/en\/system-grete-in-goettingen-is-online\/","title":{"rendered":"System Grete in G\u00f6ttingen is online"},"content":{"rendered":"<h2>System Grete in G\u00f6ttingen<\/h2>\n<p class=\"part\" data-startline=\"22\" data-endline=\"22\">We\u2019re happy to announce the beginning of regular user operation for our new GPU cluster, \u201cGrete\u201d in G\u00f6ttingen.<\/p>\n<p class=\"part\" data-startline=\"24\" data-endline=\"24\">The main part of the cluster is available via the new partition <code>grete<\/code>, consisting of 33 nodes equipped with 4 NVIDIA Tesla A100 40 GB GPUs, 2 AMD Epyc CPUs, and an Infiniband HDR interconnect. The <code>grete:shared<\/code> partition contains additionally two nodes with 8 A100 80 GB nodes each. All nodes have 16 CPU cores and 128GB memory per GPU. \u201cGrete\u201d has a dedicated new login node, <code>glogin9<\/code>, also available via its DNS alias <code>glogin-gpu.hlrn.de<\/code>.<\/p>\n<p class=\"part\" data-startline=\"26\" data-endline=\"26\">Another 3 GPU nodes are available in the partition <code>grete:interactive<\/code> for interactive usage (limited to 2 jobs per user). The <code>grete:preemptible<\/code> partition is available for backfilling these nodes. On these nodes, the GPUs are split via Multi-Instance GPU (MIG) into slices with 2 or 3 compute units each and 10 or 20 GB of GPU memory each, respectively. These slices can be requested like GPUs in Slurm. For example, <code>-G 2g.10gb:1<\/code> will allocate one slice with 2 compute units and 10 GB of memory. Preemptible jobs do not cost core h, but a compute project account has to be used, like for the <code>preempt<\/code> QoS in the CPU partitions.<\/p>\n<p class=\"part\" data-startline=\"28\" data-endline=\"28\">The default walltime limit on all <code>grete<\/code> partitions is 2 days.<\/p>\n<p class=\"part\" data-startline=\"30\" data-endline=\"30\">Part of \u201cGrete\u201d is a new dedicated flash-based WORK storage system mounted at <code>\/scratch<\/code> on the new GPU nodes and <code>glogin9<\/code>. Each user and each compute project has a soft (hard) block quota of 3 TB (6 TB) and 1M (2M) inodes. The system is intended for fast access to the active data set required by the currently running jobs. The existing \u201cEmmy\u201d WORK file system is still reachable from the new cluster under <code>\/scratch-emmy<\/code> via a long-distance connection. The HOME and PERM filesystems are shared between \u201cEmmy\u201d and \u201cGrete\u201d.<\/p>\n<p class=\"part\" data-startline=\"32\" data-endline=\"33\">The default CUDA version is 12.0, and the NVIDIA HPC SDK 23.3 is available via <code>nvhpc\/23.3<\/code>, <code>nvhpc-byo-compiler\/23.3<\/code>, <code>nvhpc-hpcx\/23.3<\/code> and <code>nvhpc-nompi\/23.3<\/code> modules.<br \/>\nCUDA-enabled OpenMPI is available in the form of HPC-X Toolkit (<code>nvhpc-hpcx\/23.3<\/code>) and the NVIDIA\/Mellanox OFED stack (<code>openmpi-mofed\/4.1.5a1<\/code>). However, previous OpenMPI versions will not provide CUDA support in combination with Infiniband!<\/p>\n<p class=\"part\" data-startline=\"35\" data-endline=\"35\">More information about using the new GPU system can be found in [1], and the accounting information has been extended to include the GPUs and MIG slices. [2] For example, in accordance with the recent round of compute time proposals, one full GPU node counts for the equivalent of 600 CPU cores.<\/p>\n<p class=\"part\" data-startline=\"37\" data-endline=\"37\">Please do not hesitate to contact us if you have questions or need support migrating suitable applications to the GPU system.<\/p>\n<p class=\"part\" data-startline=\"39\" data-endline=\"39\">The existing GPU nodes ggpu[01-03] with Nvidia V100 32GB GPUs will be migrated to the same site (\u201cRZG\u00f6\u201d) as \u201cGrete\u201d in mid-May. The operation will resume with the same \u201cRocky Linux 8\u201d based OS image as the new GPU nodes and an Infiniband interconnect as part of the grete:shared, preemptible and interactive partitions.<\/p>\n<p class=\"part\" data-startline=\"41\" data-endline=\"42\">[1] <a href=\"https:\/\/www.hlrn.de\/doc\/display\/PUB\/GPU+Usage\" target=\"_blank\" rel=\"noopener\">https:\/\/www.hlrn.de\/doc\/display\/PUB\/GPU+Usage<\/a><br \/>\n[2] <a href=\"https:\/\/www.hlrn.de\/doc\/display\/PUB\/Accounting+in+Core+Hours\" target=\"_blank\" rel=\"noopener\">https:\/\/www.hlrn.de\/doc\/display\/PUB\/Accounting+in+Core+Hours<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We\u2019re happy to announce the beginning of regular user operation for our new GPU cluster, \u201cGrete\u201d in G\u00f6ttingen.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[47],"tags":[],"class_list":["post-14887","post","type-post","status-publish","format-standard","hentry","category-latest-news"],"_links":{"self":[{"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/posts\/14887","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/comments?post=14887"}],"version-history":[{"count":2,"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/posts\/14887\/revisions"}],"predecessor-version":[{"id":14909,"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/posts\/14887\/revisions\/14909"}],"wp:attachment":[{"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/media?parent=14887"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/categories?post=14887"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/science.f4studio.de\/en\/wp-json\/wp\/v2\/tags?post=14887"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}