Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hello! Parallel Computing Service!

porcaro33
September 17, 2024

Hello! Parallel Computing Service!

porcaro33

September 17, 2024
Tweet

More Decks by porcaro33

Other Decks in Technology

Transcript

  1. UPDATE THIS PRESENTATION HEADER IN SLIDE MASTER © 2024, Amazon

    Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. J A W S - H P C # 2 0 2 0 2 4 / 9 / 1 2 Hello! Parallel Computing Service! Hiroshi Kobayashi HPC Solutions Architect
  2. © 2024, Amazon Web Services, Inc. or its affiliates. Automation

    and orchestration Access and visualization AWS HPC-optimized instances AWS HPC portfolio HPC-optimized Hpc7a Hpc7g Hpc6id Trn1 P5 G5 DL1 F1 Inf2 VT1 Accelerators Compute, memory, and networking M7i C7gn C7g C7i R7i X2iezn C5n M5zn RES on AWS NICE DCV Amazon AppStream 2.0 Amazon WorkSpaces Family AWS PCS AWS ParallelCluster AWS Batch Amazon SageMaker
  3. © 2024, Amazon Web Services, Inc. or its affiliates. AWS

    PCS Orchestration Access cloud resources at scale Job management Use common job schedulers (using Slurm) Easy migration Migrate without any code or script changes for HPC workloads
  4. © 2024, Amazon Web Services, Inc. or its affiliates. Key

    capabilities of AWS PCS Unified compute and remote visualization management Dynamic resource provisioning and scaling Ability to bring your own applications Managed updates and in-depth telemetry
  5. © 2024, Amazon Web Services, Inc. or its affiliates. Which

    workloads are best suited for AWS PCS? HTC and loosely coupled workloads Building scientific models Tightly coupled workloads Accelerated computing
  6. © 2024, Amazon Web Services, Inc. or its affiliates. Overview

    Cluster: Assembly of compute nodes, file systems, and job queues, along with login nodes and workstations, hosting a scheduler. Compute node group: Collection of Amazon EC2 instances with a distinct configuration of instance types, networking, storage, software, and security. Queue: Virtual location where jobs are stored until the scheduler executes them on instances in Compute node group(s). Login node group: Collection of Amazon EC2 instances where users can submit jobs or manage and visualize data. External resources: Customer-provided networked resources that support a cluster, like shared storage, directory, accounting database… Queue Jobs Cluster Storage Accounting database* LDAP directory Metrics Logs Cost Explorer Budgets Queue Compute node group Login node group Compute node group Compute node group Jobs Queue Jobs * Not in GA
  7. © 2024, Amazon Web Services, Inc. or its affiliates. Service

    architecture Private subnet On-premise End users (team 1) Directory services End users (team 2) SSH 1 BYO Login nodes 2 Submit jobs SSH 1 AWS Account Customer VPC PCS-Managed Service VPC) 2 4 4 Submit jobs Compute nodes allocated Slurm accounting DB* Jobs queued AWS services/resources S3 storage, license servers, databases, etc. Login Node Group 1 Min =1, max = 1 C5 Compute Node Group 1 Min =0, max = 20 C5 C5 Amazon machine image (AMI) AWS IAM role Amazon EC2 launch template Node Group configuration 1 Node Group configuration 2 PCS Cluster Slurm controller Queue 1 PCS controller, replicas, etc. VPN or Direct Connect ENI * Not in GA
  8. © 2024, Amazon Web Services, Inc. or its affiliates. Cost

    – Controller(Headnode) PCS Pricing https://aws.amazon.com/pcs/pricing/ (Tokyo Region) 東京リージョンでSmallを1ヶ月稼働すると、約8万円。Mediumだと45万円。Large だと90万円。。。
  9. © 2024, Amazon Web Services, Inc. or its affiliates. Cost

    – Node Management Fee (Compute and Login/Viz) PCS Pricing https://aws.amazon.com/pcs/pricing/ (Tokyo Region) HPC, C, M, R等のインスタンスファミリーははStandard。PやTrnはAdvancedという分類。 上記の価格が、EC2の費用の上にかかってくる。
  10. © 2024, Amazon Web Services, Inc. or its affiliates. OS

    & AMI Supported OS: AL2, RHEL9, Rocky Linux 9, Ubuntu2204 https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_installers.html#working-with_ami_installers_os Sample AMI with Amazon Linux2 https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_samples.html Custom AMI 1. Pick a supported OS 2. Install PCS agent and Slurm packages 3. Install additional apps/libs/drivers 4. Create AMI (and use that AMI on PCS) Doc : https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_custom.html Youtube : https://youtu.be/3ysMkZrDlGI?si=WTEnx0fB5jdbECPT
  11. © 2024, Amazon Web Services, Inc. or its affiliates. Launch

    template Using Amazon EC2 launch template with AWS PCS https://docs.aws.amazon.com/pcs/latest/userguide/working-with_launch-templates.html User Data https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ec2-user-data.html 1. Install software packages 2. Run scripts from S3 bucket 3. Set global ENV VAR 4. Mount network storage (EFS, FSx)
  12. © 2024, Amazon Web Services, Inc. or its affiliates. Demo

    Environment 1. Create VPC & subnets 2. Create cluster security group 3. Create a PCS cluster 4. Create shared storages 5. Create Instance profile 6. Create Launch templates 7. Create login node group 8. Create compute node group 9. Create queue 10. Login to your cluster 11. Run jobs https://docs.aws.amazon.com/pcs/latest/userguide/getting-started.html
  13. © 2024, Amazon Web Services, Inc. or its affiliates. Take

    aways • PCS manages cluster controller. That minimize the cluster operation workloads. • PCS offers a unified set of APIs to help build and operate clusters supporting a range of HPC and scientific and engineering modeling workloads. • PCS charges node management fee for both controller node and compute nodes. • Need to work security group, launch templates, IAM role, network,,, together
  14. © 2024, Amazon Web Services, Inc. or its affiliates. PCS

    Resource 16 • Blog • AWS HPC Blog https://aws.amazon.com/blogs/hpc/ • YouTube • PCS series https://youtube.com/playlist?list=PL6tstO5J3TRGPTfz6C4XY3gT6Fg70nAPN&si=9_QlXr9z96wwraJJ • Doc • User guide https://docs.aws.amazon.com/pcs/latest/userguide/what-is-service.html • API reference https://docs.aws.amazon.com/pcs/latest/APIReference/Welcome.html • Github • HPC recipe https://github.com/aws-samples/aws-hpc-recipes/tree/main/recipes/pcs
  15. UPDATE THIS PRESENTATION HEADER IN SLIDE MASTER © 2024, Amazon

    Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Thank you! © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Hiroshi Kobayashi [email protected]