PCS Orchestration Access cloud resources at scale Job management Use common job schedulers (using Slurm) Easy migration Migrate without any code or script changes for HPC workloads
capabilities of AWS PCS Unified compute and remote visualization management Dynamic resource provisioning and scaling Ability to bring your own applications Managed updates and in-depth telemetry
Cluster: Assembly of compute nodes, file systems, and job queues, along with login nodes and workstations, hosting a scheduler. Compute node group: Collection of Amazon EC2 instances with a distinct configuration of instance types, networking, storage, software, and security. Queue: Virtual location where jobs are stored until the scheduler executes them on instances in Compute node group(s). Login node group: Collection of Amazon EC2 instances where users can submit jobs or manage and visualize data. External resources: Customer-provided networked resources that support a cluster, like shared storage, directory, accounting database… Queue Jobs Cluster Storage Accounting database* LDAP directory Metrics Logs Cost Explorer Budgets Queue Compute node group Login node group Compute node group Compute node group Jobs Queue Jobs * Not in GA
architecture Private subnet On-premise End users (team 1) Directory services End users (team 2) SSH 1 BYO Login nodes 2 Submit jobs SSH 1 AWS Account Customer VPC PCS-Managed Service VPC) 2 4 4 Submit jobs Compute nodes allocated Slurm accounting DB* Jobs queued AWS services/resources S3 storage, license servers, databases, etc. Login Node Group 1 Min =1, max = 1 C5 Compute Node Group 1 Min =0, max = 20 C5 C5 Amazon machine image (AMI) AWS IAM role Amazon EC2 launch template Node Group configuration 1 Node Group configuration 2 PCS Cluster Slurm controller Queue 1 PCS controller, replicas, etc. VPN or Direct Connect ENI * Not in GA
& AMI Supported OS: AL2, RHEL9, Rocky Linux 9, Ubuntu2204 https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_installers.html#working-with_ami_installers_os Sample AMI with Amazon Linux2 https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_samples.html Custom AMI 1. Pick a supported OS 2. Install PCS agent and Slurm packages 3. Install additional apps/libs/drivers 4. Create AMI (and use that AMI on PCS) Doc : https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ami_custom.html Youtube : https://youtu.be/3ysMkZrDlGI?si=WTEnx0fB5jdbECPT
template Using Amazon EC2 launch template with AWS PCS https://docs.aws.amazon.com/pcs/latest/userguide/working-with_launch-templates.html User Data https://docs.aws.amazon.com/pcs/latest/userguide/working-with_ec2-user-data.html 1. Install software packages 2. Run scripts from S3 bucket 3. Set global ENV VAR 4. Mount network storage (EFS, FSx)
aways • PCS manages cluster controller. That minimize the cluster operation workloads. • PCS offers a unified set of APIs to help build and operate clusters supporting a range of HPC and scientific and engineering modeling workloads. • PCS charges node management fee for both controller node and compute nodes. • Need to work security group, launch templates, IAM role, network,,, together