Skip to content

Best Practices for Securing your HPC Cloud – Part II

For anyone exploring HPC in the cloud, security is an important topic. In an earlier article, we addressed perimeter security, network access controls, and configuring cloud VPCs, VPNs, and Firewalls, while showing how IT Service Providers similar to Office Systems can help to manage your office computer systems. In this article, we’ll continue walking through our multi-layer security model pictured below and cover authentication & authorization (A&A), both at the level of the cloud provider and at the level of deployed instances and applications.

At a high-level authentication is about validating an actors identity. Authorization, on the other hand, occurs after a user or process is authenticated and involves determining whether an actor is permitted to access particular resources or services. As with other security-related topics, authentication and authorization is multi-faceted and applies across multiple cloud and application services. For our purposes, we’ll narrow the discussion to focus on topics relevant to HPC cluster deployments.

While we cover authentication and authorization in the context of Amazon Web Services (AWS) here the same concepts apply to other cloud providers. We’ll start by discussing securing access to cloud-level accounts, and them move on to instance and application-level A&A.

Securing your cloud account(s)

Before you think about protecting access to machine instances, applications and data, it’s a good idea to think about how to protect the accounts used to administer the cloud service. In most HPC environments there are multiple users and potentially multiple administrators. Even if you’re the only administrator, you’ll eventually want to delegate access to others at some point to allow them to perform specific tasks. When multiple users are involved in managing the cloud service, managing permissions and auditing user activities becomes very important.

The convenience of the cloud is a double-edged sword

In on-premise data centers, a rogue administrator can usually do only limited damage. They might have root access to some servers, but have no idea where routers, firewalls, or backups are located let-alone how to access them. This is why many cloud and data storage services invest heavily in security systems such as Verisure Smart Alarms and other measures.

In cloud environments where services are software-configured, a malicious actor with access to your top-level credentials can do an enormous amount of damage quickly. Sometimes cloud administrators spend so much time hardening against external threats, that they forget about equally serious internal threats.

Identity and Access Management

While terminology varies across cloud providers, Identity and Access Management (IAM) helps organizations control how users access various cloud services. Google offers Google Cloud IAM; Amazon offers AWS IAM and Microsoft offers various implementations of Azure Active Directory.

When there are multiple people managing cloud services, the worst thing you can do is share top-level login credentials. A better approach is to define individual IAM users, groups, and roles, and grant permissions based on the concept of least-privilege so that users only have access to facilities that they need. Properly configuring IAM helps ensure that actions are auditable, and reduces chances of the environment being accidentally or deliberately compromised.

IAM Users and Groups

Cloud administrators should avoid using their organization’s root-level credentials when they can. Instead, administrators should create IAM users for everyone that will need access to the management console or applications needing to access the AWS API. Alternatively, they could use an externally managed service provider like Logicata to aid with their AWS cloud services.

Users can optionally have login access to the AWS management console, or they can be granted an Access key ID, and a Secret access key (referred to collectively as an access key) needed to access to AWS API or CLI. Access keys are used by applications such as Atlair’s Navops Launch to perform operations on behalf of an AWS user. An example showing how IAM users can be configured is provided below.

In this example, five people have different levels of access to the same root-level cloud account. Each user is associated with one or more IAM Groups (covered below) that control what actions can be performed on various cloud services.

Bill is the top-level administrator, but rather than log in to AWS using the root-account he’s created a group for himself called super-admin that has permissions he needs to do his job day to day. If Bill goes on vacation, he can temporarily designate someone else as a super-admin without handing over the top-level credentials.

Fritz is both a cluster administrator and a developer. As a developer, he needs access to AWS services that cluster administrators don’t need (like API gateway management and container management privileges). Gord and Gary work on the cluster in different roles, but they don’t need login access to the AWS console (note that they have no password). Instead, they’ve been given an access key so they can run site-specific scripts for focused tasks. Rob is the lead cluster administrator, so he has console access as well as access keys. Rob also manages the Navops Launch environment used to provision various clusters, so he (like Bill) needs elevated permissions to manage EC2, EBS, S3, VPC, and various AWS networking features.

Administrators can attach policies to each IAM Group. The figure below shows policies attached to the launch-admin group for Navops Launch administrators. AWS provides a set of default policy templates (AmazonS3FullAccess and AmazonAPIGatewayAdministrator in the example below). Administrators can create custom policies that specify what actions can be performed at a more granular level.

Administrators might start by attaching an AWS supplied policy for a group like AmazonEC2FullAccess. An Access Advisor feature in the IAM Group interface shows the privileges that are used on a day-to-day basis. Armed with this knowledge, administrators can create a custom policy (NavopsLaunchPolicy) that provides only the minimum set of permissions required for members of the launch-admin group.

Multi-factor Authentication

Because cloud administrator accounts are so sensitive, AWS offers multi-factor authentication (MFA) that can be selectively enabled for each IAM user. If MFA is enabled, in addition to their login credentials, users will need to enter a six-digit numeric code based on a time-synchronized one-time password algorithm. AWS supports a variety of virtual MFA devices (running on iOS or Android phones), Universal second factor (U2F) security keys that plug into a USB port on a computer, and stand-alone hardware MFA devices. AWS previously supported MFA via SMS text messages but have discontinued this service opting for more secure methods of MFA instead. At a minimum, administrators should configure MFA to protect their root account. It is a good practice to enable MFA on any account with elevated privileges.

IAM Roles

IAM roles describe permissions for entities that are neither individuals or groups. For example, to facilitate a cloud-bursting solution you might need to give cloud-aware provisioning software such as Navops Launch permission to create cloud instances, networks, and storage on your behalf. By using an IAM role as opposed to specifying individual user credentials, the environment is made more secure since access keys don’t need to be requested and stored by a program needing access to various cloud services.

For example, if a regular AWS user invokes Navops Launch to provision or scale a Grid Engine cluster, Navops Launch will appropriately ask for the user’s AWS key ID and AWS secret key to make sure they are authorized to provision resources. Having any software program handle and potentially store access keys is a security risk. With AWS roles, the role can be attached to the machine instance itself. This allows a Navops Launch instance to grow and shrink clusters without needing to know handle access keys belonging to individual administrators. A video demonstration explaining how to use IAM roles can be used with Navops Launch is provided here.

Large organizations may have multiple top-level AWS accounts. IAM Roles can dramatically simplify the administration of these multi-account environments by providing cross-account access. We won’t cover this in detail here, but AWS provides a tutorial on delegating access across multiple accounts with IAM Roles. Once we’ve secured access at the cloud-level administrator level, we can move on to authentication and authorization at the application and machine instance level.

AWS Key Pairs and public key cryptography

If you’ve been running an on-premise HPC cluster behind a firewall, you may have implemented a relatively permissive security model using /etc/passwd files, NIS, or LDAP to manage login credentials. While this basic level of security is fine in some environments, it probably doesn’t cut it in the cloud, unfortunately. A better practice is to use key pairs for authentication (and encryption). Battle-hardened Linux admins will be familiar with key pairs and public key cryptography, but for others, these may be new concepts.

Keys are essentially long strings (Amazon EC2 uses 2048-bit SSH keys by default for example), and unlike some passwords, they are computer generated and almost impossible to guess. Keys come in pairs – a public key available to everyone, and a private key held only by the owner. Public keys are available to others, and anyone with a copy of a public key can encrypt a message such that only a recipient with the corresponding private key can read it. Similarly, private keys are carefully guarded and are held by users as proof of their identity. Users can essentially sign communication with their private key, and when decrypted with the public key, this message serves as proof of identity. Public key cryptography frees up users from remembering long, complicated passwords or worse still, writing them down.

Most cloud services enforce login using key pairs by default. For example, when instances are created in Amazon EC2, users need first to provide a key pair. AWS gives users the ability to provision a key pair and download the associated private key anytime. AWS stores the public-key in the cloud, and users download their private key and store it in a local file. The contents of the private key file looks something like the key below.

When Amazon EC2 deploys a machine instance, the instance is configured so that it can only be accessed by the holder of the private key.

The cloud provider will typically provide a default administrative login (like ec2-user, centos or ubuntu) depending on the Linux distribution selected for the machine instance. For a user to login to a host, they will need the IP address or DNS name of the host they provision in AWS as well as the local copy of the private key. Knowing about your IP address allows you to implement more security measures so it’s worth finding out about your admin login and exploring those modifications. The example below shows how to connect to an AWS EC2 host via secure-shell (ssh) from the public internet using of using the private key stored in the file tensorflow.pem.

Using the private keys to manage the login process is very secure. There is no way that someone can guess your password because there is no password associated with the account.

When you provision a host on AWS using your key pair, the public key associated with the key pair will be stored in the file ./.ssh/authorized_keys associated with your login account (ubuntu in our example).

When cluster administrators deploy clusters, they typically deploy multiple instances with the same keypair for convenience so that they can log in to any host. With the basics setup by AWS, knowledgeable administrators can create additional users and use operating system level facilities to generate new keys (ssh-keygen) to enable secure and passwordless login access between cluster hosts in additional to traditional usernames and passwords. More information on securing Linux instances with private keys is provided in the article working with passwordless ssh logins using private keys.

In the next article in this series, we’ll cover additional topics security-related topics like trusted instances, encrypting storage, and other security-related tools helpful to HPC administrators operating in the cloud.