As the world slowly emerges from a two-year (and still dynamic) pandemic, high-performance computing (HPC) probably isn’t the most important thing on most people’s minds — but perhaps it would be if people understood how vital the results it enabled are, and how HPC-powered insight benefits everyone on the planet.
Not only have the world’s top scientists and medical experts used HPC to develop COVID-19 vaccines and treatments to combat the disease’s spread and severity, their fight has been supported by some of the world’s most powerful computing equipment and orchestrated by solutions that enable critical workloads to run at blinding speed.
Johnson & Johnson subsidiary Janssen Pharmaceuticals knows how important it is to work fast — after all, people’s lives depend on it. Martin Dellwo, advanced computing manager at Janssen, joined Altair senior vice president of HPC solutions Bill Bryce and Phil Eschallier, CTO at RCH Solutions, for a recent webinar where they shed important light on HPC and cloud computing for life sciences research and development (R&D).
The Importance of Speed
What if vaccine development had been delayed in the pandemic’s earlier stages? Without powerful computing resources to support the minds behind the research, R&D would have been a much slower process and delays could’ve resulted in more hospitalizations and deaths.
Healthcare foundation The Commonwealth Fund estimates that without a vaccination program, there would have been around 1.1 million additional COVID-19 deaths and over 10.3 million hospitalizations in the U.S. alone by November 2021. That’s over 3x more deaths and 4.9x more hospitalizations than the actual toll. Expand those numbers outside the U.S. and consider a worldwide population and the loss of life would be horrific and staggering.
The importance of HPC goes well beyond vaccine development all the way through the wide world of drug discovery and healthcare innovation. Few undertakings are more vital for the world’s population.
Janssen Accelerates R&D in the Cloud
To expand their HPC resources, Janssen has taken computing to the Amazon Web Services (AWS) cloud. “The ability to marshal vast resources can accelerate research, but traditionally this has been constrained by the limits of in-house datacenters, budgets, and technology limitations,” Dellwo said. “The cloud-based model is a unique opportunity that provides massive scale with greatly increased flexibility, agility, and less overhead.”
The Janssen R&D Advanced Computing team is frequently called on to deliver solutions that don’t fit within typical corporate computing. “We find that we’re often on the bleeding edge,” Dellwo said. The team uses top-tier technology to design high-performing, scalable platforms that deliver efficient computing power to their scientists and researchers. “We do experiments that we never considered before even with traditional HPC, and we can do this in tremendously compressed time frames, enabling scientists to make critical decisions much faster,” Dellwo said.
Janssen’s cloud-first strategy has opened new opportunities to experiment with richer, more complex models that drive precise results and perform orders of magnitude more simulations than with traditional methods — and created significant time-saving opportunities.
Delivering on Fast Compute
The Janssen Biotherapeutics team (JBIO), which works on antibody design and does a lot of structural calculations, approached Dellwo’s team about getting access to HPC resources to augment their deep learning training. The solution was a new HPC cluster complete with Altair® Grid Engine® for workload management and Altair® NavOps® for cloud enablement — and it took less than two weeks to implement.
The JBIO deep learning project involved 225,000 single-CPU jobs, which ran between 0-3 hours each on a configured limit of 300 concurrent nodes and generated final compressed data of 500 GB from much larger uncompressed text files. This powerful computing enabled the JBIO team to condense 38.5 CPU-years of compute time (assuming 1.5 hours per job) into a mere seven calendar days of running time, which even on a modern 36-CPU workstation might have otherwise taken a calendar year.
Shortly after the JBIO team implemented the solution, they reported success. “It is unbelievable the impact this has had,” the JBIO team said. “We are able to deliver on our team’s goal because of [the team’s] execution of scalable HPC. Even though this has only been online for a couple of weeks, we have had major impact on multiple projects. This enables us to do big data science work!”
Project cost was approximately $20,000 — a nearly 40% discount using AWS’s Spot-market pricing compared to on-demand pricing.
Handling Millions of Jobs
NavOps orchestrates the autoscaling rules that allow the Janssen team to cycle resources in and out of service in the cloud, so they’re providing requested resources as quickly as possible and then spinning them down just as quickly when no longer needed. “This is how we achieve tremendous scale at reasonable cost,” Dellwo said. They regularly reuse code and automate as much as possible.
In 2021 Janssen’s HPC environment included 15 production clusters defined in the AWS environment. Since the scaling is dynamic, at any given time they may be running hundreds of servers utilizing thousands to tens of thousands of CPU and GPU cores. Over 11 million jobs were run during the year, totaling over six million CPU hours (689 CPU years) and 750,000 GPU hours. Janssen’s shared computing resources also support areas like statistical modeling, molecular dynamics, free energy perturbation, machine learning, medical devices modeling, and more — and demand is only growing.
The Future of Healthcare R&D
As COVID-19 continues to evolve and as regional caseloads fluctuate, staying ahead of the curve with vaccines and drug treatments is essential. We now know new virus threats are always a possibility — and companies like Janssen Pharmaceuticals and the Johnson & Johnson organization will continue to respond quickly to novel and changing threats. Thanks to world-class HPC resources, their scientists and engineers can keep the world prepared for new situations and on the cutting-edge of existing treatment options.
“In the old on-premises universe, it used to be that you set up one huge environment with all the resources you could muster on the floor. It had to serve all users and be able to juggle all sorts of workloads amongst multiple queues,” Dellwo said. “In a cloud space it’s very different. Each type of workload can scale independently and is not constrained at all beyond any limits that we build into the rules ourselves.” The numbers tell the story. “With six million CPU hours these clusters have been a huge benefit to our users.”