Eventually, supercomputing grew out of its monastic phase and became a powerful shared resource, which also enabled sharing of expenses. This has transformed supercomputing into a sort of computational utility that can be turned on and off like a water faucet. Today’s supercomputers are critical national resources for scientific and industrial research. Many problems in engineering, physics, chemistry, biology and environmental science are computationally intractable without supercomputers delivering the calculating power of tens of thousands of processors – and beyond.
Despite their relative ubiquity, supercomputers are still highly treasured resources. There are now hundreds of designs, all hotly competing for rank in the Top 500 Most Powerful Supercomputing Sites, the A-list of supercomputer celebrity. This is the story of how the University of Nevada, Las Vegas (UNLV), became the recipient of a scientifically coveted asset, an actual Top500 supercomputer – Intel’s Cherry Creek system – to possess as its own.
UNLV uses the now fully operational Cherry Creek machine to augment its own research in biology and medicine, as well as other sciences and engineering, via access over highspeed fiber optic cables to the computer’s home at Switch’s nearby SUPERNAP datacenter. The university also makes Cherry Creek available to other research institutions, and to industry on a pay basis.
But submitting the winning proposal is just the beginning of the story. UNLV was only able to operate the system through the generous contributions of Intel, Switch , Cisco and Altair, all of whom donated material resources and talent.
Two Men with a Dream
Our story starts with the birth of a new supercomputer. In 2013 Intel unveiled its brand new Cherry Creek supercomputer at the annual Supercomputing Conference, with impressive specs:
9,936 compute cores
131.5 TeraFLOPS (TFLOPS) performance
Only 74kW power consumption
The system won a ranking of 400 in the 2013 Top500 Supercomputer Sites list, and number 41 on the 2013 Green500 rankings of the most energy-efficient supercomputers. These are amazing accomplishments for a system designed and built in only a year.
Intel developed the Cherry Creek system using off-the-shelf rackmount servers, commodity processors, memory and storage, and an innovative water-based cooling system. The final specifications made it highly desirable as a research tool rather than just a demonstration system, so in 2014 Intel announced it would entertain applications from university research institutions to become the host operator of the system.
The first step for Intel was to find a suitable data center that could handle not only the original Cherry Creek system but also the projected upgrade that was sure to occur. Rob Roy, CEO and founder of Switch, after discovering that Intel was looking for a world-class data center to house Cherry Creek, wasted no time in contacting Intel. Intel had numerous choices in data center locations, but they immediately selected Switch as the new and future home of Cherry Creek.
Roy, after learning that Switch was selected by Intel, immediately called a colleague, the Executive Director of UNLV’s National Supercomputing Center Joe Lombardo. Roy informed Lombardo that an Trends in Technology opportunity to acquire Cherry Creek as a research resource via a competitive proposal process was underway. Roy and Lombardo had been colleagues and partners on other projects, but this was truly the most exciting in terms of what it could accomplish in terms of a plus-up to the university research infrastructure.
Intel’s announcement then fell upon the desk of Lombardo. No stranger to supercomputers – he was affiliated with UNLV since 1991 in a technologist role – Lombardo dared to dream, along with Roy, that UNLV could be a contender in the Cherry Creek competition. He submitted an application, citing UNLV’s experience with hosting and managing other supercomputers.
Neither was this Lombardo’s first supercomputer competition. A few years earlier, UNLV had proposed building its own system from scratch. “We are driven by grants, and in this case we started getting interested in green computing in 2009,” recalls Lombardo. “Intel and Switch teamed with us and Pacific Northwest National Laboratory (PNNL) to build a green supercomputer.” Although it got excellent reviews, this effort did not succeed, but the interest in Green High- Performance-Computing endured.
This missed opportunity didn’t dampen Lombardo’s enthusiasm for lining up a world-class supercomputer for the university, so he was on the lookout for another possibility when Intel announced the Cherry Creek competition. According to Lombardo, Intel’s Cherry Creek was a “waterfall project” – not intended for the market. But once securing its number 400 spot on the Top500, Intel decided the system should be more than a trophy. “They then wanted to give that system to somebody,” says Lombardo, “and we wrote our proposal and won against two other universities. Intel really liked what we proposed and wanted the machine to be used for real research.”
Getting a supercomputer as a gift is kind of like being given an oceangoing yacht. Despite being free, you still need to dock and fuel your windfall, and pay for a crew to operate it. Similarly, operating a supercomputer entails considerable ongoing expense for housing, power, environmental controls and support talent. Lombardo, well acquainted with the ways of academic funding, knew that hosting a new supercomputer would require money, which would require academic fundraising. So even before being awarded the system, he began laying the groundwork for hosting the system.
The award of Cherry Creek came in 2014, but it would be another six months before Lombardo heard the machine boot up for the first time. Part of the delay was circumstance, but part was due to the pure complexity of supercomputers like Cherry Creek.
Classes of Supercomputers
Traditional capability-oriented supercomputers typically take years to develop, cost millions of dollars and only work with certain specific research problems. An alternative approach, focusing on capacity rather than capability, enables supercomputers such as Intel’s Cherry Creek system to be designed in less than a year using off-the-shelf components.
Both supercomputing approaches achieve speed through massive parallelism. But a supercomputer built with off-the-shelf components differs markedly from a traditional supercomputer, which sports an array of one processor type, all working in lockstep on a single calculation using shared memory. Supercomputers built on off-the-shelf components consist of thousands of independent computational “nodes,” each having one or more computational cores with their own dedicated memory and OS. The cores are interconnected via an external communications mesh rather than internal processor-to-processor links.
Supercomputer systems built on off-the-shelf components scale more readily than traditional supercomputers, as nodes are easy to add. But they tend to be bulkier and have more interconnections than traditional systems.
Cherry Creek’s initial release had several unique architectural features:
Each node was a heterogeneous combination of an Intel 12-core X86 Xeon E5-2697v2 to run the Linux OS along with three Xeon Phi™ 61-core 7120P Coprocessors, totaling 195 cores.
Each 2U half-rack-width node was housed in a SuperMicro FatTwin™ chassis containing 128GB DDR3 memory, solid-state disk and a network fabric controller.
Two racks, containing 24 nodes each, provided 9,360 total cores.
The system used liquid cooling custom designed by CoolIT, which efficiently removes heat that would otherwise destroy the system.
But technology had advanced considerably in the intervening two years, and Intel didn’t want to showcase outdated gear. For not much more money, Intel determined that the existing nodes could be upgraded to realize a nearly three-fold increase in capacity and performance, resulting in a second-generation “Cherry Creek II” system, sporting 26,000 cores. Which is exactly what Intel did, upgrading the system in place while UNLV researchers – and Lombardo – keenly awaited its grand opening.
UNLV’s National Supercomputing Center for Energy and the Environment is a full-service supercomputing facility with on-site and off-site user training, national network accessibility and a mission for excellence in education and research in supercomputing and its applications.
Location, Location, Location
As noted previously, owning a supercomputer is only half the burden of running one. You must also house it, cool it and maintain it. Building a data center capable of housing Cherry Creek II would have been an expensive proposition for UNLV.
Fortunately, a previous relationship with Switch founder Roy blossomed into an offer to donate a 60-month, no-cost pilot program to house Cherry Creek at Switch’s world-class SUPERNAP facility. Roy’s claim to fame is developing the world’s most efficient, highdensity data centers. His offer was made before UNLV even won the Cherry Creek donation from Intel, and helped position UNLV attractively compared to larger schools with grander facilities.
In 2015 UNLV, in collaboration with Switch and Cisco, established a Dedicated Research Network providing dual redundant 100Gb/s connectivity from the Cherry Creek racks back to UNLV. Cisco donated approximately $1 million in networking gear and cloud software to achieve a combined data transmission rate of 200Gb/s. This super high-speed bandwidth provides “being there” real-time interactive performance for researchers using the supercomputer.
High-Performance Computers (HPCs) such as Cherry Creek II don’t manage themselves; you need specialized software to do that. Management entails scheduling access to a pool of cores for a specific time period, ensuring that simultaneous users don’t interfere with each other, and monitoring system instrumentation to measure utilization and power efficiency.
In UNLV’s situation, supercomputing time is also made available to industry, so any management solution has to be able to perform cost analysis and tracking. UNLV chose Altair’s PBS Works as a management platform. “PBS Works makes life much easier, provides fine-grained control. We’ve used other cluster management tools, such as on the Cray. We went to a demo with Altair and switched to PBS right away,” says Lombardo.
Getting the “big picture” view of Cherry Creek II’s performance is critical, too. UNLV has several large-screen real-time displays showing utilization and projects in operation in the cluster. “Ron does our ‘single-pane-of-glass’ consoles,” explains Lombardo. “Screens in the machine room show pegged vs. idle [time], etc. I love 92% utilization.”
Because research efforts must be documented, computing resource usage must be tracked over time. This is particularly important for paid HPC cloud access available through Switch’s supercomputing-as-a-service. “PBS provides reliable reporting, which we route back to Switch for its HPC cloud services planning,” explains Lombardo. With nearly 30,000 cores, Cherry Creek II requires customized management tools that Altair developed as part of its contribution. Lombardo points out, “PBS software is the primary vehicle for keeping Cherry Creek in flight. The 26,000 cores would be chaos to manage without it.”
Cherry Creek supports a range of activities within UNLV and other research communities, from oil fracking simulations, to chemistry modeling, to bioinformatics and proteomics (the science of analyzing the structure, function and interactions of proteins produced by genes). Intel itself sponsors research through its informal “Intel Fellows” program, in which participating scientists craft their research for Cherry Creek II. From the outset, Intel wanted more than just a demonstration system on the Top500 list.
Beyond UNLV researchers, other universities will work with Cherry Creek, such as the University of Colorado Anschutz Medical School’s Department of Pharmaceutical Sciences dealing with sequenced rat brain transcriptome investigations.
As for UNLV, in August of 2017 the UNLV School of Medicine opens. Some domains of medical research will focus on Big Data and data analytics, which will place even more demands on UNLV supercomputing resources.
The Future of Cherry Creek
Following its formal christening in June 2015, Lombardo sees a bright future for Cherry Creek II. There is already plenty of demand for the existing 26,000 cores, so Intel’s upgrade decision was a good one.
According to Lombardo, the system could scale to 300,000 cores as processor technology advances. As Internet bandwidth increases globally, more researchers will find remote connection to Cherry Creek II feasible, adding to the nascent HPC Cloud Computing marketplace.
HPC as a national cloud resource is trending today, as well. The Texas Advanced Computing Center (TACC) provides a la carte access to a variety of supercomputing systems, and the National Science Foundation-supported Extreme Science and Engineering Discovery Environment (XSEDE) is a virtual supercomputer that shares donated research compute cycles, data and expertise.
Lombardo predicts that HPC Cloud “…can lift supercomputing to the level of a day-to-day resource like the Internet, creating ubiquitous HPC you don’t need to own.” Regarding UNLV’s own migration to the cloud, “We want to move more and more from our campus to Switch,” says Lombardo. “We started in a world of centralized computing, and we’re going full circle. Supercomputing will be a routine cloud resource in three to five years.”