Thursday 23 August 2012

The NVIDIA Blog

The NVIDIA Blog

Link to NVIDIA

NVIDIA Employees Get Hands-On To Support Local Schools

Posted: 23 Aug 2012 11:27 AM PDT

NVIDIA claw logo with green spines

Modeling DNA with pipe cleaners and beads. Learning about Newton's laws of motion using plastic bottles and straws. Creating electrical circuits with aluminum foil and cardboard.

These are activities thousands of Silicon Valley students will engage in this school year thanks to the efforts of more than 500 NVIDIANs from our Santa Clara, Calif., headquarters and Resource Area for Teaching (RAFT), a local nonprofit.

Pack To School event
NVIDIANs assembling learning kit components
at last week’s Pack to School event.

During the second annual "Pack to School" volunteer event last week, employees volunteered more than 14,500 hours assembling hands-on learning kits for donation to local NVIDIA-partner schools.

In total, nearly 4,500 kits were created, and more than 44,000 Bay Area students will be reached through this effort.

Giving Math and Science Education a Boost

Kids in San Jose elementary schools average only seven minutes of science instruction a week. Many teachers lack the time, training or expertise to adopt hands-on learning approaches on their own. And, with school budgets tighter than ever, teachers in California are spending more of their own money to buy materials for their classes.

This is where RAFT comes in. With a mission to help educators transform a child's learning experience through hands-on education, RAFT uses surplus materials donated by local companies—from zip ties and foam to CDs and paper clips—to create inexpensive lessons for teachers to use in their classrooms.

Each lesson is designed to equip teachers with a fun, hands-on way to teach kids about math and science, helping turn abstract and complex ideas into activities students can comprehend.

Interested in inspiring a student? Visit RAFT's site to see how you can help.

 

Unleash Legacy MPI Codes With Kepler’s Hyper-Q

Posted: 23 Aug 2012 12:20 AM PDT

supercomputing-pipeline

We said the Kepler architecture-based NVIDIA Tesla K20 GPU would be the highest performance processor the HPC industry has ever seen when we unveiled it at the GPU Technology Conference in May.

But recent performance tests on real-world scientific applications show that the forthcoming GPU easily surpasses even our highest expectations. This GPU simply rocks!

I'm particularly thrilled about Kepler's new Hyper-Q feature, which helps increase performance for thousands of legacy MPI applications without requiring a major code rewrite.

To illustrate the power of Hyper-Q, I picked a traditionally difficult code for GPUs called CP2K, a popular MPI-based molecular simulations code. Hyper-Q maximizes GPU utilization for the CP2K application, resulting in more than double the performance compared to running the same code without it.

How Hyper-Q Works

A GPU consists of multiple CUDA cores grouped into streaming multiprocessors operating in parallel. A hardware unit called the CUDA Work Distributor (CWD) is responsible for assigning work to the individual multiprocessors.

In the current Fermi architecture, the CWD has a single connection to the host CPU and work from different MPI processes is merged into this single queue. This serialization could easily lead to false dependencies among work from different MPI processes, limiting the amount of work that can be executed concurrently on the GPU. This often results in an underutilized GPU.

Hyper-Q removes this limitation. As shown in the graphic, the new Kepler-based Tesla K20 GPU provides 32 work queues between the host and the GPU, enabling multiple MPI processes to run concurrently on the GPU. Each MPI process can be assigned to a different hardware work queue, maximizing GPU utilization and increasing overall performance.

Hyper-Q
By enabling more MPI processes on the GPU, Hyper-Q maximizes GPU utilization,
increasing overall performance.

Reduced Development Effort for Legacy MPI Codes

While MPI developers will be thrilled with the added performance, they'll be equally enamored with how Hyper-Q makes porting legacy MPI codes to the GPU significantly easier.

Legacy MPI-based codes were often created to run on multicore CPU systems, with the amount of work assigned to each MPI process scaled accordingly. However, this often meant that MPI processes didn't generate enough work to fully occupy the GPU. To make the code launch enough work to fully utilize the GPU, developers frequently were required to modify their codes significantly.

Hyper-Q reduces recode efforts considerably because developers can now throw many MPI processes with small- and medium-size workloads at a shared GPU. Developers no longer need to modify their codes to put enough work into a single MPI process. Rather, they can send up to 32 MPI processes with variable workloads to the GPU and just let the GPU do all the heavy lifting to maximize performance.

Case In Point: CP2K

CP2K is a widely used atomic and molecular simulation code that runs at many of the world's supercomputing sites. CP2K is parallelized using MPI and OpenMP, and CUDA is used in some models where GPUs are targeted.

With Fermi-based GPUs, developers actually experienced reduced performance gains when MPI processes were limited to small amounts of work, particularly in strong scaling simulations. While the CPU was highly utilized, the GPU stayed completely inactive in substantial portions of the simulation.

The following benchmark below shows the impact of Hyper-Q.

Hyper-Q benchmark

This small data set of 864 water molecules is usually problematic for GPUs. Without Hyper-Q, only one MPI process runs on each node with GPUs, and the performance curve from 1 to 16 nodes is not much better than with CPU-only simulations.

With Hyper-Q, it is now possible to use the same number of MPI processes per node as in the CPU-only case, which means 16 MPI processes per GPU in this instance. This unlocks the full benefit of the GPU, leading to a speedup of 2.5x with Hyper-Q enabled.

And the best part? No extra coding effort is necessary to enable Hyper-Q. All it takes is a Tesla K20 GPU with a CUDA 5 installation and setting an environment variable to let multiple MPI ranks share the GPU – Hyper-Q is then ready to use.

Be Prepared, Start Today

The Tesla K20 will be the first GPU to feature Hyper-Q. It's scheduled to be available by the end of the year, but you can start preparing today.

Begin by accelerating your code using OpenACC. With OpenACC directives, developers simply insert compiler hints into the code and the compiler will automatically map compute-intensive portions of the code to the GPU. By using directives within MPI processes, you don't need to worry about how much workload is created by the OpenACC compiler because Hyper-Q ensures the GPU stays as occupied as possible.

You can get a free, 30-day trial license for an OpenACC compiler on the NVIDIA website. If you currently work with MPI codes or if you have any questions about Hyper-Q, I'd love to hear from you.

No comments:

Post a Comment