Friday, 16 July 2010



More Questions Answered: Ian Buck Live Chat Follow-Up

Posted: 16 Jul 2010 08:44 AM PDT

Tuesday's live chat was a real success—thanks to all of you who joined us and participated. If you missed the chat, you can watch the replay, here. We were pleasantly surprised to receive so many great questions – while we couldn't answer all of them during the chat itself, we've taken the chance to review and answer many of them for you here. Take a read and let us know what you think or if you have additional questions for me.

Reminder: The next chat takes place Thursday July 22 at 11am PDT with David Kirk. Stay tuned to the blog for more details.

Question from blackpawn:
I'm curious if there are any future plans or thoughts on exposing the hardware rasterizer to CUDA so we can say goodbye to OpenGL and DirectX. :)

Right now we don't have any plans for that. Graphics APIs are designed and tuned to be great at graphics. We have focused on good interoperability between CUDA C/C++ and the graphics APIs so you can easily switch back and forth without copying data, using the best tool for the job.

Question from Steve W:
Fermi supports __syncthreads_count(), letting you do a very simple kind of parallel sum without having to code the reduction itself. Is this implemented by microcode, library code, or it an actual hardware op?  Reduction (and sum prefix) are so useful and common, could future GPUs be extended to do block-wide reductions or sum-prefix computes as a single op (maybe over a few clocks) or does the manual classic way pretty much as efficient as can be expected?

The atomic operations to shared memory should help you out in these cases. Check out the CUDA Programming Guide for Fermi. (

Question from VizWorld:
Does NVidia believe they can sustain their amazing performance boosts via simply adding additional cores to the die? We've yet to see the 512-core Fermi's, is the next 'step' a 1024-core?

Certainly for HPC the problem sizes are very large. While increasing the core count can limit you on small sized problems, we haven't gotten close to typical problem sizes.
Question from Ahmed Helal:
How costly (in terms of cycles) is context switching in CUDA (switching from a thread to another)?

Pretty Quick. Microseconds. The GPU will complete all the active work before switching to the next context.

Question from kiun:
Can we expect a way to debug kernels without the need of two computers any time?

On Linux this is already supported via cuda-gdb, Allinea DDT, and TotalView debuggers. On Windows, it's a bit more tricky, since the windows manager is using the GPU for graphics compositing so hitting a breakpoint in your kernel means the windows manager stops updating.

Question from Guest:
Is there any chance of getting cuda-gdb on the Mac?

We're looking into that.

Question from devarde:
Are thare any plans to make a tool like Nexus but that works in an OS different than Windows such as Linux or Mac OS X

On Windows, where the vast majority of application developers are using Visual Studio, it makes sense for us to invest the significant engineering effort to develop a solution that integrates with the IDE. On Linux, where there are so many different IDE-type solutions (and different versions of each), we have a different strategy. Instead of picking one Linux IDE that only a small subset of Linux developers use, we are defining low-level debugging and performance analysis APIs that tools ISVs can use to incorporate CUDA debugging and performance analysis into their existing solutions. Some examples include debuggers like Allinea DDT and TotalView, and performance analysis tools like TAU and Vampir.

Also, where there are simple things we can do, like ensuring cuda-gdb works well with Emacs and DDD, the support is already there.

Win the Ultimate PC Gaming Machine From Maingear

Posted: 15 Jul 2010 02:48 PM PDT

You've all heard by now that the scaling on our new 400 series graphics cards in SLI mode is pretty impressive. But you may not have had a chance to experience the kind of performance these cards can deliver firsthand. So, we're offering you the opportunity to do just that – in 3D - with a contest for a system of your very own. We've partnered with Maingear to create the ultimate system for PC gaming; it's a liquid cooled rig with 3 GTX 480s in SLI, with a 3D-ready monitor and the full NVIDIA 3D Vision kit. It just doesn't get any better than that.

The contest is open to anyone around the world. All you have to do to enter is create a short video showing how you use GeForce graphics cards to "CRANK THAT S#!T UP"! The contest is being hosted by Firing Squad, so check it out at www.firingsquad/crankitup and post your work of art soon for a shot at this monster of a system. Being a video pro isn't a necessity, creativity alone will take you far- and points will be awarded based on the community's opinion of your clip and a judge's panel. Take a look at our video, above,for inspiration. Good luck!

No comments:

Post a Comment