« 64 GPUs, $100, and a Dream: Practical GPU on EC2 Experience | Main | CondorWeek 2011 T-Shirts »

March 16, 2011

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e0097e0bd588330147e33f1409970b

Listed below are links to weblogs that reference Lessons learned building a 4096-core Cloud HPC Supercomputer for $418/hr:

Comments

thanks

Have you tried the same experiment using the cluster compute nodes at AWS? They cost more, but would be most interesting to see whether large clusters of them can be fired up on demand. The CPUs are faster, but the more hpc style interconnect would also be vital for many hpc applications.

Cycle

Yes, we have. We had an earlier blog post where we talked about a Cluster Compute cluster that used the GPU nodes:
http://blog.cyclecomputing.com/2011/02/practical-gpu-on-ec2-experience.html

The workload that ran on this was a molecular dynamics application that took advantage of the GPUs, fast CPUs, and better 10Gbps inter-server/filer bandwidth. Give it a read and let us know when you think?

Rob

I'm curious why you would be interested in spreading the instances out across different availability zones. Given HPC jobs are generally network sensitive, having all of the nodes in a single zone (or as tightly packed as possible) would appear to be a strenght rather than a weakness.

Mark Hahn

it's not an HPC cluster without a fast, low-latency network. what performance (latency, bandwidth and message rate) did you notice in this experiment, especially considering it included nodes at multiple sites?

even if you believe that throughput on non-communicating serial jobs is HPC, the same basic question applies to IO: what kind of performance did you achieve to your filer. (also, how reasonable is it to have a single 1TB server for 4k cores?)

no insult intended, but to me this looks like a very cool way to get a high rank on foo@home...

Cycle

@Rob, So HPC is generally composed of two different types of applications, those that are low-latency and require fast interconnects to properly function, versus those that are pleasantly parallel and more concerned with throughput. An increasing number of applications and newer architectures focus less on low-latency interconnects, and more on throughput.

In this case, the application is pleasantly parallel and had a very high compute to data ratio (hours/days of compute for comparatively small amounts of data), so we had the option of spanning availability zones.

Cycle

@MarkHahn so let's talk about your points:

> it's not an HPC cluster without a fast, low-latency network.
#40 on the Top 500 supercomputing list uses Gigabit Ethernet, as do number of the other Top 500. There were moderate data needs and we do monitor I/O but the reason it wasn't in the blog is because network I/O wasn't a bottle neck for these workloads (which is probably the case for the other Gigabit Top 500 Supercomputers)

>what kind of performance did you achieve to your filer
This cluster ran a production workflow for a Fortune 500 client that had a high compute to data ratio with intermediate data stored on the executing node. So there really isn't much to say about the filer in this case. With regard to filers, while single node filers are common in EC2, they aren't the only option. We will talk about this more in future work.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.