High-Performance Computing
Winter Semester 2013

Kick-off meeting: October 22nd, 09:00—10:00, LC 140
Weekly lecture: Tuesdays, 14:00—15:00, LF 035
Instructors: Thomas Fogal, Jens Krüger
Office: LE 305
Phone: 203 379 1314
E-Mail: thomas.fogal at uni-due punkt de
Office hours: Tuesdays, 15:00—16:00 (directly after class), or by appointment (send an email).

News (newest first)

  1. [20.03.2014] Congratulations to Dominik, who took the gold for the fastest/most scalable program in the 'scalability' assignment!
  2. [23.01.2014] As already posted on moodle, the final projects will be due March 21st. Proposals are still due on January 30th.
  3. [06.01.2014, later] The final project assignment has been posted. We are still working out the exact timing information, but I wanted you to start thinking about it now.
  4. [06.01.2014] Assignment 4, scalability is now posted. This assignment is split into two parts, with the first due January 14th and the second due January 28th.
  5. [04.01.2014] Happy new year! I posted an annotated example of how you can use MPI_Allgatherv, if you've been having trouble with it. Stay tuned for the next assignments to appear here soon.
  6. [20.12.2013] Having trouble debugging your MPI-based simulation? Try learning a bit about the standard debugger on Linux and using it to identify where your segfault is happening.
  7. [17.12.2013] I posted a program to generate input files for your simulations that generates an arbitrary number of particles.
  8. [16.12.2013] I updated the tarball with in situ code to add new features, such as changing the camera location (try --help)
  9. [13.12.2013] I posted some example OpenMP programs, which I also went over in class.
  10. [11.12.2013] Assignment 3 is now up.
  11. [11.12.2013] Subtle bug in the previous 'constant T' run. Updated now; please re-download if you have already downloaded it.
  12. [10.12.2013] A tarball with output from a constant 'T' run is now available. The input file is in the tarball as "input.txt"; note there is also a sim binary which you can use to compare your simulation.
  13. [9.12.2013] The assignment submission is now created in moodle. Sorry it was not there earlier!
  14. [5.12.2013] I uploaded the broadcast and reduce examples that we went over in class.
  15. [25.11.2013] A tarball with code to produce visualizations in situ is now available. See moodle for a description on how to use it.
  16. [25.11.2013] Assignment 2 is now up.
  17. [18.11.2013] A bug was found in the code you were to compare against for Assignment 1. I also created example runs that output every timestep (for the first few), so you can more reasonably compare against them.
  18. [12.11.2013] A couple small updates were made to Assignment 1 to correct typos, and fix the bug in the output file format.
  19. [12.11.2013] We've updated the sample runs so that they can now be input verbatim into ParaView.
  20. [05.11.2013] Assignment 1's due date has been extended to 19.11.2013, as we have not gone over files yet in class.
  21. [05.11.2013] I have posted the absurd pointer example we went over in class today, along with my hand-drawn notes that show what's going on under the hood.
  22. [02.11.2013] Rainer has provided us with some sample inputs and outputs that you can use to debug your implementations of Assignment 1.
  23. [30.10.2013] Assignment 1 is now available.
  24. [30.10.2013] The C programs I am going over in class are now available from here.
  25. [22.10.2013] First lecture slides up, along with some notes. The notes should be helpful for assignment 1.
  26. [22.10.2013] Assignment 0 is now available.

Lecture Slides

  1. Introduction, N-Body Sim


  1. Login and run a cluster job.
  2. N-Body Simulation in Serial.
  3. MPI Parallelized N-Body Simulation.
  4. Scalability.
  5. Final projects.

Course Details

NOTE: This course will be given in English!

Course Description:

Implementation-heavy course focused on performance-critical code. Parallelism models; messaging-passing and shared-memory architectures. Modern technologies for parallelization; OpenMP, MPI, CUDA. I/O performance issues; parallel and distributed filesystems. Network technologies in clustered environments. Deep storage hierarchies and the memory wall.

This is an implementation-intensive course.

There are no explicit prerequisites, but it is recommended you have a basic background in operating systems. You may find Scientific Visualization to be useful, but we will discuss any needed topics from that course in an ad hoc manner in this course. Please talk to the instructor if you have any doubts about your readiness.

This course has the following objectives:

  • Review the theory behind shared-memory and message-passing models of parallelism.
  • Understand the implementation of modern parallel filesystems.
  • Provide a basis for understanding parallel program performance.
  • Improve software engineering skills through the completion of a significant software project.

If you plan to take the course, please register on the courses' moodle page. The registration code is just 'hpc' (all lowercase).

Assignments and Grading

The implementation part of this course will be evaluated by multiple large programming projects. You will implement a simple `N-Body' simulation as well as analysis tools to understand the data output by your simulation. Later, you will apply the knowledge you have gained in a custom program based on your interests. Your solutions should be written in C, C++, or Fortran 90+. (Exceptions to using these languages may be allowed, but you will need to negotiate them with me.)

We may include or additionally offer a crash course in C, if there is enough interest.

An N-Body simulation is a simulation of the movement of bodies that interact with one another. One application is (for example) computing planetary motion according to Newtonian gravity. The Figures below give 2 examples of such a system.

Image from the millenium run astrophysics simulation. sample visualization from GADGET output.
Two example visualizations from data output by the GADGET astrophysics simulation software.

Class time will be used to help guide you in the implementation of your N-Body simulation. The course will start out simple and progressively build more and more complex parallelism (and thus higher-performance!) into your simulation. In the end, you will have created your own highly-performant N-Body simulation that utilizes multi-scale parallelism.

The project is cumulative. You must live with your code for the whole semester. Solutions for phases will not be given out. Regression testing is critical.

Each project will be worth a major part of your grade. Exact point values will be discussed during the kick-off meeting. Assignments are due at midnight on the day they are due.

WARNING: All groups are expected to do their own work on the programming assignments (and exams for that matter). No cross-group collaboration is allowed. A general rule to follow is that you may discuss the programs with other groups at the concept level but never at the coding level. If you are at all unclear about this general rule, do not discuss the programs with other students at all.

Reading Materials

There are no required textbooks for this course. However, you may find Viktor Eijkhout's HPC book useful for more depth into the concepts we cover. Once you understand the basics of C, you may find Axel-Tobias Schreiner's treatise on how to do object-oriented programming in C to be enlightening.

Other resources you may find useful are the documentation for:

Computer Accounts

The CCSS Cray-XT6m HPC cluster (duecray.uni-due.de) will be the primary computing resource for this course. Account creation/registration will be worked out on the first day of class. Your program will be graded on the Cray and so you must test your program in that environment! However, we recommend you setup a local machine or VM to do most development on, as the Cray is a shared resource.

Imprint/Impressum Copyright 2014 by HPC Group - Building LE, Lotharstr. 65, 47057 Duisburg, Germany