Wednesday, June 11, 2014

RoseACC, open-source OpenACC compiler

It has almost been a decade since the introduction of CUDA. It marked the beginning of the area of accelerated computing. However the entry cost is high to take advantage of GPUs (and other accelerators), as programmers have to learn a new language and use a new programming paradigm.
It changed, again, with the introduction of OpenACC. Now, little modifications of an existing code can give access to the incredible computation capabilities of accelerators.

What is OpenACC?

OpenACC is a set of compiler directives to enable the offloading of regions of a C, C++, or Fortran application on an accelerator. The advantages of this approach have already been seen in OpenMP. For example, it enables the porting of existing applications without extensive rewriting of the code and, as it operates at a high level of abstraction, it is highly portable.
However, compiler support directives need to be implemented. The transformation from a high-level language to one closer to the hardware is a complex process where the compiler has to be smart.

What is RoseACC?

One of the issues of OpenACC is the absence of a free implementation (CAPS, PGI, and Cray have commercial OpenACC Compiler). RoseACC is free and open-source. It is based on the ROSE Compiler, an open-source, source-to-source compiler.
RoseACC is an academic project, the result of a collaboration between the University of Delaware and Lawrence Livermore National Laboratory. While creating an open-source implementation of OpenACC is rewarding by itself, it does not make for a great academic project. So, RoseACC goes further than simply implementing OpenACC. It is a tool to do research...

What kind of research is done with RoseACC?

  • RoseACC is part of a larger endeavor of the ROSE Compiler: designing reusable components to create DSLs (domain specific languages) and compiler directives. One of the goals of the ROSE Compiler is to be the tool of choice for researchers who want to create DSLs, new compiler directives, or experiment with existing compiler directives (such as OpenMP). RoseACC has been designed so that a large part of its code can be used for such purposes.
  • OpenACC is a recent initiative, RoseACC permits us to explore different extensions of the language. For example, most accelerators provide multiple dimensions for each level of parallelism (think CUDA 1D/2D/3D grids). RoseACC reflects this by providing multiple dimensions for both gang (coarse grain) and worker (fine grain) levels of parallelism. Another extension will address the fact that OpenACC cannot distribute one compute region between multiple devices.
  • While the current focus of RoseACC is on explicit parallelization (using the parallel directive), OpenACC provides the kernel directive for automated parallelization. The process of transforming a kernel region into a parallel region offers many opportunities for amazing research.
  • Finally, when lowering OpenACC to OpenCL (or any other lower level language) the compiler is presented with many choices. RoseACC can generate a variety of implementations of the original OpenACC application. It is a tuning problem which we intend to solve using machine learning techniques.
RoseACC is now close to its first release. However, a large portion of the code, including the runtime, is already available on GitHub.