You are here

CSAP Software

Parallel Runtimes

Several efficient parallel runtimes are availabe in source-code form, along with models able to predict execution time or system utilization. At the moment, we provide two runtimes, one for NUMA architectures and one for integrated CPU/GPU processors. The packages are provided as an artifact.

Dopia: Online Parallelism Management for Integrated CPU/GPU Architectures

Achieving maximal performance on modern architectures where CPU and GPU cores are co-located on the same processor die turns out to be a surprisingly difficult task. Depending on the characteristics of the parallel application and the underlying hardware, disabling a number of CPU and GPU cores can lead to a significanly higher performance than employing all avilable cores.
The Dopia framework provides an environment to automatically determine the best degree of parallelism (DoP) and execute OpenCL parallel applications on integrated archtiectures. The framework statically analyzes the OpenCL kernel code, extracts performance-relevant features and re-writes the kernel to make it malleable. A plug-in runtime then feeds the extracted performance features into a pre-trained machine-learned model to determine the best DoP and executes the OpenCL kernel on the CPU and GPU cores at the given DoP.
Click the [software] link below to download Dopia. The framework includes training data and pre-trained models for Intel Skylake and AMD Kaveri integrated architectures.
This work has been presented at PPoPP 2022 as follows:

  • Younghyun Cho, Jiyeon Park, Florian Negele, Changyeon Jo, Thomas R. Gross, and Bernhard Egger. "Dopia: Online Parallelism Management for Integrated CPU/GPU Architectures." In 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’22), April 2–6, 2022, Seoul, Republic of Korea.
    [pdf][bibtex][doi][software]

Maximizing system utilization for co-located applications

Modeling the underlying NUMA architecture using a sophisticated queuing model, this framework is able to improve turnaround time and throughput of co-located parallel applications by maximizing the overall system througput. Click on the [artifact] link to access the software.

  • Younghyun Cho, Camilo A.C. Guzman, and Bernhard Egger. "Maximizing System Utilization via Parallelism Management for Co-Located Parallel Applications." To appear in Proceedings of the the 2018 International Conference on Parallel Architectures and Compilation (PACT'18), Limassol, Cyprus, November 2018.
    [pdf][bibtex][doi][artifact]
  • Younghyun Cho, Surim Oh, and Bernhard Egger. "Online Scalability Characterization of Data-Parallel Programs on Many Cores." In Proceedings of the the 2016 International Conference on Parallel Architectures and Compilation (PACT'16), Haifa, Israel, September 2016.
    [pdf][bibtex][doi][artifact]

Maximizing throughput of OpenCL applications on integrated CPU/GPU architectures

Integrated CPU/GPU architectures (APUs) enable fast and efficient workload balancing on the cores of the CPU and the GPU. Our approach is completely online and does not require any offline performance characterization or prior application profiling.

  • Younghyun Cho, Florian Negele, Seohong Park, Bernhard Egger, and Thomas R. Gross. "On-The-Fly Workload Partitioning for Integrated CPU/GPU Architectures." To appear in Proceedings of the the 2018 International Conference on Parallel Architectures and Compilation (PACT'18), Limassol, Cyprus, November 2018.
    [pdf][bibtex][doi][artifact]

SnuMAP: profiling parallel architectures

SnuMAP is an open-source application- and system profiler for parallel architectures. SnuMAP provides detailed execution trace information and easy visualization for one or multiple concurrent parallel applications that are executed on a multi/many-core platform.

 

Virtualization

Efficient checkpointing

A patch for Xen to enable space-efficient VM checkpointing is available here. The corresponding research papers detailing the method are

  • Bernhard Egger, Eunbyung Park, Younghyun Cho, Changyeon Jo, and Jaejin Lee. "Efficient Checkpointing of Live Virtual Machines." In IEEE Transactions on Computers (TC), Volume 65, Issue 10, pp. 3041 - 3054, January 2016.
    [pdf][bibtex][doi]
  • Eunbyung Park, Bernhard Egger, and Jaejin Lee. "Fast and space efficient virtual machine checkpointing." In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'11), Newport Beach, USA, March 2011.
    [pdf][bibtex][doi]

Live migration modeling

We have developed an automatic, machine-learned model to predict several key metrics of live migration. To train the model, we have gathered around 50'000 data points of live migrations running different benchmarks. The data set and the machine learning model are available here, the details of our method are described in

  • Changyeon Jo, Youngsu Cho, and Bernhard Egger. "A Machine Learning Approach to Live Migration Modeling." In Proceedings of the 2017 ACM Symposium on Cloud Computing (SoCC'17), Santa Clara, USA, September 2017.
    [pdf][bibtex][doi]
  • Changyeon Jo, Changmin Ahn, and Bernhard Egger. "A Machine Learning-based Approach to Live Migration Modeling." Presented at the 4th International Workshop on Efficient Data Center Systems (EDCS'16), Seoul, Korea, June 2016.
    [pdf][bibtex]