Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

Frog: Asynchronous Graph Processing on GPU
with Hybrid Coloring Model

Introduction

GPUs have been widely used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or the atomic operations, leading to significant penalties/overheads when implemented on GPUs. To this end, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices.

Frog is a light-weight asynchronous processing framework with a hybrid coloring model. It includes three parts: a hybrid graph coloring algorithm, an asynchronous execution model, and a streaming execution engine on GPUs. The first two parts are most critical in our design. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of real graph coloring cases.

Hybird Coloring Model

We find that majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution will separate the processing of the vertices based on the distribution of colors. We propose an efficient pre-partition method that can solve the problem of vertex classification using a moderate number of colors. This method does not strict all adjacent vertices to be assigned with different colors, which is particularly different from other graph coloring algorithms. Instead, we only ensure that there are no adjacent vertices assigned together into the small set of colors. For the vertices with the rest of colors, we combine them together into one color and process them in a super-step.

Asynchronous Execution Engine

After partitioning the graph, our execution engine scans the input graph color by color, updating all the vertices with the same color in parallel (one color as per kernel execution). Since the concurrently accessed vertices must be non-adjacent, there will be no need to lock the adjacent edges when updating a vertex. In our asynchronous approach, we use color-step to describe the process of updating all the vertices in a single partition. As we divide the graph into n partitions, n color-steps should be processed. For each color-step, we conduct data transfers and GPU kernel function executions at the same time under the support of CUDA streaming technology, to reduce the overhead of PCI-E data transfers and accelerate the whole task.

People

Hai Jin, Director
Xuanhua Shi, Manager
Sheng Di, Bingsheng He, Jianlong Zhong, Collaborator
Junling Liang, Peng Zhao, Xuan Luo, Lu Lu, Zhixiang Wang, Developer

Open Source

The Frog source codes are available at github.
If you are interested in Frog or have any questions, please contact the authors. You are also welcomed to commit your modification to the project's page in github.

Technical Report

Xuanhua Shi, Xuan Luo, Junling Liang, Peng Zhao, Sheng Di, Bingsheng He, and Hai Jin, "Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model", Technical Report, HUST-CGCL-TR-402, 2015

Publications

Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, and Jianlong Zhong, "Optimization of Asynchronous Graph Processing on GPU with Hybrid Coloring Model", ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'15), poster, Bay Area, California, USA, February, 2015.

Acknowleagement

We want to thank Xiaosong Ma of the NCSU, Zuoning Yin of the GraphSQL, Bing Bing Zhou of the USYD, and Pingpeng Yuan of the HUST for their valuable comments.

Related Projects

Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code.
CuSha: CUDA-based vertex-centric graph processing framework.
Totem: Accelerating Graph Processing on Hybrid CPU+GPU Systems.

This page has been visited

times since 12/30/2014.