|
Introduction
GPUs have been widely used to accelerate graph processing for complicated computational problems regarding graph theory.
Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence.
Unfortunately, the consistent asynchronous computing requires locking or
the atomic operations, leading to significant penalties/overheads when implemented on GPUs. To this end,
coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the
consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from
low parallelism because of a large number of colors generally required for processing a large-scale graph
with billions of vertices.
Frog is a light-weight asynchronous processing framework with a hybrid coloring model. It includes three parts: a hybrid
graph coloring algorithm, an asynchronous execution model, and a streaming execution engine on GPUs. The first two
parts are most critical in our design. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring
algorithms as we observed through masses of real graph coloring cases.
We find that majority of vertices (about 80%) are colored with only a few colors, such that they can be read
and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly,
our solution will separate the processing of the vertices based on the distribution of colors. We propose an efficient
pre-partition method that can solve the problem of vertex classification using a moderate number of colors. This
method does not strict all adjacent vertices to be assigned with different colors, which is particularly different
from other graph coloring algorithms. Instead, we only ensure that there are no adjacent vertices assigned together
into the small set of colors. For the vertices with the rest of colors, we combine them together into one color and
process them in a super-step.
-
Asynchronous Execution Engine
After partitioning the
graph, our execution engine scans the input graph color by color, updating all the
vertices with the same color in parallel (one color as per kernel execution). Since
the concurrently accessed vertices must be non-adjacent, there will be no need to
lock the adjacent edges when updating a vertex. In our asynchronous approach, we use
color-step to describe the process of updating all the vertices in a single partition. As
we divide the graph into n partitions, n color-steps should be processed. For each color-step,
we conduct data transfers and GPU kernel function executions at the same time
under the support of CUDA streaming technology, to reduce the overhead of PCI-E
data transfers and accelerate the whole task.
People
- Hai Jin, Director
- Xuanhua Shi, Manager
- Sheng Di, Bingsheng He, Jianlong Zhong, Collaborator
- Junling Liang, Peng Zhao, Xuan Luo, Lu Lu, Zhixiang Wang, Developer
Open Source
The Frog source codes are available at github.
If you are interested in Frog or have any questions, please contact
the authors. You are also welcomed to commit your modification to the project's page in github.
Technical Report
Xuanhua Shi, Xuan Luo, Junling Liang, Peng Zhao, Sheng Di, Bingsheng He, and Hai Jin, "Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model", Technical Report, HUST-CGCL-TR-402, 2015
Publications
-
Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu,
Zhixiang Wang, Xuan Luo, and Jianlong Zhong, "Optimization of Asynchronous
Graph Processing on GPU with Hybrid Coloring Model", ACM
SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'15), poster, Bay Area, California, USA, February, 2015.
Acknowleagement
We want to thank Xiaosong Ma of the NCSU, Zuoning Yin of the GraphSQL,
Bing Bing Zhou of the USYD, and Pingpeng Yuan of the
HUST for their valuable comments.
Related Projects
Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code.
CuSha: CUDA-based vertex-centric graph processing framework.
Totem: Accelerating Graph Processing on Hybrid CPU+GPU Systems.
This page has been visited
times since 12/30/2014.