frog logo

Introduction
GPUs have been widely used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or the atomic operations, leading to significant penalties/overheads when implemented on GPUs. To this end, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices.
Frog is a light-weight asynchronous processing framework with a hybrid coloring model. It includes three parts: a hybrid graph coloring algorithm, an asynchronous execution model, and a streaming execution engine on GPUs. The first two parts are most critical in our design. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of real graph coloring cases.
frog system
We find that majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution will separate the processing of the vertices based on the distribution of colors. We propose an efficient pre-partition method that can solve the problem of vertex classification using a moderate number of colors. This method does not strict all adjacent vertices to be assigned with different colors, which is particularly different from other graph coloring algorithms. Instead, we only ensure that there are no adjacent vertices assigned together into the small set of colors. For the vertices with the rest of colors, we combine them together into one color and process them in a super-step.
After partitioning the graph, our execution engine scans the input graph color by color, updating all the vertices with the same color in parallel (one color as per kernel execution). Since the concurrently accessed vertices must be non-adjacent, there will be no need to lock the adjacent edges when updating a vertex. In our asynchronous approach, we use color-step to describe the process of updating all the vertices in a single partition. As we divide the graph into n partitions, n color-steps should be processed. For each color-step, we conduct data transfers and GPU kernel function executions at the same time under the support of CUDA streaming technology, to reduce the overhead of PCI-E data transfers and accelerate the whole task.

People

Open Source
The Frog source codes are available at github.
If you are interested in Frog or have any questions, please contact the authors. You are also welcomed to commit your modification to the project's page in github.

Technical Report

  • Xuanhua Shi, Xuan Luo, Junling Liang, Peng Zhao, Sheng Di, Bingsheng He, and Hai Jin, "Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model", Technical Report, HUST-CGCL-TR-402, 2015


  • Publications

    Acknowleagement
    We want to thank Xiaosong Ma of the NCSU, Zuoning Yin of the GraphSQL, Bing Bing Zhou of the USYD, and Pingpeng Yuan of the HUST for their valuable comments.

    Related Projects
    Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code.
    CuSha: CUDA-based vertex-centric graph processing framework.
    Totem: Accelerating Graph Processing on Hybrid CPU+GPU Systems.

    This page has been visited Hit Counter times since 12/30/2014.