Introduction

Pairwise association measure is an important operation in searching for meaningful insights within a dataset by examining potentially interesting relationships between data variables of the dataset. In bioinformatics, one typical application is to mine gene co-expression relationship via gene expression data, which can be realized by query-based gene expression database search or gene co-expression network analysis. Pearson's product-moment correlation coefficient, Spearman's rank correlation coefficient, Kendall rank correlation coefficient, Distance correlation and Mutual information are widely used correlation/dependence measures. However, all-pairs pairwise correlation computation (PCC) is computationally demanding for large number of variables, exspecially when coupled with permutation tests for statistical inference, thus motivating our acceleration of its execution using high-performance computing.

LightPCC is the first parallel and distributed library for pairwise correlation/dependence computation on Intel Xeon Phi clusters. This library is written in C++ template classes, and achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Xeon Phis as well as accelerator-level parallelism among multiple Xeon Phis. To facilitate balanced workload distribution, we have proposed a generic framework for symmetric all-pairs computation by building provable bijective functions between job identifier and coordinate space for the first time. As of today, LightPCC has already implemented the following widely used correlation/dependence meansures: Pearson's product-moment correlation coefficient, Spearman's rank correlation coefficient, Kendall's tau correlation coefficient, Distance correlation and Mutual informaiton.We will keep updating actively in the future!

Downloads


Citation

Other related papers


Parameters

Currently, LightPCC already implemented Pearson's correlation coefficient, Spearman's rank correlation coefficient, Kendall's tau correlation coefficient, Distance correlation and Mutual information. These correlation/dependence measures are implemented as C++ template classes. This library has a non-MPI-based version: LightPCC and a MPI-based one: mpiLightPCC. For version 1.0.14 and higher, the input file format will be the same with the one used by ARACNE.

For benchmarking purposes, we have also implemented a subprogram for each correlation measure based on the corresponding templated class, which shares the same set of parameters as shown in the following table.

LightPCC

Usage:: LightPCC cmd [options] -m exe_mode

mpiLightPCC

Usage:: mpiLightPCC cmd [options] -m exe_mode


Installation and Usage

Prerequisites

  1. Intel C/C++ compiler or any other C/C++ compiler that supports Xeon Phi coprocessors.
  2. A C/C++ MPI library (e.g. OpenMPI, MPICH, Intel MPI) that is compiled by the aforementioned C/C++ compiler.

Input File Format

From version 1.0.14, our input file format will be the same with the one used by ARACNE.

Download and Compile

Before compiling, please modify the corresponding Makefile to point to the correct compilers and libraries.

  1. If the subdirctory "apps" exists, please enter the subdirctory "apps/lightpcc" to compile it.
  2. Otherwise, type command "make" to compile both the non-MPI-based version (named LightPCC) and MPI-based one (named mpiLightPCC).

Typical Usage


Important Notices


Change Log


Contact

If any questions or improvements, please feel free to contact Liu, Yongchao.