Tools

I serve as the developer of two packages (iCAMP and NST) and a Galaxy-based pipeline, as well as a maintainer for three web-based pipelines.


iCAMP

R package iCAMP | GitHub site | iCAMP function in web-based pipeline

To implement a general framework to quantitatively infer Community Assembly Mechanisms by Phylogenetic-bin-based null model analysis, abbreviated as ‘iCAMP’ (Ning et al 2020 Nature Communications). It can quantitatively assess the relative importance of different community assembly processes, such as selection, dispersal, and drift, for both communities and each phylogenetic group (‘bin’). Each bin usually consists of different taxa from a family or an order. The package also provides functions to implement some other published methods, including neutral taxa percentage (Burns et al 2016 ISME J) based on neutral theory model (Sloan et al 2006 EM) and quantifying assembly processes based on entire-community null models (Stegen et al 2013 ISME J). It also includes some handy functions, particularly for big datasets, such as phylogenetic and taxonomic null model analysis at both community and bin levels, between-taxa niche difference and phylogenetic distance calculation, phylogenetic signal test within phylogenetic groups, midpoint root of big trees, etc.

The GitHub site includes latest version of the package, example data and code, as well as all code used in the iCAMP paper, which can be helpful for new users. The IEG Statistical Analysis Pipeline based on Galaxy platform also has a function to implement iCAMP with a tutorial, which can be easier for people not familiar with R language.

The package has been downloaded for 2053 times from Sep 2020 to Dec 2020. I am the developer of this package under Prof. Jizhong Zhou’s supervision.


NST

R package NST| GitHub site | NST functions in a web-based pipeline

To estimate ecological stochasticity in community assembly. Understanding the community assembly mechanisms controlling biodiversity patterns is a central issue in ecology. Although it is generally accepted that both deterministic and stochastic processes play important roles in community assembly, quantifying their relative importance is challenging. The new index, normalized stochasticity ratio (NST), is to estimate ecological stochasticity, i.e. relative importance of stochastic processes, in community assembly. With functions in this package, NST can be calculated based on different similarity metrics and/or different null model algorithms, as well as some previous indexes, e.g. previous Stochasticity Ratio (ST), Standard Effect Size (SES), modified Raup-Crick metrics (RC). Functions for permutational test and bootstrapping analysis are also included. Previous ST is published by Zhou et al (2014 PNAS). NST is modified from ST by considering two alternative situations and normalizing the index to range from 0 to 1 (Ning et al 2019 PNAS). A modified version, MST, is a special case of NST, used in some recent or upcoming publications, e.g. Liang et al (2019). SES is calculated as described in Kraft et al (2011, Science). RC is calculated as reported by Chase et al (2011, Ecosphere) and Stegen et al (2013, ISME J). Version 3 added NST based on phylogenetic beta diversity, used by Ning et al (2020, Nat. Commun.).

The GitHub site of NST includes latest version of the package, example data and code, which can be helpful for new users. The IEG Statistical Analysis Pipeline based on Galaxy platform also has a function to implement taxonomic and phylogenetic NST with a tutorial, which can be easier for people not familiar with R language. The package has been downloaded for 7608 times from Jun 2019 to Dec 2020. I am the developer of this package under Prof. Jizhong Zhou’s supervision.


IEG Statistical Analysis Pipeline

To implement various statistical analyses of microbiome data, e.g. taxonomic/phylogenetic diversity metrics, dimension reduction, dissimilarity test, dispersion test, null model test, stochasticity estimation (e.g. normalized stochasticity ratio, NST), quantifying community assembly processes (e.g. the phylogenetic bin-based null model approach, iCAMP), etc. A total of 98 users have run 4804 jobs from Sep 2019 to Dec 2020. I am the developer of this pipeline under Prof. Jizhong Zhou’s supervision.


Molecular Ecological Network Analysis pipeline (MENA)

This novel mathematical and bioinformatics framework was developed to construct ecological association networks, referred to as molecular ecological networks (MENs), through Random Matrix Theory (RMT)-based methods (Deng et al, BMC bioinformatics, 2012; Zhou, Deng et al, mBio, 2010; Zhou, Deng et al, mBio 2011 ). This approach allows automatic definition of the network and is robust to noise, thus providing an excellent solution to several common issues associated with high-throughput metagenomics data analysis. A total of 3,766 users have uploaded 51,621 datasets and construct to 51,652 networks from Mar 2011 to Dec 2020. Under Prof. Jizhong Zhou’s supervision, Dr. Ye Deng and Zhou Shi developed this network analysis tool. Dr. Naijia Xiao and I are serving as maintainers of this pipeline now.


Microarray Data Management Pipeline

This pipeline can be used to manage the data of microarrays, including GeoChip 2 (He et al 2007 ISME J), GeoChip 3 (He et al 2010 ISME J), GeoChip 4 (Tu et al 2014 Mol Ecol Resour), GeoChip 5 (Shi et al 2019 mSystems), and HumiChip (Tu et al 2014 PLoS One). The basic functions include raw data upload, data processing (normalization, quality filtering, designate signal cutoffs), and further statistical analyses. Other types of data can also be used in the pipeline (amplicon sequencing data). A total of 233 users have analyzed 114,564 sample data from 2007 to Dec 2020. Under Prof. Jizhong Zhou’s supervision, Dr. Ye Deng and Zhou Shi are previous developers of this pipeline. Dr. Naijia Xiao and I are maintainers of this pipeline now. I am developing some new methods to processing the data.


Amplicon Sequencing Data Analysis Pipeline

This pipeline is used to process amplicon sequencing data, including raw data upload, demultiplex, quality trim, pair end sequences combination, format conversion, OTU clustering, tree building, etc. It is useful tool in many of our studies, e.g. Zhou et al 2016 Nature Communications. A total of 119 users have used this pipeline from May 2018 to Dec 2020. Under Prof. Jizhong Zhou’s supervision, Dr. Yujia Qin developed this pipeline and Dr. Naijia Xiao and I are maintaining it now.

updated 2021.10.1