I serve as the developer of two packages (iCAMP and NST) and two Galaxy-based pipelines, as well as a maintainer for three web-based pipelines.
R package iCAMP | GitHub site | iCAMP function in web-based pipeline
To implement a general framework to quantitatively infer Community Assembly Mechanisms by Phylogenetic-bin-based null model analysis, abbreviated as ‘iCAMP’ (Ning et al 2020 Nature Communications). It can quantitatively assess the relative importance of different community assembly processes, such as selection, dispersal, and drift, for both communities and each phylogenetic group (‘bin’). Each bin usually consists of different taxa from a family or an order. The package also provides functions to implement some other published methods, including neutral taxa percentage (Burns et al 2016 ISME J) based on neutral theory model (Sloan et al 2006 EM) and quantifying assembly processes based on entire-community null models (Stegen et al 2013 ISME J). It also includes some handy functions, particularly for big datasets, such as phylogenetic and taxonomic null model analysis at both community and bin levels, between-taxa niche difference and phylogenetic distance calculation, phylogenetic signal test within phylogenetic groups, midpoint root of big trees, etc.
The GitHub site includes latest version of the package, example data and code, as well as all code used in the iCAMP paper, which can be helpful for new users. The IEG Statistical Analysis Pipeline based on Galaxy platform also has a function to implement iCAMP with a tutorial, which can be easier for people not familiar with R language.
The package has been downloaded for 16,046 times from Sep 2020 to Mar 2023. I am the developer of this package under Prof. Jizhong Zhou’s supervision.
R package NST| GitHub site | NST functions in a web-based pipeline
To estimate ecological stochasticity in community assembly. Understanding the community assembly mechanisms controlling biodiversity patterns is a central issue in ecology. Although it is generally accepted that both deterministic and stochastic processes play important roles in community assembly, quantifying their relative importance is challenging. The new index, normalized stochasticity ratio (NST), is to estimate ecological stochasticity, i.e. relative importance of stochastic processes, in community assembly. With functions in this package, NST can be calculated based on different similarity metrics and/or different null model algorithms, as well as some previous indexes, e.g. previous Stochasticity Ratio (ST), Standard Effect Size (SES), modified Raup-Crick metrics (RC). Functions for permutational test and bootstrapping analysis are also included. Previous ST is published by Zhou et al (2014 PNAS). NST is modified from ST by considering two alternative situations and normalizing the index to range from 0 to 1 (Ning et al 2019 PNAS). A modified version, MST, is a special case of NST, used in some recent or upcoming publications, e.g. Liang et al (2019). SES is calculated as described in Kraft et al (2011, Science). RC is calculated as reported by Chase et al (2011, Ecosphere) and Stegen et al (2013, ISME J). Version 3 added NST based on phylogenetic beta diversity, used by Ning et al (2020, Nat. Commun.).
The GitHub site of NST includes latest version of the package, example data and code, which can be helpful for new users. The IEG Statistical Analysis Pipeline based on Galaxy platform also has a function to implement taxonomic and phylogenetic NST with a tutorial, which can be easier for people not familiar with R language. The package has been downloaded for 20,902 times from Jun 2019 to Mar 2023. I am the developer of this package under Prof. Jizhong Zhou’s supervision.
IEG Statistical Analysis Pipeline
To implement various statistical analyses of microbiome data, e.g. taxonomic/phylogenetic diversity metrics, dimension reduction, dissimilarity test, dispersion test, null model test, stochasticity estimation (e.g. normalized stochasticity ratio, NST), quantifying community assembly processes (e.g. the phylogenetic bin-based null model approach, iCAMP), etc. A total of 177 users have run 12,611 jobs from Sep 2019 to Mar 2023. I am the developer of this pipeline under Prof. Jizhong Zhou’s supervision.
IEG Data Management Pipeline
This pipeline is for GeoChip and Sequencing data management and processing, including microarray normalization, quality control, implementation of QIIME2 and USEARCH, and constrained phylogenetic tree construction on galaxy platform. It is mainly for internal use at OU now, with a total of 40 users from Jan 2020 to Mar 2023. I am the developer of this pipeline under Prof. Jizhong Zhou’s supervision.
Molecular Ecological Network Analysis pipeline (MENA)
This novel mathematical and bioinformatics framework was developed to construct ecological association networks, referred to as molecular ecological networks (MENs), through Random Matrix Theory (RMT)-based methods (Deng et al, BMC bioinformatics, 2012; Zhou, Deng et al, mBio, 2010; Zhou, Deng et al, mBio 2011 ). This approach allows automatic definition of the network and is robust to noise, thus providing an excellent solution to several common issues associated with high-throughput metagenomics data analysis. A total of 6,848 users have uploaded 105,865 datasets and construct to 101,117 networks from Mar 2011 to Mar 2023. Under Prof. Jizhong Zhou’s supervision, Dr. Ye Deng and Zhou Shi developed this network analysis tool. Dr. Naijia Xiao and I are serving as maintainers of this pipeline now.
Microarray Data Management Pipeline
This pipeline can be used to manage the data of microarrays, including GeoChip 2 (He et al 2007 ISME J), GeoChip 3 (He et al 2010 ISME J), GeoChip 4 (Tu et al 2014 Mol Ecol Resour), GeoChip 5 (Shi et al 2019 mSystems), and HumiChip (Tu et al 2014 PLoS One). The basic functions include raw data upload, data processing (normalization, quality filtering, designate signal cutoffs), and further statistical analyses. Other types of data can also be used in the pipeline (amplicon sequencing data). A total of 233 users have analyzed 114,564 sample data from 2007 to Dec 2020. Under Prof. Jizhong Zhou’s supervision, Dr. Ye Deng and Zhou Shi are previous developers of this pipeline. Dr. Naijia Xiao and I are maintainers of this pipeline now. I am developing some new methods to processing the data.
Amplicon Sequencing Data Analysis Pipeline
This pipeline is used to process amplicon sequencing data, including raw data upload, demultiplex, quality trim, pair end sequences combination, format conversion, OTU clustering, tree building, etc. It is useful tool in many of our studies, e.g. Zhou et al 2016 Nature Communications. A total of 127 users have used this pipeline from May 2018 to Jul 2022. Under Prof. Jizhong Zhou’s supervision, Dr. Yujia Qin developed this pipeline and Dr. Naijia Xiao and I are maintaining it. Currently, this pipeline is more for education rather than research.