Xianyang Zhang - Software

AutoStat (Beta version) 2025
AutoStat is an R package that uses Large Language Models (LLMs) to assist with statistical analysis workflows. It generates analysis plans, R code, and reports based on your data and research questions.

GitHub

R Package Analytics 2025
A simple web app to analyze R package download statistics from CRAN.

Live App

GitHub

Crypto Portfolio Optimization 2025
A web application for cryptocurrency portfolio optimization and analysis.

Live App

GitHub

KDist (Beta version) 2025
KDist provides a comprehensive collection of kernel and distance-based methods for nonparametric statistical inference. These powerful methods excel in scenarios where traditional parametric approaches may fail, particularly with complex, high-dimensional data.

GitHub

mLLMCelltype 2025
mLLMCelltype is an iterative multi-LLM consensus framework for cell type annotation in single-cell RNA sequencing data. By leveraging the complementary strengths of multiple large language models (OpenAI GPT-4o/4.1, Anthropic Claude-3.7/3.5, Google Gemini-2.0, X.AI Grok-3, DeepSeek-V3, Alibaba Qwen2.5, Zhipu GLM-4, MiniMax, Stepfun, and OpenRouter), this framework significantly improves annotation accuracy while providing transparent uncertainty quantification.

Live App

GitHub

Paper

fastcpd (Beta version) 2023
fastcpd implements an algorithm based on the sequential gradient descent and quasi-Newton's method for change-point analysis. It can be applied to change-point detection in linear models, generalized linear models, robust regression, penalized regression, autoregressive models, etc.

Live App

GitHub

Paper

MicrobiomeStat: Statistical Methods for Microbiome Compositional Data 2022
A suite of methods for powerful and robust microbiome data analysis addressing zero-inflation, phylogenetic structure and compositional effects (Zhou et al., 2021). The methods can be applied to the analysis of other (high-dimensional) compositional data arising from sequencing experiments.

CRAN

GitHub

Documentation

Paper

MicrobiomeStat.wiki

TDFDR: Two-dimensional false discovery rate control for powerful confounder adjustment in omics association studies 2021
The package implements the two-dimensional false discovery rate control for powerful confounder adjustment in omics association analysis. The method is based on the idea that the confounder(s) usually affect part of the omics features, and thus adjusting the confounder(s) for ALL omics features will be over-adjustment, leading to reduced statistical power. The proposed procedure starts with performing the unadjusted analysis (first dimension - filtering) to narrow down the list of omics features that are more likely to be affected by either the confounder or the variable of interest or both. In the second dimension, we conduct confounder-adjusted analysis on these 'top' candidates, which are enriched in signals, to reduce multiple testing burden and increase the power. The method belongs to the general topic of using auxiliary data to increase the power of multiple testing, which has recently received tremendous research interest. In our case, the auxiliary data are the the unadjusted statistics, which could inform the probability of the null hypotheses being true. The difficulty here is to take into account the correlation between the auxiliary data (unadjusted statistics) and the main data (adjusted statistics). We provide a procedure that is theoretically guaranteed to control the false discovery rate while maximizing the power.

GitHub

Paper

OrderShapeEM 2021
OrderShapeEM implements the optimal false discovery rate (FDR) control procedure with auxiliary information, particularly for prior ordering information. The framework is based on local FDR with hypothesis-specific null probability. The prior null proabilities are estimated using isotonic regression (PAVA algorithm) with respect to the prior ordering information. The inputs of our OrderShapeEM are simply P-values and their prior ordering.

GitHub

Paper

CAMT 2020
The CAMT package implements two covariate adaptive multiple testing procedures (FDR and FWER) described in Covariate Adaptive False Discovery Rate Control with Applications to Omics-Wide Multiple Testing and Covariate Adaptive Family-wise Error Control with Applications to Genome-wide Association Studies. CAMT allows the prior null probability and/or the alternative distribution to depend on covariates. It is robust to model mis-specification and is computationally efficient. The package also contains functions for testing the informativeness of the covariates for multiple testing, and a comprehensive simulation function, which covers a wide range of settings.

GitHub

Paper on FDR

Paper on FWER

GUniFrac: Generalized UniFrac Distances, Distance-Based Multivariate Methods and Feature-Based Univariate Methods for Microbiome Data Analysis 2021
A suite of methods for powerful and robust microbiome data analysis including data normalization, data simulation, community-level association testing and differential abundance analysis. It implements generalized UniFrac distances, Geometric Mean of Pairwise Ratios (GMPR) normalization, semiparametric data simulator, distance-based statistical methods, and feature-based statistical methods. The distance-based statistical methods include three extensions of PERMANOVA: (1) PERMANOVA using the Freedman-Lane permutation scheme, (2) PERMANOVA omnibus test using multiple matrices, and (3) analytical approach to approximating PERMANOVA p-value. Feature-based statistical methods include linear model-based methods for differential abundance analysis of zero-inflated high-dimensional compositional data.

CRAN

SILM: Simultaneous Inference for Linear Models 2019
Simultaneous inference procedures for high-dimensional linear models as described by Zhang and Cheng (2017).

CRAN

Paper