Research Interests

The primary objective of my research is to develop new statistical theories and methodologies for large-scale and high-dimensional data with complex structures. My current research has focused on deep learning, high-dimensional/large-scale statistical inference, kernel and distance-based methods, and genomics. My research is supported by NIH, NSF, and local grants from Texas A&M.

Selected Publications and Preprints

2025

Yang, C., Zhang, X., & Chen, J. (2025). Large language model consensus substantially improves the cell type annotation accuracy for scRNA-seq data. bioRxiv. bioRxiv GitHub: mLLMCelltype

Zhou, H., Chen, J., & Zhang, X. (2025). BMDD: A Probabilistic Framework for Accurate Imputation of Zero-inflated Microbiome Sequencing Data. bioRxiv. bioRxiv GitHub

Li, X., & Zhang, X. (2025). fastcpd: Fast change point detection in R. Journal of Statistical Software. arXiv R: fastcpd

Zhang, X., & Zhou, H. (2025). Generalization bounds and model complexity for Kolmogorov-Arnold Networks. ICLR. arXiv

Li, X., Li, G., & Zhang, X. (2025). A likelihood based approach for watermark detection. AISTATS.

2024

Roy, A., Zhou, H., Zhao, N., & Zhang, X. (2024). Subsampling-based tests in mediation analysis. arXiv. arXiv

Yan, J., Li, Z., & Zhang, X. (2024). Distance and kernel-based measures for global and local two-sample conditional distribution testing. arXiv. arXiv R: KDist

Li, X., Li, G., & Zhang, X. (2024). Segmenting watermarked texts from language models. NeurIPS. arXiv GitHub

Li, G., & Zhang, X. (2024). A note on e-values and multiple testing. Biometrika. PDF

Deng, L., He, K., & Zhang, X. (2024). Joint mirror procedure: Controlling false discovery rate for identifying simultaneous signals. Biometrics. arXiv GitHub

Yang, L., Zhang, X., & Chen, J. (2024). Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples. Genome Biology. Journal

Deng, L., Tang, Y., Zhang, X., & Chen, J. (2024). Structure-adaptive canonical correlation analysis for microbiome multi-omics data. Frontiers in Genetics. Journal

Crafts, E. S., Zhang, X., & Zhao, B. (2024). Bayesian Cramér–Rao bound estimation with score-based models. IEEE Transactions on Information Theory. IEEE Xplore arXiv

Deng, L., He, K., & Zhang, X. (2024). Powerful spatial multiple testing via borrowing neighboring information. Statistica Sinica. arXiv GitHub

Garg, J., Zhang, X., & Zhou, Q. (2024). Soft-constrained Schrödinger bridge: A stochastic-control approach. AISTATS. arXiv Code

2023

Li, G., & Zhang, X. (2023). E-values, multiple testing and beyond. arXiv. arXiv

Pramanik, S., & Zhang, X. (2023). Structure adaptive elastic-net. arXiv. arXiv GitHub

Ye, H., Zhang, X., Wang, C., Goode, E., & Chen, J. (2023). Batch effect correction with re-measured samples in completely confounded case-control studies. Nature Computational Science. arXiv GitHub Shiny app

Roy, A., Chen, J., & Zhang, X. (2023). A general framework for powerful confounder adjustment in omics association studies. Bioinformatics. Journal GitHub

Xia, Q., & Zhang, X. (2023). Adaptive testing for alphas in high-dimensional factor pricing models. Journal of Business & Economic Statistics. PDF

Zhang, X., & Dawn, T. (2023). Sequential gradient descent and quasi-Newton's method for change-point analysis. AISTATS. PMLR R: fastcpd

Lou, Z., Zhang, X., & Wu, W. (2023). High dimensional analysis of variance in multivariate linear regression. Biometrika, 110, 777–797. arXiv

Yan, J., & Zhang, X. (2023). Kernel two-sample tests in high dimension: Interplay between moment discrepancy and dimension-and-sample orders. Biometrika, 110, 411–430. arXiv

2022

Zhang, X., Zhou, H., & Ye, H. (2022). A modern theory for high-dimensional Cox regression models. arXiv. arXiv

Cirkovic, D., Wang, T., & Zhang, X. (2022). Likelihood-based changepoint detection in preferential attachment networks. arXiv. arXiv

Zhou, H., He, K., Chen, J., & Zhang, X. (2022). LinDA: Linear models for differential abundance analysis of microbiome compositional data. Genome Biology, 23, 95. Journal R: MicrobiomeStat GitHub

Cao, H., Chen, J., & Zhang, X. (2022). Optimal false discovery rate control for large scale multiple testing with auxiliary information. Annals of Statistics, 50(2), 807–857. PDF R: OrderShapeEM

Zhang, X., & Chen, J. (2022). Covariate adaptive false discovery rate control with applications to omics-wide multiple testing. Journal of the American Statistical Association, 117, 411-427. Journal R: CAMT

Yun, S., Zhang, X., & Li, B. (2022). Detection of local differences in spatial characteristics between two spatiotemporal random fields. Journal of the American Statistical Association, 117, 291-306. Journal

Chen, J., & Zhang, X. (2022). D-MANOVA: fast distance-based multivariate analysis of variance for large-scale microbiome association studies. Bioinformatics, 38(1), 286-288. Journal R: GUniFrac

Yi, S., & Zhang, X. (2022). Projection-based inference for high-dimensional linear models. Statistica Sinica, 32, 1-23. PDF

2021

Yi, S., Zhang, X., Yang, L., Huang, J., Liu, Y., Wang, C., Schaid, D. J., & Chen, J. (2021). 2dFDR: a new approach to confounder adjustment substantially increases detection power in omics association studies. Genome Biology, 22, 208. Journal Supplement R: tdfdr

Zhou, H., Zhang, X., & Chen, J. (2021). Covariate adaptive family-wise error rate control for genome-wide association studies. Biometrika, 108, 915–931. Journal R: CAMT

Chakraborty, S., & Zhang, X. (2021). High-dimensional change-point detection using generalized homogeneity metrics. arXiv. arXiv R: KDist

Chakraborty, S., & Zhang, X. (2021). A new framework for distance and kernel-based metrics in high dimensions. Electronic Journal of Statistics, 15, 5455-5522. PDF Slides R: KDist

2020

Huang, J., Bai, L., Cui, B., Wu, L., Wang, L., An, Z., Ruan, S., Yu, Y., Zhang, X., & Chen, J. (2020). Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing. Genome Biology, 21, 88. Journal

Zhu, C., Zhang, X., Yao, S., & Shao, X. (2020). Distance-based and RKHS-based dependence metrics in high dimension. The Annals of Statistics, 48(6), 3366-3394. Journal R: KDist

Lee, C. E., Zhang, X., & Shao, X. (2020). Testing conditional mean independence for functional data. Biometrika, 107(2), 331-346. Journal R: KDist Supplement

2019

Chakraborty, S., & Zhang, X. (2019). Distance metrics for measuring joint dependence with application to causal inference. Journal of the American Statistical Association, 114(528), 1638-1650. Journal R: KDist

2018

Zhang, X., & Cheng, G. (2018). Gaussian approximation for high dimensional vector under physical dependence. Bernoulli, 24(4A), 2640-2675. Journal Slides

Yao, S., Zhang, X., & Shao, X. (2018). Testing mutual independence in high dimension via distance covariance. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3), 455-480. Journal R: KDist Supplement

Zhang, X., Yao, S., & Shao, X. (2018). Conditional mean and quantile dependence testing in high dimension. The Annals of Statistics, 46(1), 219-246. Journal R: KDist Supplement

2017

Zhang, X., & Cheng, G. (2017). Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association, 112(518), 757-768. Journal Supplement R: SILM

Zhang, X., & Bhattacharya, A. (2017). Empirical Bayes, SURE and sparse normal mean models. arXiv. PDF

Zhang, X. (2017). Testing high dimensional mean under sparsity. arXiv. PDF

2016

Zhang, X. (2016). White noise testing and model diagnostic checking for functional time series. Journal of Econometrics, 194(1), 76-95. Journal R Codes

Zhang, X. (2016). Fixed-smoothing asymptotics in the generalized empirical likelihood estimation framework. Journal of Econometrics, 193(1), 123-146. Journal

Zhang, X., & Shao, X. (2016). On the coverage bound problem of empirical likelihood methods for time series. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(2), 395-421. Journal R Codes

2015

Zhang, X., & Shao, X. (2015). Two sample inference for the second-order property of temporally dependent functional data. Bernoulli, 21(2), 909-929. Journal

2014

Zhang, X., & Shao, X. (2014). Fixed-b asymptotics for blockwise empirical likelihood. Statistica Sinica, 24(3), 1179-1194. Journal

Zhang, X., & Cheng, G. (2014). Bootstrapping high dimensional time series. arXiv. arXiv

Zhang, X., Li, B., & Shao, X. (2014). Self‐normalization for spatial data. Scandinavian Journal of Statistics, 41(2), 311-324. Journal

2013

Zhang, X., & Shao, X. (2013). Fixed-smoothing asymptotics for time series. The Annals of Statistics, 41(3), 1329-1349. Journal

2011

Zhang, X., Shao, X., Hayhoe, K., & Wuebbles, D. J. (2011). Testing the structural stability of temporally dependent functional observations and application to climate projections. Electronic Journal of Statistics, 5, 1765-1796. Journal

2010

Shao, X., & Zhang, X. (2010). Testing for change points in time series. Journal of the American Statistical Association, 105(491), 1228-1240. PDF R Codes