Copula based learning for directed acyclic graphs

Russul Mohsin; Vahid Rezaei Tabar

doi:10.19139/soic-2310-5070-1634

Rusul Mohsin Moharib Alsarray Department of statistics, Faculty of Statistics, Matematics & Computer, Allameh Tabataba’i University, Tehran, Iran.
Vahid Rezaei Tabar Assistant Professor, Department of Statistics, Allameh Tabataba'i University, Tehran, Iran.

DOI: https://doi.org/10.19139/soic-2310-5070-1634

Keywords: continuous models, copula functions, directed acyclic graph, direction selection, structure function

Abstract

We provide the learning of a DAG model arising from high dimensional random variables following both normal and non-normal assumptions. To this end, the copula function utilized connecting dependent variables. Moreover to normal copula, the three most applicable copulas have been investigated modeling all three dependence structures negative, positive, and weak kinds. The copula functions, FGM, Clayton, and Gumbel are considered coving these situations and their detailed calculations are also presented. In addition, the structure function has been exactly determined due to choosing a good copula model based on statistical software R with respect to any assumed direction among all nodes. The direction with the maximum structure function has been preferred. The corresponding algorithms finding these directions and the maximization procedures are also provided. Finally, some extensive tabulations and simulation studies are provided, and in the following to have a clear thought of provided strategies, a real world application has been analyzed.

References

Bryon Aragam and Qing Zhou. Concave penalized estimation of sparse gaussian bayesian networks. The Journal of Machine Learning Research, 16(1):2273-2328, 2015.

Bryon Aragam, Arash A Amini, and Qing Zhou. Learning directed acyclic graphs with penalized neighborhood regression. arXiv preprint arXiv:1511.08963, 2015.

Xuan Cao, Kshitij Khare, and Malay Ghosh. Posterior graph selection and estimation consistency for high-dimensional bayesian dag models. The Annals of Statistics, 47(1):319-348, 2019.

Gregory F Cooper and Edward Herskovits. A Bayesian method for constructing Bayesian belief networks from databases. In Uncertainty Proceedings, 1991, pages 86-94. Elsevier, 1991.

Irene Cordoba, Concha Bielza, and Pedro Larra~naga. A review of Gaussian Markov models for conditional independence. Journal of Statistical Planning and Inference, 206:127-144, 2020.

Dan Geiger and David Heckerman. Learning Gaussian networks. In Uncertainty Proceedings, 1994, pages 235-243. Elsevier, 1994.

Paolo Giudici and Robert Castelo. Improving Markov chain monte Carlo model search for data mining. Machine learning, 50(1):127-158, 2003.

Robert Goudie and Sach Mukherjee. A Gibbs sampler for learning dags. 2016.

David Heckerman and Dan Geiger. Learning bayesian networks: a uni -cation for discrete and gaussian domains. arXiv preprint arXiv:1302.4957, 2013.

David Heckerman, Dan Geiger, and David M Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3):197-243, 1995.

Gilles Kratzer and Reinhard Furrer. mcmcabn: a structural mcmc sampler for dags learned from observed systemic datasets. R package version, (3), 2019.

Teppo Niinimaki, Pekka Parviainen, and Mikko Koivisto. Partial order mcmc for structure discovery in Bayesian networks. arXiv preprint arXiv:1202.3753, 2012.

Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International journal of data science and analytics, 3(2):121-129, 2017.

Vahid Rezaei Tabar, Hamid Zareifard, Selva Salimi, and Dariusz Plewczynski. An empirical Bayes approach for learning directed acyclic graph using mcmc algorithm. Statistical Analysis and Data Mining: The ASA Data Science Journal, 12(5):394-403, 2019.

David J Spiegelhalter, A Philip Dawid, Steven L Lauritzen, and Robert G Cowell. Bayesian analysis in expert systems. Statistical science, pages 219-247, 1993.

Vahid Rezaei Tabar, Farzad Eskandari, Selva Salimi, and Hamid Zareifard. Finding a set of candidate parents using dependency criterion for the k2 algorithm. Pattern Recognition Letters, 111:23-29, 2018.

R Core Team et al. R: A language and environment for statistical computing. 2013.

Sewall Wright. Principles of livestock breeding. Number 905. US Department of Agriculture, 1920.

Sewall Wright. The relative importance of heredity and environment in determining the piebald pattern of guineapigs. Proceedings of the National Academy of Sciences of the United States of America, 6(6):320, 1920.

Hamid Zareifard, Vahid Rezaei Tabar, and Dariusz Plewczynski. A gibbs sampler for learning dag: a uni cation for discrete and gaussian domains. Journal of Statistical Computation and Simulation, pages 1-21, 2021.