Copula based learning for directed acyclic graphs
Abstract
We provide the learning of a DAG model arising from high dimensional random variables following both normal and non-normal assumptions. To this end, the copula function utilized connecting dependent variables. Moreover to normal copula, the three most applicable copulas have been investigated modeling all three dependence structures negative, positive, and weak kinds. The copula functions, FGM, Clayton, and Gumbel are considered coving these situations and their detailed calculations are also presented. In addition, the structure function has been exactly determined due to choosing a good copula model based on statistical software R with respect to any assumed direction among all nodes. The direction with the maximum structure function has been preferred. The corresponding algorithms finding these directions and the maximization procedures are also provided. Finally, some extensive tabulations and simulation studies are provided, and in the following to have a clear thought of provided strategies, a real world application has been analyzed.References
Bryon Aragam and Qing Zhou. Concave penalized estimation of sparse gaussian bayesian networks. The Journal of Machine Learning Research, 16(1):2273-2328, 2015.
Bryon Aragam, Arash A Amini, and Qing Zhou. Learning directed acyclic graphs with penalized neighborhood regression. arXiv preprint arXiv:1511.08963, 2015.
Xuan Cao, Kshitij Khare, and Malay Ghosh. Posterior graph selection and estimation consistency for high-dimensional bayesian dag models. The Annals of Statistics, 47(1):319-348, 2019.
Gregory F Cooper and Edward Herskovits. A Bayesian method for constructing Bayesian belief networks from databases. In Uncertainty Proceedings, 1991, pages 86-94. Elsevier, 1991.
Irene Cordoba, Concha Bielza, and Pedro Larra~naga. A review of Gaussian Markov models for conditional independence. Journal of Statistical Planning and Inference, 206:127-144, 2020.
Dan Geiger and David Heckerman. Learning Gaussian networks. In Uncertainty Proceedings, 1994, pages 235-243. Elsevier, 1994.
Paolo Giudici and Robert Castelo. Improving Markov chain monte Carlo model search for data mining. Machine learning, 50(1):127-158, 2003.
Robert Goudie and Sach Mukherjee. A Gibbs sampler for learning dags. 2016.
David Heckerman and Dan Geiger. Learning bayesian networks: a uni -cation for discrete and gaussian domains. arXiv preprint arXiv:1302.4957, 2013.
David Heckerman, Dan Geiger, and David M Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3):197-243, 1995.
Gilles Kratzer and Reinhard Furrer. mcmcabn: a structural mcmc sampler for dags learned from observed systemic datasets. R package version, (3), 2019.
Teppo Niinimaki, Pekka Parviainen, and Mikko Koivisto. Partial order mcmc for structure discovery in Bayesian networks. arXiv preprint arXiv:1202.3753, 2012.
Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International journal of data science and analytics, 3(2):121-129, 2017.
Vahid Rezaei Tabar, Hamid Zareifard, Selva Salimi, and Dariusz Plewczynski. An empirical Bayes approach for learning directed acyclic graph using mcmc algorithm. Statistical Analysis and Data Mining: The ASA Data Science Journal, 12(5):394-403, 2019.
David J Spiegelhalter, A Philip Dawid, Steven L Lauritzen, and Robert G Cowell. Bayesian analysis in expert systems. Statistical science, pages 219-247, 1993.
Vahid Rezaei Tabar, Farzad Eskandari, Selva Salimi, and Hamid Zareifard. Finding a set of candidate parents using dependency criterion for the k2 algorithm. Pattern Recognition Letters, 111:23-29, 2018.
R Core Team et al. R: A language and environment for statistical computing. 2013.
Sewall Wright. Principles of livestock breeding. Number 905. US Department of Agriculture, 1920.
Sewall Wright. The relative importance of heredity and environment in determining the piebald pattern of guineapigs. Proceedings of the National Academy of Sciences of the United States of America, 6(6):320, 1920.
Hamid Zareifard, Vahid Rezaei Tabar, and Dariusz Plewczynski. A gibbs sampler for learning dag: a uni cation for discrete and gaussian domains. Journal of Statistical Computation and Simulation, pages 1-21, 2021.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).