The BayesDB project aims to enable BayesDB users to query the probable implications of their data as easily as querying the data itself.
Why is this important? Stakeholders in business, humanitarian work, science, and government are increasingly recognizing the importance of making statistical inferences from their data. Existing approaches to this problem require experts in statistical modeling or data scientists proficient in applied machine learning. These skills are projected to be in short supply as the importance of statistical inference is increasingly recognized across a variety of fields. Also, developers new to machine learning may be stymied by the maze that is the current machine learning toolkit. This toolkit can come up short in settings that don’t match canonical machine learning problems.
BayesDB aims to address some of these issues with three core capabilities:
First, the Bayesian query language, or BQL, supports flexible data analysis queries while abstracting away the choice of model. Second, this BQL abstraction is enabled by general purpose default models that are capable of handling arbitrary tabular data. Finally, BayesDB is extensible by supporting integration of external statistical models and domain knowledge, when it’s available.
Probabilistic search for structured data via probabilistic programming and nonparametric Bayes. Saad, F.; Casarsa, L.; and Mansinghka, V. arXiv preprint, arXiv:1704.01087. 2017. PDF
Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes. Saad, F.; and Mansinghka, V. In Artificial Intelligence and Statistics (AISTATS). 2017. PDF
A Probabilistic Programming Approach To Probabilistic Data Analysis. Saad, F.; and Mansinghka, V. In Advances in Neural Information Processing Systems (NIPS). 2016. PDF
Probabilistic data analysis with probabilistic programming. Saad, F.; and Mansinghka, V. arXiv preprint, arXiv:1608.05347. 2016. PDF
BayesDB: A probabilistic programming system for querying the probable implications of data. Mansinghka, V.; Tibbetts, R.; Baxter, J.; Shafto, P.; and Eaves, B. arXiv preprint, arXiv:1512.05006. 2015. PDF