Brief synopsis: Differences between Bayesian and Frequentist modeling

Having been trained extensively in both Bayesian and Frequentist methodologies, I aim to elucidate key distinctions between these approaches in a concise and precise blog.

First, Bayesian models are specified by a likelihood function and prior distributions for each parameter that will be estimated. In contrast, non-Bayesian models are specified by a population model (because it includes error terms) that represents the hypothesized relationship (e.g., linear, nonlinear) between input and output variables and a set of assumptions about the asymptotic properties of the variables. Hence, Bayesian models require the specification of the distribution of model parameters, whereas non-Bayesian models require the specification of the distribution of variables.

Second, Bayesian models are rooted in probability theory because of their requirement to specify the space of all possible outcomes and the assignment of a probability to each of these possible outcomes. The mathematical identity, Bayes’ Rule, is then used to combine the prior with the observed data to obtain the posterior. Hence, this approach leads to a probability distribution (i.e., posterior distribution) of estimated model parameters. Specifically, this distribution tells us how likely each of the parameter values is given the data we have observed and our a priori specification of the range of possible parameter values (i.e., prior distribution). Therefore, the posterior distribution captures both, prior knowledge and the information contained in the data (likelihood), appropriately weighted by the strength of evidence. In contrast, non-Bayesian models are not rooted in probability theory. Instead, these models rely on asymptotic properties (e.g., law of large numbers justifying an approximation by a normal distribution, central limit theorem) and assumption about the error terms (e.g., partial/strict exogeneity) to rationalize whether coefficients are unbiased and/or consistent.

Third, outcomes of Bayesian models are distributions of probabilities of each parameter value. These models therefore incorporate a direct quantification of uncertainty. In contrast, non-Bayesian models lead to point estimates (incl. standard deviations) and in so doing ignore any estimation uncertainty that may arise from the estimation procedure of these parameters itself.

Fourth, Bayesian models use Bayes’ theorem and update estimates of a parameter in light of new incoming evidence. In contrast, non-Bayesian models use asymptotic properties of estimates and focus on hypothesis testing. Inferences based on non-Bayesian models usually involve a so-called frequentist approach that allows one only to reject null hypotheses to gather evidence for an alternative hypothesis. In contrast, inference of Bayesian models can benefit from the fact that estimates are associated with probabilities that provide immediate evidence for a specific hypothesis. This also explains why Bayesian modeling approaches are not susceptible to type 1 errors.

Differences in computational techniques

Non-Bayesian models typically involve defining an optimization problem. Specifically, specifying a model (e.g., relationship between input and output variables) and a deviance function (e.g., residual sum of squares). One may then define different types of constraints that lead to different estimation techniques (e.g., OLS, ridge regression, LASSO). These are non-parametric approaches because they do not require a specification of the distribution of the model parameters. Maximum likelihood estimation (MLE) is considered a parametric approach because the distribution of the coefficients is specified. However, MLE itself still represents a non-Bayesian modeling approach. This is because prior distributions of coefficients are not explicitly specified. From a Bayesian perspective, MLE is a special case for which the mode as the maximum posterior estimation (MAP) coincides with the maximum likelihood estimation and for which one assumes a uniform prior distribution for the model parameters. However, the Bayesian approach considers that even though likelihoods permit relative comparisons between different parameter values, they are not suited for estimating absolute probabilities. In fact, the likelihood L(θ|y) is not a probability and it is not the reverse of P(y|θ) (which is why it is absolutely misleading to name the latter also a likelihood). Moreover, the likelihood function cannot be treated like a probability density because it does not integrate to 1. Therefore, Bayesian parameter estimation use Bayes’ Theorem,

P(θ|y) = P(y|θ)*P(θ) / P(y)

to obtain information about the probability distribution of the parameters.

Some of my favorite references (not an exhaustive list :)):

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.

Kruschke, J. K. (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews: Cognitive Science, 1(5), 658-676.

McElreath, R. (2018). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC.