Łukasz Gątarek earned his PhD in Bayesian econometrics from the Tinbergen Institute in the Netherlands. His research focuses on time series analysis, specialising in exploring machine learning concepts for sequential data analysis.
For more than a decade, Łukasz has been advising corporations and startups on statistical data modelling and machine learning. He’s been published in several prestigious journals, and currently serves as a data governance adviser for global companies that are market leaders in their field.
In this post, Łukasz outlines how finite normal mixtures can be used in a regression context. While the linear regression is usually considered not flexible enough to tackle the nonlinear data, Łukasz explains how to simulate a finite mixture model for regression using Markov chain MonteCarlo sampling:
The linear regression is usually considered not flexible enough to tackle the nonlinear data. From a theoretical viewpoint, it is not capable of dealing with them. However, we can make it work for us with any dataset by using finite normal mixtures in a regression model. This way it becomes a very powerful machine learning tool which can be applied to virtually any dataset, even highly non-normal with non-linear dependencies across the variables.
What makes this approach particularly interesting is its interpretability. Despite an extremely high level of flexibility, all the detected relations can be directly
In this post, we demonstrate how to simulate a finite mixture model for regression using Markov chain Monte Carlo (MCMC) sampling. We will generate data with multiple components (groups) and fit a mixture model to recover these components using Bayesian inference. This process involves regression models and mixture models, combining them with MCMC techniques for parameter estimation.
Data simulated as a mixture of three linear regressions
We begin by loading the necessary libraries to work with regression models, MCMC, and multivariate distributions
We simulate a dataset where each observation belongs to one of several groups (components of the mixture model), and the response variable is generated using a regression model with random coefficients.
We consider a general setup for a regression model using G Normal mixture components.
Each group is modelled using a univariate regression model, where the explanatory variables (X) and the response variable (y) are simulated from normal distributions. The betas represent the regression coefficients for each group, and sigmas represent the variance for each group.
In this model, we allow each mixture component to possess its own variance parameter and set of regression parameters.
We then simulate the group assignment of each observation using a random assignment and mix the data for all components.
We augment the model with a set of component label vectors for
and thus z_gi=1 implies that the i-th individual is drawn from the g-th component of the mixture.
This random assignment forms the z_original vector, representing the true group each observation belongs to.
We set prior distributions for the regression coefficients and variances. These priors
will guide our Bayesian estimation.
For the component indicators and component probabilities, we consider the following prior assignment
The multinomial prior M is the multivariate generalisation of the binomial, and the Dirichlet prior D is a multivariate generalisation of the beta distribution.
In this section, we initialise the MCMC process by setting up matrices to store the samples of the regression coefficients, variances, and mixing proportions.

If we condition on the values of the component indicator variables z, the conditional likelihood can be expressed as
In the MCMC sampling loop, we update the group assignments (z), regression coefficients (beta), and variances (sigma) based on the posterior distributions. The likelihood of each group assignment is calculated, and the group with the highest posterior probability is selected.
The following complete posterior conditionals can be obtained:
where denotes all the parameters in our posterior other than x.
and where n_g denotes the number of observations in the g-th component of the mixture.
and
The algorithm below draws from the series of posterior distributions above in a sequential order.

This block of code performs the key steps in MCMC:
Finally, we visualise the results of the MCMC sampling. We plot the posterior distributions for each regression coefficient, compare them to the true values, and plot the most likely group assignments.
This plot shows how the MCMC samples (posterior distribution) for the regression coefficients converge to the true values (betas).
Through this process, we demonstrated how finite normal mixtures can be used in a regression context, combined with MCMC for parameter estimation. By simulating data with known groupings and recovering the parameters through Bayesian inference, we can assess how well our model captures the underlying structure of the data.
Here is a link to the full code on GitHub.