A Sketch of Proofs for Some Properties of Multivariate Gaussian Distribution
Introduction
In the previous blog post, we discussed the extension from univariate to multivariate Gaussian distributions and examined the transformation in their functional form. Moving from a single variable to an N-dimensional space introduces a rich structure where properties like independence, marginal distributions, and conditional behaviors play a crucial role in understanding the multivariate Gaussian distribution in depth. In this post, we will provide a sketch of proofs for several key properties of multivariate Gaussians. These properties include the independence of Gaussian random vectors, the derivation of marginal and conditional distributions, the invariance of covariance under transformations, and the behavior of Gaussian distributions under linear transformations. Rather than delving into detailed, rigorous proofs, we aim to offer an intuitive and accessible overview, highlighting the underlying mathematical concepts without overwhelming technical detail.
Proving the Independence of Gaussian Random Vectors
The relationship between the covariance structure of a random vector and its independence properties plays a key role in many machine learning algorithms, signal processing, and other applied fields. It is particularly relevant in Gaussian processes, where we assume that the random vectors representing function values are often independent, under certain conditions. So, let’s break down the problem mathematically, focusing on proving the independence of the components of a multivariate Gaussian vector. The path to this insight involves linear algebra, specifically understanding the covariance matrix and its diagonalization.
Covariance Matrix Diagonalization and Independence
Consider a random vector
The probability density function (PDF) of this multivariate Gaussian distribution is given by:
Here,
Now, to examine the independence of the components of
Diagonalization of Covariance Matrix
We know from linear algebra that any symmetric matrix, including the covariance matrix
where
To unpack this geometrically: the matrix
But here’s the critical point: uncorrelated components in a multivariate normal distribution are independent. This result holds because the multivariate normal distribution has a special property: if its components are uncorrelated, they must also be independent. This follows from the fact that the joint distribution factorizes when the covariance matrix is diagonal.
Formal Proof of Independence
Let’s now formalize this with a rigorous proof. Given that the covariance matrix
The off-diagonal elements of
For two random variables
If the covariance matrix
This implies that
Thus, if the covariance matrix
Geometrical Intuition
To gain some geometrical intuition, think of a random vector
When we diagonalize the covariance matrix, we essentially rotate the space such that the axes align with the principal directions of variation. Along each axis, the corresponding component of
Demo
In this section, we explore how the covariance matrix influences the distribution of multivariate Gaussian random variables. Specifically, we’ll visualize the difference between correlated and independent random variables by examining two different covariance matrices: one that introduces correlation between the components and one that enforces independence.
We will work with a bivariate Gaussian distribution, where the random vector
-
Non-diagonal Covariance Matrix (Correlated Case): A covariance matrix with off-diagonal elements (non-zero values) implies that the two components
and are correlated. For example, a covariance matrix like:indicates a positive correlation between
and . When plotted, this produces an elliptical distribution, where the data points are spread in a specific direction, showing a linear relationship between the two variables. -
Diagonal Covariance Matrix (Independent Case): A diagonal covariance matrix implies that the two variables are independent. For example:
In this case, there is no correlation between
and , and the distribution is circular. The points are spread equally in all directions, showing no linear dependence between the two variables.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Set plotting style
sns.set(style="white", palette="muted")
# Seed for reproducibility
np.random.seed(42)
# Define mean and covariance matrices
mean = [0, 0]
# Covariance Matrix 1 (Non-diagonal, correlated)
cov_1 = [[1, 0.8], [0.8, 1]]
# Covariance Matrix 2 (Diagonal, independent)
cov_2 = [[1, 0], [0, 1]]
# Generate random samples
data_1 = np.random.multivariate_normal(mean, cov_1, 1000)
data_2 = np.random.multivariate_normal(mean, cov_2, 1000)
# Plotting the samples
fig, ax = plt.subplots(1, 2, figsize=(14, 7))
# Plotting the non-diagonal covariance samples
sns.scatterplot(x=data_1[:, 0], y=data_1[:, 1], ax=ax[0], color="blue", alpha=0.6)
ax[0].set_title("Non-diagonal Covariance Matrix (Correlated)")
# Plotting the diagonal covariance samples
sns.scatterplot(x=data_2[:, 0], y=data_2[:, 1], ax=ax[1], color="green", alpha=0.6)
ax[1].set_title("Diagonal Covariance Matrix (Independent)")
plt.tight_layout()
plt.show()

-
Non-diagonal Covariance Matrix (Correlated Case): In the first plot (left), where the covariance matrix contains off-diagonal values (0.8), the points form an elongated ellipse. This shows that the two variables
and are positively correlated. The data points are not spread equally in all directions but rather along the principal axes of the ellipse, which reflects the dependency between the variables. -
Diagonal Covariance Matrix (Independent Case): In the second plot (right), the covariance matrix is diagonal, indicating that
and are independent. The points form a circular distribution, with no discernible directionality. This indicates that the changes in do not depend on , and vice versa. The variables are independent, which is precisely what the diagonal covariance structure represents.
Deriving Marginal and Conditional Distributions of Multivariate Gaussians
The fact that while the marginal distributions of a multivariate Gaussian are always Gaussian, the reverse inference is not true.
Deriving the Marginal Distribution
In probability theory, the marginal distribution is the distribution of a subset of variables, ignoring the others. Mathematically, for a random vector
Suppose we have a partition of the vector
The full vector
We want to derive the marginal distribution of
The joint PDF of
Now, let’s partition the mean vector
Where:
is the mean vector for the subset , is the mean vector for the complement , is the covariance matrix between the components in , and are the cross-covariances between and , is the covariance matrix of .
To find the marginal distribution of
Substituting the partitioned PDF:
This integral can be solved, and the result is another Gaussian distribution for
Thus, the marginal distribution of
Reverse Inference: Marginal Distributions Do Not Imply Multivariate Normality
While it is clear from the above derivation that the marginal distribution of any subset of a multivariate Gaussian is also Gaussian, there is an interesting and subtle issue when trying to reverse the argument. That is, just because the marginal distributions of a random vector are Gaussian does not necessarily mean the entire vector follows a multivariate Gaussian distribution.
Suppose we have a random vector
This implies that the marginal distributions of
Counterexample: Suppose that we have two random variables
For instance, consider the following joint distribution:
Even though the marginals of
In simpler terms: marginal Gaussianity is a necessary but not sufficient condition for the joint distribution to be Gaussian.
Invariance of the Multivariate Gaussian Distribution under Linear Transformations
In this section, we explore a crucial property of the multivariate Gaussian distribution: its invariance under linear transformations.
The Setup: A Multivariate Gaussian Vector
We begin with a random vector
Here,
The PDF of
We are interested in how the distribution of
The Linear Transformation
Consider a linear transformation of the vector
Where
To do this, we use the fact that the multivariate Gaussian distribution is closed under linear transformations. This means that if
Deriving the Mean and Covariance of
Step 1: The New Mean Vector
The mean of
Since the expectation of
Thus, the mean vector of
Step 2: The New Covariance Matrix
Next, we derive the covariance matrix of
Substituting
Since
Recall that
This shows that the covariance matrix of the transformed vector
Demo
In this experiment, we explore the invariance of multivariate Gaussian distributions under linear transformations. Specifically, we start with a bivariate Gaussian distribution and apply a simple linear transformation, such as rotation, to the data. The transformation is visualized by comparing the original and transformed distributions using scatter plots.

Additionally, we delve into how the covariance structure of the data changes under linear transformations. The covariance ellipses are plotted for both the original and transformed distributions, showing how the shape and orientation of the ellipses reflect the underlying correlations between the variables. This experiment demonstrates that, while the distribution remains Gaussian, the transformation alters the data’s spread and direction,

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Set random seed for reproducibility
np.random.seed(0)
# Define original mean and covariance matrix
mu = [0, 0] # Mean vector
sigma = [[1, 0.8], [0.8, 1]] # Covariance matrix with positive correlation
# Generate samples from the bivariate Gaussian distribution
x, y = np.random.multivariate_normal(mu, sigma, 5000).T
# Define a simple rotation matrix (45 degrees)
theta = np.pi / 4 # Rotation by 45 degrees
A = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])
b = np.array([0, 0]) # No translation
# Apply the linear transformation
xy_transformed = np.dot(A, np.vstack([x, y])) # Perform the linear transformation
xy_transformed = xy_transformed + b[:, np.newaxis] # Apply the translation vector separately
# Plot the original and transformed distributions on the same axes
fig, ax = plt.subplots(figsize=(8, 8))
# Plot the original distribution with transparency
sns.kdeplot(x=x, y=y, cmap="Blues", fill=True, ax=ax, alpha=0.5, label="Original Distribution")
# Plot the transformed distribution with transparency
sns.kdeplot(x=xy_transformed[0], y=xy_transformed[1], cmap="Oranges", fill=True, ax=ax, alpha=0.5, label="Transformed Distribution")
# Add titles and labels
ax.set_title("Comparison of Original and Transformed Bivariate Gaussian Distributions")
ax.set_xlabel("X1")
ax.set_ylabel("X2")
# Display legend to differentiate the distributions
ax.legend()
plt.tight_layout()
plt.show()
# Function to plot covariance ellipse
def plot_cov_ellipse(covariance, mean, ax, color='blue', label=None):
# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eigh(covariance)
order = eigenvalues.argsort()[::-1] # Sort by eigenvalue size
eigenvalues, eigenvectors = eigenvalues[order], eigenvectors[:, order]
# Calculate rotation angle
angle = np.degrees(np.arctan2(*eigenvectors[:, 0][::-1]))
# Plot the covariance ellipse
width, height = 2 * np.sqrt(eigenvalues)
ell = patches.Ellipse(mean, width, height, angle=angle, color=color, alpha=0.3)
ax.add_patch(ell)
# Plot covariance ellipses for both distributions on the same axes
fig, ax = plt.subplots(figsize=(8, 8))
# Covariance ellipse for original distribution
plot_cov_ellipse(sigma, mu, ax, color='blue')
ax.scatter(x, y, color='skyblue', alpha=0.5, label="Original Samples")
# Covariance ellipse for transformed distribution
transformed_sigma = A @ sigma @ A.T
plot_cov_ellipse(transformed_sigma, [0, 0], ax, color='orange')
ax.scatter(xy_transformed[0], xy_transformed[1], color='orange', alpha=0.5, label="Transformed Samples")
# Add titles and labels
ax.set_title("Comparison of Covariance Ellipses for Original and Transformed Distributions")
ax.set_xlabel("X1")
ax.set_ylabel("X2")
# Display legend
ax.legend()
plt.tight_layout()
plt.show()
Distribution and the Ellipsoid Theorem
The Distribution: Derivation from Multivariate Gaussian
Let’s begin with the first concept: the
This expression appears frequently in multivariate statistics, particularly when we are testing hypotheses about the mean vector
Step 1: The Transformation to Standard Normal
To analyze
Here,
Step 2: The
Now, notice that
Each
This result tells us that any quadratic form
Demo for the Distribution
To test this, we simulate 1,000 samples from a 3-dimensional multivariate Gaussian distribution with a mean vector of zeros and an identity covariance matrix. For each sample, we compute the quadratic form
The resulting plot includes the simulated distribution of

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import chi2
# Set experiment parameters
k = 3 # Degrees of freedom, as we are using a 3-dimensional Gaussian
mu = np.zeros(k) # Mean vector (all zeros)
Sigma = np.eye(k) # Covariance matrix (identity matrix for simplicity)
# Generate samples from a multivariate normal distribution
np.random.seed(42) # Set seed for reproducibility
samples = np.random.multivariate_normal(mu, Sigma, 1000) # 1,000 samples
# Compute the quadratic form Q = x^T * Sigma_inv * x for each sample
Sigma_inv = np.linalg.inv(Sigma) # Inverse of the covariance matrix
Q = np.sum((samples @ Sigma_inv) * samples, axis=1) # Quadratic form values
# Plot the histogram of Q values with kernel density estimation (KDE)
plt.figure(figsize=(10, 6))
sns.histplot(Q, kde=True, stat="density", color="skyblue", label="Empirical Distribution", bins=40, alpha=0.7)
# Plot the theoretical Chi-Squared PDF
x = np.linspace(0, np.max(Q), 100) # Range for theoretical curve
y = chi2.pdf(x, df=k) # Chi-Squared PDF with k degrees of freedom
plt.plot(x, y, 'r-', label=f"Chi-squared Distribution (df={k})", linewidth=2)
# Plot the Chi-Squared CDF for additional reference
y_cdf = chi2.cdf(x, df=k)
plt.plot(x, y_cdf, 'g--', label=f"Chi-squared CDF (df={k})", linewidth=2)
# Mark the 95% critical value for the Chi-Squared distribution
critical_value_95 = chi2.ppf(0.95, df=k)
plt.axvline(critical_value_95, color="orange", linestyle="--", label=f"95% Critical Value (df={k})", linewidth=2)
# Add text annotation for the 95% critical value
plt.text(critical_value_95 + 1, 0.03, f"Critical Value = {critical_value_95:.2f}", color="orange", fontsize=12)
# Mark the mean and standard deviation lines
mean = k # Mean of Chi-Squared with k degrees of freedom
std_dev = np.sqrt(2 * k) # Standard deviation of Chi-Squared with k degrees of freedom
plt.axvline(mean, color="purple", linestyle=":", label=f"Mean = {mean}", linewidth=2)
plt.axvline(mean + std_dev, color="purple", linestyle=":", label=f"Mean + 1 SD", linewidth=2)
plt.axvline(mean - std_dev, color="purple", linestyle=":", label=f"Mean - 1 SD", linewidth=2)
# Beautify and finalize the plot
plt.title(f"Chi-Squared Distribution with {k} Degrees of Freedom", fontsize=16)
plt.xlabel("Q Value", fontsize=14)
plt.ylabel("Density", fontsize=14)
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
# Display the plot
plt.tight_layout()
plt.show()
The Ellipsoid Theorem: Geometrical Interpretation of Gaussian Contours
The second concept we explore is the ellipsoid theorem, which describes the shape of the level sets (contours) of a multivariate Gaussian distribution. Specifically, we will prove that the contour lines of a multivariate Gaussian distribution are ellipsoids, and we will derive their geometric properties.
Let’s consider again a random vector
The contour lines of the Gaussian distribution correspond to the set of points
where
Step 1: Eigenvalue Decomposition of the Covariance Matrix
To understand the geometry of this surface, we perform the eigenvalue decomposition of the covariance matrix
where
Thus, the quadratic form can be rewritten as:
where
Step 2: Geometry of the Ellipsoid
In the transformed space, the equation describing the level set becomes:
This is the equation of an ellipsoid in the
Thus, the geometry of the level sets (or contours) of a multivariate Gaussian distribution is determined by the eigenvalues and eigenvectors of the covariance matrix. The eigenvectors determine the direction of the axes of the ellipsoid, and the eigenvalues determine their lengths (the standard deviations along those axes).
Demo for the Ellipsoid Theorem
To illustrate this, we generate samples from three different 2-dimensional Gaussian distributions, each with a unique covariance matrix. Each covariance matrix introduces a different level of correlation between the dimensions, which changes the orientation and shape of the ellipsoidal contours. We visualize these contours using concentric ellipses representing one, two, and three standard deviations from the mean. These ellipses are derived from the eigenvalues and eigenvectors of the covariance matrices, where the eigenvalues define the axis lengths, and the eigenvectors determine the rotation of each ellipse. The final plot overlays scatter plots of each Gaussian sample set to show sample density, and the ellipsoid contours at multiple standard deviations.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from matplotlib.patches import Ellipse
# Set experiment parameters
mu = [0, 0] # Mean vector
covariances = [
[[2, 1], [1, 2]], # Covariance matrix 1
[[3, 1], [1, 1]], # Covariance matrix 2
[[1, -0.8], [-0.8, 1]] # Covariance matrix 3
]
# Generate samples for each covariance matrix
np.random.seed(42)
samples = [np.random.multivariate_normal(mu, cov, 500) for cov in covariances] # 500 samples per covariance matrix
# Set up plot
plt.figure(figsize=(12, 8))
# Define colormaps and colors for each covariance matrix
colormaps = ["Blues", "Greens", "Reds"]
colors = ["blue", "green", "red"]
# Plot sample distributions and ellipsoid contours
for i, (sample, cov) in enumerate(zip(samples, covariances)):
# Density plot for samples with KDE
sns.kdeplot(x=sample[:, 0], y=sample[:, 1], fill=True, cmap=colormaps[i], alpha=0.3, thresh=0.1)
# Scatter plot for samples with explicit color
sns.scatterplot(x=sample[:, 0], y=sample[:, 1], s=30, color=colors[i], label=f"Samples with Covariance {i+1}")
# Draw ellipsoids for 1, 2, and 3 standard deviations
eigenvalues, eigenvectors = np.linalg.eigh(cov) # Eigenvalues and eigenvectors for covariance matrix
angle = np.degrees(np.arctan2(*eigenvectors[:, 0][::-1])) # Rotation angle in degrees
for n_std in range(1, 4): # 1, 2, 3 standard deviations
width, height = 2 * n_std * np.sqrt(eigenvalues) # Ellipse width and height
ellipse = Ellipse(xy=mu, width=width, height=height, angle=angle,
edgecolor=colors[i], linestyle="--", linewidth=2, fill=False, alpha=0.5)
plt.gca().add_patch(ellipse)
# Add title and axis labels
plt.title("Ellipsoid Contours for Different Covariance Matrices", fontsize=16)
plt.xlabel("X1", fontsize=14)
plt.ylabel("X2", fontsize=14)
plt.legend(loc="upper right")
# Beautify the plot
plt.grid(True, which="both", linestyle="--", linewidth=0.5)
plt.tight_layout()
plt.show()
Conclusion
In this blog, we introduced the key properties of the multivariate Gaussian distribution, including the independence of random variables, the derivation of marginal and conditional distributions, closure under linear transformations, and its geometric characteristics. These properties form the foundation of probabilistic modeling and serve as building blocks for more complex models.
As a natural extension of the multivariate Gaussian distribution, Gaussian Mixture Models (GMMs) combine multiple Gaussian components to flexibly capture multimodal characteristics in data. In the next blog, we will explore the mathematical principles behind GMMs and their applications as generative models.
Enjoy Reading This Article? Here are some more articles you might like to read next: