Random variables, distributions, and independence
In this section, we will first introduce random variables, which will enable us to model diverse random experiments in a unified mathematical framework. We will also learn about probability distributions, which are useful for describing and predicting how likely random events are to occur, and expected values which summarize distributions as a single value.
Random variables
Sample spaces of random experiments are fairly arbitrary objects (e.g., {heads, tails}, values showing on dice, playing cards). This generality makes probability theory powerful. But unfortunately, such generality is also inconvenient – mathematics works best with concrete definitions.
So how can we develop a uniform framework for coin flips, die rolls, card draws, etc.? We define random variables to represent random outcomes. A random variable (RV) assigns a real number to any possible outcome of a trial or experiment. Sometimes the RV can be defined in a natural way for example for a die roll, the random variable can be defined as the number showing on the die. In other case, we have to define an arbitrary mapping such as \(heads \to 0, tails \to 1\).
We actually represented outcomes with numbers with foresight in the last section; we just didn’t mention the term random variables (it’s a good idea to review this table and the examples after it).
Random variable are shown by capital letters, such as \(X\). When a particular outcomes occurs the random variable takes the corresponding value. For example, let \(X\) be the random variable defined for a coin toss, where we assign \(heads\to 0, tails\to 1\). Then if heads shows, we say \(X=0\) while if tails shows we say \(X=1\).
Random variables allow us to translate general ideas about probability to numeric concepts:
- Finite or discrete sample space: finite or discrete range of random variables (The range of a random variable is the set of values it can take)
- Continuous sample space: continuous range of random variable
- Events as sets of outcomes: events as sets of numbers, intervals, or collections of intervals, on real number line
- Probability function on events: probability function on intervals, or ranges, on real number line
A random variable whose range is discrete, is called a discrete random variable and those with continuous ranges are called continuous random variables.
- a birthdate (discrete)
- a temperature (continuous)
Consider a random variable \(X\), defined as the number showing on a die. We can now represent events as membership in a set of numbers. For example, the event that an odd number shows on the die can be written as \(X\in\{1,3,5\}\) and its probability can be written as \(P(X\in\{1,3,5\}).\)
\(P(X=2)=\) \(\frac16\), \(\qquad P(3 \leq X \leq 5)=\) \(P(X\in\{3,4,5\})=\frac36=\frac12\)
When an experiment has more than one component, for example, when an action is repeated, we can describe the outcomes as a tuple of random variables. For example, suppose a die is rolled twice. Let the result of the first roll be denoted by \(X\) and the result of the second roll be denoted by \(Y\). We can represent each outcome with a pair of numbers, i.e., $(X,Y)$. For example, (2,3) represents the first die showing 2 and the second die showing 3.
Random variables can also be functions of other random variables. Continuing the two dice example, we can define a third random variable as the sum of \(X\) and \(Y\), i.e., \(Z=X+Y\).
When dealing with more than one random variable, say $X$ and $Y$, we can write $P(X=i,Y=j)$ to indicate the intersection of the two events $X=i$ and $Y=j$, i.e., $P(X=i\cap Y=j)$.
Describing probabilities: distributions
Probability over discrete sets
Suppose a die is rolled twice, with all outcomes equally likely. Consider the random variable $Z$ defined as the sum of the two die rolls. The sample space, with all outcomes equally likely, is
and the range of the random variable \(Z\) is the set {2,3,4,5,6,7,8,9,10,11,12}. We can find the probability of each of these values:
Now that we have these probabilities, we can plot \(P(Z=z)\), for \(z\in\{2,3,4,5,6,7,8,9,10,11,12\}\), which you can see below:
The function \(P(Z=z)\) is the distribution of \(Z\). It determines the probability of each value that \(Z\) can take. This is useful for a few reasons. First, after we determine the distribution, we no longer need to recalculate the probabilities of outcomes of interest. Second, we can calculate the probabilities of different events by summing the probabilities of the relevant outcomes. Finally, as we will see, it will allow us to define quantities such as the expected value, which help us better understand the behavior of \(Z\).
- Find \(P(Z>10)\).
- What is the most likely value?
In this example, our random variable was discrete. The distribution of a discrete random variable is called a probability mass function (pmf). Note that pmf’s do not apply to continuous random variables. We will discuss continuous probability distributions later, albeit much more briefly.
For brevity, instead of \(P(X = x)\) we may write \(P(x)\) or \(p(x)\).
- Find the pmf of \(X\).
- \(P(X\ge 1)\)
- What is the most likely value(s)?
Independence
Suppose we have a fair die, which when rolled has equal probability of showing each of the numbers 1 to 6, and a fair coin with equal probability of heads and tails. Let $X$ denote the number showing on the die and $Y$ be equal to 0 if heads shows and equal to 1 if tails shows. If you know what $X$ is, does that provide any information about $Y$? It doesn’t seem so. For example, if $X=4$, that doesn’t affect the probability of $Y=0$. The latter probability is still $1/2$.
Two random variables $X$ and $Y$ are called independent if the value of one does not affect the other.
We described above what independence intuitively mean. What about a mathematical definition?
This definition may seem strange at first sight but it agrees with our intuitive understanding of independence. For example, what is the probability $P(X=4,Y=0)$, with $X$ and $Y$ defined as above? The mathematical definition of independence says if $X$ and $Y$ are independent, as we believe they are, then
\[P(X=4,Y=0)=P(X=4)\times P(Y=0)=1/6\times1/2=1/12.\]Suppose the experiment is performed many many times. Would we intuitively expect to see $X=4,Y=0$ in a twelfth of the trials? Yes! In 1/6 of the trials, we would expect to see $X=4$. Since the coin doesn’t care about the die, among the trials in which $4$ shows, in about half of them the coin should show heads ($Y=0$). So overall, in $1/6\times1/2$ of the trials, we would expect to see both $X=4$ and $Y=0$.
Let’s give this a try:
- Show that $X$ and $Y$ are independent (using the mathematical definition and the fact that all outcomes are equally likely).
- Show that $Z$ and $X$ are not independent by considering the probability $P(X=1,Z=3)$.
The definition of independence also extends to events:
Sometimes the components of an experiment are physically independent from each other, for example, when two dice are rolled, one by one. In such cases, independence is very natural. But independence also may be the case when the two events are physically related or may result from a single action. For example, consider a deck of cards from which you draw a card at random. The color of the card could be red or black; its suit can be heart, diamond, club or spade; and its rank may be \(A, 2, 3, 4, \cdots, J, Q, K\). Let’s check the independence of a few events:
- ‘heart’ and ‘ace’ are independent:
- ‘red’ and ‘heart’ are not independent:
- ‘black’ and ‘heart’ are not independent:
- ‘ace’ and ‘red’ are independent:
The examples above show that color and suit are not independent. Since the first example works for any rank, not just ace, and for any color, not just red, the rank and color are independent. Similarly, rank and suit are independent.
- $P(A\cap B).$
- $P(A\cup B).$
Events defined using random variables: If two random variables $X$ and $Y$ are independent, any event defined based on these will also be independent. (This is an important result but we will not prove it.) For example, if $X$ and $Y$ are two independent die rolls, then
\[P(X<4,Y=3) = P(X<4)P(Y=3)=\frac36\times\frac16=\frac3{36}=\frac1{12}.\]Specifically, $X<4,Y=3$ corresponds to the event \(\{(1,3),(2,3),(3,3)\}\).
What about more than two random variables?
- We can talk about the independence of any pair of random variables.
- We can talk about independence among any group of random variables: \(P( X=x, Y=y, Z=z,\cdots) = P(X=x)P(Y=y)P(Z=z)\cdots\)
- If an entire group of random variables are independent, we say they are mutually independent.
- Mutual independence implies pairwise independence, but the converse is not true.
Bernoulli and binomial distributions
Bernoulli distribution
The distribution of a random variable $X$ that takes only two values, typically 0 and 1, is called a Bernoulli distribution, where the probability of 1 is usually denoted by $p$, i.e., $p=P(X=1)$. Such a random variable results from an experiment with two outcomes such as flipping a coin, playing a game (no draw), performing any task that may lead to success or failure, etc. The distribution is determined in full by $p$:
\[P(X=1) = p, \qquad P(X=0) = 1-p.\]As an example, for $p=0.3$, the plot of the pmf is given below.
If the distribution of $X$ is Bernoulli with probability of 1 equal to $p$, we write $X\sim\text{Bernoulli}(p)$. The most common case is $p=1/2$ resulting from a fair coin.
Binomial distribution
An archer hits the target with probability $p$. She participates in a competition that involves shooting three times and we can assume each shot is independent from the others. Let $X$ denote the number of times she hits the target. What is the distribution of $X$?
Let’s show each outcome as a binary sequence of length 3, with 1 denoting hitting the target. We have
Let us now find the probability of each event. $X=3$ corresponds to three hits. Since each hit has probability $p$ and they are independent,
\[P(X=3) = p\times p\times p =p^3.\]Similarly, the probability of $X=0$ is
\[P(X=0) = (1-p)\times (1-p)\times (1-p) =(1-p)^3.\]The case of $X=1$ is trickier. There are three outcomes in this event. But they all have the same probability:
So $P(X=1)=3 p (1-p)^2$. Similarly, $P(X=2) = 3p^2(1-p)$.
Now let us consider the general case, when the archer tries $N$ times. What is the probability of $k$ hits, i.e., $P(X=k)$? The probability of a particular sequence of $k$ hits and $N-k$ misses is $p^k(1-p)^{N-k}$. But we also need to consider the number of such sequences. Each sequence of $k$ hits and $N-k$ misses is equivalent to a binary sequence with $k$ 1s and $N-k$ 0s. There are ${N\choose k}$ such sequences. Putting thing together,
\[P(X=k) = {N\choose k}p^k (1-p)^{N-k}.\]This is called the binomial distribution, written as $X\sim\text{Binomial}(N,p)$.
Below, you can plot the Binomial pmf for different values of $N$ and $p$.
Probability over continuous sets
What if we are interested in the amount of rainfall, blood pressure, etc.? Here the probability is over a continuous set. The distribution is then shown with a probability density function, or pdf for short, which is a continuous curve. An example is shown below. Intuitively, wherever pdf is larger, the surrounding region has a higher probability.
The pdf shown above is for a random variable $X$ with Gaussian distribution, whose formula is
\[f(x) = \frac1{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]where $\mu$ and $\sigma$ control the shape of the distribution (similar to $N$ and $p$ for binomial distribution.) In the figure $\sigma=1,\mu=0$.
If the pdf is given by a function $f(x)$, then we can compute event probabilities using integrals:
\[P(a\le X\le b) = \int_{a}^{b} f(x)dx\]If we are interested in the probability that $X$ is between -1 and 1, we can find it as
\[P(-1\le X\le 1) = \int_{-1}^1 f(x)d(x) = 0.6827,\]which is the area of the shaded region in the graph below.
Below, we simulate 20 values from this distribution:
0.48889, 1.0347, 0.72689, -0.30344, 0.29387, -0.78728, 0.8884, -1.1471, -1.0689, -0.8095, -2.9443, 1.4384, 0.32519, -0.75493, 1.3703, -1.7115, -0.10224, -0.24145, 0.31921, 0.31286
13 of them are in the interval of interest, a fraction of $13/20=0.65$ which is close to the probability $0.68$.
Our discussion of continuous distributions will for the most part be limited to the above. We won’t need to compute integrals and Gaussian distribution is the only one we will consider, although there are many more common continuous distributions.