Probability mass function is recognized as a probability that is distributed over discrete variables.

First, probability mass function is always denoted with the capital P.

Second, each random variable with a different probability mass function will be identified by the random variable. P(x) is not the same as P(y).

Third, P(X = x) is the same as P(x).

Fourth, probability mass functions can act on many variables all that the same time, this is called joint probability distribution: P(X = x, Y = y) means that the probability that X = x and Y = y at the same time. And can therefore be written as P(x,y).

The following are required to satisfy the properties of probability mass function:

- Domain of P must be the set of all possible states of x.
- ∀x ∈ x,0≤ P(x)≤1 — What does this mean? Well, something that might be impossible would have a probability of 0. Therefore, the possibility that something is guaranteed to happen would have the probability of 1. And that is the greatest chance of occurring.
- P(x) = 1 means that the property has been normalized. Without being normalized we could run the chance of getting probabilities greater than one.