Probability mass function is recognized as a probability that is distributed over discrete variables.
First, probability mass function is always denoted with the capital P.
Second, each random variable with a different probability mass function will be identified by the random variable. P(x) is not the same as P(y).
Third, P(X = x) is the same as P(x).
Fourth, probability mass functions can act on many variables all that the same time, this is called joint probability distribution: P(X = x, Y = y) means that the probability that X = x and Y = y at the same time. And can therefore be written as P(x,y).
The following are required to satisfy the properties of probability mass function:

 Domain of P must be the set of all possible states of x.
 ∀x ∈ x,0≤ P(x)≤1 — What does this mean? Well, something that might be impossible would have a probability of 0. Therefore, the possibility that something is guaranteed to happen would have the probability of 1. And that is the greatest chance of occurring.
 P(x) = 1 means that the property has been normalized. Without being normalized we could run the chance of getting probabilities greater than one.