np.percentile: How to Calculate Percentile in Numpy Array
A percentile is a mathematical criterion used in statistics, suggesting the value below which a given percentage of observations falls in a group of observations.
The np.percentile() is a numpy mathematical array method used to calculate the ith percentile of the provided input data supplied using arrays along a specified axis. But what does percentile value mean? A percentile is a mathematical term generally used in statistics.
The ith percentile of a set of data is the value at which i percent of the data is below it. Using the np percentile() method, you can calculate the percentile in Python.
For better understanding, we may consider a student who scores 90 percentiles out of 100, and then it means that out of 100 students, that particular student has outnumbered 90 students, and they are below him.
numpy.percentile (arr, i, axis=None, out=None)
The percentile() function takes at most four parameters:
arr: array_like, input array, or an object that can be converted into an array-like a list.
i: percentile value, it must be in the range of 0-100 (with 0 and 100 as inclusive values).
axis: It is an optional parameter. It represents the axis along which we want to compute the percentile. If the axis value is not given, then by default, the input array is supposed to be flattened, and then percentile value is computed (overall axis). So, for example, if we set axis=0, then percentile is calculated along the column, and if axis= 1, then percentile is computed along the row.
out: ndarray, it is also an optional parameter. It represents an optional resultant array in which we want to get our output. It is to be noted that the shape of this array should match with the expected output.
It returns a scalar or a ndarray. The method returns ith percentile value. A scalar value is returned when axis value is set to None; otherwise, when the axis is specified, an n-dimensional array is returned with percentile values along the specified axis.
Program to show the working of numpy.percentile() method in case of 1-D array/vectors.
# importing the numpy module import numpy as np # Making a list lst = [16, 18, 22, 14, 22, 23, 28, 17, 19, 15, 16, 22, 14, 16, 18] print("Given list is:\n", lst) # Calculating percentile value in the list res = np.percentile(lst, 35) print("35th percentile of given list is: ", res) # Creating an array arr = np.array([100, 200, 150, 125, 175, 150]) print("\nArray elements are:\n", arr) # Calculating percentile value in the list output = np.percentile(arr, 75) print("75th percentile of given array is: ", output)
Given list is: [16, 18, 22, 14, 22, 23, 28, 17, 19, 15, 16, 22, 14, 16, 18] 35th percentile of given list is: 16.0 Array elements are: [100 200 150 125 175 150] 75th percentile of given array is: 168.75
In the above program, we have taken a list named lst consisting of some random integers, then we have calculated the 35th percentile value in the list. Similarly, another array named arr is taken then we have displayed 75th percentile value in the array.
Program to show the working of numpy.percentile() method in case of a multi-dimensional array:
See the following code.
# importing the numpy module import numpy as np # Making a 2D array arr = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]]) print("Elements in the 2D array are:\n", arr) out = np.percentile(arr, 50) print("Calculating 50th percentile without specifying axis: ", out) out0 = np.percentile(arr, 50, axis=0) print("Calculating 50th percentile along axis 0: ", out0) out1 = np.percentile(arr, 50, axis=1) print("Calculating 50th percentile along axis 1: ", out1)
Elements in the 2D array are: [[10 20 30] [40 50 60] [70 80 90]] Calculating 50th percentile without specifying axis: 50.0 Calculating 50th percentile along axis 0: [40. 50. 60.] Calculating 50th percentile along axis 1: [20. 50. 80.]
In the program, we have taken a two-dimensional array named arr, and then we have displayed its content inside the array.
Computing 50th percentile value of the given array in three modes:
First, when we didn’t specify any axis, the result becomes a scalar value because, by default, the input array is supposed to be flattened.
Secondly, when we compute the percentile value along axis 0, then percentile value is calculated along with the columns. Its result is shown using out0.
Lastly, when we compute the percentile value along axis 1, then percentile value is calculated along the rows. Its result is shown using out1.
That’s it for this tutorial.