PERCENTILE_CONT
Returns the value at the position in an ordered list where a specified percentage falls. If the percentile value does not correspond to an exact data point in the dataset, PERCENTILE_CONT
interpolates between the two closest values surrounding the specified percentile to estimate the result. The interpolation is based on the dataset’s ascending or descending order.
For example, if you want to find the 40%
percentile on the ordered list [10,20,30]
, position 1
corresponds to the value 10
, and position 2
corresponds to the value 20
. The position that corresponds to the specified percentile is calculated as 1
plus the percentile multiplied by (n-1
), where n
is the number of data points in the set. Then, the 40th
percentile corresponds to 1 + 0.40 * (3 - 1) = 1.8
, which falls between position 1
and 2
, or the values 10
and 20
. Because the desired percentile does not correspond to an exact data point, PERCENTILE_CONT
interpolates an estimated value using the following formula:
Formula to interpolate values
RN = 1 + <percentile> * (<number_of_values> - 1)
CRN = CEILING(RN)
FRN = FLOOR(RN)
EXP_CRN = <sorted_expression>[CRN]
EXP_FRN = <sorted_expression>[FRN]
If RN is a whole number, RESULT = <ordered_set>[RN].
If RN is not a whole number, RESULT = (CRN - RN) * EXP_FRN + (RN - FRN) * EXP_CRN.
Formula definitions In the previous formula for interpolation, the following apply:
RN
: The row number of the exact position of a value within an ordered dataset.CRN
: The ceiling row number, which determines the upper row position for interpolation.FRN
: The floor row number, which determines the lower row position for interpolation.EXP_CRN
: The value from the sorted dataset that corresponds to theCRN
, used as the upper bound in percentile calculations when interpolation is required.EXP_FRN
: The value from the sorted dataset that corresponds to the FRN, used as the lower bound in percentile calculations when interpolation is required.<sorted_expression>
: The ordered list of values derived from the input expression, arranged in ascending or descending order, used as the basis for percentile calculations.<number_of_values>
: The total count of values within the input expression, used to calculate the exact position of a specified percentile by multiplying the percentile (as a decimal) by (number_of_values
− 1). This calculated position then identifies the two values in the sorted dataset used for interpolation.
In the previous example, PERCENTILE_CONT
estimates the value between 10
and 20
in the ordered list [10,20,30]
to find the 40th percentile as follows:
result = (CRN - RN) * (EXP_FRN) + (RN - FRN) * EXP_CRN = (2 - 1.8) * 10 + (1.8 - 1) * 20 = 18
.
In the previous example, the set [10,20,30]
is in ascending order. If the set were in descending order as [30,20,10]
, the 40th
percentile would correspond to:
result = (CRN - RN) * (EXP_FRN) + (RN - FRN) * EXP_CRN = (2 - 1.8) * 30 + (1.8 - 1) * 20 = 22
.
Syntax
PERCENTILE_CONT
( <percentile> ) WITHIN GROUP ( ORDER BY <expression> [ { ASC | DESC } ] )
Parameters
Parameter | Description | Supported input types |
---|---|---|
<percentile> | The desired percentile position within the ordered dataset. | DOUBLE PRECISION , REAL literal between 0.0 and 1.0 |
<sorted_expression> | The expression used to calculate the percentile, which is sorted in ascending or descending order prior to calculation. | DOUBLE PRECISION , REAL , BIGINT , INT |
Return Types
PERCENTILE_CONT
returns a value of type DOUBLE PRECISION
.
- This function ignores
NULL
values. - This function returns
NULL
if the input is either empty or contains onlyNULL
values.
Examples
These calculations use 1-based indexing, where the first element is at position 1
.
Example
The following code example calculates the 20th percentile of values from a generated series ranging from 0
to 5
, and returns it as result
:
SELECT PERCENTILE_CONT(0.2) WITHIN GROUP (ORDER BY x) as result FROM GENERATE_SERIES(0, 5) as x;
Returns
result (DOUBLE PRECISION) |
---|
1 |
The previous code example calculates the 20th percentile from a series that contains the following six values: 0
, 1
, 2
, 3
, 4
, and 5
. The position for the 20th percentile is calculated as follows: RN = 1 + 0.2 * (6 - 1) = 2
. Since the row number is a whole number, it corresponds directly to the value at position 2
, which is EXP_RN = 1
, using one-based indexing.
Example
The following code example calculates the 20th percentile of values from a generated series ranging from 0
to 6
, interpolates the position, and returns it as result
:
SELECT PERCENTILE_CONT(0.2) WITHIN GROUP (ORDER BY x) as result FROM GENERATE_SERIES(0, 6) as x;
Returns
result (DOUBLE PRECISION) |
---|
1.2000000000000002 |
The previous code example calculates the 20th percentile from a series that contains the following seven values: 0
, 1
, 2
, 3
, 4
, 5
, and 6
. The position for the 20th percentile is calculated as follows: RN = 1 + 0.2 * (7 - 1) = 2.2
. Since 2.2
is not a whole number, PERCENTILE_CONT
interpolates the position as follows:
- The
CRN
isCEILING(2.2) = 3
. - The
FRN
isFLOOR(2.2) = 2
. - The
EXP_CRN
is the value at position3
of the ordered set which is2
using one-based indexing. - The
EXP_FRN
is the value at position2
of the ordered set which is1
using one-based indexing. - The result is
(CRN - RN) * EXP_FRN + (RN - FRN) * EXP_CRN
which is(3 − 2.2) × 1 + (2.2 − 2) × 2 = 0.8 + 0.4 = 1.2
.