APPROX_COUNT_DISTINCT

Counts the approximate number of unique values that are not NULL. Internally, APPROX_COUNT_DISTINCT uses HyperLogLog (HLL) sketches. Calling APPROX_COUNT_DISTINCT returns the same value as calling HLL_COUNT_DISTINCT with precision 17.

Syntax

APPROX_COUNT_DISTINCT(<expression>)

Parameters

Parameter Description Supported input types
<expression> Expression on which to approximate the distinct count Any type

Return Type

BIGINT

Example

When aggregating on few distinct values, APPROX_COUNT_DISTINCT has no estimation error and returns exact results:

SELECT
    APPROX_COUNT_DISTINCT(number) as approximate,
    COUNT(DISTINCT number) as exact
FROM
    generate_series(1, 1000) r(number);

Returns:

approximate exact
1,000 1,000

NULL values do not change the result of APPROX_COUNT_DISTINCT:

SELECT
    APPROX_COUNT_DISTINCT(number) as approximate,
    COUNT(DISTINCT number) as exact
FROM
    (SELECT * FROM generate_series(1, 1000)
       UNION ALL
     SELECT NULL) r(number);

Returns:

approximate exact
1,000 1,000

As the number of distinct values grows, the result becomes an approximation:

SELECT
    APPROX_COUNT_DISTINCT(number) as approximate,
    COUNT(DISTINCT number) as exact
FROM
    generate_series(1, 50000) r(number);

Returns:

approximate exact
50,160 50,000

APPROX_COUNT_DISTINCT also works for compound types:

SELECT 
    APPROX_COUNT_DISTINCT(arr) as approximate,
    COUNT(DISTINCT arr) as exact
FROM 
    unnest([[1, 2], [3, 4], NULL, [NULL], [1, NULL]]) r(arr)

Returns:

approximate exact
4 4