HLL_COUNT_BUILD

Counts the approximate number of unique not NULL values, aggregating the values to HLL++ sketches represented as the BYTEA data type. Multiple sketches can be merged to a single sketch using the aggregate function HLL_COUNT_MERGE. To estimate the final distinct count value, the scalar function HLL_COUNT_ESTIMATE can be used. HLL_COUNT_BUILD uses the HLL++ algorithm and allows you to control the set sketch size precision, similar to HLL_COUNT_DISTINCT.

HLL_COUNT_BUILD requires less memory than exact count distinct aggregation, but also introduces statistical uncertainty. The default precision is 12, with a maximum of 20 set optionally.

Higher precision comes at a memory and performance cost.

Syntax

HLL_COUNT_BUILD(<expression> [, <precision> ])

Parameters

Parameter Description Supported input types
<expression> Any column name or function that return a column name. Any type
<precision> Optional literal integer value to set precision. If not included, the default precision is 12. Precision range: 12-20. INTEGER, BIGINT

Return Type

BYTEA

Example

CREATE TABLE data_to_count AS
SELECT *
FROM generate_series(0, 10000000, 3) a;

SELECT count(distinct a) AS accurate_count
FROM data_to_count;
accurate_count (BIGINT)
3333334
CREATE TABLE data_to_count2 AS
SELECT *
FROM generate_series(0, 10000000, 2) a;

SELECT count(distinct a) AS accurate_count
FROM data_to_count2;
accurate_count (BIGINT)
5000001
CREATE TABLE sketch_of_data_to_count AS
SELECT hll_count_build(a) a
FROM data_to_count;

INSERT INTO sketch_of_data_to_count
SELECT hll_count_build(a)
FROM data_to_count2;

SELECT hll_count_estimate(a) AS hll_estimate, a AS sketch
FROM sketch_of_data_to_count
ORDER BY 1;
hll_estimate (BIGINT) sketch (BYTEA)
3291008 \x2f41676772656761746546….
4948957 \x2f41676772656761746546….
SELECT hll_count_estimate(hll_count_merge(a)) AS hll_estimate
FROM sketch_of_data_to_count;
hll_estimate (BIGINT)
6606880