NGRAM

This function takes an integer n and a text sequence, then splits the sequence into overlapping contiguous subsequences of length n.

Syntax

NGRAM( <n>, <text> )

Parameters

Parameter Description Datatype
<n> An integer specifying the length of each n-gram. INTEGER
<text> The text sequence to split into n-grams. TEXT

Return Types

ARRAY(TEXT)

  • If any of the inputs is nullable, the result type is ARRAY(TEXT) NULL.

Behavior

The function splits the input text into overlapping contiguous subsequences of length n.

  • If n is smaller than the size of the input text, an array containing the single value of the input text is returned.
  • If n is smaller than 1, an error is thrown.
  • If any input is NULL, the result is NULL regardless of the other input value.

Errors

An error is thrown if n is smaller than 1.

Respect/Ignore Nulls

Propagates nulls: If any input is NULL, the result is NULL.

Examples

The following example generates 2-grams (bigrams) from the string ‘hello world’:

SELECT NGRAM(2, 'hello world') AS result;
result (ARRAY(TEXT))
{he,el,ll,lo,”o “,” w”,wo,or,rl,ld}

The following example generates 3-grams (trigrams) from the string ‘hello world’:

SELECT NGRAM(3, 'hello world') AS result;
result (ARRAY(TEXT))
{hel,ell,llo,”lo “,”o w”,” wo”,wor,orl,rld}

The following example generates 1-grams (unigrams) from the string ‘hello’:

SELECT NGRAM(1, 'hello') AS result;
result (ARRAY(TEXT))
{h,e,l,l,o}

The following example generates 10-grams from the string ‘hi’. Since the string length matches the n-gram size, the result contains the entire string:

SELECT NGRAM(10, 'hi') AS result;
result (ARRAY(TEXT))
{hi}

The following example uses an n-gram size of 0, which is invalid and throws an error:

SELECT NGRAM(0, 'hi') AS result;

ERROR: Line 1, Column 8: Invalid n-gram size: 0. Must be greater than 0. Choose an n-gram size larger than 0 or NULL.

The following example uses a negative n-gram size, which is invalid and throws an error:

SELECT NGRAM(-1, 'hi') AS result;

ERROR: Line 1, Column 8: Invalid n-gram size: -1. Must be greater than 0. Choose an n-gram size larger than 0 or NULL.

The following example generates 2-grams (bigrams) from the Japanese string ‘こんにちは’:

SELECT NGRAM(2, 'こんにちは') AS result;
result (ARRAY(TEXT))
{こん,んに,にち,ちは}

The following example generates 2-grams (bigrams) from the string of emojis ‘😊👍🎉’:

SELECT NGRAM(2, '😊👍🎉') AS result;
result (ARRAY(TEXT))
{😊👍,👍🎉}