NGRAM
Reference material for NGRAM function
This function takes an integer n
and a text sequence, then splits the sequence into
overlapping contiguous subsequences of length n
.
Syntax
Parameters
Parameter | Description | Datatype |
---|---|---|
<n> | An integer specifying the length of each n-gram. | INTEGER |
<text> | The text sequence to split into n-grams. | TEXT |
Return Types
ARRAY(TEXT)
- If any of the inputs is nullable, the result type is
ARRAY(TEXT) NULL
.
Behavior
The function splits the input text into overlapping contiguous subsequences of length n
.
- If
n
is smaller than the size of the input text, an array containing the single value of the input text is returned. - If
n
is smaller than 1, an error is thrown. - If any input is
NULL
, the result isNULL
regardless of the other input value.
Errors
An error is thrown if n
is smaller than 1.
Respect/Ignore Nulls
Propagates nulls: If any input is NULL
, the result is NULL
.
Examples
The following example generates 2-grams (bigrams) from the string ‘hello world’:
result (ARRAY(TEXT)) |
---|
{he,el,ll,lo,"o "," w",wo,or,rl,ld} |
The following example generates 3-grams (trigrams) from the string ‘hello world’:
result (ARRAY(TEXT)) |
---|
{hel,ell,llo,"lo ","o w"," wo",wor,orl,rld} |
The following example generates 1-grams (unigrams) from the string ‘hello’:
result (ARRAY(TEXT)) |
---|
{h,e,l,l,o} |
The following example generates 10-grams from the string ‘hi’. Since the string length matches the n-gram size, the result contains the entire string:
result (ARRAY(TEXT)) |
---|
{hi} |
The following example uses an n-gram size of 0, which is invalid and throws an error:
ERROR: Line 1, Column 8: Invalid n-gram size: 0. Must be greater than 0. Choose an n-gram size larger than 0 or NULL.
The following example uses a negative n-gram size, which is invalid and throws an error:
ERROR: Line 1, Column 8: Invalid n-gram size: -1. Must be greater than 0. Choose an n-gram size larger than 0 or NULL.
The following example generates 2-grams (bigrams) from the Japanese string ‘こんにちは’:
result (ARRAY(TEXT)) |
---|
{こん,んに,にち,ちは} |
The following example generates 2-grams (bigrams) from the string of emojis ’😊👍🎉‘:
result (ARRAY(TEXT)) |
---|
{😊👍,👍🎉} |