Creates a user-defined function (UDF) that extends Firebolt’s SQL capabilities with custom Python logic. UDFs allow you to encapsulate complex business logic, data transformations, or calculations that can be reused across queries. User-defined functions execute Python code within a secure, isolated environment. Each function invocation processes data row-by-row by default, or in batches when using vectorized execution with Pandas or PyArrow.Documentation Index
Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
Use this file to discover all available pages before exploring further.
Syntax
Parameters
| Parameter | Description |
|---|---|
<function_name> | A unique identifier for the function. Function names are case-insensitive and cannot match existing built-in function names. |
<parameter_name> | The name of a function parameter. Parameters are passed to the Python function by name. |
<data_type> | The SQL data type of the parameter. See Supported data types for the list of supported types. |
<return_type> | The SQL data type returned by the function. Must be one of the supported data types. |
LANGUAGE PYTHON | Specifies Python as the implementation language. Currently, only Python is supported. |
PACKAGES | An optional array of pip packages to install, specified as '<package>[==<version>]'. For example: PACKAGES ['numpy', 'pandas==2.2.3']. |
SET <option>=<value> | Optional configuration settings. See Configuration options for available options. |
<python_code> | The Python code implementing the function. Must contain either a single function or a function named process. |
Configuration options
| Option | Description |
|---|---|
memory_limit_mb | Maximum memory in megabytes allocated to the UDF worker. Default is 1024 MB (1 GB). |
memory_limit | Maximum memory in bytes allocated to the UDF worker. Cannot be used together with memory_limit_mb. |
description | A text description of the function, visible in information_schema.routines. |
Supported data types
The following SQL data types are supported for UDF parameters and return values:| SQL Type | Python Type | Notes |
|---|---|---|
INT / INTEGER | int | 32-bit signed integer |
BIGINT | int | 64-bit signed integer |
REAL | float | 32-bit floating point |
DOUBLE / DOUBLE PRECISION | float | 64-bit floating point |
DECIMAL(p, s) | decimal.Decimal | Fixed-point decimal |
BOOLEAN | bool | True or False |
TEXT | str | Variable-length string |
BYTEA | bytes | Binary data |
DATE | datetime.date | Calendar date |
TIMESTAMP | datetime.datetime | Date and time without timezone |
TIMESTAMPTZ | datetime.datetime | Date and time with timezone |
ARRAY(<type>) | list | Array of any supported type, including nested arrays |
GEOGRAPHY, STRUCT.
Python code requirements
The Python code must contain either a single function definition or a function namedprocess. This function receives the SQL parameters as Python arguments and returns the result.
- Function naming: If your code contains multiple functions, the entry point must be named
process. If there is only one function, it can have any name. - Parameter names: Python parameter names do not need to match SQL parameter names, but they are passed positionally in the order defined.
- Return value: Return the computed value directly. Return
Noneto represent SQLNULL. - NULL handling: NULL SQL values are passed as Python
None. Your code should handleNonevalues appropriately.
Examples
Basic arithmetic function
The following example creates a function that adds two integers:8
Function with no parameters
Using external packages
The following example uses NumPy to compute the percentile of an array:12.9
Working with dates
2026-02-02
Setting memory limit and description
Vectorized UDFs
For improved performance when processing large datasets, you can write vectorized UDFs that process data in batches using PyArrow or Pandas. Vectorized UDFs receive entire columns of data at once instead of individual values. Unlike scalar UDFs, the Python function in a vectorized UDF must always have a single argument annotated with typepandas.DataFrame or pyarrow.RecordBatch.
This type annotation is what distinguishes vectorized UDFs from scalar UDFs.
Pandas vectorized UDF
A Pandas vectorized UDF receives apandas.DataFrame and must return a pandas.Series:
Column names and positions in the DataFrame match the parameter names and order from the SQL function declaration.
You can access columns by name (e.g., df['param_name']) or by position (e.g., df.iloc[:, 0]), where position 0 corresponds to the first parameter, position 1 to the second, and so on.
PyArrow vectorized UDF
A PyArrow vectorized UDF receives apyarrow.RecordBatch and must return a pyarrow.RecordBatch:
Vectorized UDFs automatically handle NULL values according to the library’s conventions.
Viewing user-defined functions
Queryinformation_schema.routines to view all user-defined functions:
Limitations
- Language: Only Python is supported. Other languages such as JavaScript or WebAssembly are not available.
- Unsupported types:
GEOGRAPHYandSTRUCTtypes cannot be used as parameters or return types. - Schema restriction: User-defined functions can only be created in the
publicschema. Custom schemas are not supported. - Name conflicts: Function names cannot match existing built-in function names.
- Overloading: Function overloading (multiple functions with the same name but different signatures) is not supported. Each function name must be unique.
- Package availability: Only packages available on PyPI can be installed. Private or custom packages are not supported.
- Execution context: UDFs run in isolated containers and cannot access external network resources, the file system, or other system resources.