COPY FROM
Copies data from S3 to firebolt table.
- Topic ToC
Syntax
COPY
[INTO] <table_name> [ <column_mapping> ]
FROM <externalLocation>
[ WHERE <condition> ]
[ LIMIT <count> ]
[ OFFSET <start> ]
[ WITH <options> ]
<column_mapping>:
( <column_name> [DEFAULT <default_value>] [ { $<source_column_index> | <source_column_name> } ] [, ...] )
<options>:
[ CREDENTIALS = ( <credentials> ) ]
[ PATTERN = <regex_pattern> ]
[ TYPE = { **AUTO** | CSV | TSV | PARQUET } ]
[ AUTO_CREATE = { **TRUE** | FALSE } ]
[ ERROR_FILE = <externalLocation> ]
[ ERROR_FILE_CREDENTIALS = <credentials> ]
[ MAX_ERRORS_PER_FILE = { integer | '<percentage>' } ]
[ <csv_options> ]
<credentials>:
{ AWS_KEY_ID = '<aws_key_id>' AWS_SECRET_KEY = '<aws_secret_key>' }
<csv_options>:
[ HEADER = { **FALSE** | TRUE } ]
[ DELIMITER = 'character' ]
[ QUOTE = { **DOUBLE_QUOTE** | SINGLE_QUOTE } ]
[ ESCAPE = 'character' ]
[ NULL_STRING = 'string' ]
[ EMPTY_FIELD_AS_NULL = { **TRUE** | FALSE } ]
[ SKIP_BLANK_LINES = { **FALSE** | TRUE } ]
[ DATE_FORMAT = <date_format> ]
[ TIMESTAMP_FORMAT = <timestamp_format> ]
Parameters
Parameter | Description |
---|---|
<table_name> | The name of the target table. |
<column_mapping> | Optional. Only available of target table exists. Specify mapping between src schema and target schema. With this feature one can choose for each column in the target table a column from src files by name or by index. |
<column_name> | The name of a target column in the table. If <source_column_index/name> not specified, the src columns will be mapped by this name. |
<default_value> | A value that replace each NULL value generated by the src. This value must be convertible to the target column datatype. |
<source_column_index> | The index of the src column that will be mapped (same index for all the files). This value range is [1,2^64-1]. It needs to follow the $ character. |
<source_column_name> | The name of the src column that will be mapped (same name for all the files). |
<externalLocation> | The path to an S3 location where the query result file or files are saved. If ends with / it interpreted as directory otherwise as single file. Folder example: s3://my_bucket/my_folder/ . File example: s3://my_bucket/my_folder/my_file . |
CREDENTIALS | The Amazon S3 credentials for accessing the specified <externalLocation> . See CREDENTIALS. |
PATTERN | A regular expression pattern string (glob). For additional explenation visit: https://en.wikipedia.org/wiki/Glob_(programming). |
TYPE | Specify the file type Firebolt expects to ingest given the PATTERN . If not specified it will be auto detected by the suffix of a file. If a file referenced using PATTERN does not conform to the specified TYPE , an error occurs. |
AUTO_CREATE | Specify if the table can be automatically created by working alongside with automatic schema discovery. By default, automatic table creation is enabled (value: TRUE). If value is TRUE and the target table already exists, AUTO_CREATE is ignored. |
ERROR_FILE | Defines S3 location where rejected records will be stored. See <externalLocation> parameter definition above for format details. By default or if the specified path doesn’t exist, no error files will be created. If specified a child directory is created based on the time of load submission in the format YearMonthDay -HourMinuteSecond + QueryID (Ex. 20220330-173205/ |
ERROR_FILE_CREDENTIALS | The Amazon S3 credentials for accessing the specified <externalLocation> for error file creation. See CREDENTIALS. |
MAX_ERRORS_PER_FILE | Specify the maximum number of rejected rows per file. Can be integer or literal text in the format ‘integer%’ eg. ‘100%’. By default no errors are allowed (0). If ‘100%’ is given then all errors are allowed. If the threshold is passed an error will be shown. |
HEADER | Specify that the file contains a header line with the names of each column in the file. If set to true the first line will be interpreted as file schema column names. |
DELIMITER | Specify the delimiter character between fields. By default is , . |
QUOTE | Specify the quote character. By default: DOUBLE_QUOTE (" ). If specified SINGLE_QUOTE it means ' |
ESCAPE | Specify the escape character. By default is the QUOTE character. |
NULL_STRING | Specify the null string. By default is empty which means no null string. |
EMPTY_FIELD_AS_NULL | Specify whether empty fields are treated as null. |
SKIP_BLANK_LINES | Specify whether to skip blank lines. |
DATE_FORMAT | Specify the date format for parsing text into date columns. Will apply only and for all columns that will be ingested to date columns. For supported formats see formats in TO_DATE |
TIMESTAMP_FORMAT | Specify the timestamp format for parsing text into timestamp columns. Will apply only and for all columns that will be ingested to timestamp columns. For supported formats see formats in TO_TIMESTAMP |
notes: Non-existing columns: By default if a column does not exist in the source file it will produce nulls. For CSV format it applies to missing fields as well.
CSV: new line sequence will be \n
or \n\r
or \r
.
Examples
COPY all files under a directory, including all columns in the files
COPY INTO public.games FROM 's3://Bucket/directory/';
COPY with explicit list of columns
COPY INTO test_table (c1 DEFAULT 34, c3 $6) FROM 's3://Bucket/directory/' WITH
TYPE = CSV CREDENTIALS = '...' QUOTE = DOUBLE_QUOTE;
COPY without auto-schema discovery
COPY INTO test_table FROM 's3://Bucket/directory/' WITH
TYPE = PARQUET CREDENTIALS = '...' QUOTE = DOUBLE_QUOTE AUTO_CREATE = FALSE;