COPY TO
Copies (exports or unloads) the results of a SELECT
query to an Amazon S3 location in the file format that you specify.
Syntax
COPY (<select_query>)
TO '<location>'
CREDENTIALS = <aws_credentials>
[ TYPE = CSV | TSV | JSON | PARQUET ]
[ COMPRESSION = GZIP | SNAPPY | NONE ]
[ INCLUDE_QUERY_ID_IN_FILE_NAME = TRUE | FALSE ]
[ FILE_NAME_PREFIX = <file_string> ]
[ SINGLE_FILE = FALSE | TRUE ]
[ MAX_FILE_SIZE = <bytes> ]
[ OVERWRITE_EXISTING_FILES = FALSE | TRUE ]
Parameters
Parameter | Description |
---|---|
<select_query> | Any valid SELECT statement. |
<location> | The path to an S3 location where the query result file or files are saved. For example, s3://my_bucket/my_folder . |
CREDENTIALS | The Amazon S3 credentials for accessing the specified <s3_location> . See CREDENTIALS below. |
TYPE | Specifies the file type to save to S3. If omitted, CSV is the default. |
COMPRESSION | Specifies the compression algorithm to use for the output files. If omitted, defaults to GZIP. If NONE is specified, exported files are not compressed. For TYPE=PARQUET , COMPRESSION can be set to NONE , the default GZIP , or SNAPPY . Each output file is written in Parquet format, using the specified algorithm to compress each column according to the Parquet specification. This kind of compression is internal to the Parquet file. For TYPE=CSV , TSV , or JSON , COMPRESSION can be set to NONE or the default GZIP . With GZIP , the file is first created in the specified format and then compressed into a GZIP file. |
INCLUDE_QUERY_ID_IN_FILE_NAME | Specifies whether a query ID is included in the file name. Each time the statement runs, Firebolt generates a new query ID. If omitted, defaults to TRUE , and Firebolt saves file names using the pattern <query_id>.[type].gz , for example, 123ABCXY2.csv.gz . This allows Firebolt to generate unique file names by default. If FALSE , and FILE_NAME_PREFIX is not specified, files are exported with a generic output file name, for example, output.csv.gz . |
FILE_NAME_PREFIX | Specifies an optional string to use in the file name. If FILE_NAME_PREFIX is omitted and INCLUDE_QUERY_ID_IN_FILE_NAME is TRUE , the exported file name is in the pattern <query_id><file_string>.<type>.gz . If FILE_NAME_PREFIX is specified and INCLUDE_QUERY_ID_IN_FILE_NAME is set to FALSE , the specified string replaces output and the file is in the pattern <file_string>.<type>.gz . |
SINGLE_FILE | Specifies whether the export should be a single file or multiple files. If omitted, the default is FALSE , and the export is split based on the MAX_FILE_SIZE value. Exported files are appended with _<n> incrementally to indicate the position in series, starting with 0 . For example, the first file in a series might be named 123ABCXY2_0.parquet.gz . If TRUE , only a single file is written. If set to TRUE and the file exceeds MAX_FILE_SIZE , an error occurs. |
MAX_FILE_SIZE | Specifies the max file size in bytes. If omitted, the default value is 16000000 (16 MB). The maximum file size that can be specified is 5000000000 (5 GB). |
OVERWRITE_EXISTING_FILES | Specifies whether exported files should overwrite existing files of the same name in the specified S3 location. If omitted, defaults to FALSE , and files of the same name are not overwritten. |
CREDENTIALS
Firebolt needs permissions to write query results to the specified S3 location. You can specify IAM credentials using the AWS access keys, and the specified credentials must be associated with a user with permissions to write objects to the bucket.
Specifying AWS access keys
Specify access key credentials using the syntax shown below.
CREDENTIALS = (AWS_KEY_ID = '<aws_key_id>' AWS_SECRET_KEY = '<aws_secret_key>')
<aws_key_id>
is the AWS access key id associated with a user or role, for example,AKIAIOSFODNN7EXAMPLE
.<aws_secret_key>
is the AWS secret key, for example,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
.
For more information on how to create access keys, see Creating Access Key and Secret ID.
Least privileged permissions
The example AWS IAM policy statement below demonstrates the minimum actions that must be allowed for Firebolt to write query files to an example S3 location. A permissions policy that allows at least these actions for the <s3_location>
that you specify in the COPY TO
statement must be attached to the user or role specified in the CREDENTIALS
clause.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::my_s3_bucket",
"arn:aws:s3:::my_s3_bucket/*"
]
}
]
}
Examples
- COPY TO with defaults and role ARN
- COPY TO with single file and AWS access keys
- COPY TO with custom file name prefix and ARN with external ID
- COPY TO with custom file name and overwrite set to TRUE
COPY TO with defaults and role ARN
The example below shows a COPY TO
statement with minimal parameters that specifies an AWS_ROLE_ARN
. Because TYPE
is omitted, the file or files will be written in CSV format, and because COMPRESSION
is omitted, they are compressed using GZIP (*.csv.gz
).
COPY (SELECT * FROM test_table)
TO 's3://my_bucket/my_fb_queries'
CREDENTIALS = (AWS_ROLE_ARN='arn:aws:iam::123456789012:role/my-firebolt-role');
Firebolt assigns the query ID 16B903C4206098FD
to the query at runtime. The compressed output is 40 MB, exceeding the default of 16 MB, so Firebolt writes 4 files as shown below.
s3://my_bucket/my_fb_queries/
16B903C4206098FD_0.csv.gz
16B903C4206098FD_1.csv.gz
16B903C4206098FD_2.csv.gz
16B903C4206098FD_3.csv.gz
COPY TO with single file and AWS access keys
The example below exports a single, uncompressed JSON file with the same name as the query ID. If the file to be written exceeds the specified MAX_FILE_SIZE
(5 GB), an error occurs. AWS access keys are specified for credentials.
COPY (SELECT * FROM test_table)
TO 's3://my_bucket/my_fb_queries'
TYPE=JSON
COMPRESSION=NONE
CREDENTIALS=(AWS_KEY_ID='AKIAIOSFODNN7EXAMPLE' AWS_SECRET_KEY='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
INCLUDE_QUERY_ID_IN_FILE_NAME=TRUE
SINGLE_FILE=TRUE
MAX_FILE_SIZE=5000000000;
Firebolt writes a single file as shown below.
s3://my_bucket/my_fb_queries/
16B90E96716236D0.json
COPY TO with custom file name prefix and ARN with external ID
In the example below, because FILE_NAME_PREFIX
parameter is specified, Firebolt adds a string to the query ID to form file names. The IAM role specified for CREDENTIALS
also has an external ID configured in AWS, so the ID is specified.
COPY (SELECT * FROM test_table)
TO 's3://my_bucket/my_fb_queries'
TYPE=JSON
COMPRESSION=NONE
CREDENTIALS = (AWS_ROLE_ARN='arn:aws:iam::123456789012:role/my-firebolt-role' AWS_ROLE_EXTERNAL_ID='99291')
FILE_NAME_PREFIX='_query_result';
Firebolt assigns the query an id of 16B90E96716236D6
at runtime and writes files as shown below.
s3://my_bucket/my_fb_queries/
16B90E96716236D6_query_result_0.json
16B90E96716236D6_query_result_1.json
16B90E96716236D6_query_result_2.json
16B90E96716236D6_query_result_3.json
16B90E96716236D6_query_result_4.json
COPY TO with custom file name and overwrite set to TRUE
In the example below, INCLUDE_QUERY_ID_IN_FILE_NAME
is set to FALSE
and a FILE_NAME_PREFIX
is specified. In this case, Firebolt writes files using only the specified FILE_NAME_PREFIX
as the file name. In addition, SINGLE_FILE
and OVERWRITE_EXISTING_FILES
are set to TRUE
so that the S3 location always contains a single file with the latest query results.
COPY (SELECT * FROM test_table)
TO 's3://my_bucket/my_fb_query'
TYPE=JSON
COMPRESSION=NONE
CREDENTIALS=(AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-firebolt-role')
INCLUDE_QUERY_ID_IN_FILE_NAME=FALSE
SINGLE_FILE=TRUE
FILE_NAME_PREFIX='latest-fb-query-result'
MAX_FILE_SIZE=5000000000
OVERWRITE_EXISTING_FILES=TRUE;
Firebolt writes a single file as shown below.
s3://my_bucket/my_fb-query/
latest-fb-query-result.json