LIST_OBJECTS
A table-valued function (TVF) that lists objects and their metadata from Amazon S3. The function can use either a location object (recommended) or direct credentials to access the data. Each row in the output contains four columns: object_name
(TEXT
), object_type
(TEXT
), object_bytes
(BIGINT
), and file_timestamp
(TIMESTAMPTZ
).
Syntax
-- Using location object (recommended)
LIST_OBJECTS (
LOCATION => location_name
)
|
-- Using static credentials
LIST_OBJECTS (
URL => <url>
[, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
[, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
[, AWS_SESSION_TOKEN => <aws_session_token>]
[, AWS_ROLE_ARN => <aws_role_arn>]
[, AWS_ROLE_EXTERNAL_ID => <aws_role_externakl_id>]
)
Parameters
Parameter | Description | Supported input types |
---|---|---|
LOCATION | The name of a location object that contains the S3 URL and credentials. This is the recommended approach. See CREATE LOCATION for details. | IDENTIFIER |
URL | The location of the Amazon S3 bucket containing your files. The expected format is s3://{bucket_name}/{full_file_path} . | TEXT |
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
Return Type
The output is a table with four columns:
object_name
(TEXT
)object_type
(TEXT
)object_bytes
(BIGINT
)last_modified
(TIMESTAMPTZ
)
Column Descriptions
object_name
: Contains both the full path and the file extension.-
object_type
: Can be either “file” or “folder”.-
If
object_type
= “folder”, theobject_bytes
andlast_modified
columns will containNULL
values, as folders do not have associated sizes or timestamps. -
If
object_type
= “file”, the following apply:- The
last_modified
column is populated from theLastModified
attribute in Amazon S3. Note that AWS does not expose the creation timestamp, so the values in this column only differ from the creation time if an immutable object has been overwritten. - The
object_bytes
column contains the size of the file in bytes.
- The
-
Amazon S3 is not a traditional filesystem. In AWS, what is commonly referred to as “folders” is called “common_prefixes,” and what are typically considered “files” are referred to as “objects.”
Examples
Using LOCATION object to store credentials
Best practice
Firebolt recommends using a LOCATION
object to store credentials for authentication.
The following code example retrieves all objects from the specified LOCATION
using the LIST_OBJECTS
function:
SELECT * FROM LIST_OBJECTS(
LOCATION => my_location
);
Returns:
object_name | object_type | object_bytes | last_modified |
---|---|---|---|
data/file1.csv | file | 1421277 | 2023-02-27 10:49:13+01 |
data/folder1/ | folder | NULL | NULL |
Using static credentials
The following code examples show how to list all objects (folders) that start with a specified prefix within an Amazon S3 bucket. You can specify either a part of the prefix or the full prefix. For example, the URLs ending with the prefixes fire
or firebolt_sample_dataset
both return identical results because both are valid matches for the firebolt_sample_dataset
folder as follows:
SELECT * FROM LIST_OBJECTS(url => 's3://firebolt-publishing-public/help_center_assets/fire')
and
SELECT * FROM LIST_OBJECTS(url => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset')
Both return:
object_name (TEXT) | object_type (TEXT) | object_bytes (BIGINT) | last_modified (TIMESTAMPTZ) |
---|---|---|---|
help_center_assets/firebolt_sample_dataset/ | folder | NULL | NULL |
The following code example shows how to list all objects such as files, folders, and associated metadata, that start with a specified prefix in an Amazon S3 bucket:
SELECT * FROM LIST_OBJECTS(url => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/')
Returns:
object_name (TEXT) | object_type (TEXT) | object_bytes (BIGINT) | last_modified (TIMESTAMPTZ) |
---|---|---|---|
help_center_assets/firebolt_sample_dataset/players.json | file | 1,421,277 | 2023-02-27 10:49:13+01 |
help_center_assets/firebolt_sample_dataset/levels.csv | file | 83,596 | 2023-02-27 11:06:52+01 |
help_center_assets/firebolt_sample_dataset/tournaments.csv | file | 83,351 | 2022-12-15 15:34:14+01 |
help_center_assets/firebolt_sample_dataset/games.json | file | 872 | 2023-02-27 13:18:54+01 |
help_center_assets/firebolt_sample_dataset/rankings/ | folder | NULL | NULL |
help_center_assets/firebolt_sample_dataset/playstats/ | folder | NULL | NULL |
The following code examples show how to list all objects such as files and associated metadata, that start with a specified prefix in an Amazon S3 bucket. The URLs ending with the prefixes lev
or levels.csv
return identical results because both are valid matches for the levels.csv
file as follows:
SELECT * FROM LIST_OBJECTS(url => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/lev')
and
SELECT * FROM LIST_OBJECTS(url => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/levels.csv')
Both Return:
object_name (TEXT) | object_type (TEXT) | object_bytes (BIGINT) | last_modified (TIMESTAMPTZ) |
---|---|---|---|
help_center_assets/firebolt_sample_dataset/levels.csv | file | 83,596 | 2023-02-27 11:06:52+01 |
The following code example shows how to use your Amazon credentials to list objects in an Amazon S3 bucket that is not publicly accessible:
SELECT * FROM LIST_OBJECTS(url => 's3://example_bucket/foo.csv', AWS_ACCESS_KEY_ID => 'my_key_id', AWS_SECRET_ACCESS_KEY => 'my_secret_key')
Returns:
object_name (TEXT) | object_type (TEXT) | object_bytes (BIGINT) | last_modified (TIMESTAMPTZ) |
---|---|---|---|
foo.csv | file | 10 | 2024-06-12 9:30:00 UTC |