How to load a CSV file from S3 to Redshift?

Road2Data
3 min readApr 13, 2021

S3 is the simple storage service offered by AWS. used to store the objects through a web service interface.

Widely used file storage service for the creation of data lake or storing raw data.

Redshift is a cloud database service offered by AWS for designing data warehouse(DWH) solutions.

When we build the BI solution around S3 and Redshift, Data moves to and fro between these two AWS analytical services quite often as one is the file service and another one is the database services.

Here is the step-by-step guide for loading a file from an S3 bucket to a Redshift table.

Step-1 Placing a file in S3 Bucket.

One AWS console and search for S3 service in the search box and select it.

AWS console

Create a new bucket by clicking on create button and follow the instruction.

Once we are done with the creation of the S3 bucket that we need to unload the CSV file in the bucket which we want to load into the redshift.

For this example, I am taking a simple CSV with two fields id and value as below and will load them into the S3 bucket.

CSV file data
Click on Add file and select the CSV from your local machine.

Step-2 Creation of credentials for accessing the S3 bucket objects.

Search for "Iam" service in AWS console, click on Users, Add user as like below screenshot.

User creation page on IAM service

Select the privacy policy for the user so that the user can access the s3 bucket.

Permission page for new user creation.

Once We are finished the user creation in IAM then it will generate a secret id and key for accessing AWS services, please save them in your local machine or the secret manager. In the next step, We will use the redshift copy command for loading CSV into the redshift table.

Step 3 Loading data into the redshift.

copy the S3 URL of the CSV file.

For loading data into redshift from S3, we have a COPY command in the redshift. Like the below example, we can use the copy command in redshift.

-- Table Creation DDL
CREATE TABLE sandbox.test
(
id INTEGER,
value INTEGER
);
-- Copy Command
copy sandbox.test -- table name
from 's3://ayush-test-2021/test.csv' --s3 url path
ACCESS_KEY_ID ' 1234 ' -- access key for the user created in IAM
SECRET_ACCESS_KEY ' asdfg '
/*secret key for the user created in IAM */
IGNOREHEADER 1 -- if you have header in your csv file
DELIMITER ','; -- column delimiter

All the errors related to copy command stored in the stl_load_errors system table.

Do let me know in the comments if you have any issues related to loading a CSV into redshift.

--

--

Road2Data

A data enthusiast, Having 8 years of experience in Data Warehousing and Business Intelligence field, Currently working as Senior BI developer at Home24 Berlin.