Amazon S3 (Simple Storage Service) is the leading object storage platform for cloud-native apps, data lakes, backups, and archives. Its protocol underpins most other object storage providers too.
S3 is a powerful service, but one challenge you can face is how to move data in and out of your storage buckets. When S3 is used for large amounts of data, such as backups, you need a way to efficiently transfer your content and then restore it after an incident.
In this article, we’ll show how to use the AWS CLI s3 sync command to easily synchronize an S3 bucket to and from a local directory.
We will cover:
The aws s3 sync
command is part of the AWS CLI. It “synchronizes” directories to and from S3 by recursively copying files and subdirectories.
To use the command, you must specify a source and destination. New or changed files will be copied from the source (whether a local directory or an S3 path) to the destination, ensuring complete replication while offering good performance even for large directories.
Although the command performs a recursive copy, it ignores empty directories. Empty folders in the source path will not be copied to the destination; it’s not possible to disable this feature.
aws s3 cp vs aws s3 sync
The aws s3 cp
command is an alternative way to move content between your local machine and an S3 bucket.
cp
is mainly used to copy individual files. However, it can also copy entire folders when the --recursive
flag is used. This has a similar result to sync
but the copy behavior is different:
aws s3 cp --recursive
– Copies all files and folders from the source to the destination, even if they already exist in the destination. Existing files will be overwritten. Deletion of destination files that no longer exist in the source is not possible.aws s3 sync
– Before copying, the destination’s content is inspected to determine which files already exist. Only the new or changed files from the source will be copied to the destination. Deletion of destination files that no longer exist in the source can be optionally enabled.
The sync
command should be used in preference to cp
when you’re copying large directories that already exist in your S3 bucket. Only the changed files will be copied, improving performance and reducing your transfer costs.
Here’s a comparison table you can reference when deciding which command to use:
cp |
sync |
|
Copied files | All files in the source location | New and changed files in the source location |
Deletion of removed files | Not supported | Optional (must be enabled) |
Performance | Good for individual files and new directories | Good for large directories that already exist in the destination |
Cost | Optimal for individual files and new directory structures (no need to check the destination to determine if the content exists) | Optimal for large directories that already exist in the destination (no wasted transfer of existing files) |
Use case | Copy operations where the destination doesn’t exist or its content should be replaced | Keeping the source and destination synchronized when files are incrementally added to the source |
The sync
command can be used whenever you need to mirror a local directory to an S3 bucket, or vice versa, without wastefully replacing files that already exist in the destination.
A few common use cases include:
- Creating backups – You can easily synchronize local directories to S3 to create a remote backup. Only the files that have changed since the last backup will be copied.
- Uploading websites to S3 static hosting – S3 buckets can be used to host static websites. The
sync
command will upload the website files produced by your static site generator while preserving any unchanged assets. - Downloading the contents of an S3 bucket – Sometimes, you might need to create a local copy of an S3 bucket, either to more conveniently inspect the bucket’s content or so you can transfer it to another service. Using
sync
ensures any files that already exist on your machine won’t be unnecessarily replaced. - Synchronizing two different S3 buckets – You can use the command to synchronize two S3 buckets. This can be helpful if you need to make a separate clone or backup of a bucket.
You’ll need the AWS CLI on your system to follow along with this tutorial.
You can use Docker to quickly get started without manually installing the CLI:
$ docker run --rm -it --entrypoint bash amazon/aws-cli:latest
Run the aws configure
command to interactively supply your credentials to the CLI. You’ll be prompted to enter your Access Key ID and Secret Access Key.
You can generate new credentials in the AWS Console by creating an IAM user and assigning it S3-related policies.
Next, use the CLI to create two new S3 buckets for demonstration purposes:
$ aws s3 mb s3://demo-bucket.example.com
make_bucket: demo-bucket.example.com
$ aws s3 mb s3://demo-bucket-2.example.com
make_bucket: demo-bucket-2.example.com
Bucket names must be unique across all AWS users. Change example.com to your own domain to avoid name collisions.
You can create buckets using either the mb sub-command, shown here, or s3api create-bucket.
Finally, create a few local files ready to synchronize to S3:
$ mkdir demo-content
$ mkdir demo-content/files
$ touch demo-content/foo
$ touch demo-content/bar
$ touch demo-content/files/example
Check out also how to create and manage an AWS S3 bucket using Terraform.
Example 1: Sync a local directory to S3
The basic s3 sync
syntax is as follows:
aws s3 sync <source> <destination>
<source>
and <destination>
can be either a local filesystem path or an S3 URI in s3://bucket/folder
form.
To synchronize your local directory to S3, you can run the following command:
$ aws s3 sync demo-content s3://demo-bucket.example.com
upload: demo-content/bar to s3://demo-bucket.example.com/bar
upload: demo-content/foo to s3://demo-bucket.example.com/foo
upload: demo-content/files/example to s3://demo-bucket.example.com/files/example
If you list the bucket’s content, you’ll see your files are now available:
$ aws s3 ls demo-bucket.example.com
PRE files/
2023-08-25 08:44:21 0 bar
2023-08-25 08:44:22 0 foo
If you repeat the sync operation again, you’ll see that no files upload:
$ aws s3 sync demo-content s3://demo-bucket.example.com
This is because no changes have occurred to the source directory. You can create a new file and try another sync to observe that only new files are uploaded:
$ touch demo-content/new
$ aws s3 sync demo-content s3://demo-bucket.example.com
upload: demo-content/new to s3://demo-bucket.example.com/new
Example 2: Download from S3 to a local directory
The same syntax can be used to move files in the opposite direction, from S3 to your machine.
To download from S3 to a local directory you can run:
$ aws s3 sync s3://demo-bucket.example.com demo-bucket-download
download: s3://demo-bucket.example.com/bar to demo-bucket-download/bar
download: s3://demo-bucket.example.com/foo to demo-bucket-download/foo
download: s3://demo-bucket.example.com/files/example to demo-bucket-download/files/example
Your files will now be available in the demo-bucket-download
folder within your working directory:
$ ls demo-bucket-download
bar files foo
$ ls demo-bucket-download/files
example
Example 3: Synchronise two S3 buckets
To synchronize content between buckets, use an S3 URI as both the source and destination paths:
$ aws s3 sync s3://demo-bucket.example.com s3://demo-bucket-2.example.com
Now the files exist in both buckets:
$ aws s3 ls demo-bucket.example.com
PRE files/
2023-08-25 08:44:21 0 bar
2023-08-25 08:44:22 0 foo
2023-08-25 08:44:46 0 new
$ aws s3 ls demo-bucket-2.example.com
PRE files/
2023-08-25 08:47:39 0 bar
2023-08-25 08:47:39 0 foo
2023-08-25 08:47:39 0 new
Example 4: Allow deletions at the destination
The sync
command does not delete anything from the destination by default.
You can optionally enable this behavior with the --delete
flag. It will remove any destination files that no longer exist in the source location.
$ rm demo-content/new
$ aws s3 sync demo-content s3://demo-bucket.example.com --delete
delete: s3://demo-bucket.example.com/new
s3 sync
supports many different options that change copy behavior and customize the attributes of created files. Here’s a quick look at some of the most useful capabilities.
How to include and exclude files with S3 sync
You can include and exclude file paths with the --include
and --exclude
flags. These support UNIX-style wildcards that determine which files will be considered as part of the sync
operation. The flags can be repeated multiple times in a single sync
command.
# Copy only HTML files
$ aws s3 sync <source> <destination> --exclude="*" --include="*.html"
The flags are applied in the order they appear–-later flags override previous ones.
How to use an S3 sync dry run
You can perform a dry run to see what changes would be made by a sync
operation without actually applying them to the destination:
$ aws s3 sync <source> <destination> --dryrun
The regular s3 sync command output will be shown in your terminal so you can check your options are correct before anything is transferred.
How to disable S3 sync symlink resolution
Symlinks are automatically followed when uploading to S3 from your filesystem.
If this is undesirable, you can disable symlink resolution by setting the --no-follow-symlinks
flag. This will ensure that files and folders in the linked path don’t appear in S3.
$ aws s3 sync <source> <destination> --no-follow-symlinks
How to set the ACL for synced S3 files
S3 supports several predefined ACLs that can be used to control access to uploaded files. Policies are available for common use cases including private
, public-read
, public-read-write
, and bucket-owner-full-control
.
To set the ACL on newly synced files, pass the desired policy’s name to the --acl
flag:
$ aws s3 sync <source> <destination> --acl=private
How to enable server-side encryption for synced S3 files
The --sse
flag enables server-side encryption on the S3 files that sync
creates. Set aws:kms
as the value to use your AWS-managed key from the AWS Key Management Service:
$ aws s3 sync <source> <destination> --sse=aws:kms
A specific KMS key can be selected with the --sse-kms-key-id
flag:
$ aws s3 sync <source> <destination> --sse=aws:kms --sse-kms-key-id=<id>
How to set the storage class for synced S3 files
S3 storage classes determine the performance, pricing, and access frequency restrictions for your files. The default standard
class covers most regular use cases, but alternative classes such as Glacier or Deep Archive can be more optimal for long-term retention of infrequently retrieved objects.
The --storage-class
flag allows you to set the storage class to apply to newly synced files:
$ aws s3 sync <source> <destination> --storage-class=GLACIER
The aws s3 sync
command should be your go-to tool when you need to synchronize a local directory and an S3 bucket. It can also be used to synchronize two existing S3 buckets.
Sync is a recursive operation that matches the content of the source and destination. However, deletion of redundant files from the destination is optional so you must remember to set the --delete
flag if you need this behavior.
Sync is useful in a variety of scenarios where an S3 bucket acts as a mirror of an existing directory, such as for backups, or when seeding a new bucket with initial content. Sync is also an effective way to create a local copy of an S3 bucket, ready to move elsewhere or transfer to another provider.
Does your organization have extra compliance concerns? Here you can learn more about self-hosting Spacelift in AWS, to ensure your organization’s compliance, control ingress, egress, internal traffic, and certificates, and have the flexibility to run it within GovCloud.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.