[Demo Webinar] ⛏️ How to build a user-friendly infra self-service portal with Spacelift

AWS S3 Sync Command – Guide with Examples

23 Oct 2023·11 min read

Reviewed by: Paweł PiwoszPaweł Piwosz

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

Amazon S3 (Simple Storage Service) is the leading object storage platform for cloud-native apps, data lakes, backups, and archives. Its protocol underpins most other object storage providers too.

S3 is a powerful service, but one challenge you can face is how to move data in and out of your storage buckets. When S3 is used for large amounts of data, such as backups, you need a way to efficiently transfer your content and then restore it after an incident.

In this article, we’ll show how to use the AWS CLI s3 sync command to easily synchronize an S3 bucket to and from a local directory.

We will cover:

What is aws s3 sync command

The aws s3 sync command is part of the AWS CLI. It “synchronizes” directories to and from S3 by recursively copying files and subdirectories.

To use the command, you must specify a source and destination. New or changed files will be copied from the source (whether a local directory or an S3 path) to the destination, ensuring complete replication while offering good performance even for large directories.

Although the command performs a recursive copy, it ignores empty directories. Empty folders in the source path will not be copied to the destination; it’s not possible to disable this feature.

aws s3 cp vs aws s3 sync

The aws s3 cp command is an alternative way to move content between your local machine and an S3 bucket.

cp is mainly used to copy individual files. However, it can also copy entire folders when the --recursive flag is used. This has a similar result to sync but the copy behavior is different:

aws s3 cp --recursive – Copies all files and folders from the source to the destination, even if they already exist in the destination. Existing files will be overwritten. Deletion of destination files that no longer exist in the source is not possible.
aws s3 sync – Before copying, the destination’s content is inspected to determine which files already exist. Only the new or changed files from the source will be copied to the destination. Deletion of destination files that no longer exist in the source can be optionally enabled.

The sync command should be used in preference to cp when you’re copying large directories that already exist in your S3 bucket. Only the changed files will be copied, improving performance and reducing your transfer costs.

Here’s a comparison table you can reference when deciding which command to use:

	`cp`	`sync`
Copied files	All files in the source location	New and changed files in the source location
Deletion of removed files	Not supported	Optional (must be enabled)
Performance	Good for individual files and new directories	Good for large directories that already exist in the destination
Cost	Optimal for individual files and new directory structures (no need to check the destination to determine if the content exists)	Optimal for large directories that already exist in the destination (no wasted transfer of existing files)
Use case	Copy operations where the destination doesn’t exist or its content should be replaced	Keeping the source and destination synchronized when files are incrementally added to the source

When to use aws s3 sync?

The sync command can be used whenever you need to mirror a local directory to an S3 bucket, or vice versa, without wastefully replacing files that already exist in the destination.

A few common use cases include:

Creating backups – You can easily synchronize local directories to S3 to create a remote backup. Only the files that have changed since the last backup will be copied.
Uploading websites to S3 static hosting – S3 buckets can be used to host static websites. The sync command will upload the website files produced by your static site generator while preserving any unchanged assets.
Downloading the contents of an S3 bucket – Sometimes, you might need to create a local copy of an S3 bucket, either to more conveniently inspect the bucket’s content or so you can transfer it to another service. Using sync ensures any files that already exist on your machine won’t be unnecessarily replaced.
Synchronizing two different S3 buckets – You can use the command to synchronize two S3 buckets. This can be helpful if you need to make a separate clone or backup of a bucket.

💡 You might also like:

How to use aws s3 sync - examples

You’ll need the AWS CLI on your system to follow along with this tutorial.

You can use Docker to quickly get started without manually installing the CLI:

$ docker run --rm -it --entrypoint bash amazon/aws-cli:latest

Run the aws configure command to interactively supply your credentials to the CLI. You’ll be prompted to enter your Access Key ID and Secret Access Key.

You can generate new credentials in the AWS Console by creating an IAM user and assigning it S3-related policies.

Next, use the CLI to create two new S3 buckets for demonstration purposes:

$ aws s3 mb s3://demo-bucket.example.com
make_bucket: demo-bucket.example.com

$ aws s3 mb s3://demo-bucket-2.example.com
make_bucket: demo-bucket-2.example.com

Bucket names must be unique across all AWS users. Change example.com to your own domain to avoid name collisions.

You can create buckets using either the mb sub-command, shown here, or s3api create-bucket.

Finally, create a few local files ready to synchronize to S3:

$ mkdir demo-content
$ mkdir demo-content/files
$ touch demo-content/foo
$ touch demo-content/bar
$ touch demo-content/files/example

Check out also how to create and manage an AWS S3 bucket using Terraform.

Example 1: Sync a local directory to S3

The basic s3 sync syntax is as follows:

aws s3 sync <source> <destination>

<source> and <destination> can be either a local filesystem path or an S3 URI in s3://bucket/folder form.

To synchronize your local directory to S3, you can run the following command:

$ aws s3 sync demo-content s3://demo-bucket.example.com
upload: demo-content/bar to s3://demo-bucket.example.com/bar
upload: demo-content/foo to s3://demo-bucket.example.com/foo
upload: demo-content/files/example to s3://demo-bucket.example.com/files/example

If you list the bucket’s content, you’ll see your files are now available:

$ aws s3 ls demo-bucket.example.com
                           PRE files/
2023-08-25 08:44:21          0 bar
2023-08-25 08:44:22          0 foo

If you repeat the sync operation again, you’ll see that no files upload:

$ aws s3 sync demo-content s3://demo-bucket.example.com

This is because no changes have occurred to the source directory. You can create a new file and try another sync to observe that only new files are uploaded:

$ touch demo-content/new
$ aws s3 sync demo-content s3://demo-bucket.example.com
upload: demo-content/new to s3://demo-bucket.example.com/new

Example 2: Download from S3 to a local directory

The same syntax can be used to move files in the opposite direction, from S3 to your machine.

To download from S3 to a local directory you can run:

$ aws s3 sync s3://demo-bucket.example.com demo-bucket-download
download: s3://demo-bucket.example.com/bar to demo-bucket-download/bar
download: s3://demo-bucket.example.com/foo to demo-bucket-download/foo
download: s3://demo-bucket.example.com/files/example to demo-bucket-download/files/example

Your files will now be available in the demo-bucket-download folder within your working directory:

$ ls demo-bucket-download
bar files foo

$ ls demo-bucket-download/files
example

Example 3: Synchronise two S3 buckets

To synchronize content between buckets, use an S3 URI as both the source and destination paths:

$ aws s3 sync s3://demo-bucket.example.com s3://demo-bucket-2.example.com

Now the files exist in both buckets:

$ aws s3 ls demo-bucket.example.com
                           PRE files/
2023-08-25 08:44:21          0 bar
2023-08-25 08:44:22          0 foo
2023-08-25 08:44:46          0 new

$ aws s3 ls demo-bucket-2.example.com
                           PRE files/
2023-08-25 08:47:39          0 bar
2023-08-25 08:47:39          0 foo
2023-08-25 08:47:39          0 new

Example 4: Allow deletions at the destination

The sync command does not delete anything from the destination by default.

You can optionally enable this behavior with the --delete flag. It will remove any destination files that no longer exist in the source location.

$ rm demo-content/new

$ aws s3 sync demo-content s3://demo-bucket.example.com --delete
delete: s3://demo-bucket.example.com/new

Using S3 sync options

s3 sync supports many different options that change copy behavior and customize the attributes of created files. Here’s a quick look at some of the most useful capabilities.

How to include and exclude files with S3 sync

You can include and exclude file paths with the --include and --exclude flags. These support UNIX-style wildcards that determine which files will be considered as part of the sync operation. The flags can be repeated multiple times in a single sync command.

# Copy only HTML files
$ aws s3 sync <source> <destination> --exclude="*" --include="*.html"

The flags are applied in the order they appear–-later flags override previous ones.

How to use an S3 sync dry run

You can perform a dry run to see what changes would be made by a sync operation without actually applying them to the destination:

$ aws s3 sync <source> <destination> --dryrun

The regular s3 sync command output will be shown in your terminal so you can check your options are correct before anything is transferred.

How to disable S3 sync symlink resolution

Symlinks are automatically followed when uploading to S3 from your filesystem.

If this is undesirable, you can disable symlink resolution by setting the --no-follow-symlinks flag. This will ensure that files and folders in the linked path don’t appear in S3.

$ aws s3 sync <source> <destination> --no-follow-symlinks

How to set the ACL for synced S3 files

S3 supports several predefined ACLs that can be used to control access to uploaded files. Policies are available for common use cases including private, public-read, public-read-write, and bucket-owner-full-control.

To set the ACL on newly synced files, pass the desired policy’s name to the --acl flag:

$ aws s3 sync <source> <destination> --acl=private

How to enable server-side encryption for synced S3 files

The --sse flag enables server-side encryption on the S3 files that sync creates. Set aws:kms as the value to use your AWS-managed key from the AWS Key Management Service:

$ aws s3 sync <source> <destination> --sse=aws:kms

A specific KMS key can be selected with the --sse-kms-key-id flag:

$ aws s3 sync <source> <destination> --sse=aws:kms --sse-kms-key-id=<id>

How to set the storage class for synced S3 files

S3 storage classes determine the performance, pricing, and access frequency restrictions for your files. The default standard class covers most regular use cases, but alternative classes such as Glacier or Deep Archive can be more optimal for long-term retention of infrequently retrieved objects.

The --storage-class flag allows you to set the storage class to apply to newly synced files:

$ aws s3 sync <source> <destination> --storage-class=GLACIER

Key points

The aws s3 sync command should be your go-to tool when you need to synchronize a local directory and an S3 bucket. It can also be used to synchronize two existing S3 buckets.

Sync is a recursive operation that matches the content of the source and destination. However, deletion of redundant files from the destination is optional so you must remember to set the --delete flag if you need this behavior.

Sync is useful in a variety of scenarios where an S3 bucket acts as a mirror of an existing directory, such as for backups, or when seeding a new bucket with initial content. Sync is also an effective way to create a local copy of an S3 bucket, ready to move elsewhere or transfer to another provider.

Does your organization have extra compliance concerns? Here you can learn more about self-hosting Spacelift in AWS, to ensure your organization’s compliance, control ingress, egress, internal traffic, and certificates, and have the flexibility to run it within GovCloud.

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.

Start free trial

Written by

James Walker

James Walker is the founder of Heron Web, a UK-based software development studio providing bespoke solutions for SMEs. He has experience managing complete end-to-end web development workflows with DevOps, CI/CD, Docker, and Kubernetes. James is also a technical writer and has written extensively about the software development lifecycle, current industry trends, and DevOps concepts and technologies.

jhwalker.net