Amazon S3 (Simple Storage Service) is the leading object storage platform for cloud-native apps, data lakes, backups, and archives. Its protocol underpins most other object storage providers too.
S3 is a powerful service, but one challenge you can face is how to move data in and out of your storage buckets. When S3 is used for large amounts of data, such as backups, you need a way to efficiently transfer your content and then restore it after an incident.
In this article, we’ll show how to use the AWS CLI s3 sync command to easily synchronize an S3 bucket to and from a local directory.
We will cover:
The aws s3 sync command is part of the AWS CLI. It “synchronizes” directories to and from S3 by recursively copying files and subdirectories.
To use the command, you must specify a source and destination. New or changed files will be copied from the source (whether a local directory or an S3 path) to the destination, ensuring complete replication while offering good performance even for large directories.
Although the command performs a recursive copy, it ignores empty directories. Empty folders in the source path will not be copied to the destination; it’s not possible to disable this feature.
aws s3 cp vs aws s3 sync
The aws s3 cp command is an alternative way to move content between your local machine and an S3 bucket.
cp is mainly used to copy individual files. However, it can also copy entire folders when the --recursive flag is used. This has a similar result to sync but the copy behavior is different:
- aws s3 cp --recursive– Copies all files and folders from the source to the destination, even if they already exist in the destination. Existing files will be overwritten. Deletion of destination files that no longer exist in the source is not possible.
- aws s3 sync– Before copying, the destination’s content is inspected to determine which files already exist. Only the new or changed files from the source will be copied to the destination. Deletion of destination files that no longer exist in the source can be optionally enabled.
The sync command should be used in preference to cp when you’re copying large directories that already exist in your S3 bucket. Only the changed files will be copied, improving performance and reducing your transfer costs.
Here’s a comparison table you can reference when deciding which command to use:
| cp | sync | |
| Copied files | All files in the source location | New and changed files in the source location | 
| Deletion of removed files | Not supported | Optional (must be enabled) | 
| Performance | Good for individual files and new directories | Good for large directories that already exist in the destination | 
| Cost | Optimal for individual files and new directory structures (no need to check the destination to determine if the content exists) | Optimal for large directories that already exist in the destination (no wasted transfer of existing files) | 
| Use case | Copy operations where the destination doesn’t exist or its content should be replaced | Keeping the source and destination synchronized when files are incrementally added to the source | 
The sync command can be used whenever you need to mirror a local directory to an S3 bucket, or vice versa, without wastefully replacing files that already exist in the destination.
A few common use cases include:
- Creating backups – You can easily synchronize local directories to S3 to create a remote backup. Only the files that have changed since the last backup will be copied.
- Uploading websites to S3 static hosting – S3 buckets can be used to host static websites. The synccommand will upload the website files produced by your static site generator while preserving any unchanged assets.
- Downloading the contents of an S3 bucket – Sometimes, you might need to create a local copy of an S3 bucket, either to more conveniently inspect the bucket’s content or so you can transfer it to another service. Using syncensures any files that already exist on your machine won’t be unnecessarily replaced.
- Synchronizing two different S3 buckets – You can use the command to synchronize two S3 buckets. This can be helpful if you need to make a separate clone or backup of a bucket.
You’ll need the AWS CLI on your system to follow along with this tutorial.
You can use Docker to quickly get started without manually installing the CLI:
$ docker run --rm -it --entrypoint bash amazon/aws-cli:latestRun the aws configure command to interactively supply your credentials to the CLI. You’ll be prompted to enter your Access Key ID and Secret Access Key.
You can generate new credentials in the AWS Console by creating an IAM user and assigning it S3-related policies.
Next, use the CLI to create two new S3 buckets for demonstration purposes:
$ aws s3 mb s3://demo-bucket.example.com
make_bucket: demo-bucket.example.com
$ aws s3 mb s3://demo-bucket-2.example.com
make_bucket: demo-bucket-2.example.comBucket names must be unique across all AWS users. Change example.com to your own domain to avoid name collisions.
You can create buckets using either the mb sub-command, shown here, or s3api create-bucket.
Finally, create a few local files ready to synchronize to S3:
$ mkdir demo-content
$ mkdir demo-content/files
$ touch demo-content/foo
$ touch demo-content/bar
$ touch demo-content/files/exampleCheck out also how to create and manage an AWS S3 bucket using Terraform.
Example 1: Sync a local directory to S3
The basic s3 sync syntax is as follows:
aws s3 sync <source> <destination><source> and <destination> can be either a local filesystem path or an S3 URI in s3://bucket/folder form.
To synchronize your local directory to S3, you can run the following command:
$ aws s3 sync demo-content s3://demo-bucket.example.com
upload: demo-content/bar to s3://demo-bucket.example.com/bar
upload: demo-content/foo to s3://demo-bucket.example.com/foo
upload: demo-content/files/example to s3://demo-bucket.example.com/files/exampleIf you list the bucket’s content, you’ll see your files are now available:
$ aws s3 ls demo-bucket.example.com
                           PRE files/
2023-08-25 08:44:21          0 bar
2023-08-25 08:44:22          0 fooIf you repeat the sync operation again, you’ll see that no files upload:
$ aws s3 sync demo-content s3://demo-bucket.example.comThis is because no changes have occurred to the source directory. You can create a new file and try another sync to observe that only new files are uploaded:
$ touch demo-content/new
$ aws s3 sync demo-content s3://demo-bucket.example.com
upload: demo-content/new to s3://demo-bucket.example.com/newExample 2: Download from S3 to a local directory
The same syntax can be used to move files in the opposite direction, from S3 to your machine.
To download from S3 to a local directory you can run:
$ aws s3 sync s3://demo-bucket.example.com demo-bucket-download
download: s3://demo-bucket.example.com/bar to demo-bucket-download/bar
download: s3://demo-bucket.example.com/foo to demo-bucket-download/foo
download: s3://demo-bucket.example.com/files/example to demo-bucket-download/files/exampleYour files will now be available in the demo-bucket-download folder within your working directory:
$ ls demo-bucket-download
bar files foo
$ ls demo-bucket-download/files
exampleExample 3: Synchronise two S3 buckets
To synchronize content between buckets, use an S3 URI as both the source and destination paths:
$ aws s3 sync s3://demo-bucket.example.com s3://demo-bucket-2.example.comNow the files exist in both buckets:
$ aws s3 ls demo-bucket.example.com
                           PRE files/
2023-08-25 08:44:21          0 bar
2023-08-25 08:44:22          0 foo
2023-08-25 08:44:46          0 new
$ aws s3 ls demo-bucket-2.example.com
                           PRE files/
2023-08-25 08:47:39          0 bar
2023-08-25 08:47:39          0 foo
2023-08-25 08:47:39          0 newExample 4: Allow deletions at the destination
The sync command does not delete anything from the destination by default. 
You can optionally enable this behavior with the --delete flag. It will remove any destination files that no longer exist in the source location.
$ rm demo-content/new
$ aws s3 sync demo-content s3://demo-bucket.example.com --delete
delete: s3://demo-bucket.example.com/news3 sync supports many different options that change copy behavior and customize the attributes of created files. Here’s a quick look at some of the most useful capabilities.
How to include and exclude files with S3 sync
You can include and exclude file paths with the --include and --exclude flags. These support UNIX-style wildcards that determine which files will be considered as part of the sync operation. The flags can be repeated multiple times in a single sync command.
# Copy only HTML files
$ aws s3 sync <source> <destination> --exclude="*" --include="*.html"The flags are applied in the order they appear–-later flags override previous ones.
How to use an S3 sync dry run
You can perform a dry run to see what changes would be made by a sync operation without actually applying them to the destination:
$ aws s3 sync <source> <destination> --dryrunThe regular s3 sync command output will be shown in your terminal so you can check your options are correct before anything is transferred.
How to disable S3 sync symlink resolution
Symlinks are automatically followed when uploading to S3 from your filesystem.
If this is undesirable, you can disable symlink resolution by setting the --no-follow-symlinks flag. This will ensure that files and folders in the linked path don’t appear in S3.
$ aws s3 sync <source> <destination> --no-follow-symlinksHow to set the ACL for synced S3 files
S3 supports several predefined ACLs that can be used to control access to uploaded files. Policies are available for common use cases including private, public-read, public-read-write, and bucket-owner-full-control.
To set the ACL on newly synced files, pass the desired policy’s name to the --acl flag:
$ aws s3 sync <source> <destination> --acl=privateHow to enable server-side encryption for synced S3 files
The --sse flag enables server-side encryption on the S3 files that sync creates. Set aws:kms as the value to use your AWS-managed key from the AWS Key Management Service:
$ aws s3 sync <source> <destination> --sse=aws:kmsA specific KMS key can be selected with the --sse-kms-key-id flag:
$ aws s3 sync <source> <destination> --sse=aws:kms --sse-kms-key-id=<id>How to set the storage class for synced S3 files
S3 storage classes determine the performance, pricing, and access frequency restrictions for your files. The default standard class covers most regular use cases, but alternative classes such as Glacier or Deep Archive can be more optimal for long-term retention of infrequently retrieved objects.
The --storage-class flag allows you to set the storage class to apply to newly synced files:
$ aws s3 sync <source> <destination> --storage-class=GLACIERThe aws s3 sync command should be your go-to tool when you need to synchronize a local directory and an S3 bucket. It can also be used to synchronize two existing S3 buckets.
Sync is a recursive operation that matches the content of the source and destination. However, deletion of redundant files from the destination is optional so you must remember to set the --delete flag if you need this behavior.
Sync is useful in a variety of scenarios where an S3 bucket acts as a mirror of an existing directory, such as for backups, or when seeding a new bucket with initial content. Sync is also an effective way to create a local copy of an S3 bucket, ready to move elsewhere or transfer to another provider.
Does your organization have extra compliance concerns? Here you can learn more about self-hosting Spacelift in AWS, to ensure your organization’s compliance, control ingress, egress, internal traffic, and certificates, and have the flexibility to run it within GovCloud.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.
Frequently asked questions
- What is the difference between sync and replication in S3?- The key difference is that S3 sync is a one-time operation you initiate manually or via script, while S3 replication is an automatic, continuous process configured between buckets. Use sync for ad-hoc or local workflows, and replication for compliance, disaster recovery, or ongoing multi-region availability. 
- Does AWS S3 sync create a folder?- No, - aws s3 syncdoes not explicitly create a folder in the destination. It creates only the necessary key prefixes in S3 that simulate folder structures through object paths.- To simulate folder creation, you can upload a dummy file (e.g., - .keep) or use- aws s3 cpwith a- --recursiveflag and empty content.
- Is AWS S3 sync multithreaded?- Yes, - aws s3 syncis multithreaded by default.
