New CDK Bootstrap and the EKS Cluster
In the AWS CDK Version v1.25.0, the CDK team added a new bootstrap template that includes new resources like IAM Role and S3 Buckets. From the AWS CDK Documentation: > The AWS CDK supports two…
In this post we will review four S3 announcements that were not mentioned in Andy Jassy’s keynote at all: strong consistency, multiple replication destinations, two-way replication and bucket keys.
For more re:Invent day 1 announcements, see my other post AWS re:Invent 2020 Day 1: Top 5 Announcements.
Did Amazon just break the CAP theorem? Since the release of S3 in 2006, Amazon and many others have documented and educated their audience how overwriting an object in an S3 bucket is eventually consistent: when you read an object after overwriting it, you might receive the latest version or the previous version, until the change has propagated over all storage nodes. When this process is complete, all nodes will return the latest version.
With the release of strong read-after-write consistency, this is no longer true. When you read an object you will get the latest version, regardless of the amount of files or versions you write.
In many use cases eventual consistency is fine. Let’s say you store profile pictures for users on social media. When somebody updates their image and somebody else views their profile in the same second, it doesn’t matter if the old image is still there. It will be updated the next time they visit.
However, S3 is used for data lakes more and more. In these use cases, your S3 buckets might contain reports, analytics data, clickstreams, and many other types of time-sensitive data. With the release of strong read-after-write consistency, these data processing applications are now guaranteed to have the latest data available.
S3 replication launched in 2015. At that time it was limited to cross-region setups only. In 2019, AWS added same-region replication. Both solutions are great for sharing and synchronizing objects between multiple accounts, for example for secure data sharing or disaster recovery. However, these features could only be configured for a single target bucket. If you wanted to replicate your source data to multiple buckets you would either need to daisy-chain multiple replications or build your own solution using S3 events.
With the release of multiple destination buckets, you can configure a single source bucket to replicate incoming changes to a number of target buckets.
A common use case for S3 replication is Cost and Usage Reports (CURs). These (very) detailed billing reports contain every single billable resource in your AWS account or AWS organization. The bucket containing these reports lives in your root account, which you want to protect very well. This includes prohibiting access to the buckets in the root account. To keep the root account secure, you can use S3 replication to propagate the CURs to an AWS account owned by your finance team. But what if your sales team or ops teams also need access to this data? With multiple destination buckets you can easily replicate your data to each of these teams’ accounts, while maintaining the security and consistency of your source data and root account.
To set up two-way replication you have to configure two replication rules: one for bucket A to bucket B and one for the reverse direction. Two-way replication makes sure that the metadata on these objects stays in sync as well. From the FAQ:
Make sure to enable replica modification sync on both buckets A and B to replicate replica metadata changes like object access control lists (ACLs), object tags, or object locks on the replicated objects.
Update 06 December: The original article stated that two-way replication removed the need to set up two replication rules. This is not the case: the two replication rules still need to be in place, but the two-way sync makes sure the metadata on these buckets stays in sync as well.
Buckets are regional entities. If you’re building a multi-master, multi-region application you want to read and write your objects from a bucket that is in the application’s region, while also having access to the objects written by other regions. With two-way replication, creating the infrastructure to support this has become as easy as the click of a button.
S3 bucket keys are a cost reduction mechanism. Normally, when you use object encryption with a customer-managed key (CMK) on your S3 bucket, S3 will use this CMK generate a new data key for every object you write to your bucket. When decrypting the objects, S3 will use the CMK to decrypt the object’s data key. Each of these operations incurs a small cost. With S3 bucket keys, an intermediate key is added between the CMK and the data key. The intermediate key is used for the encryption and decryption of data keys, reducing the amount of KMS calls by up to 99%.
Image source: AWS Documentation
Werner Vogels said: dance like nobody is watching, encrypt like everybody is. This is good advice, but when you’re using S3 object encryption on a high-traffic bucket, your good intentions and high security could become quite expensive. With S3 bucket keys, this no longer has to be the case.
It says something about re:Invent and the sheer waterfall of releases when important features like these don’t even make it to the keynote. Even if that keynote is 3 hours long! I’m glad to see these iterative improvements to the core services, making AWS a little better every year.
I share posts like these and smaller news articles on Twitter, follow me there for regular updates! If you have questions or remarks, or would just like to get in touch, you can also find me on LinkedIn.