Transparently Generate Pre-Signed URLs with S3 Object Lambdas

Transparently Generate Pre-Signed URLs with S3 Object Lambdas

Luc van Donkersgoed

Luc van Donkersgoed

AWS recently introduced S3 Object Lambdas. These Lambda functions sit behind an S3 Access Point and can transparently mutate objects as they are retrieved from S3. In this post we will see how this mechanism can be combined with pre-signed URLs to protect assets, while simplifying application code and improving the user experience.

Let’s say you have a paid blog or news site. Paying subscribers should have access to your content, but others shouldn’t. Your architecture is completely serverless so you maintain your site’s structure in DynamoDB and store the actual articles and assets in S3.

A common security measure is to store your assets in a private bucket where they can’t be accessed by unauthorized users or bots. You then generate pre-signed URLs for users with a valid session.

Simple setup

In our example setup the articles are stored as Markdown files like the example below.

# This is the header of the article
The introduction paragraph goes here.

## Then there is a second header
This contains some more text, and very importantly, an image:
![My Fancy Graph](assets/fancy_graph.png)

When a signed-in user requests this article, they call your API, which returns some basic info and a signed URL for the content.

The response from Lambda might look like this:

{
    "title": "My article",
    "published_at": "2021-03-31T20:50:52+0000",
    "body_url": "https://mybucket.s3.amazonaws.com/articles/my-article.md?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA2Q5L6SKH3REFJWBK..."
}

The frontend website will retrieve this JSON, fetch the markdown from the signed URL and render it. This design keeps the Lambda function nimble (and thus cheap) and the user experience responsive. So far, so good. But the attentive reader will ask: “What about that fancy graph image? Shouldn’t it be signed too?” And indeed it should - otherwise these images could be downloaded by any user.

Before S3 Object Lambdas the solution for this problem would be to have the Lambda function fetch the content of the Markdown file from S3, use regex to retrieve all the assets from the text and generate a signed URL for each of them. This might be implemented as follows in Python.

s3_resource = boto3.resource('s3')
obj = s3_resource.Object(
    os.getenv('ASSETS_BUCKET'),
    f'articles/{slug}.md'
)
article_body = obj.get()['Body'].read().decode('utf-8')

# Transform object
transformed_object = re.sub(
    r'!\[(.*)\]\((assets\/(?:.*))\)',
    generate_signed_url,
    article_body
)

Heavy setup

The problem with this solution is that a relatively lightweight Lambda function suddenly becomes a lot slower: it now needs to retrieve the full article from S3, parse its contents, generate pre-signed URLs for every asset, and return the full body of text to the user. This also means the lambda response has exploded in size, which further reduces efficiency. A function that should respond in a few dozen milliseconds can now take hundreds of milliseconds before a reply is returned to the user.

S3 Object Lambda design

With the newly released S3 Object Lambdas we can redesign this architecture so the article content is fetched directly from S3 again, while all assets are transformed into signed URLs. This means the API Lambda function no longer has this responsibility and can be converted back to its old nimble self.

Object Lambda

In this setup, the Lambda function called by the frontend looks like this (validating user access and retrieving article details from DynamoDB are left out).

import json
import boto3
from botocore.config import Config

def lambda_handler(event, context):
    s3_client = boto3.client(
        's3',
        config=Config(
            signature_version='s3v4',
            s3={'addressing_style': 'path'}
        ),
        region_name='eu-west-1'
    )

    slug = 'my-article'
    object_lambda_access_point_arn = 'arn:aws:s3-object-lambda:eu-west-1:123412341234:accesspoint/object-lambda-access-point'

    signed_url = s3_client.generate_presigned_url(
            'get_object',
            Params={
                'Bucket': object_lambda_access_point_arn,
                'Key': f'articles/{slug}.md'
            },
            ExpiresIn=3600,
        )

    return {
        'statusCode': 200,
        'signed_url':signed_url
    }

As you can see we call the generate_presigned_url operation with the object lambda access point ARN. The resulting URL can be called by the frontend, which will retrieve the articles/my-article.md object through the object lambda access point. Please note that generating the pre-signed URL requires at least botocore version 1.20.31, or the operation will fail with an error like the following:

"Parameter validation failed: Invalid bucket name \"arn:aws:s3-object-lambda:eu-west-1:123412341234:accesspoint/object-lambda-access-point\": Bucket name must match the regex \"^[a-zA-Z0-9.\\-_]{1,255}$\" or be an ARN matching the regex \"^arn:(aws).*:s3:[a-z\\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\\-]{1,63}$\""

Because the object is retrieved through the object lambda access point we can process the markdown file before it is returned to the user. Our processing lambda looks like this:

import boto3
import requests
import re
import json
from botocore.config import Config

supporting_access_point_arn = None

s3_client = boto3.client(
    's3',
    config=Config(
        signature_version='s3v4',
        s3={'addressing_style': 'path'}
    ),
    region_name='eu-west-1'
)

def generate_signed_url(match):
    alt_text = match.group(1)
    asset_path = match.group(2)
    signed_url = s3_client.generate_presigned_url(
        'get_object',
        Params={
            'Bucket': supporting_access_point_arn,
            'Key': asset_path
        },
        ExpiresIn=3600,
    )
    return f'![{alt_text})({signed_url})]'

def lambda_handler(event, context):
    print(json.dumps(event))
    global supporting_access_point_arn
    supporting_access_point_arn = event['configuration']['supportingAccessPointArn']

    object_get_context = event["getObjectContext"]
    request_route = object_get_context["outputRoute"]
    request_token = object_get_context["outputToken"]
    s3_url = object_get_context["inputS3Url"]

    # Get object from S3
    response = requests.get(s3_url)
    original_object = response.content.decode('utf-8')

    # Replace every asset with a signed url
    transformed_object = re.sub(
        r'!\[(.*)\]\((assets\/(?:.*))\)',
        generate_signed_url,
        original_object
    )

    # Write object back to S3 Object Lambda
    s3 = boto3.client('s3')
    s3.write_get_object_response(
        Body=transformed_object,
        RequestRoute=request_route,
        RequestToken=request_token)

    return {'status_code': 200}

Please ignore the fact that I used the global keyword - this is just a tech demo. The Lambda function replaces all the assets with signed urls, with the result looking exactly the way we want it.

# This is the header of the article
The introduction paragraph goes here.

## Then there is a second header
This contains some more text, and very importantly, an image:
![My Fancy Graph)(https://object-demo-ap-123412341234.s3-accesspoint.eu-west-1.amazonaws.com/assets/fancy_graph.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAR4CQPUGOOM5EYJUI%2F20210411%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20210411T154616Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEFAaCWV1LXdlc3QtMSJHMEUCIQCSMZrlF4MpZIn44zAqWf4LHiNO21b1IOu5NiIFohAyrwIgYVQPuRKKwoGVNqyvqTBVmVcNxEadSkO3nIf7eOzICcQq1wEIqf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgwxMjkwMTc4MTU0NTIiDDweJwDsPIqqCVYIUyqrATzBgTncAjtK8D5U0Ox2q0sT6NKLR35%2Fr4Ob%2Ftc9I4%2BYw7eZuQ%2Flw5W7bauXs6VWPZUyDt60BAsob6%2B%2FeKmUTuaiJeqgfE9jcT%2FPnNVncHyIH0RvKhNFW2CjScrqzw7mcqjsj5AMd1ukDK55aHPGNlKCSFGzandgXSMk7tvYeGlqmLhuCSyZOON9iD5ufQGycxsFpNGD20d0e4MczBYEDA9Oy9B5%2FNGEaIBZTDDuscyDBjrgAd5bMLQ0nHKRLOMZuoLUvtnIcO6GTE%2FeqXpw2EYF2NKbERjWzCf6vD50sI3jD1nsGwlJ2mEjUDzCKOFJL5%2Bce8rSaJ7HAtTr9%2BlmYQeOUIyxmdXKgfZ2tGjtnHFBOY78te7oVi0Fbt8ivm%2FhTfxl4KYyjODTFd8nKnO8QVwjGSFNASZRlypXNtWA5UBv6Vj802Y9CpSNtouJIBdZoXFEGxJIwUTZPa9iFGnX7kMVb23Bd1%2FZogiaXCAyBJzx6FLbJ5lU%2BsPcWtiqWI06uKWaOxOidX55Sa9B%2F4XY8l40ocyF&X-Amz-Signature=5d742a2f506db654eb4717082972cadc68a82ae45b8a592ea0655cff34c5d417)]

Conclusion

It has always felt like an anti-pattern to load a file from S3 into a Lambda function and return its contents through an API. After all, that’s what we have pre-signed URLs for. Sometimes, however, you simply had to. For cases where the object’s content had to be mutated - for example in PII scenarios or when accessing embedded assets - there was no alternative. This made the Lambda functions bloated and slow, and their responsibilities fuzzy.

The new S3 Object Lambdas allow for much clearer designs with better segmented responsibilities. In this article’s example the API Lambda is clearly responsible for assessing a user’s permissions and returning article metadata, while the Object Lambda Access Point and its underlying function are responsible for pre-processing and returning the article’s content. The S3 access point’s responsibilities are limited to returning objects without transformation, like the article’s assets. Three access patterns, three responsibilities. Simple and clean.

I share posts like these and smaller news articles on Twitter, follow me there for regular updates! If you have questions or remarks, or would just like to get in touch, you can also find me on LinkedIn.

Luc van Donkersgoed
Luc van Donkersgoed