AppSync Insights Part 3: Minimizing Data Transfer on all Layers

AppSync Insights Part 3: Minimizing Data Transfer on all Layers

Luc van Donkersgoed

Luc van Donkersgoed

In part one of the AppSync Insights series we saw how VTL templates can minimize the code required to implement OAuth scope authorization. In part two we saw how we can use Python to implement a generic GraphQL filter. This third installment is less earth shaking, yet a cool way to reduce data transfer and latency.

One of the core tenets of GraphQL is that it only returns the fields requested by the client. The underlying idea is that when millions of users connect to an API it is wasteful and costly to send data they won’t use. So instead of sending everything and the kitchen sink…

{
  "data": {
    "getCars": {
      "items": [
        {
          "id": "078a9ebe-db9d-47c3-ab14-b1d923fae04f",
          "make": "Tesla",
          "model": "Model Y",
          "licensePlate": "XX-123-B",
          "color": "white",
          "countryRegistered": "NL",
          "insured": true,
          "fuelType": "electric"
        }
      ]
    }
  }
}

… we let the client define what they need, and return only those fields requested.

{
  "data": {
    "getCars": {
      "items": [
        {
          "make": "Tesla",
          "model": "Model Y",
          "color": "white",
          "countryRegistered": "NL"
        }
      ]
    }
  }
}

This is an important and very effective optimization. But it is not the only component in the response flow. Let’s take a look at the following diagram.

Data Flows

In this process the user tells AppSync which fields they want to receive in Request 1. AppSync uses a data source (in this case Lambda and DynamoDB) to retrieve the data. Generally, AppSync will receive the full object and all its details from Lambda, after which AppSync will strip what the client doesn’t need. This leads to an optimized Response 3.

Today we will cover how we can forward which fields the user requested from AppSync to Lambda (Request 2) and from Lambda to DynamoDB (Request 3). We can use this information to only retrieve relevant fields from DynamoDB (Response 1) and have Lambda only return this subset of data (Response 2). We will use VTL Templates and DynamoDB Projection Expressions to achieve this. All code in this article as well as a fully functional environment (Python app and CDK infrastructure) are available in my GraphQL Playground repository on GitHub.

This is part three in a three-part series about useful features I’ve built in AppSync. A full overview of the series:

Step 1: VTL templates

We’ve extensively covered VTL in Restricting Access with OAuth Scopes and VTL. Today we will use VTL again, this time to forward the requested fields from AppSync to Lambda. The default request mapping template for a Lambda data source looks like this:

{
  "version" : "2017-02-28",
  "operation": "Invoke",
  "payload": $util.toJson($context.args)
}

This template takes the arguments provided by the client, converts them to JSON and invokes a Lambda function with them. To include the “selection set list” (the list of fields the client wants to receive), we amend the template to:

{
  "version" : "2017-02-28",
  "operation": "Invoke",
  "payload": {
    "arguments": $util.toJson($context.args),
    "selectionSetList": $utils.toJson($context.info.selectionSetList)
  }
}

This way the arguments are available under the arguments key in the Lambda Event JSON, and the selection set list will be available under the selectionSetList key. Please note that $context.info.selectionSetGraphQL exists as well, but field contains the selection set formatted in GraphQL Schema Definition Language (SDL). More information can be found in the documentation.

With this VTL request template in place we execute the following GraphQL call…

{
  getCars(
    filter:{
      make: {
        containsOr: ["Tesla", "Volkswagen"]
      }
      model: {
        containsOr: ["Model"]
        notEquals: ["Model Y"]
      }
    }
  ) {
    resultCount
    items {
      make
      model
      licensePlate
    }
  }
}

… and the Lambda Event JSON will look like this.

{
    "arguments": {
        "filter": {
            "make": {
                "containsOr": [
                    "Tesla",
                    "Volkswagen"
                ]
            },
            "model": {
                "containsOr": [
                    "Model"
                ],
                "notEquals": [
                    "Model Y"
                ]
            }
        }
    },
    "selectionSetList": [
        "resultCount",
        "items",
        "items/make",
        "items/model",
        "items/licensePlate"
    ]
}

Let’s take another look at the data flow diagram. With the implementation above (see it in context on GitHub) we have implemented forwarding the client’s selection to Lambda (Request 2).
Data Flows

Next up: forwarding these fields to DynamoDB.

Introducing Projection Expressions

DynamoDB has a built-in mechanism to let it know which fields you want to retrieve with a query. For example, our cars are stored in DynamoDB like so…

{
    "PK": "ITEM",
    "SK": "CAR#<uuid>",
    "make": "Tesla",
    "model": "Model Y",
    "color": "white",
    "continentOfOrigin": "Europe",
    "countryOfOrigin": "Netherlands",
    "licensePlate": "XX-123-B"
}

… and if you would query DynamoDB with no Projection Expression, it would return all these fields. That might be fine when you’re retrieving a single object, but in large requests it might impact performance. So let’s tell DynamoDB we only need the make and model of our cars.

    inventory_table = boto3.resource('dynamodb').Table(table_name)
    ddb_response = inventory_table.query(
        KeyConditionExpression=Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'),
        ProjectionExpression='make, model'
    )
    items = ddb_response['Items']

Easy peasy. However, this query will fail when the Projection Expression contains reserved keywords like ‘region’. So to be safe, it’s better to implement it like this:

    inventory_table = boto3.resource('dynamodb').Table(table_name)
    ddb_response = inventory_table.query(
        KeyConditionExpression=Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'),
        ProjectionExpression='#K0, #K1',
        ExpressionAttributeNames={'#K0': 'make', '#K1': 'model'}
    )
    items = ddb_response['Items']

This solution will also succeed with reserved keywords. So all that’s left to do is to convert the incoming selection set to a projection expression and expression attribute names.

Converting the Selection Set to a Projection Expression in Python

Our GraphQL schema defines that the cars returned by a GetCars call are put in an items array. That’s why you will see the items/ prefix in the selection set.

{
    "selectionSetList": [
        "resultCount",
        "items",
        "items/make",
        "items/model",
        "items/licensePlate"
    ]
}

The properties in DynamoDB don’t have this prefix, so let’s filter the list and strip the prefix.

selection_set = [
    set_item[len('items/'):] for set_item in params['selection_set'] if set_item.startswith('items/')
]

With the example input above, this will result in a selection_set that looks like ["make", "model", "licensePlate"]. We can use this list to create both a ProjectionExpression and ExpressionAttributeNames.

# The ProjectionExpression can't contain words like 'Region', so we use numbered references.
# After building, the ProjectionExpression looks like this: "#K0, #K1"
projection_expression = ', '.join(
    f'#K{index}' for index in range(len(selection_set))
)

# Then we create a map to link the #Kx values to the actual keys we want to resolve.
# The final ExpressionAttributeNames look like this: {"#K0": "make", "#K1": "model"}
expression_attribute_names = {
    f'#K{index}': selection_key for index, selection_key in enumerate(selection_set)
}

Now we can use these variables as input to our DynamoDB query.

    inventory_table = boto3.resource('dynamodb').Table(table_name)
    ddb_response = inventory_table.query(
        KeyConditionExpression=Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'),
        ProjectionExpression=projection_expression,
        ExpressionAttributeNames=expression_attribute_names
    )
    items = ddb_response['Items']

Assuming the ["make", "model", "licensePlate"] selection set, our items dictionary will look like this.

[
    {
        "make": "Tesla",
        "model": "Model 3",
        "licensePlate": "BB-915-Q"
    },
    {
        "make": "Tesla",
        "model": "Model X",
        "licensePlate": "AA-112-B"
    },
    {
        "make": "Tesla",
        "model": "Model X",
        "licensePlate": "AA-123-B"
    }
]

Results

With the Projection Expression in place, we’re telling DynamoDB to only fetch the fields requested by the user. Again, the diagram:
Data Flows

Our Projection Expression is now part or Request 3. This will result in Response 1 only containing the required fields, which we can transfer unchanged back to AppSync (Response 2). AppSync receives exactly the data it needs to return, so no data transfer and processing power are wasted.

As I said at the top, this implementation isn’t quite groundbreaking. The reductions in data transfer all take place within AWS’ boundaries, so there is no data transfer cost, and the available network performance and bandwidth are very, very high. But in large scale APIs, either in amount of requests or amount of data processed, this optimization might actually make an impact.

Conclusion

This has been a bit of a pet feature. Realistically, I should probably say it’s a premature optimization in almost any use case. Yet I like how I’ve been able to connect a few cool technologies - VTL, Selection Sets, and Projection Expressions - to create something pretty. Although almost nobody will see the benefits, this implementation just feels right.

I share posts like these and smaller news articles on Twitter, follow me there for regular updates! If you have questions or remarks, or would just like to get in touch, you can also find me on LinkedIn.