AppSync Insights Part 2: Implementing a reusable and generic String Filter in Python and DynamoDB

AppSync Insights Part 2: Implementing a reusable and generic String Filter in Python and DynamoDB

Luc van Donkersgoed

Luc van Donkersgoed

GraphQL is my go-to solution for any resource based API. I don’t know what the future might bring, but at this time I consider it the best platform for standardizing CRUDL operations. One very common feature of List operations (the L in CRUDL) is to filter results, for example “list the resources where ‘owner_name’ contains ‘Donkersgoed’”. In this post I will describe a standardized solution for filtering in AppSync, Lambda and DynamoDB.

This is part two in a three-part series about useful features I’ve built in AppSync. A full overview of the series:

All code in this article as well as a fully functional environment (Python app and CDK infrastructure) are available in my GraphQL Playground repository on GitHub.

The case for filtering

One of the most powerful features in GraphQL is that the client can tell the server exactly what data it wants to receive. For example, the following query …

{
  getCars {
    items {
      id
      make
      model
      licensePlate
    }
  }
}

… will return the following data:

{
  "data": {
    "getCars": {
      "items": [
        {
          "id": "078a9ebe-db9d-47c3-ab14-b1d923fae04f",
          "make": "Tesla",
          "model": "Model Y",
          "licensePlate": "XX-123-B"
        },
        {
          "id": "7ba706b4-ac16-49c1-ba33-7ceb34f2025c",
          "make": "Tesla",
          "model": "Model 3",
          "licensePlate": "AB-888-Z"
        },
        {
          "id": "a0a8b85a-0098-4c59-860d-45931782aad4",
          "make": "Volkswagen",
          "model": "Golf GTE",
          "licensePlate": "TR-543-X"
        }
      ]
    }
  }
}

If you don’t need the ID or make of the cars, adjust your query to…

{
  getCars {
    items {
      model
      licensePlate
    }
  }
}

… and the response will be:

{
  "data": {
    "getCars": {
      "items": [
        {
          "model": "Model Y",
          "licensePlate": "XX-123-B"
        },
        {
          "model": "Model 3",
          "licensePlate": "AB-888-Z"
        },
        {
          "model": "Golf GTE",
          "licensePlate": "TR-543-X"
        }
      ]
    }
  }
}

But what if you only want to retrieve Teslas or Volkswagens? With the solution above you would need to apply client-side filtering, which would go against the entire principle of putting as little data on the wire as possible. So we implement a filter.

In our solution a query with a filter looks like this:

{
  getCars(
    filter:{
      make: {
        containsOr: ["Tesla"]
      }
      model: {
        containsOr: ["Y"]
      }
    }
  ) {
    resultCount
    items {
      make
      model
    }
  }
}

And the result:

{
  "data": {
    "getCars": {
      "resultCount": 1,
      "items": [
        {
          "make": "Tesla",
          "model": "Model Y"
        }
      ]
    }
  }
}

This solution allows the client to filter on any field. The relevant parts of the GraphQL schema are:

input GetCarsFilter{
    make: StringOperators
    model: StringOperators
    color: StringOperators
    continentOfOrigin: StringOperators
    countryOfOrigin: StringOperators
    licensePlate: StringOperators
}

input StringOperators {
    containsOr: [String]
    containsAnd: [String]
    notContains: [String]
    equalsOr: [String]
    notEquals: [String]
}

Adding an additional filter field is as easy as adding a <new_field>: StringOperators to GetCarsFilter. But the important question is what the filter implementation looks like in the Python (Lambda) backend. After all, the solution wouldn’t scale if we had to add an additional 100 lines of code for every field. Luckily, our implementation is 100% generic and not aware of the model’s properties at all.

An introduction to Filter Expressions

The cars in this example are stored in DynamoDB with single table design. This means the object in DynamoDB looks like this:

{
    "PK": "ITEM",
    "SK": "CAR#<uuid>",
    "make": "Tesla",
    "model": "Model Y"
}

With this design we can also store books (or any other item type) in the same table, for example:

{
    "PK": "ITEM",
    "SK": "BOOK#<uuid>",
    "title": "One Flew Over the Cuckoo's Nest",
    "author": "Ken Kesey"
}

And querying all cars is as easy as KeyConditionExpression = Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'). To filter the results returned by this query, for example on the make and model described above, we apply Filter Expressions. In its simplest form these can be implemented like this:

    inventory_table = boto3.resource('dynamodb').Table(table_name)
    ddb_response = inventory_table.query(
        KeyConditionExpression=Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'),
        FilterExpression=Attr('make').contains('Tesla')
    )
    items = ddb_response['Items']

Querying on multiple fields is easy. The following code filters on ‘make’ and ‘model’.

    inventory_table = boto3.resource('dynamodb').Table(table_name)
    ddb_response = inventory_table.query(
        KeyConditionExpression=Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'),
        FilterExpression=Attr('make').contains('Tesla') & Attr('model').contains('Y')
    )
    items = ddb_response['Items']

But the GraphQL filter might be a lot more complex than this simple example. Take the following query, which is perfectly valid:

{
  getCars(
    filter:{
      make: {
        containsOr: ["Tesla", "Volkswagen"]
      }
      model: {
        containsOr: ["Model", "Golf"]
        notEquals: ["Model Y"]
      }
      licensePlate: {
        containsAnd: ["AA", "12"]
        notEquals: ["AA-123-T", "AA-112-W"]
      }
      countryOfOrigin: {
        containsOr: ["Belg", "Netherl"]
      }
    }
  ) {
    resultCount
    items {
      make
      model
    }
  }
}

The corresponding Filter Expression in Python would be:

filter_expression = \
    (Attr('make').contains('Tesla') | Attr('make').contains('Volkswagen')) & 
    ((Attr('model').contains('Model') | Attr('model').contains('Golf')) & ~Attr('model').eq('Model Y')) & 
    ((Attr('licensePlate').contains('AA') & Attr('licensePlate').contains('12')) & 
    (~Attr('licensePlate').eq('AA-123-T') & ~Attr('licensePlate').eq('AA-112-W'))) & 
    (Attr('countryOfOrigin').contains('Belg') | Attr('countryOfOrigin').contains('Netherl'))

… aaand that’s where we need a generic solution. This solution should take the filter input from GraphQL and translate it to a complete filter expression, without any custom code per keyword.

Boolean logic: nested ANDs and ORs

As seen in the code above, specifying different keywords (make, model) in the filter results in an AND comparison. In other words, if you filter on make and model, both components need to match their queries. Let’s look at a simplified filter that only compares exact strings:

filter: {
    make: "Tesla"
    model: "Model Y"
}

The matching filter expression would be <make_filter> AND <model_filter>: Attr('make').eq('Tesla') & Attr('model').eq('Model Y'). In Python, we can generate this filter expression like this:

filter_expression = None
# filter_key might be 'make' or 'model' and 'filter_value' might be 'Tesla' or 'Model Y'
for filter_key, filter_value in filter_dict:
    key_filter = Attr(filter_key).eq(filter_value)
    if not filter_expression:
        filter_expression = key_filter
    else:
        filter_expression & key_filter

return filter_expression

In plain English: loop over every keyword. If no filter_expression is set, initialize it with Attr(filter_key).eq(filter_value). If the filter_expression exists, append the next one with the binary AND operator.

However, our solution is more complex than literal string matching. First of all, we support multiple conditions on the same keyword:

filter: {
    make: {
        contains: "esl"
        notEquals: "Vesla"
    }
}

If multiple conditions are specified for the same keyword, we again combine them with the AND operator: in the example above, the ‘make’ must contain ‘esl’ AND it must not equal ‘Vesla’. To implement this in Python we need a nested loop.

filter_expression = None
# filter_key might be 'make' or 'model' and 'filter_value' is a dict of conditions and their values:
for filter_key, filter_value in filter_dict:
    # The part below replaces `key_filter = Attr(filter_key).eq(filter_value)`
    key_filter = None
    for condition, condition_value in filter_dict:
        if condition == 'contains':
            condition_filter = Attr(filter_key).contains(condition_value)
        elif condition == 'notEquals':
            condition_filter = ~Attr(filter_key).eq(condition_value)

        if not key_filter:
            # In case of the first condition (contains: "esl")
            key_filter = condition_filter
        else:
            # We always AND multiple conditions, so we AND (notEquals: "Vesla") to the
            # existing condition.
            key_filter & condition_filter

    # At this point, `key_filter` is `Attr('make').contains('esl') & ~Attr('make').eq('Vesla')`

    # From here on the code is the same as before, so we AND multiple keywords (make, model). Each
    # of these contains a key_filter as generated above.
    if not filter_expression:
        filter_expression = key_filter
    else:
        filter_expression & key_filter

return filter_expression

But our filter implementation is not complete yet. The third level of filtering introduces multiple keywords per condition:

filter: {
    make: {
        containsOr: ["esl", "olks"]
        notEquals: ["Vesla", "Bolkswagen"]
    }
}

This involves another loop, where each of the values (“esl”, “olks”) gets its own filter expression. These expressions can be compounded using an AND or an OR operation, depending on the condition (containsAnd vs. containsOr). The code becomes a bit harder to follow:

filter_expression = None
# filter_key might be 'make' or 'model' and 'filter_value' is a dict of conditions and their values:
for filter_key, filter_values in filter_dict.items():

    key_filter = None
    for condition, condition_values in filter_values.items():
        condition_filter = None
        if condition == 'containsOr':
            for condition_value in condition_values:  # This the third loop, for the values in the array
                sub_key_filter = Attr(filter_key).contains(condition_value)

                if not condition_filter:
                    condition_filter = sub_key_filter
                else:
                    condition_filter | sub_key_filter  # Note the OR here. 

            # At this point, `condition_filter` is `Attr('make').contains('esl') | Attr('make').contains('olks')`

        elif condition == 'notEquals':
            for condition_value in condition_values:  # This the third loop, for the values in the array
                sub_key_filter = ~Attr(filter_key).eq(condition_value)

                if not condition_filter:
                    condition_filter = sub_key_filter
                else:
                    condition_filter & sub_key_filter  # Note the AND here. 

            # At this point, `condition_filter` is `~Attr('make').eq('Vesla') | ~Attr('make').eq('Bolkswagen')`

        if not key_filter:
            # In case of the first condition (Attr('make').contains('esl') | Attr('make').contains('olks'))
            key_filter = condition_filter
        else:
            # We always AND multiple conditions, so we AND 
            # (~Attr('make').eq('Vesla') | ~Attr('make').eq('Bolkswagen')) to the existing key_filter.
            key_filter & condition_filter

    # At this point, `key_filter` is `Attr('make').contains('esl') & ~Attr('make').eq('Vesla')`

    # From here the code is the same as before, so we AND multiple keywords (make, model). Each
    # of these contains a key_filter as generated above.
    if not filter_expression:
        filter_expression = key_filter
    else:
        filter_expression & key_filter

return filter_expression

The code above is not entirely complete and polished - containsAnd, notContains, and equalsOr are missing, and there is a lot of duplicate code. The final product, optimized and stripped of comments, looks like this:

@staticmethod
def _append_filter(source_filter, operation, additional_filter):
    if source_filter is None:
        return additional_filter
    elif additional_filter is None:
        return source_filter

    if operation == 'AND':
        return source_filter & additional_filter
    if operation == 'OR':
        return source_filter | additional_filter
    raise RuntimeError(f'Invalid operation: {operation}')

def _build_query_filter_expression(self, filter_dict):
    if not filter_dict:
        return None
    filter_expression = None
    for filter_key, filter_values in filter_dict.items():
        key_filter = None
        for condition, condition_values in filter_values.items():
            sub_filter = None
            for condition_value in condition_values:
                if condition == 'containsOr':
                    sub_key_filter = Attr(filter_key).contains(condition_value)
                    sub_filter = self._append_filter(sub_filter, 'OR', sub_key_filter)
                elif condition == 'containsAnd':
                    sub_key_filter = Attr(filter_key).contains(condition_value)
                    sub_filter = self._append_filter(sub_filter, 'AND', sub_key_filter)
                elif condition == 'notContains':
                    sub_key_filter = ~Attr(filter_key).contains(condition_value)
                    sub_filter = self._append_filter(sub_filter, 'AND', sub_key_filter)
                elif condition == 'equalsOr':
                    sub_key_filter = Attr(filter_key).eq(condition_value)
                    sub_filter = self._append_filter(sub_filter, 'OR', sub_key_filter)
                elif condition == 'notEquals':
                    sub_key_filter = ~Attr(filter_key).eq(condition_value)
                    sub_filter = self._append_filter(sub_filter, 'AND', sub_key_filter)
            key_filter = self._append_filter(key_filter, 'AND', sub_filter)
        filter_expression = self._append_filter(filter_expression, 'AND', key_filter)
    return filter_expression

Pretty complex, yet only 40 lines of code. To use this function you simply pass the ‘filter’ value from the GraphQL query as a parameter, and forward the result directly to the DynamoDB query:

inventory_table = boto3.resource('dynamodb').Table(table_name)
ddb_response = inventory_table.query(
    KeyConditionExpression=Key('PK').eq('ITEM') & Key('SK').begins_with('CAR#'),
    FilterExpression=self._build_query_filter_expression(filter_parameters)
)
items = ddb_response['Items']

To see the full code in context, check out the files on GitHub.

The Python implementation does not perform many value checks (e.g. if the keyword exists as an attribute in the database) because the input data is generated by GraphQL. This guarantees sane input which makes the Python implementation easier. Another plus for GraphQL right there. The code above is visualized in the following diagram:

Filter Diagram

Conclusion

AppSync, Lambda and DynamoDB are a powerful combo. AppSync serves as a strong typing mechanism for your APIs, guaranteeing that only known fields and types are passed to Lambda. DynamoDB offers built-in features to easily apply filters on your data, and Lambda (and in this case Python) is the glue to put it all together. This allows us to build generic solutions that can serve many use cases.

Looking back at the APIs I built in the past I can hardly believe how much heavy lifting I no longer need to worry about, while API consistency is greatly improved and documentation writes itself. And on top of it all, the solution is completely serverless, which means it’s a pay-as-you-go model, it automatically scales with demand, and is inherently highly available. This allows me to focus on the application, not on patching and maintaining servers or databases.

I share posts like these and smaller news articles on Twitter, follow me there for regular updates! If you have questions or remarks, or would just like to get in touch, you can also find me on LinkedIn.