Reduce API calls to backend systems with a Static Data Cache

Reduce API calls to backend systems with a Static Data Cache

Koen Werdler

Koen Werdler

Within Sentia we use many different systems for our processes, like ServiceNow, Microsoft Dynamics 365, Okta and others. To have a streamlined and centralized solution for automated communication with these systems, we created one single cloud-based solution called the Sentia Aggregation Layer API (SALA). To process thousands of requests each day without overloading the connected systems, we’ve implemented a Static Data Cache. Read on to find out more.

SALA is a system made with serverless technology on AWS. Serverless means you provide the code and configuration to the provider (AWS) where it’s executed without the need for you to worry about the operating system or hardware; this is all taken care for you. In our case we use the following serverless technologies:

  • AppSync
    fully managed AWS service for GraphQL APIs with features like caching, subscriptions and automatic scaling to handle any kind of traffic

  • Lambda functions
    executable (Python) code that can perform functionality using an event from AppSync

  • API Gateway
    fully managed service for creating and publishing a RESTful API with features like CORS support, authorization, throttling, etc.

  • DynamoDB
    AWS’s fast NoSQL Key-Value database

  • EventBridge
    event bus that is able to asynchronously transfer events between different services like AppSync and Lambda functions

Most of the requests, if not all, comes in via the GraphQL API at AppSync. From there, we’ve created Lambda functions as datasources that are able to retrieve or update data in all the backend systems like ServiceNow. See the (simplified) diagram below.

SALA Diagram

In the diagram you’ll also notice a design decision to use EventBridge for asynchronous calls. Some of our calls to update data use this asynchronous route to process large requests. If we don’t do this, these requests may time out and abort if they exceed the maximum request execution time of 30 seconds for AppSync requests.

Need for cache

We’ve noticed that some of the lambda’s were responsible for doing requests which always yield the same response for the same request. Think about retrieving the account number for a company by name, or retrieving ServiceNow’s unique ID (sys_id) for a location or a support schedule. Every time the same request is done, it will yield the same account number or unique ID. These requests make excellent candidates for caching.

Caching is usually something you do to increase performance of your application. If you already have the data stored somewhere, it saves you the time of calculating or retrieving it again. In our case, we’re more worried about overloading the connected systems with the amount of requests going through the system.
For example with ServiceNow we only have a limited amount of workers which are able to request or update data in ServiceNow. If we do too many requests per second, ServiceNow returns a “Too Many Requests” error: this aborts the query or makes the update retry automatically. Both cases will cause problems when working with automated processes. To mitigate this problem, we’ve implemented the static data cache. If this cache also increases performance, it would be a nice gain but not the main goal.

AWS AppSync cache

But… wait! AWS AppSync already has a cache you can use! That’s right, however this is caching done in AWS AppSync for the whole (GraphQL) request or per AppSync resolver. This caching is not fine-grained enough or doesn’t apply when doing mutations. Our static data cache applies to specific pieces of data we know we can safely cache for longer than the allowed maximum AppSync cache of 1 hour.

Design Decision: DynamoDB

To add caching functionality to your (enterprise) application, you usually use separate high-performance caching software for fast cache storage and retrieval. AWS offers ElastiCache for this purpose. This is a high performance, fully managed and secure solution with either Redis or Memcached as the engine.

Apart from that, it’s also possible to use any other data storage solution as a makeshift caching storage, for example using AWS DynamoDB.

In a preliminary investigation we compared costs between using DynamoDB and ElastiCache. With 2 ElastiCache nodes (t2.medium, on demand) it costs about $ 100 per month, using DynamoDB with 10.000 read operations and 2.000 write operations per hour it costs only about $ 6 per month. With this huge price difference and that performance isn’t the main goal, the choice was easily made to use DynamoDB.

Design Decision: In-memory caching

When running Lambda functions in AWS, AWS reuses the same lambda function instance for processing multiple events. This means that any variables outside of the main lambda_handler method in the Lambda function stays in memory and gets reused.

With this in mind, we can put a small cache in the Lambda which stays in memory for as long as the instance is reused. This is very useful as it can return results directly from memory for repeated requests.
Keep in mind that this is only a short-lived cache as it’s removed as soon as the Lambda function is shut down by AWS. Nonetheless, with this small in-memory cache we’re able to save a lot of requests. We’re currently seeing that about 75% of the cacheable requests are served from the in-memory cache. More details below.

Design Decision: Python decorator

To easily integrate the static data cache in the codebase of SALA, we’ve decided to use a Python decorator. Decorators are a very powerful and useful tool in Python since it allows programmers to modify the behaviour of function or class, outside the function or class. Decorators allow us to wrap another function in order to extend the behaviour of the wrapped function, without permanently modifying it.

The decorator for the static data cache allows us to implement the behaviour separately while allowing it to be used anywhere necessary (and applicable) without much modification to the original implementation.

For example, take the following Python method from SALA to retrieve a company id from ServiceNow :

def _get_account_sys_id(self, account_number: str) -> str:
    # (...)

By applying the decorator (@cacheable) of the static data cache to this method, it would automatically deploy the caching functionality to all the code contained in the method.

@cacheable(logger_name="my application logger")
def _get_account_sys_id(self, account_number: str) -> str:
    # (...)

The only required argument for using this decorator, is the name of the logger. This logger (looked up or created by name) is used to log messages about a cache hit, a cache miss or the error message if something goes wrong. These log messages is what we use to count the amount of cache hits/misses on our dashboards (see below).

Implementation Schematic

SALA Schematic

This picture outlines the basic flow of the static data cache:

  1. Request for data comes in
  2. Check if the data for the request is available in memory. If so, return it.
  3. Check if the data for the request is available in DynamoDB. If so, store it in memory and return it.
  4. Retrieve the data from the external system as we would’ve done without caching. Store the retrieved data in DynamoDB and memory and then return it.

Implementation

Below is an excerpt from the decorator’s code, which (due to clean code) is quite readable.

def cacheable(logger_name: str, time_to_live: int = DEFAULT_TIME_TO_LIVE):

    def decorator(func: Callable[..., Any]) -> Callable[..., Any]:

        @wraps(func)
        def wrapper(*args: Any, **kwargs: Mapping[str, Any]) -> Any:
            initialize_logger()
            initialize_dynamodb_table()

            try:
                cache_item_name = compile_cache_item_name(args=args, kwargs=kwargs)

                return cache_from_memory(name=cache_item_name) or \
                    cache_from_dynamodb(name=cache_item_name) or \
                    exec_function_and_cache_result(name=cache_item_name, args=args, kwargs=kwargs)

            except CacheItemNameTooLargeError as error:
                cacheable.logger.exception(f"Cache ignore: {error}")
                return exec_function(args=args, kwargs=kwargs)

A few highlights on this piece of code:

  • compile_cache_item_name(args=args, kwargs=kwargs)
    for storing the cache data, we need to compile a cache item name that’s unique to the current implementation of the decorator in the code. We do this by using the name of the original function (for example _get_account_sys_id) and the arguments used by the original function (the account number for example).

  • cache_from_memory(name=cache_item_name)
    retrieve the cache item data if it’s available in memory and return it

  • cache_from_dynamodb(name=cache_item_name)
    retrieve the cache item data if it’s available in DynamoDB and return it

  • exec_function_and_cache_result(name=cache_item_name, args=args, kwargs=kwargs)
    execute the original function and cache the results in DynamoDB and memory

Caching overview in New Relic Dashboard

New Relic is advanced monitoring software with many features, like application monitoring, error tracking and infrastructure monitoring. From all the features New Relic offers we mainly use log management, dashboards and error tracking. All of our Lambda logs from AWS are gathered in New Relic, where we can easily lookup logging of specific cases and generate statistics and overviews for reporting.

Below is a screenshot of our Caching dashboard which shows the amount of cache hits (from memory and DynamoDB), the amount of cache misses and cache errors for one week.

Cache Dashboard

As you can see, for a single week, we save about 3,7 million requests to our connected systems.

Performance

We also noticed a small increase in performance when the caching solution was put into production. We didn’t measure this in more detail as it was not the main goal.

The graph below shows the time the lambda spends executing. At 08/02 we deployed the caching solution. You can see the improvement in performance as the lambda spends less time executing from that point onward.

Lambda function execution time

Want to know more?

Unfortunately as it’s right now, the static data cache implementation is quite specific to SALA and it isn’t currently suitable to be used outside SALA. However, please let us know if you’re interested or if you have any questions by leaving a comment below. With enough interest we can see if we can open source it as a Python package.