Data Reload Got You Cringing?

5 min readJul 20, 2022

By Stephanie Soto, Senior Software Engineer at Integral Ad Science

Let me paint you a picture of a monolithic application, something every company has, and a tool that has been around since Integral Ad Science (IAS) was shiny and new. New features are added daily, but rarely does it get the love and attention needed to fix the tech debt piling up. This application requires updated information throughout the day, which is no small task when you are a global company serving global ads at a pace of 12 thousand requests per minute!

Boiled down, the current monolith solves the data reload problem by restarting itself every couple of hours. That’s not so bad, how long could it take? Our monolithic application took at most 15 minutes to restart itself and load up to 4 GB of data comprised of +/- 13M records. Did anyone else just cringe?

This post addresses how IAS handles data reloads to minimally interrupt traffic and keep our response times high for the services we’ve moved into Kubernetes.

The Problem

A full restart of the service every few hours doesn’t fit into the Kubernetes model that boasts scalability, visibility and efficiency for your developers’ time. IAS eagerly wants to take advantage of all these benefits; especially when responding to unpredictable user request peaks worldwide.

Fun fact: During major events, traffic at IAS peaks to astonishing new heights. During such an event, IAS broke into the 1 million impressions per second arena when the Spain vs. Croatia and France vs. Switzerland Soccer games were aired. And again for the NCAA Basketball Tournament 2022 *AIR HORN*.

The Options

Discussing options brought about a few interesting alternatives to the way we load data today. One option was to set the nodes on cron jobs using an external scheduler to execute the data reload. Preventing each node from reloading at the exact same time would help us quickly respond to traffic and orchestrate controlled reloads. As you can imagine, in Kubernetes, nodes start and stop unpredictably; with current k8s architecture using horizontal pod and node auto-scaling, there would be no way to keep up with the churn.

It was a good first draft, but not a sustainable solution for our applications.

The team looked into Apache Gossip, Zookeeper and an unused inhouse application using change data capture. Reloads would be centralized and each node would know when to reload based on a central source. However, at our scale these options were too radical of an approach.

The last suggested approach would let the applications read and write to the databases from which we already load our data. Again this was shot down due to scale and permissions. The least amount of privilege granted, the more secure our databases can be.

Back to the drawing board, we needed a queue that would deliver confidence while we reloaded data and handled incoming traffic. I present to you AWS’s DynamoDB Lock Client. Trumpets please!

The Lock Client is a distributed locking library built on AWS’s DynamoDB. Each application attempts to attain a single record in DynamoDB. This library comes with some handy bells and whistles where you can play with how the lock records are handled to include a background session monitor designed to combat deadlocks. Pods frequently go down in k8s so making sure we wouldn’t be in a situation with a deadlocked record is crucial. We chose this library for its distributed abilities, scalability, and global availability.

Implementation

We built a wrapper application to handle some of our business logic before and after calling the Lock Service; namely we needed a callback function to run while the lock was active, and proper error handling for the edge cases we could come across with timing. The more we tested, we realized that one pod reloading data at a time prevented us from deploying changes to the cluster leading us to add the capability of a preset number of concurrent reloads.

Integration testing was really easy with the use of localstack/localstack that builds a local DynamoDB table in Docker. They support local cloud service emulators for a lot of AWS resources. Tear down the table after tests are done with no worries about charges in AWS or forgotten resources (developers never forget to tear down resources right?).

Learning Points

While there were some inevitable road bumps, there were plenty of pros to using AWS’ Lock Client. Here are a few of the road blocks we hit.

We needed to implement a connection retry to DynamoDB because sometimes two pods would try to acquire the lock at the same time and the second would receive an error.
We implemented an internal leaseWaitTime that was shorter than the requested lease time to prohibit indefinite locks.
You need AWS sdk java version 1.11.704 in order to give your Service Account access to DynamoDB.
On a Kubernetes cluster, the code relied on WebIdentityTokenCredntialsProvider being in the default CredentialsProvider from AWS. It is not. A real quick dependency add fixes this. Find the most recent available here.
With over 100 pods running at any given time in a single region, deployments of new code became blocked by Flagger (our progressive delivery tool on K8s). We needed to build a way to have 2 or 3 concurrent reloads.

Conclusion

Sometimes you have to use your own solution for problems, but that doesn’t always mean you can’t borrow from other places. If your company works with any of the cloud services, take a look at some of their solutions for a spark of inspiration. But for now, our queued data reloads work within the k8s environment and allows us to control when applications aren’t receiving traffic to update data safely.

Join Our Innovative Team

IAS is a global leader in digital media quality. Our engineers collaborate daily to design for excellence as we strive to build high performing platforms and leverage impactful tools to make every impression count. We analyze emerging industry trends in order to drive innovation, research new areas of interest, and enhance our revolutionary technology to provide top-tier media quality outcomes. IAS is an ever-expanding company in a constantly evolving space, and we are always looking for new collaborative, self-starting technologists to join our team. If you are interested, we would love to have you on board! Check out our job opportunities here.