by Feng Fan

Recently I worked on a project to evaluate different left-join options for a Spark application we are building to modernize our largest data pipeline. The pipeline processes about 2B events per hour, creating a data set of about 0.5B records. There was a long running left-join operation that took 20 minute to finish using Pig over MapReduce in the old pipeline. My task was to benchmark this left-join operation with different Spark join options. This article shares the learnings I gathered during that project.

Dataset Sizes and Test Environment

On the left side of the join we had a big dataset of…


by Akshay Tambe

At Integral Ad Science (IAS), we measure over 100 billion data events daily, giving our customers unmatched scale, coverage, and accuracy. We process this data with hundreds of big data processing and data science pipelines. As we’ve continued to scale globally, IAS migrated to a cloud-based infrastructure hosted on Amazon Web Services (AWS), resulting in cost savings and increased performance. One great strategy to control and reduce AWS costs is to leverage spot instances.

Spot Instances are spare EC2 instances in the AWS Cloud which are offered at up to 90% cost savings compared to on-demand instances…


Is banana bread a bread? An unexpected UXR journey in cookbook design.

by Joey Stempel

After organizers announced they were putting together a company cookbook, I was quick to volunteer. As a UX Designer who loves to cook and consumes a significant amount of food media, this was too perfect of an opportunity to pass up. Only I didn’t realize just how helpful UX Research would be; by the end, I had conducted competitive research, run a card sorting exercise, and applied user-centered design principles.

In early discussions with the team, the first question to arise was: how should the…


by Yuva Mahendran

At Integral Ad Science we constantly experiment with technologies to process massive datasets and get insightful performance details for customers. One of our major initiatives over the upcoming quarters is to introduce streaming in our multi-billion-events-per-hour data ingestion layer and provide real-time metrics for our customers. Introducing streaming into this massive pipeline could easily span multiple quarters before reaping any benefit, if not properly planned. This blog covers our phased plan to introduce streaming in our system and highlights tracers we added to automatically test data consistency in the streaming pipeline.

Batch processing pipeline

The current log processing pipeline is…


by Yuva Mahendran

At Integral Ad Science, with billions of events hourly, milliseconds can make a difference in down-stream processing. Is Apache Pulsar ready to replace Kafka as our go to streaming data provider? We put it to the test.

Goal

Our main goal was to expose and make data available for down-stream processing within milliseconds from the actual event happening.

Candidates for experimenting

Apache Kafka is a framework that’s been in the market since 2011, and has stood the test in time in and outside IAS. Given that we have our core-pipes already running in AWS, MSK (Amazon managed Kafka) was a natural…


by Janus Chung

I love technology and I am lucky enough to get paid for pursuing this hobby. So it is no surprise that in my spare time at home I am experimenting with new technologies and leveraging those to make my life at home a bit more convenient. One of the latest projects I worked on at home was rebuilding my home server.

In this new socially distant reality, the tools I use at work (Docker and automation with Git and Jenkins) have helped me build a home server to unify and simplify entertainment, and connect with family and…


by Bruce Rudolph

At IAS, we have Java-based web services that require a response as low as 10 to 20 milliseconds for several billion requests per hour. Therefore, having a deterministic time for Garbage Collection (GC) is essential to avoid service request/response timeouts.

Repurposing a Java-based Web Service

One such web service had an average response time of 50 ms with periodic spikes greater than 200 ms. The consumers of this application tolerated these response times. …


By Emiliya Trakhtenberg

A new member of your team is your most important user — you are trying to create an experience for them to fit into your team and ultimately stay on your team and continue growing with the company.

Do you remember how you felt the first day of your job? I was eager, excited, nervous but playing it cool, and above all, I was ready to jump right in and hit the ground running! Naturally, there was a whole lot I needed to learn about my role, the company, our processes, and our technology before I’d be…


By Dhanush Soundarapandyan, Rene Haase & Yuva Mahendran

Integral Ad Science (IAS) is the global leader in digital ad verification, offering technologies that drive high-quality advertising media. IAS equips advertisers and publishers with both the insight and technology to protect their advertising investments from ad fraud and control where their ads appear to capture consumer attention and drive business outcomes.

In short: we sell data and we have lots of it as we mostly process web traffic data — we process several billion events per hour. In order to enable unconstrained growth, improve developer velocity, and reduce the cost of…


The coronavirus outbreak is forcing us all to make many lifestyle changes in order to limit transmission and protect our most vulnerable citizens.

Many tech companies, including IAS, are encouraging or requiring all employees to work from home. This may be a jarring transition for people used to working from the office, but for many of us here at IAS, working remotely is just part of the job.

We asked some of our seasoned remote employees to share their tips on how to be productive and stay sane while working from home.

Aurelian Săndulescu, Senior Scrum Master

As a remote employee since 2008, these are…

IAS Tech Blog

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store