How IAS Productionalized SonarQube

8 min readJul 13, 2022

Written by Milap Jhumkhawala, Senior Software Engineer at Integral Ad Science

As Integral Ad Science (IAS) continues to scale globally, the wide array of software products and services it develops evolve and become more complex. The risk of incidents related to application failure and bugs being leaked into production increases. To mitigate this, we need an authority that consistently monitors the code for potential bugs and detects them at a very early stage of development.

This article showcases how at IAS, we adopted GitOps to host SonarQube in a Kubernetes cluster.

Why did IAS decide to run Sonarqube as a production service?

Borrowing an anonymous software testing quote —

A bug in hand is worth two in the box.

Programmers aren’t perfect. The extent of manual code review is limited, and every error in the code will never be found. The relative cost in terms of developer time and effort of fixing issues in production because of errors in the code is high. It is much easier to fix them while the developers are still in the coding phase. Here is an example of a very costly error —

July 22, 1962: NASA’s Mariner 1 Done In by a Typo

Code quality is important as it impacts the overall software quality, which is critical. SonarQube is a great tool that can consistently monitor the code for bugs and security vulnerabilities and offers visual feedback to the developers in a dashboard.

Architecture Overview

The entire architecture is divided into three separate processes that are implemented in the following order:

Sonarqube Helm Chart
Configure Database
Deploy to Cluster

Sonarqube Helm Chart

The official Helm webpage describes Helm as a package manager for Kubernetes, but it is more than that. Helm not only packages the application as a collection of pre-configured Kubernetes resources, but it also provides CLI commands that can be used to manage, deploy, upgrade, rollback, and delete the application in the cluster. In other words, instead of individually deploying Kubernetes resources like pods, services, rs (ReplicaSet), deployments, etc., we could deploy all resources using a single command helm install ; it’s that easy.

The application container image and Kubernetes resources required by that application are bundled together into a chart. At IAS, we decided to use open-sourced Sonarqube Helm chart maintained by Oteemo.

Configure Database

The above architectural diagram shows the high-level flow of provisioning Sonarqube’s supporting infrastructure, which is a PostgreSQL database (db) setup using AWS RDS (Relational Database Service) in the IAS Prod Environment.

The Sonarqube Helm chart provides an option to deploy the application along with a PostgreSQL database in the cluster, but we decided to use an external server. We wanted to take advantage of features like HA (Highly Available), easy to administer, and restore the database from snapshots.

The architecture consists of two different pipeline flows. The first one starts when the tag pipeline of git repository sonarqube-cdk-repository is triggered manually. sonarqube-cdk-repository is a GitHub repository that holds AWS CDK definitions for the PostgreSQL database and its secrets. The diagram shows the pipeline stages, and a numeric symbol labels their order.

Pipeline 1

① Provision Database using AWS RDS

We implemented a simple sonarqube AWS CDK construct that configures the following properties and provisions an RDS instance.

DB credentials: Two database password configs are created, pg-admin-user-pw for the admin user and pg-sonar-user-pw for sonarqube. Both passwords are randomly generated and encrypted. They are stored in AWS Systems Manager’s Parameter Store (ssm param store).
DB Engine type: db_engine sets a PostgreSQL version as the database engine.
DB security group config: The same security group assigned to the AWS EKS cluster is used to allow communication between the sonarqube server and database.

② Bootstrap the DB

Once the database is provisioned, we need to bootstrap or do the initial setup so that Sonarqube can talk to it. This is done as part of the second pipeline stage. The sonarqube database username and password are grabbed from the parameter store, a database connection is established, and the sonarqube database is created.

The sonar-db-setup.sql contains the following SQL statements:

This completes the bootstrap, and the Sonarqube server can now establish a connection with the database and log in using the sonarqube username, sonar, and password stored in the parameter store.

Pipeline 2

This pipeline kicks off when the tag pipeline of sonarqube-backups-repository is triggered. The purpose of this pipeline is to replicate the current production database state across dev and staging environments. Sonarqube test instances are hosted in dev and staging environments in different AWS accounts. We use dev and staging to test a new version and custom features before rolling it out to production. The pipeline stage uses pg_dumpto dump out the contents of the sonarqube database into a SQL file and uploads it to an AWS S3 bucket. Restoring the dev and staging database from production AWS RDS snapshots (emphasis on prod snapshots shared with dev and staging accounts) doesn’t work because production database credentials are preserved in it, and dev and staging are configured with different credentials (when the stack is created by running Pipeline 1), the pods would fail withFATAL: Cannot connect to database, password authentication failed for user "admin". We chose not to enable reading production secrets in the parameter store from dev and staging accounts. There is no cron setup for this pipeline, so the backup file is not uploaded to the bucket regularly. Instead, it is treated like an on-demand database backup generator, and the reason for that is we do not maintain dev and staging stacks all the time; they are torn down after a feature or upgrade is tested. However, we do use RDS snapshots to regularly backup the prod database and restore it as part of the disaster recovery procedure.

③ Upload db backup file to S3

Each SQL file is timestamped. When provisioning infrastructure, the Pipeline 1 automation has logic that gives an option to restore the database from this kind of backup SQL file. This makes production data easily replicable in other environments.

A push of a button achieves provisioning and configuring the PostgreSQL database.

Deploy to Cluster

Sonarqube is deployed to an AWS EKS cluster using GitOps workflow. The core concept of GitOps is declaring the application’s state in Git and using that as the only source of truth. With the configuration versioned in Git, applications can be easily deployed or rolled back. For example, in the traditional way of deploying an application (CIOps), a CI tool like Jenkins would grab the artifact from an artifact repository, which had to follow a set of instructions for Jenkins to deploy that artifact to an environment. At the same time, a configuration management tool like Puppet would make sure the required environment properties are set. We would end up maintaining three systems. If a change in application is required, the application is updated, packaged again, uploaded to an artifact repository, and then the entire process is done again. But with GitOps, the application config is stored in Git; there is a single place where everything is derived and driven. This trivializes deploy or rollbacks, making them as simple as updating versions in Git. To understand GitOps in detail, this article is a good place to start.

To implement GitOps, we use Flux CD, a tool used to keep Kubernetes clusters in sync using a centralized source of configuration (Git repository) and perform automated updates to the application when the source Git repository is updated. Before we dive into the implementation, below are some of the core Flux CD terms and their definition —

Sources: A Source defines the origin of a repository containing the desired state of the system and the requirements to obtain it (e.g. credentials, version selectors). Sources produce an artifact consumed by other Flux components to perform actions, like applying the contents of the artifact to the cluster. Examples of sources are the Git repository, Helm Repository, and AWS S3 bucket. We are using the Helm repository for Sonarqube.
Reconciliation: Refers to ensuring that a given state (e.g., an application running in the cluster, infrastructure) matches the desired state declaratively defined somewhere (e.g., a Git repository). There are a few types of reconciliation, but we only need to know about HelmRelease. HelmRelease reconciliation ensures the state of the Helm release matches what is defined in the resource and performs a release if this is not the case (including revision changes of a HelmChart resource).

Flux CD components:

Helm Controller: A Kubernetes operator that watches for HelmRelease objects and generates HelmChart objects, which it monitors for changes and performs automated Helm actions like installs, rollbacks, and uninstalls.
Source Controller: A Kubernetes operator that detects source changes, and fetches resources on-demand and on-a-schedule. Types of source controllers are GitRepository, HelmRepository, and Bucket.

At IAS, we have implemented a multi-tenant architecture in which there is one central repository called the platform admin repository and the rest are tenant (IAS dev teams) repositories. The source controller that monitors the GitHub repository sonarqube-tenant-repository to check for any update at 1-minute intervals is deployed from the admin repository. sonarqube-tenant-repository houses Sonarqube-related custom cluster resources definitions.

There are two custom resource files needed:

repository.yaml — This is a HelmRepository definition. The index YAML of helm chart repository oteemo is fetched every 5 minutes. Sonarqube helm charts is one of many charts hosted by Oteemo.

release.yaml — This is a HelmRelease definition. It creates a Helm deployment; the Helm chart (v 9.6.2) is fetched from the oteemo Helm repository created by the repository.yaml.

The Sonarqube version used in the deployment (v 8.9.8 — current LTS) is also set (overrides the helm chart version), along with other values like db username and password and an ingress URL to access the application. The interval value tells the reconciler how often it should reconcile the release.

Deployment Workflow:

Updating the source sonarqube-tenant-repository triggers reconciliation and the sonarqube helm chart version defined in the release.yaml is deployed as a helm deployment in the cluster.

Conclusion

Upgrading Sonarqube is as easy as it gets; all it takes is opening and merging a single pull request that updates the helm chart version.

And boom, the server is upgraded, newly generated pods auto-connect to the database, and sonarqube is back up in almost a minute.

Join Our Innovative Team

IAS is a global leader in digital media quality. Our engineers collaborate daily to design for excellence as we strive to build high performing platforms and leverage impactful tools to make every impression count. We analyze emerging industry trends in order to drive innovation, research new areas of interest, and enhance our revolutionary technology to provide top-tier media quality outcomes. IAS is an ever-expanding company in a constantly evolving space, and we are always looking for new collaborative, self-starting technologists to join our team. If you are interested, we would love to have you on board! Check out our job opportunities here.