Query Engine

Print Tracker

Query Engine

At Print Tracker, customers needed the ability to quickly design and build column-based reports. I designed and implemented an in-memory SQLite-based query engine that manages isolated SQLite instances for all our customers and shuts them down when they're inactive.

Go SQLite REST Kubernetes Istio

THE PROBLEM

At Print Tracker our customers have many different reporting needs. We collect thousands of different data points that somehow need to be easily accessible to the customers that need them. We needed a way for customers to:

Build and customize reports with the data that they need
Schedule the reports to be generated on configurable schedules
Distribute the reports to the people that need them

THE SOLUTION

Many of our customers are familiar with SQL-based solutions so I thought, “why not just give them a SQL interface to our data?” I designed and implemented an in-memory SQLite-based query engine that manages isolated SQLite instances for all our customers and shuts them down when they’re inactive.

It acts as a hypervisor for SQLite instances. When a user runs a query, we initialize a new SQLite instance and populate the database using our persistent database (MongoDB) one-time. The first query can take a few seconds while the database is being populated, but subsequent queries are very fast. Users can then make as many queries as they like, adjusting the report as needed. Once we detect that the user is no-longer running queries, we shut-down the instance and free up the memory. If the user happens to run a query again, we can quickly spin-up a new instance and populate it with the data just like before.

This approach has a couple of benefits:

It is more flexible then a glorified column-picker. Since we support all of SQLite syntax, our customers can do mathematical expressions, sub-queries, window functions, and more.
It reduces the load on our persistent data store. Once the data is initialized once, we don’t have to query the persistent data store again.
Users and automated systems can run the same queries. This ensures that when a user schedules a report to be distributed to them regularly, they get the same data that they see when they run the report manually.

HORIZONTAL SCALABILITY

We wanted the implementation to be horizontally scalable: during the periods of the month, where automated report generation is high, our system to scale to meet the demand. Since we run all our infrastructure on Kubernetes, we could simply increase the number of replicas using a horizontal pod autoscaler (HPA).

The challenge is that these SQLite instances are stateful, meaning that when a user sends a query, it needs to be sent to the same instance that they used before. We don’t want a user’s second request to get sent to a second instance of the query engine when the first instance already has their in-memory database initialized. Simple round-robbin routing across the replicas would not work.

The solution was actually remarkably simple: we already handled all of our routing with Istio. Using an istio destination rule, we could ensure that subsequent requests using the same HTTP header (httpQueryParameterName) would be routed to the same instance of the query engine. This way, we could scale the query engine horizontally and still ensure that users’ requests were routed to the same instance.