Analytical Processing Queue

UNDER CONSTRUCTION

Print Tracker

At Print Tracker, customers need to extract insights from their printer usage data, like "based on historical printing patterns, when is this toner expected to run out?" To address these predictive analytics needs and several others, I designed a processing queue that is written in Rust and takes heavy advantage of several Redis data structures.

Rust Redis

THE PROBLEM

Print Tracker collects a lot of data from customer printers and providing actionable insights is the only way to make all that data valuable. You could think about the data as densely packed time-series data where we have 10s-100s of data points for a single device attached to a single timestamp. Many of the insights that customers care about require looking at trends in this data — i.e. for a single data point, we may need to load in several of the previous data points for comparison.

THE SOLUTION

We solved this problem by implementing a processing queue that uses Redis lists, sets, and streams for queuing up new data that needs to be processed and storing the most recent n data points so that whenever we process one data point, we already have the previous data points loaded into memory ready for analysis, avoiding the need to make expensive database calls on our time series collection.

After the data is analyzed, we trim the Redis streams so that we only keep the data points that are needed when the next data point is received.

FAULT TOLERANCE

The queue is designed to be fault-tolerant: we make sure that every data point is analyzed, and we don’t trim the streams until we’ve successfully processed the data point. If the queue crashes, it will pick up where it left off once a new data point is received. We found that most errors are type-related — missing fields, unexpected values, etc. By using Rust’s type system and the serde crate, we can catch most of these errors at compile time.

Horizontal Scalability

Because we use Redis as the primary data structure for queueing data points that need to be processed, we can run as many instances of the Rust service as we need to. Redis lists support the BLPOP command which allows multiple clients to block until data is available, and clients are prioritized in the order in which the requests are received, acting as a sort of naive load balancer.