Cloudspace | Common Mistakes When Building Analytics Platforms: "I need real-time data"

Common Mistakes When Building Analytics Platforms: "I need real-time data"

Posted on March 19, 2013 by Tim Rosenblatt

Business is moving faster than it used to. This is a cliche, and you're probably rolling your eyes that I started a blog post with that line. But it's important to keep in mind, because of what I'm about to tell you.

Because things do move faster than they used to, any opportunity to be faster than the competition is viewed as good, to the point that real-time is what everyone aspires to, with good reason -- even faster-than-real-time is coming in a sense.

Knowing what's going on right now means you can make better decisions, and that's the whole point of an analytics platform.

There is a catch when it comes to asking an engineer for real-time in an analytics platform, and it's a big one. Real-time data can be very expensive and is almost never needed when compared to the alternative of near-real-time.

The main thing you need to think about is called latency -- how long something takes to update.

If you can just wait 15 minutes for data to roll in, you'll save yourself a lot of money. If you can wait 4 or more hours for data to roll in, there's almost no added cost to make something perform at that level. For what it's worth, the system used in Target's faster-than-real-time system could easily have 24-hour latency. They're looking for long-term trends, not 5-second trends.

One of the other things to keep in mind: real-time is not all-or-nothing thing. Even within a single platform, different data sources can have different latencies. You can pick and choose -- if there's one data source that needs 5-minute updates, but there are four others that only need 24-hour updates, you can have it that way.