Based on this chart with numbers from DB-Engine.com one may think time-series databases will be the next best thing since sliced bread. I wouldn't go that far, but it does intrigue my interest.
What is driving this change? Why has this growth exploded in the last 12 months? What is this trend signaling?
It is important to note that this chart only shows the percentage change in popularity relative to other database types. This popularity ranking is calculated by many factors according to DB-Engine.com including but not limited to: number of mentions on websites, general interest, the frequency of technical discussions, relevance in social networks and number of job listings.
Before we understand the trends, first we must have a firm understanding of time-series data and databases; if you're like me, I had never heard of time-series databases until recently.
What is Time-series Data?
Time series data is a sequence of data points, typically consisting of successive measurements made from the same source over a period of time. Generally, it's any data that has a timestamp, like sensor measurements, system stats, and log files. For example:
- All enterprise websites are utilizing time-series data to measure the performance of the web application and to help predict issues before they occur. For example, let's say you have an architecture that scales based on response time of the server. When a threshold is hit based on time-series data, a new server can spin up to balance the load.
- Another example is a package delivery company. We all know that feeling of checking the status of a package that is needed before an important event. The abilities of these organizations to ensure on-time delivery relies on knowing where/how long your package is at each step of the transit and using that data to optimize the process for a faster delivery time.
- Time-series data does not have to only be produced from systems. Time-series data was collected long before databases were even invented in the 1960's. Financial stock and weather information used to be recorded in ledgers - this is a form of time-series data.
Time-series data comes in two forms, regular and irregular.
- Regular: Data coming in at a fixed interval like a weather monitor recording the temperature every minute. You do not know what the temperature reading will be, but you know to expect it every minute.
- Irregular: Data is recorded at irregular time intervals. For example, a website may log every time a user interacts with the contact form. Data will be logged but you do not know when.
Irregular data collection is based on events; regular data collection is based on measurements/metrics.
If you think this type of data can be stored in another database type, you are correct. Many databases can handle time-series data, but they are not optimized to do so. In the next section, I highlight the features of time-series databases.
What are Time-series Databases?
Time-series databases are optimized for collecting, storing, retrieving, and processing of time-series data. Compare this to:
- Document databases which are optimized for storing documents.
- Search databases which are optimized for full-text searches.
- Traditional relational databases which are optimized for the tabular storage of related data in rows and columns, and transactional data.
As noted above, time-series data is not a new concept. However, the need to process the growing amount of time-series data is becoming a challenge for many enterprises.
Time-series databases offer the following features:
- High Write Speeds: Normally OLTP databases are optimized for updates on how the partitioning and indexing work. With time-series you're mostly inserting data to the end of the set, therefore, time-series databases have been able to be optimized for that specific use case.
- Data Life Cycle Management: Typically, time-series data needs to be at a low-level granularity for the first week. As time passes you may want to start summarizing the data and archive it to save space which is straightforward and out of the box with most time-series databases.
- Efficient Long-Range Reads: Relational database indexes are generally optimized to return a long time range for specific tags. Time-series databases are optimized for these types of reads, but they lack functionality to perform complex queries efficiently.
- Time-Series Functions: Most time-series databases have built-in functions like time aggregations, continuous queries, and math operations.
Popular time-series databases include InfluxDB, RRDTool, Graphite, OpenTSDB, and Prometheus. Some NoSQL databases like MongoDB or Cassandra offer ways to efficiently hold time-series data but can be a more significant lift to set up.
3 Trends Impacting the Uptick in Popularity for Time-Series Databases
- Growth in the IoT Market: According to Gartner 8.4 Billion Connected "Things" Will Be in Use in 2017, Up 31% From 2016 and will reach 20.4 billion by 2020. Many of these devices are producing time-series data through sensors and logs. Autonomous cars, for example, will generate and consume roughly 40 terabytes of data for every eight hours of driving, according to Intel CEO Brian Krzanich, speaking at the auto show's technology pavilion, Automobility. Even with some IoT data being summarized with edge analytics there is still a massive inflow of data that needs to be processed efficiently.
- The Growth of Machine Learning: According to a study performed by McKinsey: Tech giants including Baidu and Google spent between $20B to $30B on AI in 2016, with 90% of this spent on R&D and deployment, and 10% on AI acquisitions. This level of investment has a direct impact on the popularity of time-series databases as many AI/ML use cases utilize time series data. For example, machine learning is helping companies identify problems when error codes are not being logged based on a loss of traffic. Time-series data is needed to solve these types of issues to help the model learn and for prediction.
- Need for High Throughput Efficiency: Gone are the days of monthly or even nightly data loads. Companies are losing money every time systems go down and the ability to understand system/application performance in near real time allows for fixes to occur before it becomes a problem.
The ability for enterprises to pivot and adopt to the ever-growing changing market is paramount to their success. Think about Blockbuster, Circuit City and, Sears. While these companies have failed for multiple reasons, I think one common theme is their inability to capitalize on newer technology.
Companies need to understand what data they can collect to allow them to gain insight into the market and their customers for a better customer experience. Those that can successfully gather insights from data near real time will have the upper hand.
I believe the sharp increase in the popularity of time-series databases is a strong indicator that many companies are facing challenges with using relational databases to solve their use cases. Companies are looking for other ways to solve their time-series data problems to capitalize on the influx of data available.
If you enjoyed this post, I'd be very appreciative if you share it with a friend via email or share it on Twitter or Facebook. Thank you!