One of the primary responsibilities of a data architect is to maintain a balanced workload on data systems. This can be a daunting task, and often involves a variety of complexities. Instead of focusing on the details of the complexities, I want to introduce the basic concepts out of which the complexities stem from. I will provide a real-world example to illustrate how the basic concepts are applicable and finish up by summarizing the overall importance of a data architects balancing act.
At a high-level, there are two basic groups of processes that make up the workload on a data system. The Extract, Transform, Load (ETL) processes input data into the system while the business intelligence (BI) processes extract information out of the system. Some characteristics of each are listed below.
- Processes for acquiring, transforming, and integrating data
- business relevance
- Processes for requests, transactions, and queries accessing data
- business value
A data architect must keep the system in balance as dictated by the business. To illustrate this, let's use the typical 24 hour weekday for a clothing retailer. The day starts just after midnight and the longer running ETL processes will be run during off hours. For example, my fiancé and I recently registered at Kohls, and we were told our selected items could not be uploaded to our registry until midnight. This means Kohls runs the ETL processes to load data into the central registry system at night when there are not many BI processes taking place. Come 8:00 AM the long running ETL processes will fade out as business users logon and begin utilizing the BI processes to run ad-hoc queries and reporting applications. Certain data that is pertinent to business operations, like loyalty rewards call centers, will need to be loaded throughout the day in small increments. After 5:00 PM, users begin logging off and ETL processes gradually ramp up to begin loading large amounts of data again.
This creates competition for resources and the optimal use of available resources is required. The database/warehouse should be designed in a way that ETL processes do not take an excessive amount of time to load data; as well as, BI processes should not take an excessive amount of time to retrieve data and display the valuable information. The right data needs to be available at the right time just as the valuable information needs to be readily accessible at the right moment. The data needs to be loaded accurately and the information needs to be aggregated appropriately. A data architect must take both the ETL and BI requirements into consideration when designing a data system to ensure a balanced workload. This means designing the database and its surrounding processes with both ETL and BI in mind.