Case Study

Our collaborative approach helps businesses grow, engage their customers and turn data into powerful insights.

A Fortune 500 Energy Company had a data warehouse that could not accept the amount of data that was being generated by their smart meters and grid monitoring technologies. They also lacked the ability to leverage their customer data in a meaningful way due to computer processing power limitations. With over 3 million customers, this forced their data analysis to be a time consuming, piecemeal process. By creating a data ingestion product integrated with a secure cloud solution, CapTech was able to help the organization store and analyze its data quickly and make better predictions of their customers’ energy usage. Now, the client can leverage its data to improve customer experience, save money, more efficiently use resources, reduce its ecological impact, and optimize marketing efforts.

Business Case

For a Fortune 500 Energy Company, a workflow bottleneck prevented them from performing data analysis on their entire customer population. Their workflow consisted of querying from a data warehouse straight to their local machines for further examination. And with a customer base of over 3 million, this resulted in a time consuming, piecemeal analysis. These unacceptable processing times restricted their ability to deliver actionable marketing insights to company decision-makers. 
Meanwhile, the amount of data available to the company continued to grow rapidly due to their increased adoption of residential smart meters. Their customer analytics team was looking for a solution that could process large volumes of data, integrate internal and external data, and allow the team to become more predictive and proactive, rather than descriptive and reactive.


The clients’ customer analytics team engaged CapTech to first propose an architecture and develop a roadmap that would allow them to set up, maintain, and utilize a big data insight platform. We implemented a big data cloud solution leveraging a Hadoop cluster to analyze data using RStudio server and SparkR. This solution utilized open source tools that saved costs and easily integrated with their existing systems. This platform allowed the client’s analysts to interact with large volumes of data from different sources and generate analytical insights. Within the first week, the cloud tool was up and running and able to process up to 40 terabytes of data, which can be scaled as needed. 

We also developed a Data Governance policy that covered: security, data requisition, roles, and responsibilities to support the platform, data usage guidelines, and a metadata policy. This policy allows the client to expand their data governance across their enterprise and provides a framework that enables smart decisions. Part of this policy implementation included a CapTech created custom data ingestion engine that acts as a gatekeeper to prevent ungoverned data from being loaded onto the platform. This process simplifies data management and assures a documented data catalog to assist in data audits and lineage tracking.
Throughout the process, CapTech provided ongoing training to the clients’ data analysts on how to use the architecture and tools most effectively and to ensure that they could operate self-sufficiently.


The cloud solution we implemented has allowed the energy company’s analysts to access data in a fraction of the time that was previously possible. Some queries that used to take days can now be completed in mere minutes. This has freed up previously paralyzed analysts to respond to requests quickly and leverage data more effectively. Now that the client can store and analyze all of their data they can make better predictions of their customers’ energy usage. 

These predictions have a number of trickle down benefits. With this new information, they can:
• actively monitor the way that customers are using energy and see when energy usage drops. This could allow them to send out a truck to fix problems before customers even realize the power has gone off. 
• more accurately prepare their supply to prevent blackouts and brownouts. 
• lower margins on excess energy production which saves money for both customers and the client.
• utilize fewer resources and reduce their ecological impact.  

Additionally, the energy company will be able to better market to its customers with this information. For example, the company could predict how old a customer’s heat pump is and then target marketing efforts by monitoring a strange energy pattern for a customer. As data needs increase or decrease, the platform can be quickly scaled to handle different workloads. 

Tools & Methodologies

  • Agile/Scrum
  • BDaaS
  • Python
  • Pydoop
  • Impyla
  • Java
  • Spring
  • Spring Boot
  • JavaScript
  • React
  • RStudio
  • Git
  • Spark
  • SparkR
  • Spark MLlib
  • Hive
  • Hadoop