CapTech conducted a data strategy assessment for the PGA of America (PGA), one of the world’s largest sports organizations, with the goal of establishing the largest golf consumer data platform in the industry. The PGA produces a wealth of golf consumer data through its day-to-day operations, but this data was heavily siloed within each individual operational pillar. This structure limited the PGA’s ability to realize shared value from its consumer data assets at an enterprise level. Interviews with stakeholders revealed a need to create a unified consumer data environment. CapTech collaborated with the PGA to create a data warehouse hosted on Amazon Web Services (AWS) that end-users from across the organization could access to drive business value.
The PGA had limited access to valuable data across business units within the organization and a lack of data governance and lineage, resulting in a significant amount of untapped potential from sizeable amounts of relatable data. At each professional tournament, the PGA would produce retail, volunteer, event ticket sales, event hospitality sales, and marketing data. These data sources were separately operated, meaning one operational team could not easily access another team’s data to drive value. This model resulted in data siloes within the organization. The challenge was to create a centralized data platform that would drive value for the PGA and consumer alike.
Another challenge was finding a way to implement this process at-scale. With access to millions of rows of data and hundreds of columns within some data sources, it was necessary to carefully consider how to create a reliable, valuable, timely, and cost-effective data warehouse.
The PGA currently uses AWS for their computing environment and opted to continue using AWS to build the consumer data platform. The first step was to prioritize the data sources that would ultimately populate the data warehouse. These sources were then ingested in raw form into Amazon S3, AWS’s Simple Storage Service. S3 serves as a data lake, a place to store raw, unstructured data from all ingested source systems. To bring together these varying data sources, CapTech implemented different patterns for the use-cases using Appflow, webhooks, Lambdas, and JDBC connections through AWS Glue.
From there, the data went through validation and initial cleaning, a minimal process to standardize email and phone number formats while dropping null values. Data that did not pass validation requirements was sent to “quarantine,” where it would be stored but not included in the data warehouse.
Once trusted data was identified, extensive transformation was done using AWS Glue before loading into the data warehouse. Due to the number of sources, the data needed to be joined, unjoined, and manipulated to match the tables and data types of the target system for a successful data load. This process was based upon an Entity Relationship Diagram (ERD), which is a means of visualizing the schema of a relational database. End-users can then use the ERD to gain an understanding of the tables, fields, and datatypes within the data warehouse to write SQL queries for their analytical efforts.
The new PGA Consumer Data Mesh environment is built to be scalable for any number and formats of endpoints. Foundationally, enriched consumer data is persisted into a structured data mart built on AWS Redshift. PGA’s technical and operations teams can access unstructured, but validated and enriched data directly in S3 via AWS Athena. Finally, CapTech implemented a direct file ingestion framework for integration into the PGA’s Marketing Technology stack.
The solution resulted in a centralized repository that end-users can leverage to quickly retrieve data for business reporting and intelligence. It also enabled permissions at an enterprise user level, rather than per system, allowing the PGA to establish a well-structured, role-based approach to accessing sensitive data. Among the highlights:
Reporting time was reduced from hours or days to minutes.
Validation processes have captured over 30GB of malformed data in production instances, which translates to 250,000 records that can be used in reports to drive critical business decisions.
These key benefits pave the way for the PGA to become a data-driven business as it continues its mission to become one of the largest golf consumer data organizations in the world. With a strong foundation in place, the PGA can continue to build on top of its capabilities and unlock features like consumer segmentation, propensity modeling, frictionless transactions, and more.