Our clients have many different teams moving data to the lake and, as we've seen in many cases, the data is redundant, resulting in multiple copies of the same data being stored. Worse than storing multiple copies of the same data is the time spent building the processes to clean, load, and validate this data. Instead, data scientists can spend their time testing a hypothesis rather than having to find data, wrangle data, and work with the IT team to get that data loaded.
Alfred changes these needs by putting a process in place to manage data flowing into the lake while keeping it organized and curated. This process gives data scientists the ability to quickly load their own experimental data without IT intervention, while also giving the data governance team peace of mind that data is catalogued, secure, validated, and well managed. Having a well curated set of business metadata also allows data consumers to have an easier way to look for and find specific data. When insights are ready to be operationalized, Alfred provides IT the code they need to quickly set up the ingestion and refinement processes developed by the data science team. Increased speed to insight, easy to find data, simple to operationalize - everybody wins with Alfred!
CapTech created Alfred without a vendor solution in mind. What this means is that data scientists and IT can continue to use the data movement and analysis tools of their choice. Alfred is flexible enough to work with data lakes in the cloud and on premises (or both) and supports XML, JSON, and delimited datasets. Further, there is no requirement to use a specific data mining or data wrangling tool, which gives users the flexibility to adjust to an evolving set of technologies.
Open sourcing our solution provides CapTech with the unique opportunity to share our thought leadership with our clients and gives our consultants a way to give back to the data engineering community. Our hope is that Alfred provides a starting point for analytics-hungry organizations to solve their modern data governance challenges.
Alfred is available on GitHub: https://captechconsulting.github.io/alfred/.