This blog is a part of a series written about the open-source data ingestion engine Alfred. For an overview of Alfred read this blog. You can also learn what Alfred means for data scientists and how using this tool can save you time and money.
Data stewards benefit from Alfred's features because Alfred was designed by actual CapTech scientists with data governance in mind. These design decisions were driven by our client's needs around data security, data lineage, data quality, and metadata access.
Alfred extends table- and field-level technical and business metadata to data stewards and business users alike. The technical metadata assures the data quality of all data being ingested into the platform, while the business metadata provides the means to store and retrieve key insights and definitions about all data elements. Exposing this metadata to business users lightens the communication needs between the user and data steward by putting all aspects of data discovery in the user's hands.
All production data ingested by Alfred adheres to strict data quality and validation rules. Data ingested into the platform is copied, then both files are ingested into Hive tables as strings values and validated against each other to guarantee an exact match. When validation is confirmed, the final production table is created using the raw data and the technical metadata from the corresponding Alfred template to dynamically create the query statement that ultimately ingests the data into the production environment. At the time of ingestion, tables are created in the database that corresponds to their indicated source, a file-level metadata field, and security designation. The sum of the Alfred ingestion process gives data stewards the peace of mind that the production data adheres to strict validation processes.
Data lineage, another metadata feature for refined datasets, allows data stewards the forensic capability to identify source data for all reports and statistical models available to users. This feature helps stewards interface with their internal compliance and risk departments to conduct forensic analysis when needed.
Finally, stewards can manage user access to Alfred's data catalog and warehouse to ensure data security across your organization's most sensitive data. Built in security reports extend data stewards' system-wide security monitoring. Security can be set at the table level for individual users or groups of users.
If you're a data steward interested in learning more about Alfred you can visit the GitHub page for the tool here.