Looking into individual data records to discover errors. Structure discovery helps understand how well data is structured-for example, what percentage of phone numbers do not have the correct number of digits. Validating that data is consistent and formatted correctly, and performing mathematical checks on the data (e.g. There are three main types of data profiling: Structure discovery Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.Discovering metadata and assessing its accuracy.Performing data quality assessment, risk of performing joins on the data.Tagging data with keywords, descriptions or categories.Collecting data types, length and recurring patterns.Collecting descriptive statistics like min, max, count and sum.user inputs, errors in interfaces, data corruption). Source system data quality projects-data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g.It can also uncover new requirements for the target system. Data conversion and migration projects-data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target.Data warehouse and business intelligence (DW/BI) projects-data profiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.Need to achieve big data profiling with limited time and resources? What is data profiling?ĭata profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects. As data gets bigger and infrastructure moves to the cloud, data profiling is increasingly important. What Is Data Profiling? Process, Best Practices and Toolsĭata processing and analysis can’t happen without data profiling-reviewing source data for content and quality.