CSE 450 (AB-213): Some Data Quality issues and, how you will deal with those problems?

Some Data Quality issues and, how you will deal with those problems?

1. Duplicate data

Modern organizations face an onslaught of data from all directions – local databases, cloud data lakes, and streaming data. Additionally, they may have application and system silos. There is bound to be a lot of duplication and overlap in these sources. Duplication of contact details, for example, affects customer experience significantly. Marketing campaigns suffer if some prospects get missed out while some may get contacted again and again. Duplicate data increases the probability of skewed analytical results. As training data, it can also produce skewed ML models.

Rule-based data quality management can help you keep a check on duplicate and overlapping records. With predictive DQ, rules are auto-generated and continuously improved by learning from the data itself. Predictive DQ identifies fuzzy and exactly matching data, quantifies it into a likelihood score for duplicates, and helps deliver continuous data quality across all applications.

2. Ambiguous data

In large databases or data lakes, some errors can creep in even with strict supervision. This situation gets more overwhelming for data streaming at high speed. Column headings can be misleading, formatting can have issues, and spelling errors can go undetected. Such ambiguous data can introduce multiple flaws in reporting and analytics.

Continuously monitoring with autogenerated rules, predictive DQ resolves ambiguity quickly by tracking down issues as soon as they arise. It delivers high-quality data pipelines for real-time analytics and trusted outcomes.

241 words

Class and Counseling Hour

Announcements

Course Outlines

Course Assessment Plan

Text Book: Introduction to Data Mining - Tan

Reference Reading Materials

Course Motivation Test

Introduction to data mining (videos)

Introduction to data mining (readings)

Lab: Working with Python

Week 2 Discussion Forum

About Data (videos)

About Data & Project Guidelines (readings)

Lab: Data Processing

Week 3 Discussion Forum

Assignment-1

Exploring data (videos)

Exploring data (readings)

Lab: Data pre-processing

Weekly_Quiz (deadline: 29 October)

Week 4 Discussion Forum

Introduction to classification (videos)

Introduction to classification (readings)

Lab: Working with python & classification

Weekly_Quiz (deadline: 29 October)

Week 5 Discussion Forum

Communication technique

Submit Your Recorded Presentation (Deadline: 30 November, 2021)

Association Rules Mining (videos)

Associate Rules Mining (readings)

Lab: Working with python & Association rules mining

Clustering (videos)

Clustering (readings)

Lab: Working with python & clustering

Week 3 Discussion Forum

Some Data Quality issues and, how you will deal with those problems?

Some Data Quality issues and, how you will deal with those problems?

Some Data Quality issues and, how you will deal with those problems?

1. Duplicate data

2. Ambiguous data

Data Mining Online Class (Sections: PC-C, PC-D,PC-F)

Class and Counseling Hour

Announcements

Week 1: Ice-Breaking and Course Overview

Course Outlines

Course Assessment Plan

Text Book: Introduction to Data Mining - Tan

Reference Reading Materials

Course Motivation Test

Week 2: Introduction

Introduction to data mining (videos)

Introduction to data mining (readings)

Lab: Working with Python

Week 2 Discussion Forum

Week 3: Working with data

About Data (videos)

About Data &amp; Project Guidelines (readings)

Lab: Data Processing

Week 3 Discussion Forum

Assignment-1

Week 4: Data Exploration

Exploring data (videos)

Exploring data (readings)

Lab: Data pre-processing

Weekly_Quiz (deadline: 29 October)

Week 4 Discussion Forum

Week 5: Classification & Prediction

Introduction to classification (videos)

Introduction to classification (readings)

Lab: Working with python &amp; classification

Weekly_Quiz (deadline: 29 October)

Week 5 Discussion Forum

Presentation

Communication technique

Submit Your Recorded Presentation (Deadline: 30 November, 2021)

Week 6: Classification Tuning

Week 7: Nearest Neighbor and Bayesian Classification

Week 8: Associate Rules Mining

Association Rules Mining (videos)

Associate Rules Mining (readings)

Lab: Working with python &amp; Association rules mining

Week 9: Working with Clustering

Clustering (videos)

Clustering (readings)

Lab: Working with python &amp; clustering

Week 10: Neural Network

Class Test 03

Final Examination

Some Data Quality issues and, how you will deal with those problems?

Some Data Quality issues and, how you will deal with those problems?

1. Duplicate data

2. Ambiguous data

About Data & Project Guidelines (readings)

Lab: Working with python & classification

Lab: Working with python & Association rules mining

Lab: Working with python & clustering