CSE 450 (AB-213): Mention Some Data Quality issues

Working with data we can find some qualities in a dataset. For example, noise and outlines, missing values, duplicate values in a dataset are parts of data quality. Noise refers to the modification of original values. It distorts the original values and makes the dataset unusable. On the other hand, missing values in a dataset also create a row of a dataset unusable. And finally, duplicate values like duplicate rows make a dataset large and hampers the processing of a dataset as duplicate values aren’t helpful in a dataset. It only creates a storage problem.

Dealing with this kind of dataset is really challenging. Noise can be removed from a dataset by finding out the outliers. The missing values can be two types one is null values and the other one is an object in the place of numbers. We can remove them and replace them with mean values. Mean values are the closes values to original values as it is the average value of a specific column. Duplicate value can be removed. In a dataset, we can remove the duplicate rows. Using pandas in python it is easy to drop all the duplicate rows in a column.

198 words

Class and Counseling Hour

Announcements

Course Outlines

Course Assessment Plan

Text Book: Introduction to Data Mining - Tan

Reference Reading Materials

Course Motivation Test

Introduction to data mining (videos)

Introduction to data mining (readings)

Lab: Working with Python

Week 2 Discussion Forum

About Data (videos)

About Data & Project Guidelines (readings)

Lab: Data Processing

Week 3 Discussion Forum

Assignment-1

Exploring data (videos)

Exploring data (readings)

Lab: Data pre-processing

Weekly_Quiz (deadline: 29 October)

Week 4 Discussion Forum

Introduction to classification (videos)

Introduction to classification (readings)

Lab: Working with python & classification

Weekly_Quiz (deadline: 29 October)

Week 5 Discussion Forum

Communication technique

Submit Your Recorded Presentation (Deadline: 30 November, 2021)

Association Rules Mining (videos)

Associate Rules Mining (readings)

Lab: Working with python & Association rules mining

Clustering (videos)

Clustering (readings)

Lab: Working with python & clustering

Week 3 Discussion Forum

Mention Some Data Quality issues

Mention Some Data Quality issues

Data Mining Online Class (Sections: PC-C, PC-D,PC-F)

Class and Counseling Hour

Announcements

Week 1: Ice-Breaking and Course Overview

Course Outlines

Course Assessment Plan

Text Book: Introduction to Data Mining - Tan

Reference Reading Materials

Course Motivation Test

Week 2: Introduction

Introduction to data mining (videos)

Introduction to data mining (readings)

Lab: Working with Python

Week 2 Discussion Forum

Week 3: Working with data

About Data (videos)

About Data &amp; Project Guidelines (readings)

Lab: Data Processing

Week 3 Discussion Forum

Assignment-1

Week 4: Data Exploration

Exploring data (videos)

Exploring data (readings)

Lab: Data pre-processing

Weekly_Quiz (deadline: 29 October)

Week 4 Discussion Forum

Week 5: Classification & Prediction

Introduction to classification (videos)

Introduction to classification (readings)

Lab: Working with python &amp; classification

Weekly_Quiz (deadline: 29 October)

Week 5 Discussion Forum

Presentation

Communication technique

Submit Your Recorded Presentation (Deadline: 30 November, 2021)

Week 6: Classification Tuning

Week 7: Nearest Neighbor and Bayesian Classification

Week 8: Associate Rules Mining

Association Rules Mining (videos)

Associate Rules Mining (readings)

Lab: Working with python &amp; Association rules mining

Week 9: Working with Clustering

Clustering (videos)

Clustering (readings)

Lab: Working with python &amp; clustering

Week 10: Neural Network

Class Test 03

Final Examination

Mention Some Data Quality issues

About Data & Project Guidelines (readings)

Lab: Working with python & classification

Lab: Working with python & Association rules mining

Lab: Working with python & clustering