CSE 412 ( NIB-232 ): Feature Engineering

What is Feature Engineering?

Feature Engineering

Feature engineering is a crucial step in the process of developing machine learning models. It refers to the process of selecting, transforming, or creating relevant features (input variables or attributes) from raw data to improve the performance of a machine learning algorithm. The goal of feature engineering is to extract meaningful information from the data that can help the model better understand the underlying patterns and relationships.

Here are some common tasks and techniques involved in feature engineering:

1. Feature Selection: This involves choosing a subset of the most relevant features from the original dataset while discarding irrelevant or redundant ones. Feature selection methods include statistical tests, feature importance from tree-based models, and domain knowledge.

2. Feature Transformation: This involves altering the representation of existing features to make them more suitable for the model. Common techniques include:

Normalization: Scaling features to a common range, often between 0 and 1, to ensure they have similar scales.
Standardization: Transforming features to have a mean of 0 and a standard deviation of 1.
Logarithmic Transformation: Applying logarithmic functions to handle data with skewed distributions.
Binning: Grouping continuous values into discrete bins to capture non-linear relationships.
Encoding Categorical Variables: Converting categorical variables into numerical representations (e.g., one-hot encoding or label encoding).

3. Feature Creation: Sometimes, creating new features can provide additional information that the model can use to make better predictions. This might involve:

Interaction Features: Combining two or more existing features to capture interactions between them.
Polynomial Features: Generating higher-order features (e.g., squaring or cubing) to capture non-linear relationships.
Time-Based Features: Extracting date and time information from timestamps, such as day of the week, month, or year.

4. Handling Missing Data: Deciding how to handle missing values in the dataset, which may involve imputing missing values with statistical measures or using domain-specific knowledge.

5. Feature Scaling: Ensuring that all features are on a similar scale to prevent some features from dominating others during model training. This is particularly important for distance-based algorithms like k-means or support vector machines.

6. Feature Extraction: Reducing the dimensionality of high-dimensional data using techniques like Principal Component Analysis (PCA) or Singular Value Decomposition (SVD).

Effective feature engineering can have a significant impact on the performance of machine learning models. By creating informative features and preprocessing the data appropriately, you can help the model learn patterns and relationships more effectively, leading to better predictive accuracy and model interpretability. It often requires a combination of domain knowledge and experimentation to determine the most effective feature engineering strategies for a particular problem.

Announcements

Expected Learning Outcome

Github link

Task submission of Installation

lab 2

Topic 4

PySpark Installation

What is Feature Engineering?

Open Discussion

Topic 6

Open Discussion

Week 4: Map Reduce and Yarn

Topic 8

Class Task-E,F

Week 5: Hadoop cluster and ecosystem

Topic 10

Class Task

Week 6: Mid Term

QUIZ-02

CT 02

Week 8: Apache Sqoop, Hive and Pig

Open Discussion

Submission on Class Practise

Assignment on Feature Engineering Task

Lab-Week 8: IoT Lab

Week 9: Introduction to IoT

Lab-Week 9-10: Implementing to IoT - 2

Week 10: IOT

Open Discussion

Quiz-3

Week 12: Presentation

Submit your Presentation Slide

Assignment

Assignment-1(Who missed Quiz)

Project Report

Submit your Project Report (Section- 53(E))

Submit your Project Report (Section- 53(F))

Submit your Project Report (Section- 54(E))

Submit on Project Link

others

assignment

DL tutorial

ANN architecture and code explanation (55 A)

Topic 22

Section 55 - A

Section 55-M

Project

Project Information

Project team of 55_A

Project team of 55 M

Discussion

Topic 24

Discussion

Topic 25

Discussion

Topic 26

Communication models with Example

Discussion

Topic 27

Discussion

Topic 28

Topic 30

Final Lab test (55_M)

Topic 31

CT 03 (55_A)

CT -03 (55_M)

Lab project

Lab project (55_A)

Lab project (55_M)

What is Feature Engineering?

Feature Engineering