Situations in which we handle large amounts of information, data cleansing is a must as it properly organizes the data, making algorithms easier to implement and run. So, if we have a well-cleaned dataset, we can get desired results even with a very simple algorithm, which can prove very beneficial at times.
Data cleaning is the process consisting of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
The Data will be collected from any website e.g., Kaggle, which contain data-sets in .csv format, and after cleansing, it will be stored in separate .csv file.
Salient Features
The Data will be collected from any website e.g., Kaggle, which contain data-sets in .csv format, and after cleansing, it will be stored in separate .csv file.
Provides ideal solution that is able to cleanse any type of data.
Two level based Cleaning (i.e. first simply parses the file in clean format & 2nd level deals with null values & outliers)
Data is handled efficiently by saving in appropriate structure
Move Semantics are used with dynamic Array structures in order to avoid using heavy node structures like doubly-linked list
Learned/Covered Topics
Sorting Algorithms (Bubble, Insertion, Selection)
Dynamic Safe Arrays (as Vector & Strings)
Copy Semantics (Rule of Three)
Move Semantics (Rule of Five)
Trees (BST,AVL)
Stacks (implemented as singly-linked list in LIFO order & in Vectors as FILO order)