About the course
This course covers in details, the fundamentals of Data Mining, and basic algorithms like Data Processing, Association Rule Mining, Classification Basics, Decision Tree, Bayes Classifier, K nearest neighbor. Then it covers Support Vector Machine, Kernel Machine. After that, it covers Clustering, Outlier Detection, and Sequence Mining. In addition to this, it covers Evaluation, Visualization and the implementations in open source software. Finally, the course concludes with the demonstrations of case studies on industrial problems. This is completely an online course, and you can access it from anywhere in the world.
After completing this course, you will be able to:
- Demonstrate advanced knowledge of data mining concepts and techniques.
- Apply the techniques of clustering, classification, association finding, feature selection in the visualization of real-world data.
- Set up a Data Mining process for an application, including data preparation, modeling, and evaluation.
- Boost your hireability through innovative and independent learning.
- Get a certificate on successful completion of the course.
The course can be taken by:
Students: All students who are pursuing any technical/professional courses in IT / Data Science.
Teachers/Faculties: All teachers/faculties who wish to acquire new skills or improve their efficiency in Data Science.
Professionals: All working professionals, who wish to enhance their skills by learning data mining techniques.
Why Learn Data Mining?
Modern businesses are complex and rely on data. This means that the amount of data has increased. Data Mining has great importance in today’s highly competitive business environment. Data Science has gained unprecedented traction in the job market, as Big Data, Data Analytics, Data Mining, and Machine Learning become more relevant to the mainstream IT industry. Around the world, organizations are fiercely fighting with each other for skilled data professionals available in the market. As a result, the financial packages for different data science roles are consistently going into overdrive. An estimated 2.7 million job postings for Data Analytics and Science are projected by 2020.
- 24X7 Access: You can view lectures as per your own convenience.
- Online lectures: 19 hours of online lectures with high-quality videos
- Updated Quality content: Content is latest and gets updated regularly to meet the current industry demands.
Each lecture will have a quiz containing a set of multiple choice questions. Apart from that, there will be a final test based on multiple choice questions.
Your evaluation will include the overall scores achieved in each lecture quiz and the final test.
Certification requires you to complete all the lectures, quizzes, and the final test. Your certificate will be generated online after successful completion of course.
Topics to be covered
- Module_1: Introduction, Knowledge Discovery Process
Here, we have looked at data mining, its motivations followed by the drawbacks of traditional data analysis, further, a discussion on data and data functionalities has been done along with the study of the process of knowledge discovery and the issues in data mining. Finally, the typical architecture of Data Mining has been covered in detail.
- Why is Data Mining important?
- What is Data Mining, what are the drawbacks of Traditional Data Analysis?
- Data mining is done on what kinds of data, and what are the functionalities of data mining?
- What is the process of Knowledge Discovery in Databases (KDD)?
- What are the major issues in Data Mining and what is the typical architecture of Data Mining?
- Module_2: Data Preprocessing - I
Here, we have looked at the data and different types of attributes, its properties including the different types of data sets.
- What is Data?
- What are the different types of attributes?
- What are the properties of attribute values?
- What are Discrete and Continuous Attributes and what are the various types of data sets?
- What are the different types of Data?
- Module_3: Data Preprocessing - II
The main objective of this lecture is to understand the issues that are considered before performing the preprocessing along with some of the preprocessing techniques.
- What are the various data quality problems?
- What are the different kinds of preprocessing algorithms (part 1)?
- What are the different kinds of preprocessing algorithms (part 2)?
- What are the different kinds of preprocessing algorithms (part 3)?
- Module_4: Association Rules
The focus of this lecture is on understanding the association rule mining and the different steps of discovering association rules.
- What is Association Rule Mining?
- What is the different set of steps in discovering Association rules (part 1)?
- What is the different set of steps in discovering Association rules (part 2)?
- Module_5: Data Frames
Here, we have looked at the Frequent itemset generation which is computationally expensive, then we have covered Apriori principle and its algorithm.
- What is the concept of Frequent Itemset Generation?
- What is Apriori Principle and its Algorithm?
- Module_6: Rule Generation
Here, we have continued to look at the Apriori algorithm and also covered Rule generation for Apriori algorithm along with pattern evaluation followed by seeing these evaluations in term of the interestingness.
- What is the Apriori Algorithm (continued)?
- How to efficiently generate rules from frequent itemsets and for Apriori Algorithm?
- How to evaluate patterns and compute interestingness measure?
- Module_7: Classification
Here, we have covered the classification means and classification task along with the classification techniques.
- What is the Classification (part 1)?
- What is the Classification (part 2)?
- Module_8: Decision Tree - I
The focus of this lecture is to understand Decision trees along with the study of the representation of rules in Decision trees.
- What is Decision Tree and Classification Task?
- What is a Decision Tree Algorithm?
- How to implement a Decision Tree?
- Module_9: Decision Tree - II
Here, we have looked at obtaining a decision tree for classification problem including an example of the same.
- How to create a Decision Tree (part 1)?
- How to create a Decision Tree (part 2)?
- How to create a Decision Tree (part 3)?
- Module_10: Decision Tree - III
Here, we have continued looking at obtaining a decision tree and also covered the Top-down construction rule for obtaining a decision tree for classification problem.
- How to create a Decision Tree (part 4)?
- How to use Top-Down Construction rule in Decision Tree Creation?
- What is the best attribute to split and what is the principle of Decision Tree Construction?
- What is Entropy?
- Module_11: Decision Tree - IV
We have continued looking at obtaining a decision tree and also covered the problem with obtaining decision tree along with few extensions of the basic tree algorithm and some of the advantages of the decision.
- What is the concept of Decision Tree Pruning and Decision Tree Extensions?
- Module_12: Bayes Classifier I
We have covered probability distribution through an example and looked at some of the important concepts, which is known as Class Conditional Probabilities.
- Class Conditional Probabilities example (part 1)
- Class Conditional Probabilities example (part 2)
- Class Conditional Probabilities example (part 3)
- Module_13: Bayes Classifier II
Here, we have looked at the Posteriori Distribution on the previous example and we have covered the MAP representation of Bayes classifier and MAP multiple classifiers along with Multivariate Bayes classifier.
- What is Posteriori Probability?
- How is MAP representation of Bayes Classifier and Multiclass classifier done?
- What is Multivariate Bayes Classifier?
- Module_14: Bayes Classifier III
Here, we have continued to look at the Multivariate Bayes classifier and its special case.
- What is Multivariate Bayes Classifier (part 1)?
- What is Multivariate Bayes Classifier (part 2)?
- What is Multivariate Bayes Classifier (part 3)?
- Module_15: Bayes Classifier IV
Here, we have looked at the types of distances measurement between two distribution, and an example of a Bayes classifier along with the Naive Bayes classifier.
- What is the different type of distances between two distributions?
- Example of Bayes Classifier
- What are the Naive Bayes Classifier and its example?
- Module_16: Bayes Classifier V
We have continued to look at the Naive Bayes classifier and its example, we have also looked at the conditional independence and an exercise on it along with the comprehensive look at the Directed acyclic graph (DAG).
- Example of a Naive Bayes Classifier (part 1)
- Example of a Naive Bayes Classifier (part 2)
- What is Conditional Independence?
- Exercise on Conditional Independence
- Module_17: K Nearest Neighbor I
Here, we have covered the Classification algorithm called the K nearest neighbor classifier.
- Recap of Bayes Classifiers
- K Nearest Neighbour Classifiers
- Module_18: K Nearest Neighbor II
The focus of this lecture is to understand the basics of K Nearest Neighbor and also understand the Voronoi diagram of the Nearest Neighbor. Also, we have looked at the distance-weighted K-NN followed by different issues of nearest-neighbor classifiers.
- What is the definition of nearest neighbors and Voronoi Diagram?
- What is Distance Weighted K Nearest Neighbour Rule and how to predict continuous values?
- What are the issues in Nearest Neighbour Classifiers?
- Module_19: K Nearest Neighbor III
Here, we have looked at K-nearest neighbor classification technique (KNN) and the computational complexity of KNN followed by reduction of computational complexity.
- Example of K Nearest Neighbour (KNN) Classifier (part 1)
- Example of K Nearest Neighbour (KNN) Classifier (part 2)
- What is the computational complexity in KNN Classifiers?
- What is Condensing and Condensed Nearest Neighbour?
- Module_20: K Nearest Neighbor IV
Here, we have covered the reduction of computational complexity using High dimensional search and also looked at the K dimensional tree structure along with some alternate terminologies in KNN.
- What is the concept of High Dimension Search?
- What is a KD-tree and how it is used for range search?
- What are the alternate terminologies in KNN?
- Module_21: K Nearest Neighbor V
Here, we have looked at the classification algorithms to know which one is better and which one should be chosen.
- How to evaluate a classifier?
- What are the metrics for performance evaluation?
- What are the methods for performance evaluation and model comparison?
- Module_22: Support Vector Machine - I
The main objective of this lecture is to understand the discriminant analysis and the case of Linear discriminants, which means that we have 2 features and 2 classes as well, we want to draw a line which will separate this.
- What is a Discriminant Analysis?
- What is Linear Discriminant Analysis and Design (part 1)?
- What is Linear Discriminant Analysis and Design (part 2)?
- What is Linear Discriminant Analysis and Design (part 3)?
- Module_23: Support Vector Machine - II
Here, we have continued to look at the linear discriminate (Linear separators).
- What are Linear Separators (part 1)?
- What are Linear Separators (part 2)?
- What are Linear Separators (part 3)?
- What are Linear Separators (part 4)?
- Module_24: Support Vector Machine - III
Here, we have again looked at linear discriminate(Linear separators), bad and good decision boundaries and then we have covered the way of getting the line with the highest margin which will give the equation of the line.
- What are Linear Separators (part 5)?
- What are a good decision and a bad decision boundary?
- How to choose the optimal linear separator (part 1)?
- How to choose the optimal linear separator (part 2)?
- Module_25: Support Vector Machine - IV
Here, we have covered the Primal and Dual optimization problem followed by understanding the solution of the dual optimization problem. We have also covered the concept of quadratic programming (QP).
- What is Primal Optimization Problem and dual problem?
- What is Dual Optimization Problem (part 1)?
- What is Dual Optimization Problem (part 2)?
- Module_26: Support Vector Machine - V
Here, we have covered the Quadratic programming (QP) problem and looked at Karush–Kuhn–Tucker (KKT) theorem that can be used for solving QP problem.
- What is the Quadratic Programming (QP) problem?
- What is Karush–Kuhn–Tucker (KKT) theorem (part 1)?
- What is Karush–Kuhn–Tucker (KKT) theorem (part 2)?
- What is Karush–Kuhn–Tucker (KKT) theorem (part 3)?
- Module_27: Kernel Machines
Here, we have introduced the concept of slack variable, soft and hard margin in separable cases followed by the optimizations problems. We have also looked at the kernel machine for solving the non-linearly separable class problem.
- What is the concept of Slack Variable?
- What is the hard and soft margin in an inseparable class and what is optimization problem for non-separable classes?
- What is the Dual optimization problem for Soft Margin Hyperplane?
- What is the problem of Non linearly separable class and how it is solved using Kernel machine?
- Module_28: Artificial Neural Networks I
The main objective of this lecture is to study the Neural networks and its connectionism and Biological neuron. We have also covered the Artificial neural network and its simplest model which is Perceptron.
- What are Neural Networks, what is Connectionism and Biological Neuron?
- When Artificial Neural Networks (ANNs) are to be considered and what is the Perceptron?
- What is Perceptron (Continued)
- How Does Perceptron work as a Linear Discriminant?
- Module_29: Artificial Neural Networks II
Here, we have looked at the mechanism of finding the correct set of weights to solve a prediction problem.
- What are the training rules to be followed for determining weights in ANNs?
- What is the Perceptron training rule?
- What is the Delta rule and Gradient Descent?
- What is the Gradient Descent Technique and Algorithm?
- Module_30: Artificial Neural Networks III
Here, we have explained the difference between Perceptron and Gradient Descent algorithm and also explained about the logic gates that can be realized with the perceptron model, and also explained Multilayer Perceptron.
- Comparison between Perceptron and Gradient Descent algorithm
- How to realize logic gates using Perceptron?
- What are Multi-Layered Perceptrons (MLP)?
- What are MLPs (continued)?
- Module_31: Artificial Neural Networks IV
Here, we have covered Sigmoid unit, and weight update rule for the multilayer perceptron along with the issues with ANNs and extension of ANNs.
- How to use a Sigmoid function in Multi-Layered Perceptron and its training rules?
- What is Forward and Back Propagation?
- What is the error gradient for a sigmoid unit?
- What is the procedure of backpropagation?
- Module_32: Clustering I
Here, we have looked at the basics of Clustering and Scatter coefficient for observing the goodness of clustering including an understanding of Hierarchical & Partitional clustering.
- What is Clustering?
- How to find groups of similar objects?
- What is the distance measure?
- What is partitional clustering?
- What is Hierarchical clustering?
- Module_33: Clustering II
Here, we have covered the desirable properties of the clustering algorithm, Hierarchical Agglomerative & Hierarchical Divisive clustering. We have also looked at the three ways of measuring distance b/w two clusters.
- What are the desirable properties of Clustering algorithm and what is the Hierarchical Agglomerative Clustering?
- What is the Hierarchical Divisive Clustering and how to measure closeness between two clusters?
- What is Single Linkage Clustering, its advantages and disadvantages?
- Module_34: Clustering III
Here, we have looked at the K-means clustering algorithm and its Example
- What K - Means Clustering Algorithm (part 1)?
- What K - Means Clustering Algorithm (part 2)?
- Example of K - Means Clustering Algorithm
- Module_35: Clustering IV
The main objective of this lecture is to learn about Idea of Density-based clustering, and density-based algorithm that is DBSCAN (Density-based special clustering acronym)
- What is Density-based Clustering algorithm?
- How to measure the density of a point?
- Some definitions related to Density-Based Clustering
- What is the DBSCAN Algorithm?
- Module_36: Clustering V
Here, we have covered the Hybrid clustering algorithm that is CLARA followed by evaluating clustering algorithms.
- What is Hybrid Clustering Algorithm?
- What is Cluster Validity, what are the different aspects of Cluster Validation and what are the measures of Cluster Validity?
- What is Scatter Coefficient and what are internal and external measures of Cluster Validity?
- Module_37: Regression I
The focus of this lecture is to understand the concept of Regression problem, univariate and multivariate regression of regression model along with the most common technique of regression called as Linear Regression.
- What is Regression?
- What is Univariate and Multivariate Regression?
- What is a Regression Model?
- What is Linear Regression?
- Module_38: Regression II
Here, we have looked at the Linear Regression model.
- What is Linear Regression Model (part 1)?
- What is Linear Regression Model (part 2)?
- What is Linear Regression Model (part 3)?
- What is Linear Regression Model (part 4)?
- Module_39: Regression III
Here, we have continued to look at the Linear Regression model and some of the limitations of Linear regression followed by the Non-Linear regression.
- What is the Error in Linear Regression Model?
- How to solve Error in Linear Regression Model?
- What are the limitations of Linear Regression model what is Non-Linear Regression?
- Module_40: Regression IV
Here, we have covered the problem of Over-fitting, Ochams razor principle and the Time series prediction problem along with its solution.
- What is the problem of Overfitting?
- How Complexity and Goodness of Fit are compared and what is Ochams Razor Principle?
- What are Complexity and Generalization?
- What is Training, Validation and Test Data and what is time series prediction problem?
- Module_41: Dimensionality Reduction I
- What is the purpose of Dimensionality Reduction?
- What are the Dimensionality Reduction Techniques?
- What is the Evaluation Index, Kullback-Leiber Divergence?
- What are the search algorithms to find the best subset and what are the techniques of feature subset selection?
- Module_42: Dimensionality Reduction II
- What is a Feature Selection?
- What is Feature Extraction Problem?
- Module_43: Tutorial
- Basics of R programming (part 1)
- Basics of R programming (part 2)
- How to use Apriori Algorithm for accessing Association Rules from a dataset?
- How to Generate the Decision Trees?
- How to apply the R program to use K-Means Clustering?
- How to classify the data based on Naive Bayesian classification?
- Data Mining - Final Quiz