About The Course

AICouncil creates training and development program with maximum focus over hands 0n learning experiences. This training will lead to in depth knowledge of R programming and its use cases in Data Science domain. You will develop experience over inbuilt functions and libraries of R and understand its robustness, flexibility and ease of coding. The techniques such as clustering, time-series analyses and classification techniques, nonlinear/linear modelling and classical statistical tests will be used in different data science computations. The course will cover data exploration, data visualization, predictive analytics, and descriptive analytics techniques with the R language. You will learn about R packages, how to import and export data in R, data structures in R, various statistical concepts, cluster analysis, and forecasting.

  • Various analysis and visualization tools such as Ggplot and plotly
  • Build significant models to understand Statistics Fundamentals after knowing behaviour of Data
  • Will know about the various R libraries like Dplyr, Data.table etc.
  • Data manipulation, data preparation and data explorations
  • How to use R graphics libraries like Ggvis, Plotly etc.
  • Advanced Statistics & Predictive Modeling using OLS, Logistic Regression using MLE, KNN, Decision Trees

Mode of Learning and Duration

  • Weekdays – 5 to 6 weeks
  • Weekend – 6 to 7 weeks
  • FastTrack – 4 to 5 weeks
Course Agenda

  • What is Data Science
  • What is Machine Learning
  • Machine Learning vs. Data Science vs. AI
  • How leading companies are harnessing the power of Data Science with Python?
  • Different phases of a typical Analytics/Data Science projects and role of python
  • Anaconda vs. Python
  • Machine Learning flow to code
  • Regression vs. Classification
  • Features, Labels, Classes
  • Supervised Learning, Semi-Supervised and Unsupervised Learning
  • Cost Function and Optimizers
  • Installation and Setup
  • Installing R
  • Installing RStudio
  • Installing Packages
  • Working with Vectors
  • Vectors
  • Random Numbers, Rounding, and Binning
  • Missing Values
  • The which() Operator
  • R Essentials
  • Set Operations
  • Sampling and Sorting
  • Check Conditions
  • For Loops
  • Dataframes and Matrices
  • Importing and Exporting Data
  • Matrices and Frequency Tables
  • Merging Dataframes
  • Aggregation
  • Melting and Cross Tabulations with dcast()
  • Functions
  • Debugging and Error Handling
  • Fast Loops with apply()
  • Fast Loops with sapply(), lapply() and vapply()
  • Normal Distribution, Central Limit Theorem, and Confidence Intervals
  • Skewness in data
  • Correlation and Covariance
  • Statistical Tests – F Test, T-Test
  • DPlyR and Caret Packages
  • Aggregation and Special Functions
  • Understanding Syntax, Creating and Updating Columns
  • Chaining, Functions, and .SD
  • Fast Loops with set (), Keys, and Joins
  • Probability
  • Probability Distribution: Discrete
  • Probability Distribution: Continuous
  • Sampling Distribution
  • Estimation
  • Hypothesis Testing ANOVA
  • Importing Data from various sources (CSV, txt, excel, access etc)
  • Database Input (Connecting to database)
  • Viewing Data objects - subsetting, methods
  • Exporting Data to various formats
  • Cleansing Data with R Programming
  • Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
  • Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
  • Scaling and Normalizing data
  • Pre-processing and Formatting data
  • Feature selection – Correlation, P Values, Multi-Collinearity etc
  • Introduction exploratory data analysis
  • Basic Plots Vs. GGPLOT Library
  • Making Plots with Base Graphics
  • Drawing Plots with 2 Y Axes
  • Multiplots and Custom Layouts
  • Creating Basic Graph Types
  • Creating graphs using GGPLOT
  • Descriptive statistics, Frequency Tables and summarization
  • Univariate Analysis (Distribution of data & Graphical Analysis)
  • Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
  • Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ densityplot etc)
  • Overview
  • Introduction to Regression Analysis
  • Types of Regression Analysis Models
  • Linear Regression
  • Model
  • Model statistics
  • Gradient Descent Algorithm
  • Demo: Simple Linear Regression
  • Demo: Regression Analysis with Multiple Variables
  • Cross Validation
  • Factor Analysis
  • Fitting model and Predictions
  • Logistic Regression
  • Decision Tree Classification
  • Entropy & Gini Index
  • Classification and Regression Trees
  • Decision Tree Statistics
  • Decision Tree
  • Demo: Decision Tree Classification
  • Evaluating Classification Report
  • ROC Curve
  • Random Forest Classification
  • Gradient Boosting
  • Introduction to Data mining principles
  • Data mining and knowledge discovery
  • Overview of Data warehousing and mining
  • Advantages and challenges
  • Data mining applications in various application areas
  • Data warehousing
  • Data warehouse architectures,
  • Datawarehouse design
  • Steps in Datawarehousing (ETL)
  • Data marts and OLAP
  • Design and performance considerations
  • Overview
  • Introduction to Clustering
  • Clustering Example
  • Clustering Methods: Prototype Based Clustering
  • Centroids and Means
  • Eucledian Distance Formula
  • Elbow Method – Picking values of K
  • Demo: K-means Clustering
  • Clustering and association Rule mining
  • EM technique
  • Hierarchical Clustering
  • Dendrogram
  • Density based methods
  • Grid based methods
  • Cluster Analysis and Outlier Analysis
  • Association Rule mining
  • Stream mining and Fraud Detection



Industry: - Banking and Finance

Problem Statement: - Analyse customer and applicant data to identify any possible fraudulent activity or operation

This project will let you work with credit card dataset for real time analysis and defining attributes describing customer characteristics to build a classification model to predict which customer is likely to default a credit card payment next month. The process will involve performing data analysis and plotting score performance with respect to variables

Industry: - Entertainment and E-Commerce

Problem Statement: - Develop a recommender system for shows and movie recommendation over platform like Netflix

This project will let us understand how to work with raw data and make use of processes like Data Cleaning, Data Visualization, Distribution, Recommender Lab to develop a system to recommend specific movie or show after understanding and analysing the preferences of users

Industry: - Healthcare

Problem Statement: - Make a prediction model to know how likely is a patient to get chronic kidney disease

This project will be done using health history data of a patient to know about the probability of getting infected or suffering from chronic kidney disease

Industry: - Inventory Management

Problem Statement: - A company wants a build a tool to help their managers to in analysing the inventory to increase cross selling

This project will make use of tools and techniques like Association Rule, Mining, Data Extraction and Data Manipulation to work upon real time inventory data and make a model analysed to manage whole thing quite effectively and maintain the product associated with crossponding products requirement

Industry: - Banking

Problem Statement: - Predict the approval rate of loan for an applicant

The project will help the banking system by using Data Pre-processing, PCA, Cleaning Ops and Data Visualization to make prediction that about how much is it safe to grant a loan to a person. You will build a machine learning based model to predict the loan application status based upon the user earnings, expenses, banking transaction, financial behaviour and other loan or lending status



