What is Data Science?


  • Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics

Who is a Data Scientist?

Data scientist solve complex business problems using data mining Techniques

Data for a Data Scientist is what Oxygen is to Human Beings. This is also a profession where statistical adroit works on data – incepting from Data Collection to Data Cleansing to Data Mining to Statistical Analysis and right through Forecasting, Predictive modelling and finally Data Optimization. A Data Scientist does not provide a solution; they provide most optimized solution out of the many available.

Gartner predicted in 2012 that Data Scientist & Business Analytics jobs will increase to the tunes of Millions by the end of 2015. This is very evident with the rise in job opportunities in various job portals. As a Data Scientist or an aspirant you should not believe us. Go! Research for your own and confirm the facts and figures.

How Data scientist add value?

Data scientist add value by using data mining  techniques by doing one of the four things

Predicting the bad

Identifying the good

Automating  Exsiting process

identifying patterns in data

What is Business Analytics / Data Analytics?

Business Analytics or Data Analytics is an extremely high-in-demand profession which requires a professional to possess sound knowledge of analyzing data in all dimensions and uncover the unseen truth coupled with the logic and domain knowledge to impact the top-line (increase business) and bottom-line (increase revenue). Also Google Trends shows the upward trajectory with an exponential increase in volume of searches like never seen before. This is proof enough to back the statements made by Harvard Business Review and the business research giants that Business Analytics will be the most sort after professional world has ever witnessed.

 

 

 world  say’s about Analytics and Data Science?

SEXIEST JOB OF THE 21ST CENTURY

hardverdsmall

BEST NEW JOB IN AMERICA

cnnsmallsize

JOB WITH THE HOTTEST PROSPECTS

economicstimeslogo

SHORTAGE OF 190,000 ANALYSTS IN US ALONE BY 2018

mckinsey

THERE WILL BE A SHORTAGE OF 200,000 TRAINED DATA ANALYSTS IN INDIA

hindhu
What is Data Analytics?
Studying Data Analytics can open up a wider for you no matter what your role is. A basic understanding of it allows you to see the bigger picture and predict issues , trends , and habits- and these skills can enhance any organisation.
Statistical Analysis and Data Mining No.2 in the List of “The 25 Skills That Can GET You Hired in 2016”
—-Linkedin
  • Who is a Data scientist?
  • What are business analytics / data analytics?
  • Analytics
  • Business analytics
  • Business intelligence (bi)
  • Why do organizations need bi?
  • Challenges of building bi solutions
  • Data warehousing
  • Users of business intelligence (bi)
  • What is predictive analytics?
  • Statistics
  • Artificial intelligence
  • Machine learning
  • Predictive analytics software
  • Predictive analytics data flow
  • Data science components
  • Prospects
  • R – pros & cons
  • R & other analytical products
  • Data science tools & technologies

Introduction to Data analytics 

  • Origin of R
  • Downloading & installing R, Rstudio
  • Interface of r-
  • R components.
  • Data types,
  • Data structures

Data

  • Definition of data
  • Types of data
  • Raw data
  • Processed or transformed data
  • Information
  • Decision making/ Decision support system
  • Statistics: Making sense of data
  • Definition of data analysis

Variables

What are variables?

Variable types

  • Qualitative variables
  • Quantitative variables

Continuous variables

Categorical variables

Discrete versus continuous variable

Data types

  • Strings
  • Vector
  • Data frame
  • List
  • Factors
  • Arrays

Vector

  • Vector Creation
  • Single Element Vector
  • Multiple Elements Vector
  • Using sequence (Seq.) operator
  • Using the c() function
  • Accessing Vector Elements
  • Vector Manipulation
  • Vector element recycling
  • Vector Element Sorting

Data Frame

  • Create Data Frame
  • Structure of the Data Frame
  • Summary of Data in Data Frame
  • Extract Data from Data Frame
  • Expand Data Frame
  • Data Frame Column Slice
  • Data Frame Row Slice
  • Merging data frames

List

  • Create a list containing strings, numbers, vectors and logical values.
  • Create a list containing a vector, a matrix and a list.
  • Accessing List Elements
  • Manipulating List Elements
  • Merging Lists
  • Converting List to Vector

Factors

  • Build Factors
  • Factors in Data Frame
  • Changing the Order of Levels
  • Generating Factor Levels

Array

  • Create Array
  • Naming Columns and Rows
  • Accessing Array Elements
  • Manipulating Array Elements
  • Calculations Across Array Elements

String

  • Creating a string
  • Valid Strings
  • Invalid Strings
  • String Manipulation
  • Formatting numbers & strings
  • Counting number of characters in a string
  • Changing the case
  • Extracting parts of a string

     R data interfaces

  • R – csv files
  • R – excel files
  • R – binary files
  • R – xml files
  • R – json files
  • R – web data
  • R – database
  • Reading tabular data files
  • Writing data

Functions in R

  • Numeric functions
  • Character functions
  • Statistical probability functions
  • Other statistical functions
  • Other useful functions
  • Operators
  • Logical operators
  • Relational operators
  • Aggregations
  • Data Aggregation
  • Multiple Aggregations
  • 4 control structure & functions
  • Debugging

Statistics

  • Uni Variate analysis
  • Measure of central tendency
  • Mean
  • Median
  • Mode
  • Dispersions techniques.
  • Range
  • IQR
  • Variance
  • Standard deviation

Distributions

Frequency distributions

  • Symmetric/ Asymmetric
  • Skewness
  • Kurtosys
  • Normal distribution.
  • Binomial distributions
  • Poisson distributions
  • Tests
  • Hypothesis
  • Chi- square test
  • T –test
  • F-test
  • Z- test
  • Annova
  • Bi / Multi Variate analysis
  • Correlation
  • Regression analysis

Regression Analysis

  • Linear regression models
  • Non linear regression models
  • Logistic regression
  • Possion Regression

Data science with R

Data mining

  • Analyzing the past
  • Predicting the future

Data exploration

  • Variable identification
  • Univariate analysis
  • Bi-variate analysis

Missing Value Treatment

  • Why missing value treatment is required?
  • Why data has missing values?
  • Which are the methods to treat missing value?

Techniques Of Outlier Detection And Treatment

  • What is an outlier?
  • What are the types of outliers?
  • What are the causes of outliers?
  • What is the impact of outliers on dataset?
  • How to detect outlier?
  • How to remove outlier?

The art of feature engineering

  • What is feature engineering?
  • What is the process of feature engineering?
  • What is variable transformation?
  • When should we use variable transformation?
  • What are feature variable creation and its benefits?

Data manipulation

  • What is Data manipulation?
  • Different ways to manipulate / treat data
  • List of packages for data manipulation
  • Working with packages for Data manipulation.

Data visualisation

  • How to create a scatter plot?
  • How to create a histogram?
  • How to create a bar chart?
  • How to create a stacked bar chart?
  • How to create a box plot?
  • How to create an area chart?
  • How to create a heat map?
  • How to create a correlogram?
  • How to plot a geographical map?
  • How to plot the entire data in a single command?

Machine learning

Introduction to Machine leaning.

Categories of machine learning algorithms

  • Supervised learning :
  • Unsupervised learning
  • Reinforcement learning
  • Classification
  • Regression
  • Classification vs regression
  • Clustering

Cluster analysis

  • What is cluster analysis?
  • Why clustering?
  • Similarity /dissimilarity
  • Similarity measurement
  • Dissimilarity measurement
  • Clustering classified
  • K means clustering
  • Process flow of k – means
  • Number of clusters k=?
  • Case study
  • K means clustering implementation

K – Nearest Neighbour

  • Introduction
  • What is knn algorithm?
  • How to select appropriate k value?
  • Calculating distance
  • Knn algorithm – pros and cons
  • Case study
  • Knn algorithm implementation

Regression

  • Linear regression
  • Logistic regression

Tree based models

  • What is a decision tree?
  • Types of decision tree
  • Decision trees terminology
  • Advantages:
  • Disadvantages:
  • Decision tree algorithms
  • How does it work?
  • Case study
  • Implementation

Ensemble methods of trees based models

Random forest

  • What is random forest?
  • How does it work?
  • Advantages of random forest
  • Disadvantages of random forest
  • Case study
  • Random forest implementation

 Bagging

  • What is bagging?
  • How does it work?
  • Working with gbm in r
  • Case study
  • Implementation

 Boosting

  • What is boosting?
  • How does it work?
  • Working with xgboost in r

Deep Learning And Neural Network

  • Introduction
  • Artificial neural networks
  • What is a neural network?
  • What is a Deep Learning?
  • How a single neuron works?
  • Why multi-layer networks are useful?
  • General structure of a neural network back-propagation

Support Vector Machines

  • Overview
  • Maximum Margin Classifier ◦What is a Hyperplane?
  • Classification Using a Separating Hyperplane
  • The Maximal Margin Classifier ◦Non-separable Case
  • Support Vector Classifiers ◦Details
  • Support Vector Machines ◦Classification with non-linear boundaries
  • The SVM
  • SVMs with More than Two Classes ◦One-Vs-One Classification
  • One-Vs-All Classification

Model evaluations

  • Mean Squared Error
  • K fold cross validation

Text analytics

  • Natural language processing.
  • Text mining
  • Sentiment analysis
  • Social network Analysis

PCA

  • Introduction to PCA
  • What is Principal Component Analysis?
  • What are principal components?
  • Why is normalization of variables necessary?
  • PCA run with Unscaled and scaled predictors).
  • Implement PCA in R
  • Association rule mining
  • Market basket analysis – concepts
  • Lift
  • Support
  • Confidence
  • Implement & inspect Rules

Time series analysis

  • Importance
  • What is time series analysis in business analytics
  • how is implemented in business analytics
  • Forecasting in business analytics
  • Time series forecasting in r
  • Time series decomposition in r
  • Time series best practices.

Data science in python

  1. Basics of python for data analysis ◦
  • Why learn python for data analysis?
  • Python 2.7 v/s 3.4
  • How to install python?
  • Running a few simple programs in python
  1. Python libraries and data structures
  • Python data structures
  • Python iteration and conditional constructs
  • Python libraries
  1. 3. Exploratory analysis in python using pandas
  • Introduction to series and data frames
  1. DATA MUGGING IN PYTHON USING PANDAS
  • Pandas for Data Wrangling
    • Overview
    • Reading data
    • Exploration
    • GroupBy
    • Plotting
    • Advanced Indexing
    • Categorical Data
  1. Building a predictive models by using machine learning algorithms in python
  • Linear Regression
  • Logistic regression
  • Decision tree
  • Random forest
  • Boosting

Dictionary

  • Creating a Dictionary
  • Accessing Values in Dictionary:
  • Updating Dictionary
  • Delete Dictionary Elements
  • Properties of Dictionary Keys

Tuples

  • Creating a Tuples
  • Accessing Values in Tuples:
  • Updating Tuples
  • Delete Tuple Elements
  • Basic Tuples Operations

Strings

  • Creating a Strings
  • Accessing Values in Strings
  • Updating Strings

List

  • Creating a list
  • Accessing Values in Lists
  • Updating Lists
  • Delete List Elements

Datawarehouse concepts:

  • Introduction to OLTP
  • Introduction to DWH/OLAP
  • Reporting fundamentals
  • Differences OLTP/OLAP
  • DWH detailed
  • Dimensional modeling
    • Star schema
    • Snow flake schema
    • Fact constellation schema
  • Dimensions
  • Fact tables
  • Data modeling
  • Relational modeling – normalized schema

Data visualization with Tableau

Tableau – Desktop

  • Overview
  • Tableau products
  • Why visualization?
  • Getting started:
  • Tableau workspace
  • Data window

Toolbar:

  • Cards & shelves
  • Workbook
  • Creating dashboards

Connecting to data

  • Data sources to connect to tableau
  • open data source
  • Developing a sample worksheet
  • Using of show me
  • Show me with many fields
  • Save your work
  • Joining multiple tables
  • Copying and pasting formatting
  • Creating an extract
  • Analysis

Visualization charts

  • Types of charts in tableau
  • Bar chart
  • Heat map
  • Scatter plot
  • Building a map view
  • Map options
  • Pie charts

Analyzing

  • Sorting & grouping
  • Sorting specific fields
  • Grouping
  • Creating groups
  • Aliases
  • Filtering
  • Quick filters
  • Text table
  • Drilling and drill through
  • Trend lines and statistics

Formatting

  • Annotations and marks labels
  • Point annotations
  • Area annotations
  • Titles
  • Captions

Calculated fields

  • Calculated fields
  • How to create calculated field
  • String functions
  • Date functions
  • Logical functions
  • Aggregate functions

Building workbooks & interactive dashboards

  • Create workbooks
  • Dashboards
  • Creating dashboards
  • Adding sheets to the dashboards
  • Adding dashboard objects
  • Sharing saving workbooks to Tableau public
  • Publish as pdf

Tableau with R

Integration of R with Tableau

What if I have no experience in Data Science?

Data Science is a profession which caught the attention of the world only since 2 years. Because of this very reason most companies are struggling to close the demand-supply gap. Hence, people who are trained and have decent exposure towards the data Science techniques are recruited immediately

What if I am a fresher? Can I still get a job in Data Science?

There are a lot of job opportunities in various job portals for fresher’s. The key thing employer would be keen to know is whether you have the conceptual knowledge or not. The concepts and projects provided in various concepts will  reinforce your learning to make you market ready for the jobs

Do I need to have strong programming skills to be Data scientist?

Yes and No. Yes in the sense programming skills would be required & No in the sense one need not have extremely strong programming skills. However, we ensure that you get sufficient exposure on the statistical programming tool called ‘R’. We start right from the basics assuming you do not have any exposure towards programming.

Why should we learn R ? can’t we learn any other tool for Data Science?

R has approximately 50% market share & it is open source (free of cost). Hence, R is very lucrative in the analytics space. Almost all the jobs are asking for experience & exposure in R. Demand for other statistical tools is decreasing steadily & hence it is recommended to be futuristic and invest time in learning R

What are the salaries that we can expect in the profession of Data Science?

Salaries range varies based on experience, industry, domain, geography & various other parameters. However, as a general thumb rule, we can apply the following formula:

Salary = No. of years of experience * 3 Lacs per annum (India – INR)

Salary = No. of years of experience * $ 1200 to $1500 per annum (Overseas – USD)

Are you wondering what it takes to succeed in Data Science?

Do you have a curious mind? An ability to see the bigger picture? Make the connections that build larger ideas? Do you want to add edge, individuality, and passion into the professional space? If you do, then you should check outwith usat INDEX IT for a demowhat you can do tomake a switch to career in Data Science and Analytics

  • Intensive Training on Data concepts  with different sources
  • Intensive Training on Statistics
  • Intensive Training on R Programming.
  • Intensive Training on Machine Learning
  • Intensive Training on Artificial Intelligence and Deep Learning
  • Intensive Training on Anaconda Jupiter phython Notebooks
  • Intensive Training on Data Warehousing.
  • Intensive Training on Data Visualization using Tableau
  • Exclusive Live Projects Implementation.
  • Mock Resumes/Interview Preparation
  • Highly designed material /Hands on Labs

CourseDemoDate
Pega  19th July@8-30am

Data Science

19th July@9am

Devops (online)

20th July@ 9pm

Data Science(online)

20th July@8pm

© 2016 Copyright. All rights reserved.

Click Me