An Overview Of Data Science As Career: FAQs

Before I jump on to basics of Data Science, I would like to answer some of your frequently asked questions regarding this field.

FAQ’S-

Q.1- Is Data Science Hard or Easy?

Ans- I would not say that it’s very hard to learn the course, but it’s not like cake walking. But with pure determination and skills, we will make it happen.

Q.2-What are the prerequisite skills needed to start Data Science?

Ans- Skills needed to start with Data Science are Python (Any Language-Python, Julia, R, SQL, Statistics.

Now let’s start with our course.

Some people confuse Data Scientist and Data Analyst roles. So here is the difference between them-

1. What is the role of Data Analyst?

A Data Analyst’s main role is to understand the data and provide reports and visualizations that explains the insights of the data which are hidden for normal viewers.

2. What is the role of Data Scientist?

1. Programming in statistical package i.e R. Python, Julia.

2. Cleaning, Extracting and Exploring data received from various sources.

3. Research, Development with Implementation of Statistical Models.

4. Checking the accuracy, precision, recall, F1 Score for the model built.

What is Data Science?

Data Science is a field that uses scientific methods, algorithms to extract meaningful insights from any structural or unstructural data.

Data Science consists of 2 types of learning-

1. Supervised Learning- Input data are labelled, it uses a training dataset, it is used for prediction, Ex-Classification and Regression.

2. Unsupervised Learning- Input data are unlabelled, it uses the input data set, it is used for analysis, Ex-Classification.

What is a Classification Problem?

It is a problem of identifying to which set of classification (population) a new observation belongs. Examples of this problem consists of assigning a mail to spam or non-spam section, diagnosis of patients, etc.

What is a Regression Problem?

It is an analysis in the form of predictive modelling technique which finds out the relationship between the dependent and the independent variable. Examples of this problem consist of forecasting the weather on a particular day, either a politician will win election or not, cricket team will win the match on a particular stadium or not, etc.

What is a Clustering Problem?

Clustering is a non-supervised learning method. It is a task of dividing the data points into groups such that the points in the same group are similar to the data points in the same group. It is basically splitting each data point based on its dissimilarity with each other.

Various machine learning algorithms are present, which are classified into Classification, Regression or Clustering family. These algorithms help us build the desired model whichever we require for a used case.

NOTE– We will study in detail the important algos in coming blogs.

Types of algorithms in Regression family-

1. Linear Regression

2. Polynomial Regression

3. Logistic Regression

4. Quantile Regression

5. Ridge Regression

6. Lasso Regression

7. Elastic Net Regression

8. Principal Component Regression

9. Partial Least Squares Regression

10. Support Vector Regression

Also Read: The Data Veracity – Big Data

Types of algorithms in Classification family-

1. Decision Trees

2. Logistic Regression

3. Naive Bayes

4. Random Forest

5. Support Vector Machine

NOTE- Logistic Regression is both a Regression and a classification algorithm.

Types of algorithms in Clustering family-

1. Hierarchical Clustering

2. KNN Clustering

3. K-means Clustering

4. Mean Shift Clustering

That’s it from my side for today. Have a great day. Catch you soon