Week1 Introduction to Python & Data Science

What is Data Science

Data Science is a science field that aims to unearth latent knowledge from a bunch of data. This area is not new in the sense that we always had a data analytics component in traditional fields like economics or business administration. However, modern data science is different because the analyst does not execute the analysis to accept or reject a hypothesis ($H_0, H_a$) but instead relies on the data and algorithms to inform with new knowledge. The data science field have many different sub stages

For example, according to Berkeley School of information we can classify the data scientists’ life cycle into 5 tasks.

Capture data
Maintain data
Process data
Analyze
Communicate

Although every task is important in data science, I will focus mostly on the Analyze stage (a.k.a, data mining).

Data Mining

As mentioned above, data mining(a.k.a, Knowledge discovery from data, KDD) is one of the many tasks that a data scientist executes. There are many interpretations about this task, but one of the most neatly explained works is by Jiawei Han, which explains the Knowledge Discovery process in 7 Stages

Data cleaning (remove noise and inconsistent data)
Data integration (Combination of multiple data)
Data Selection (where data relevant to the analysis task are retrieved from the database)
Data Transformation (where data are transformed and consolidated into forms appropriate for mining by performing summary or aggregation operations)
Data mining (an essential process where intelligent methods are applied to extract data patterns)
Pattern Evaluation (to identify the truly interesting patterns)
Knowledge Presentation (where visualization and knowledge representation techniques are used to present mined knowledge to users)

Resource: Data Mining Concepts and Techniques third edition, Jiawei Han, Micheline Kanber and Jian Pei, 2021

What is python?

Python_logo_01.svg.png

Python is an interpreter programming language developed in 1990. Although the language also has a compiler aspect is usually regarded as an interpreter language since it differs from other pure compiler languages like C, C++, Haskell, etc...

<aside> 📌 Interpreter Vs Compiler

Ppl often refers to the compiler as a language requiring a “build” stage and Interpreter as a language coded line by line.

For example: say you are reading a book in Latin and you don’t speak latin.

You could have:

The whole book translated before you start reading (compiled)
Or ask sb to translate every line as you read (Interpreted) </aside>