Data Science - Data Analysis

This is the direction of my supposed career. This is based on Computerphile on Youtube 10 episode series (14 mins each)

Episode 0

Machine Learning is subset of A.I. But not the other way around.

Dr. Mike Pound introduced this using R.

R is good for plotting graphs.

R Commands demonstrated by Dr. Mike:

data <- read.csv(file = 'norm.csv', header = FALSE)
(read data from norm.csv into data using the function "read.csv", setting it to not ignore header in data)

view(data)

data[3,4] shows data at row 3, col 4
data [4,] shows entire row 4
data [,3] shows entire column 3

typing the variable of the data, namedata for example, will show the table with headers if applicable.
namedata$firstname will just all firstname values of the table.

Process: Analysis -> Visualization -> Preprocessing (remove nonsense data, etc.) -> back to Analysis and the cycle continues....

Episode 1

4 types of data:
Nominal: data that cannot be compared with, like colors, number assignments, etc.
Ordinal: has order, but no distance indication. Can be calculated though, like getting median.
Interval: has order, has distance. But no absolute zero, like zero degree Celsius is still a degree.
Ratio: Similar to Interval but has also absolute zero, like Kelvin scale degree. Unlike Celsius, 100 Kelvin is half of 200 Kelvin, can't say 100 Celsius is half of 200 Celsius. Another example = number of children.

In a data frame, columns are attributes. Rows are instances/samples.

Episode 2

Use bars, pie charts don't help much.

R function aggregate mentioned.

hist (histogram function in R), boxplot function, ggplot

This entry was posted in Technical. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.