新聞中的資料圖表工作坊｜ggplot2 繪圖教學

3/19/2017

Getting started

沒安裝的人今天應該進不來 😈😈😈

Summit Suen

Full Stack Developer / Data Scientist.
新聞電商推薦系統、輿情監控、運動數據分析
Taiwan R User Group 共同主持人｜社群活動｜聚會錄影

RStudio & R & …

R ecosystem 簡介

RStudio 新手上路

中文編碼設定

Tools > Global Options > Default text encoding

RStudio 新手上路

初始環境

Tools > Global Options

RStudio 新手上路

工作目錄設定（重要！）

`setwd('/path/where/your/data/located')`

RStudio 新手上路

`命令列`介面

注意命令列的狀態：`>` or `+`；愛惜生命，常用 `tab` 和 `?`

RStudio 新手上路

`程式碼` 編輯介面

養成在編輯界面撰寫的習慣

RStudio 新手上路

RStudio 的快樂夥伴快捷鍵們

All RStudio keyboard shortcuts

功能	Windows & Linux	Mac
顯示快捷鍵	Alt+Shift+K	Option+Shift+K
自動補完	Tab or Ctrl+Space	Tab or Command+Space
執行（單行／選取範圍）	Ctrl+Enter	Command+Enter
註解（單行／選取範圍）	Ctrl+Shift+C	Command+Shift+C
存擋	Ctrl+S	Command+S
縮排	Ctrl+I	Command+I

R 到底是什麼東西 R

先把 R 當成計算機

1 + 1

## [1] 2

sin(2017)

## [1] 0.09736191

pi

## [1] 3.141593

R 到底是什麼東西 R

統計是看家本領

# Kolmogorov-Smirnov Tests
ks.test(iris$Sepal.Length, iris$Petal.Length)

## Warning in ks.test(iris$Sepal.Length, iris$Petal.Length): p-value will be
## approximate in the presence of ties

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  iris$Sepal.Length and iris$Petal.Length
## D = 0.56, p-value < 2.2e-16
## alternative hypothesis: two-sided

R 到底是什麼東西 R

統計是看家本領

plot(density(iris$Sepal.Length), xlim = range(c(iris$Sepal.Length, iris$Petal.Length)), main = "Sample PDF")
lines(density(iris$Petal.Length), col = 2)

R 到底是什麼東西 R

A/B Testing（媒體、電子商務、流量成長駭客必備工具）

我們的 APP 推出了改版，要比較新舊 UI 功能的成效；於是 sample 了一群 user 試用新介面（功能）
新介面（功能）：在 3000 點擊有 10 個購買
舊介面（功能）：在 50000 點擊有 30 個購買
我怎麼知道這兩種方法的轉換率（購買／點擊）是不是明顯不同？

R 到底是什麼東西 R

A/B Testing（媒體、電子商務、流量成長駭客必備工具）

# install.packages("binom")
library(binom)
# Binomial confidence intervals
binom.confint(c(10, 30), c(3000, 50000), methods = "exact")

##   method  x     n        mean        lower        upper
## 1  exact 10  3000 0.003333333 0.0015995846 0.0061215478
## 2  exact 30 50000 0.000600000 0.0004048529 0.0008564274

資料前處理 `dplyr`

回頭來看 R 基礎語法

向量式的運算

c(1, 2, 3, 4) + 1

## [1] 2 3 4 5

c(1, 2, 3, 4) + c(2, 3, 4, 5)

## [1] 3 5 7 9

c(1, 2, 3, 4) + c(2, 10)

## [1]  3 12  5 14

％＆＠＄＾＊％＠！
「阿鬼你還是說中文吧！」

回頭來看 R 基礎語法

To understand computations in R, two slogans are helpful:

Everything that exists is an object.
Everything that happens is a function call."

— John Chambers

`+`

## function (e1, e2)  .Primitive("+")

`<-`

## .Primitive("<-")

`[`

## .Primitive("[")

`c`

## function (...)  .Primitive("c")

回頭來看 R 基礎語法

什麼是 function call？

1 + 1

## [1] 2

`+`(1, 1)

## [1] 2

回頭來看 R 基礎語法

沒關係不重要，搞懂 Dataframe 就好
（直翻）資料框，就是表格的意思啦
看起來大概差不多就是長這種樣子（？）

wang <- Pitching %>% filter(playerID == "wangch01") %>% arrange(desc(yearID))
wang

##   playerID yearID stint teamID lgID  W L  G GS CG SHO SV IPouts   H ER HR
## 1 wangch01   2013     1    TOR   AL  1 2  6  6  0   0  0     81  40 23  5
## 2 wangch01   2012     1    WAS   NL  2 3 10  5  0   0  0     97  50 24  5
## 3 wangch01   2011     1    WAS   NL  4 3 11 11  0   0  0    187  67 28  8
## 4 wangch01   2009     1    NYA   AL  1 6 12  9  0   0  0    126  66 45  7
## 5 wangch01   2008     1    NYA   AL  8 2 15 15  1   0  0    285  90 43  4
## 6 wangch01   2007     1    NYA   AL 19 7 30 30  1   0  0    598 199 82  9
## 7 wangch01   2006     1    NYA   AL 19 6 34 33  2   1  1    654 233 88 12
## 8 wangch01   2005     1    NYA   AL  8 5 18 17  0   0  0    349 113 52  9
##   BB  SO BAOpp  ERA IBB WP HBP BK BFP GF  R SH SF GIDP
## 1  9  14 0.351 7.67   0  2   0  0 123  0 24  0  0   NA
## 2 15  15 0.376 6.68   0  5   3  0 158  0 24  4  3   NA
## 3 13  25 0.272 4.04   0  2   1  0 264  0 35  2  2   NA
## 4 19  29 0.365 9.64   1  3   2  0 206  2 46  3  1   NA
## 5 35  54 0.249 4.07   1  0   3  0 402  0 44  0  3   NA
## 6 59 104 0.265 3.70   1  9   8  1 823  0 84  2  3   NA
## 7 52  76 0.277 3.63   4  6   2  1 900  1 92  3  2   NA
## 8 32  47 0.256 4.02   3  3   6  0 486  0 58  3  4   NA

新潮好用的玩意兒 R

好用工具懶人包 🎒👝👛👜💼
tidyverse

# install.packages("tidyverse")
library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

畫圖之前先把資料讀進來

readr::read_csv()
跟原生方法比起來，效率較高，也不會將文字格式轉成 factor（因子）來處理
比 data.table::fread() 略慢，不過語法較簡單

traffic <- read_csv("ggplot2-slides/traffic_eng.csv")

## Parsed with column specification:
## cols(
##   time.year = col_integer(),
##   time.month = col_integer(),
##   time.day = col_integer(),
##   time.hour = col_integer(),
##   time.minute = col_integer(),
##   event.level = col_character(),
##   location.district = col_character(),
##   location.address = col_character(),
##   number.dead = col_integer(),
##   number.injury = col_integer(),
##   party.sn = col_integer(),
##   vehicle.type = col_character(),
##   party.gender = col_character(),
##   party.age = col_integer(),
##   party.injury = col_character(),
##   location.weather = col_character(),
##   location.speed.limit = col_integer(),
##   location.road.type = col_character(),
##   location.type = col_character()
## )

畫圖之前先把資料讀進來

readr::read_csv()

head(traffic, n=3)

## # A tibble: 3 × 19
##   time.year time.month time.day time.hour time.minute event.level
##       <int>      <int>    <int>     <int>       <int>       <chr>
## 1       104          1        1         0          18        一般
## 2       104          1        1         0          18        一般
## 3       104          1        1         0          18        一般
## # ... with 13 more variables: location.district <chr>,
## #   location.address <chr>, number.dead <int>, number.injury <int>,
## #   party.sn <int>, vehicle.type <chr>, party.gender <chr>,
## #   party.age <int>, party.injury <chr>, location.weather <chr>,
## #   location.speed.limit <int>, location.road.type <chr>,
## #   location.type <chr>

畫圖之前先把資料讀進來

readr::read_csv()

traffic <- read_csv("ggplot2-slides/traffic_eng.csv", col_types = 
                      cols(
                        time.year = col_integer(),
                        time.month = col_integer(),
                        time.day = col_integer(),
                        time.hour = col_integer(),
                        time.minute = col_integer(),
                        event.level = col_character(),
                        location.district = col_character(),
                        location.address = col_character(),
                        number.dead = col_integer(),
                        number.injury = col_integer(),
                        party.sn = col_integer(),
                        vehicle.type = col_character(),
                        party.gender = col_character(),
                        party.age = col_integer(),
                        party.injury = col_character(),
                        location.weather = col_character(),
                        location.speed.limit = col_integer(),
                        location.road.type = col_character(),
                        location.type = col_character()
                      ))

畫圖之前先把資料讀進來

readr::read_csv()

head(traffic, n=3)

## # A tibble: 3 × 19
##   time.year time.month time.day time.hour time.minute event.level
##       <int>      <int>    <int>     <int>       <int>       <chr>
## 1       104          1        1         0          18        一般
## 2       104          1        1         0          18        一般
## 3       104          1        1         0          18        一般
## # ... with 13 more variables: location.district <chr>,
## #   location.address <chr>, number.dead <int>, number.injury <int>,
## #   party.sn <int>, vehicle.type <chr>, party.gender <chr>,
## #   party.age <int>, party.injury <chr>, location.weather <chr>,
## #   location.speed.limit <int>, location.road.type <chr>,
## #   location.type <chr>

畫圖之前先把資料讀進來

dplyr::data_frame()
跟原生方法比起來，效率較高，也不會將文字格式轉成 factor（因子）來處理
比 data.table::data.table() 略慢，不過語法較簡單

traffic <- data_frame(time.year = c(104), time.month = c(1), 
                      time.day = c(1), time.hour = c(0), 
                      time.minute = c(0, 1, 2), event.level = c("一般"))
head(traffic, n=3)

## # A tibble: 3 × 6
##   time.year time.month time.day time.hour time.minute event.level
##       <dbl>      <dbl>    <dbl>     <dbl>       <dbl>       <chr>
## 1       104          1        1         0           0        一般
## 2       104          1        1         0           1        一般
## 3       104          1        1         0           2        一般

畫圖之前先把資料讀進來

magrittr (pipe %>%)
把工作流程的每一站串接起來，簡化開發邏輯跟程式碼

head(traffic, n=3)

## # A tibble: 3 × 6
##   time.year time.month time.day time.hour time.minute event.level
##       <dbl>      <dbl>    <dbl>     <dbl>       <dbl>       <chr>
## 1       104          1        1         0           0        一般
## 2       104          1        1         0           1        一般
## 3       104          1        1         0           2        一般

畫圖之前先把資料讀進來

magrittr (pipe %>%)
把工作流程的每一站串接起來，簡化開發邏輯跟程式碼

traffic %>% head(3)

## # A tibble: 3 × 6
##   time.year time.month time.day time.hour time.minute event.level
##       <dbl>      <dbl>    <dbl>     <dbl>       <dbl>       <chr>
## 1       104          1        1         0           0        一般
## 2       104          1        1         0           1        一般
## 3       104          1        1         0           2        一般

畫圖之前先把資料讀進來

dplyr::select()

wang %>% select(yearID, teamID, W, L, ERA)

##   yearID teamID  W L  ERA
## 1   2013    TOR  1 2 7.67
## 2   2012    WAS  2 3 6.68
## 3   2011    WAS  4 3 4.04
## 4   2009    NYA  1 6 9.64
## 5   2008    NYA  8 2 4.07
## 6   2007    NYA 19 7 3.70
## 7   2006    NYA 19 6 3.63
## 8   2005    NYA  8 5 4.02

畫圖之前先把資料讀進來

dplyr::filter()

wang %>% filter(GS > 10)

##   playerID yearID stint teamID lgID  W L  G GS CG SHO SV IPouts   H ER HR
## 1 wangch01   2011     1    WAS   NL  4 3 11 11  0   0  0    187  67 28  8
## 2 wangch01   2008     1    NYA   AL  8 2 15 15  1   0  0    285  90 43  4
## 3 wangch01   2007     1    NYA   AL 19 7 30 30  1   0  0    598 199 82  9
## 4 wangch01   2006     1    NYA   AL 19 6 34 33  2   1  1    654 233 88 12
## 5 wangch01   2005     1    NYA   AL  8 5 18 17  0   0  0    349 113 52  9
##   BB  SO BAOpp  ERA IBB WP HBP BK BFP GF  R SH SF GIDP
## 1 13  25 0.272 4.04   0  2   1  0 264  0 35  2  2   NA
## 2 35  54 0.249 4.07   1  0   3  0 402  0 44  0  3   NA
## 3 59 104 0.265 3.70   1  9   8  1 823  0 84  2  3   NA
## 4 52  76 0.277 3.63   4  6   2  1 900  1 92  3  2   NA
## 5 32  47 0.256 4.02   3  3   6  0 486  0 58  3  4   NA

畫圖之前先把資料讀進來

dplyr::summarise()

wang %>% group_by(lgID) %>% summarise(mean(ERA))

## # A tibble: 2 × 2
##     lgID `mean(ERA)`
##   <fctr>       <dbl>
## 1     AL       5.455
## 2     NL       5.360

資料視覺化 `ggplot2`

Peorth Chen

Economist.
Data Analyst, KKBOX Data science team.
Amuture pâtissier and home chef.

What is data science?

What does data scientist do? (Reversed)

運用資料視覺化工具呈現研究結果
- ggplot!
詮釋或利用適當的統計方法分析資料
取得、清洗、轉化資料
- "If you torture the data enough, nature will always confess."(Ronald Coase, 1981)
- it can even be a mutual torturing activity

What does data scientist do? (Reversed)

瞭解你的資料
- 裡面有些什麼?
- 資料欄位代表些什麼?
- 資料的分佈、變數間的關聯
- 裡頭有沒有錯?
- 也是ggplot 的用處!

What does data scientist do? (Reversed)

定義問題. (但通常是「重新定義」問題)
- 「這首歌會紅嗎？」 (X)
- 「什麼數據或指標可以有效的預測歌曲熱門度？」 (O)
最重要的兩個階段，但常常都需要先探索資料才會出現好的問題
Difference between mediocre/tedious and intriguing/great data science project

一圖值千金

1918 flu pandemic

環境

library(tidyverse)

讀入資料

臺北市政府警察局交通警察大隊/交通事故資料
使用readr的read_csv函數，而不是內建的read.csv

traffic <- read_csv("traffic_eng.csv")
head(traffic, n=3)

## # A tibble: 3 × 19
##   time.year time.month time.day time.hour time.minute event.level
##       <int>      <int>    <int>     <int>       <int>       <chr>
## 1       104          1        1         0          18        一般
## 2       104          1        1         0          18        一般
## 3       104          1        1         0          18        一般
## # ... with 13 more variables: location.district <chr>,
## #   location.address <chr>, number.dead <int>, number.injury <int>,
## #   party.sn <int>, vehicle.type <chr>, party.gender <chr>,
## #   party.age <int>, party.injury <chr>, location.weather <chr>,
## #   location.speed.limit <int>, location.road.type <chr>,
## #   location.type <chr>

探索資料　

看一下時間分佈好了

traffic %>%
    select(time.month, time.day, time.hour, time.minute) %>%
    summary()

##    time.month       time.day       time.hour      time.minute  
##  Min.   : 1.00   Min.   : 1.00   Min.   : 0.00   Min.   : 0.0  
##  1st Qu.: 4.00   1st Qu.: 8.00   1st Qu.: 9.00   1st Qu.:13.0  
##  Median : 7.00   Median :16.00   Median :14.00   Median :30.0  
##  Mean   : 6.79   Mean   :15.67   Mean   :13.65   Mean   :27.6  
##  3rd Qu.:10.00   3rd Qu.:23.00   3rd Qu.:18.00   3rd Qu.:42.0  
##  Max.   :12.00   Max.   :31.00   Max.   :23.00   Max.   :59.0

看的出來是24小時制的紀錄方式，但如果想知道整體分配，這樣好像沒什麼感覺

探索資料　

每小時的分配次數?

traffic %>%
    group_by(time.hour) %>%
    tally() -> pivot.time.hour
    # count() -> pivot.time.hour
pivot.time.hour %>% head(5)

## # A tibble: 5 × 2
##   time.hour     n
##       <int> <int>
## 1         0   849
## 2         1   513
## 3         2   360
## 4         3   282
## 5         4   298

5個值的話還好，24個呢? 60? 365?

探索資料, your first ggplot

qq <- ggplot(data = pivot.time.hour)
qq <- qq + geom_bar(aes(x=time.hour, y=n), stat = "identity")
print(qq)

探索資料, your first ggplot

qq <- ggplot(data = pivot.time.hour)
qq <- qq + geom_bar(aes(x=time.hour, y=n), stat = "identity")
print(qq)

發生了什麼事？　
準備好畫紙ggplot()，指定要畫的資料是pivot.time.hour
疊加上一層長條圖geom_bar()，aes()指定x軸與y軸。
資料已經計算好個數，stat = "identity"指定使用的數據是已算好的結果
print()把圖表印出來!

ggplot and Grammar of Graphics

ggplot遵循一套稱為"Grammar of Graphics"的設計準則，將圖表拆解成許多元素
疊圖!
aes()代表了"Aesthetic mapping"，指定那些看的見的元素
位置(x, y)、顏色(color, fill)、形狀(shape)、大小(size)……
geom_系列的函數，決定"Geometric object"，這些元素最後組成什麼圖表
長條圖(bar)、散點圖(point)、折線圖(line)……etc.

ggplot and Grammar of Graphics

圖表的類型可以輕易的被改變，利用折線圖geom_line()取代長條圖geom_bar()

qq <- ggplot(data = pivot.time.hour)
qq <- qq + geom_line(aes(x=time.hour, y=n))
print(qq)

ggplot and Grammar of Graphics 　

甚至疊加!

qq <- ggplot(data = pivot.time.hour)
qq <- qq + geom_line(aes(x=time.hour, y=n))
qq <- qq + geom_point(aes(x=time.hour, y=n))
print(qq)

ggplot and Grammar of Graphics

探索資料, more! 　

~~來戰南北囉啊不對這是台北市的資料~~
那就戰男女!!! 把剛才的時序資料，按照性別分開

traffic %>% 
    group_by(time.hour, party.gender) %>% 
    tally() -> pivot.hour.gender
    # count() -> pivot.hour.gender

pivot.hour.gender %>% head(3)

## Source: local data frame [3 x 3]
## Groups: time.hour [1]
## 
##   time.hour         party.gender     n
##       (int)                (chr) (int)
## 1         0                   女   195
## 2         0 無或物(動物、堆置物)    25
## 3         0                   男   629

探索資料, 戰男女!

qq <- ggplot(data = pivot.hour.gender)
qq <- qq + geom_line(aes(x=time.hour, y=n, colour=party.gender))
print(qq)

aes()中依舊包含關於位置的資訊(x, y)，但是這次有新朋友colour=party.gender
本來都是一樣的黑色，那顏色本身能不能包含資訊呢?
colour=party.gender代表「利用性別這個欄位，著上不同的顏色！」
colour, size, shape……等等，都可以用這種方式呈現！

探索資料, 戰男女!

需要外星翻譯官…….

More than data

R是洋人的玩意ㄦ，ggplot也是……~~八月丨五殺韃了!!!!~~
預設字型，顯示不了中文!
需要指定可以兼容的字型，交給theme()來處理這些資料以外的問題!

More than data

qq <- ggplot(data = pivot.hour.gender)
qq <- qq + geom_line(aes(x=time.hour, y=n, colour=party.gender))
# qq <- qq + theme(text=element_text(family = "Microsoft JhengHei",
qq <- qq + theme(text=element_text(family = "STHeiti",
                                   colour="red"))
print(qq)

More than data

圖上的字型，現在都變成微軟正黑體／蘋果黑體的形狀啦!

Improve it!

把這張圖交出去的話會被電~~沒被電的話你要懷疑你上司的程度~~
沒有圖表標題
x軸與y軸是什麼意思?
x軸想要每3個小時畫一個標記，y軸的字能不能轉向?
圖例補上標題，放在底下

Improve it!

qq <- ggplot(data = pivot.hour.gender)
qq <- qq + geom_line(aes(x=time.hour, y=n, colour=party.gender))
# qq <- qq + theme(text=element_text(family = "Microsoft JhengHei"), 
qq <- qq + theme(text=element_text(family = "STHeiti"),
                 axis.title.y = element_text(angle = 0, vjust = 0.5), 
                 legend.position = "bottom") 
qq <- qq + labs(title="台北市每時交通事故人數，按性別分",
                x="時間", y="人數")
qq <- qq + scale_x_continuous(breaks = seq(0,24,3))
qq <- qq + scale_colour_discrete(name="當事人性別")
print(qq)

theme()裡新增了對y軸標題的修改，包括字串呈現的角度(angle)與位置(vjust)，並將圖例(legend)的位置放到底部
labs()指定x軸與y軸的變數說明，以及整張圖的大標題(title)
scale_x_continuous()每3個小時產生一個斷點(breaks)
scale_colour_discrete()修改圖例的名稱

Improve it!

It's show time!

試試看利用簡單的敘述統計跟ggplot，探索一下這筆資料吧!
畫個「分鐘」的長條圖，你會發現警察登記時的習性
除了這個資料，我們準備了一些有趣的東西可以玩
20年的大學榜單資料

你的名字

college.admission <- read_csv("college_admission.csv")
college.admission %>% arrange(-n) %>% head(5)

## # A tibble: 5 × 6
##    year first.name first.pinyin        n national.colleges top.5.colleges
##   <int>      <chr>        <chr>    <dbl>             <dbl>          <dbl>
## 1  1999       怡君        yijun 407.0708         126.86109       44.61049
## 2  2000       怡君        yijun 398.6149         112.73957       36.23772
## 3  2001       怡君        yijun 379.4943         109.54474       28.69029
## 4  2002       雅婷       yating 379.2310         113.12653       28.28163
## 5  2003       雅婷       yating 352.9057          90.54816       19.73486

菜市場來的

聽起來就像學妹

好像轉到民視

Let's try it!

手把手時間

Learning Materials

BONUS 時間

畫地圖

上次的知了小學堂 Miss Monday 教了大家 QGIS
你知道在 R 裡面也可以畫地圖嗎 😂😂😂
讀 shapefile 地圖檔（可以在 QGIS 先處理完）

# install.packages(c("rgdal", "ggmap"))
# brew install gdal
library(rgdal)
library(ggplot2)
library(ggmap)
shapefile <- readOGR("shapefile", "Town_MOI_1041215_C_Name_臺北市")

## OGR data source with driver: ESRI Shapefile 
## Source: "shapefile", layer: "Town_MOI_1041215_C_Name_臺北市"
## with 12 features
## It has 10 fields

畫地圖

使用 ggplot2::fortify() 將讀進來的 shapefile 轉換成為 DataFrame

shapefile_df <- fortify(shapefile)

## Regions defined for each Polygons

shapefile_df %>% head()

##       long      lat order  hole piece id group
## 1 121.5714 25.07429     1 FALSE     1  0   0.1
## 2 121.5715 25.07424     2 FALSE     1  0   0.1
## 3 121.5715 25.07398     3 FALSE     1  0   0.1
## 4 121.5717 25.07361     4 FALSE     1  0   0.1
## 5 121.5717 25.07344     5 FALSE     1  0   0.1
## 6 121.5718 25.07321     6 FALSE     1  0   0.1

畫地圖

使用 ggplot2::geom_path() 畫地圖

map <- ggplot() + 
  geom_path(data = shapefile_df, 
            aes(x = long, y = lat, group = group),
            color = 'gray', size = 1)
print(map)

畫地圖

運用座標投影修正畫出來的地圖

map_projected <- map + coord_map()
print(map_projected)

Getting started

沒安裝的人今天應該進不來 😈😈😈

Summit Suen

RStudio & R & …

R ecosystem 簡介

RStudio 新手上路

中文編碼設定

Tools > Global Options > Default text encoding

RStudio 新手上路

初始環境

Tools > Global Options

RStudio 新手上路

工作目錄設定（重要！）

setwd('/path/where/your/data/located')

RStudio 新手上路

命令列介面

注意命令列的狀態：> or +；愛惜生命，常用 tab 和 ?

RStudio 新手上路

程式碼 編輯介面

養成在編輯界面撰寫的習慣

RStudio 新手上路

RStudio 的快樂夥伴快捷鍵們

R 到底是什麼東西 R

先把 R 當成計算機

R 到底是什麼東西 R

統計是看家本領

R 到底是什麼東西 R

統計是看家本領

R 到底是什麼東西 R

A/B Testing（媒體、電子商務、流量成長駭客必備工具）

R 到底是什麼東西 R

A/B Testing（媒體、電子商務、流量成長駭客必備工具）

資料前處理 dplyr

回頭來看 R 基礎語法

回頭來看 R 基礎語法

回頭來看 R 基礎語法

回頭來看 R 基礎語法

新潮好用的玩意兒 R

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

畫圖之前先把資料讀進來

資料視覺化 ggplot2

Peorth Chen

What is data science?

What does data scientist do? (Reversed)

What does data scientist do? (Reversed)

What does data scientist do? (Reversed)

一圖值千金

環境

讀入資料

探索資料

探索資料

探索資料, your first ggplot

探索資料, your first ggplot

ggplot and Grammar of Graphics

ggplot and Grammar of Graphics

ggplot and Grammar of Graphics

ggplot and Grammar of Graphics

ggplot and Grammar of Graphics

探索資料, more!

探索資料, 戰男女!

探索資料, 戰男女!

More than data

More than data

More than data

Improve it!

Improve it!

Improve it!

It's show time!

你的名字

菜市場來的

聽起來就像學妹

好像轉到民視

`setwd('/path/where/your/data/located')`

`命令列`介面

注意命令列的狀態：`>` or `+`；愛惜生命，常用 `tab` 和 `?`

`程式碼` 編輯介面

資料前處理 `dplyr`

資料視覺化 `ggplot2`

探索資料　

探索資料　

ggplot and Grammar of Graphics 　

ggplot and Grammar of Graphics 　

探索資料, more!