R-intro

-------------------------------------------------
【１】Ｒへの入門
-------------------------------------------------
今回の集中講義で用いる統計言語「Ｒ」は，フリーであるにもかかわらず，信頼のおけるソフトウェアです．オンラインでのドキュメントがたくさんあり，下記はそのいくつかです．日本語による解説や事例紹介もありますので，参考になると思います．

○ 「Ｒ」による統計解析（群馬大学・青木さん）
http://aoki2.si.gunma-u.ac.jp/R/
すぐ使える事例がたくさん．解説も親切．

○ 日本語の解説としては、他にも：

R - 統計解析とグラフィックスの環境
http://datamining.tama.ac.jp/~yama/R/

Notes on R (翻訳)
http://datamining.tama.ac.jp/~yama/R/notes.html

○ 「Ｒ」の本家である CRAN のウェブサイト：

The Comprehensive R Archive Network
http://cran.r-project.org/

にあるソフトウェア集：

Packaged Softwares
http://cran.r-project.org/src/contrib/PACKAGES.html

にも膨大な数の「Ｒ」プログラムが。

-------------------------------------------------
【２】教科書と参考書

-------------------------------------------------

統計言語「Ｒ」については、教科書が今まではなかったのですが、今年になってやっと出ました:

著者 Peter Dalgaard 2002.
書名 "Introductory Statistics with R"
刊行 2002年
出版 Springer Verlag, New York
頁数 xvi+267 pp.
価格 EURO 29.95 (paperback)
ISBN 0-387-95475-9

これ↑は、今回の講義で私がテキストとして使っている本です。「Ｒ」を横に立ち上げながら使える実習書として利用できます。「Ｒ」の機能などひととおりのことはさらっと把握できます。

他にもこんな参考書があります．

Ｒは，商業ベースのＳあるいはＳ-plus とほぼ同一なので，Ｓ（Ｓ-plus）の本はそのまま使えます．ＳやＳ-plus は日本語の本もすでに出ているので，オンライン検索すればヒットするはず．ここでは，私が持っている新刊を２冊紹介します：

著者 W.N. Venables and B.D. Ripley
書名 Modern Applied Statistics with S (Fourth Edition)
刊行 2002年
出版 Springer Verlag, New York
頁数 xii+495 pp.
価格 EURO 74.95 (hardcover)
ISBN 0-387-95457-0

この↑本は，Ｓの定評ある教科書で，第３版はすでに翻訳されてい
ます（訳文に難ありとのこと）．タイトルは「Ｓ」と銘打たれてい
ますが，中では「Ｓ専用」とか「Ｒでも可能」と付箋されているの
で，Ｒのテキストとして利用できます．複雑なデータ解析へのＳや
Ｒの利用法が書かれているので，実践的に役立ちます．

著者 A. Krause and M. Olsen

書名 The Basics of S-Plus (Third Edition)

刊行 2002年
出版 Springer Verlag, New York
頁数 xx+419 pp.
価格 EURO 59.95 (paperback)
ISBN 0-387-95456-2

これ↑はＳの入門書．Ｒについての章もあります．

-------------------------------------------------
【３】データの読みこみ
-------------------------------------------------
Ｒにデータを入力するには：

１）コマンドラインからの入力
２）ファイルからの入力

の二つの方法があります．

１）コマンドラインからの入力
簡単なデータならばコマンドラインから直接キー入力してもいいでしょう．たとえば：

> daily.intake <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515,8230,8770)
※11個の数値データを daily.intake に格納する（「c( )」はベクトル）
> daily.intake
[1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770
※ daily.intake の内容表示

このように入力されたデータについては，たとえば，平均（mean）・標準偏差（sd）・分位点（quantile）などの記述統計量を下記のように計算できます：

> mean(daily.intake)
[1] 6753.636

> sd(daily.intake)

[1] 1142.123
> quantile(daily.intake)
0% 25% 50% 75% 100%
5260 5910 6515 7515 8770

また，母平均μに関する仮説をｔ検定することも：

> t.test(daily.intake, mu=7725)

One Sample t-test

data: daily.intake
t = -2.8208, df = 10, p-value = 0.01814
alternative hypothesis: true mean is not equal to 7725
95 percent confidence interval:
5986.348 7520.925
sample estimates:
mean of x
6753.636

２）ファイルからの読みこみ
しかし，大きなデータの場合には，事前にデータ・ファイルとして別に作成するのが得策です．Ｒに入力できるファイルの形式はプレーンテキストだけです．たとえば，Box1演習に用いたデータファイル"Box1_R.tab"は下記のような内容です：

TRT DATA
001 DM1 2537
002 DM1 2069
003 DM1 2104
004 DM1 1797
005 DM2 3366
006 DM2 2591
007 DM2 2211

008 DM2 2544

009 DDT 2536
010 DDT 2459
011 DDT 2827
012 DDT 2385
013 AZO 2387
014 AZO 2453
015 AZO 1556
016 AZO 2116
017 DB 1997
018 DB 1679
019 DB 1649
020 DB 1859
021 DK 1796
022 DK 1704
023 DK 1904
024 DK 1320
025 Con 1401
026 Con 1516
027 Con 1270
028 Con 1077

このデータを Box1 に読みこんで表示させると下記のようになります：

> Box1 <- read.table("Box1_R.tab")
> Box1
TRT DATA
1 DM1 2537
2 DM1 2069
3 DM1 2104
4 DM1 1797
5 DM2 3366

///////////中略

28 Con 1077

しかし，行番号をあらかじめもたなくても読みこむことができます．たとえば，つぎの"Box1_R.data"：

TRT DATA
DM1 2104
DM1 1797
DM2 3366
DM2 2591
DM2 2211
DM2 2544
DDT 2536
DDT 2459
DDT 2827
DDT 2385
AZO 2387
AZO 2453
AZO 1556
AZO 2116
DB 1997
DB 1679
DB 1649
DB 1859
DK 1796
DK 1704
DK 1904
DK 1320
Con 1401
Con 1516
Con 1270
Con 1077

をＲに入力するには，下記の指定をします：

> Box1 <- read.table("Box1_R.data", header=T)

「header=T」とは，第１行に列の名称（ヘッダ）が指定されているという意味で，この読みこみをした結果は，上の場合と同一です：

TRT DATA
1 DM1 2537
2 DM1 2069
3 DM1 2104
4 DM1 1797
5 DM2 3366
///////////中略
28 Con 1077

その他，コンマやセミコロンで区切られた csv 形式のファイル，あるいはタブ区切りされたファイルも読みこみ可能です．

-------------------------------------------------
【４】正規分布に関連する関数
-------------------------------------------------
正規分布に関連する関数（dnorm, pnorm, qnorm, rnorm）

●平均0, 標準偏差0.8の正規分布の確率密度関数（dnorm）
> x <- seq(-3, 3, 0.05)
> plot(x,dnorm(x, mean=0, sd=0.4), type="n")
> curve(dnorm(x, mean=0, sd=0.8), type="l",add=T)

●正規分布の確率分布関数（pnorm）とその逆関数（qnorm）
> curve(pnorm(x, mean=0, sd=0.8), type="l", lty=3, add=T)

●５％点の表示

> abline(h=0.05)

> lower.alpha5 <- qnorm(0.05, mean=0, sd=0.8)
> lower.alpha5
[1] -1.315883
> abline(v=lower.alpha5)
> points(lower.alpha5, 0.05, cex=3.0, pch="*")

> abline(h=0.95)
> upper.alpha5 <- qnorm(0.05, mean=0, sd=0.8, lower.tail = FALSE)
> upper.alpha5
[1] 1.315883
> abline(v=upper.alpha5)
> points(upper.alpha5, 0.95, cex=3.0, pch="*")

●１％点の表示
> abline(h=0.01, lty=2)
> lower.alpha1 <- qnorm(0.01, mean=0, sd=0.8)
> lower.alpha1
[1] -1.861078
> abline(v=lower.alpha1, lty=2)
> points(lower.alpha1, 0.01, cex=3.0, pch="*")

> abline(h=0.99, lty=2)
> upper.alpha1 <- qnorm(0.01, mean=0, sd=0.8, lower.tail = FALSE)
> upper.alpha1
[1] 1.861078
> abline(v=upper.alpha1, lty=2)
> points(upper.alpha1, 0.99, cex=3.0, pch="*")

●正規乱数（rnorm）の生成とヒストグラム表示
> random.norm <- rnorm(10, mean=0, sd=0.8)
> hist(random.norm, freq=F)

> random.norm <- rnorm(100, mean=0, sd=0.8)

> hist(random.norm, freq=F)
> random.norm <- rnorm(1000, mean=0, sd=0.8)
> hist(random.norm, freq=F)
> random.norm <- rnorm(10000, mean=0, sd=0.8)
> hist(random.norm, freq=F)
> random.norm <- rnorm(100000, mean=0, sd=0.8)
> hist(random.norm, freq=F)
> random.norm <- rnorm(1000000, mean=0, sd=0.8)
> hist(random.norm, freq=F)
> curve(dnorm(x, mean=0, sd=0.8), add=T)

●正規分布のパラメーター（１）――平均μを変える
> x <- seq(-4, 4, 0.01)
> plot(x, dnorm(x, mean=0, sd=0.5), type="n")
> title("Normal Distribution\nmean=0 -> 2.0")
> for (i in 1:5) curve(dnorm(x, mean=0.5*(i-1), sd=0.5), type="l", add=T)

●正規分布のパラメーター（２）――分散σ^2を変える
> x <- seq(-4, 4, 0.01)
> plot(x, dnorm(x, mean=0, sd=0.5), type="n")
> title("Normal Distribution\nsd=0.5 -> 2.5")
> for (i in 1:5) curve(dnorm(x, mean=0, sd=0.5*i), type="l", add=T)

●標準正規分布（平均０，分散１）
> mean1 <- 1.0
> sd2 <- 2.0
> plot(x, dnorm(x, mean=0, sd=1), type="n")
> x <- rnorm(10000, mean=mean1, sd=sd2)
> hist(x, freq=F, density=25, angle=135, add=T)
> curve(dnorm(x, mean=mean1, sd=sd2), type="l", lty=2, lwd=2, add=T)
> hist((x - mean1)/sd2, freq=F, density=25, angle=45, add=T)

> curve(dnorm(x, mean=0, sd=1), type="l", lty=1, lwd=2, add=T)

-------------------------------------------------
【５】ｔ分布に関連する関数
-------------------------------------------------
ｔ分布に関連する関数（dt, pt, qt, rt）

●ｔ分布の密度関数（dt）を表示死，標準正規分布と比較
> x <- seq(-4, 4, 0.01)
> plot(x, dt(x, 20), type="n")
> curve(dt(x, 5), type="l", add=T)
> curve(dnorm(x), type="l", lty=2, add=T)

●５％点の表示
> abline(h=0.05)
> lower.alpha5 <- qt(0.05, 5)
> lower.alpha5
[1] -2.015048
> abline(v=lower.alpha5)
> points(lower.alpha5, 0.05, cex=3.0, pch="*")

> upper.alpha5 <- -lower.alpha5
> upper.alpha5
[1] 2.015048
> abline(v=upper.alpha5)
> points(upper.alpha5, 0.95, cex=3.0, pch="*")

●ｔ分布のパラメーター――自由度を変える
> x <- seq(-4, 4, 0.01)
> plot(x, dt(x, 20), type="n")
> title("t Distribution\ndf=5 -> 1")
> for (i in 1:5) curve(dt(x, 5-(i-1)), type="l", add=T)

-------------------------------------------------
【６】χ二乗分布に関連する関数
-------------------------------------------------
χ二乗分布に関連する関数（dchisq, pchisq, qchisq, rchisq）

●χ二乗分布の密度関数（dchisq）を表示
> x <- seq(0, 20, 0.01)
> plot(x, dchisq(x, 5), type="n")
> curve(dchisq(x, 10), type="l", add=T)

●ｔ分布のパラメーター――自由度を変える
> x <- seq(0, 20, 0.01)
> plot(x, dchisq(x, 5), type="n")
> title("Chi-square Distribution\ndf=5 -> 10")
> for (i in 1:5) curve(dchisq(x, 5+i), type="l", add=T)

-------------------------------------------------
【７】Ｆ分布に関連する関数
-------------------------------------------------
Ｆ分布に関連する関数（df, pf, qf, rf）

●Ｆ分布の密度関数（df）を表示
> x <- seq(0, 4, 0.01)
> plot(x, df(x, 15, 50), type="n")
> curve(df(x, 13, 50), type="l", add=T)

●Ｆ分布の確率分布関数（pf）を表示
> curve(pf(x, 13, 50), type="l", lty=3, add=T)

●５％点の表示
> abline(h=0.05)

> lower.alpha5 <- qf(0.05, 13, 50)

> lower.alpha5
[1] 0.4321874
> abline(v=lower.alpha5)
> points(lower.alpha5, 0.05, cex=3.0, pch="*")

> abline(h=0.95)
> upper.alpha5 <- qf(0.05, 13, 50, lower.tail = FALSE)
> upper.alpha5
[1] 1.921429
> abline(v=upper.alpha5)
> points(upper.alpha5, 0.95, cex=3.0, pch="*")

●１％点の表示
> abline(h=0.01, lty=2)
> lower.alpha1 <- qf(0.01, 13, 50)
> lower.alpha1
[1] 0.2962809
> abline(v=lower.alpha1, lty=2)
> points(lower.alpha1, 0.01, cex=3.0, pch="*")

> abline(h=0.99, lty=2)
> upper.alpha1 <- qf(0.01, 13, 50, lower.tail = FALSE)
> upper.alpha1
[1] 2.508328
> abline(v=upper.alpha1, lty=2)
> points(upper.alpha1, 0.99, cex=3.0, pch="*")

●Ｆ分布のパラメーター（１）――分子自由度n1を変える
> x <- seq(0, 4, 0.01)
> plot(x, df(x, 15, 50), type="n")
> title("F Distribution\nn1=13 -> 5")

> for (i in 1:5) curve(df(x, 13-2*(i-1), 50), type="l", add=T)

●Ｆ分布のパラメーター（２）――分母自由度n2を変える
> x <- seq(0, 4, 0.01)
> plot(x, df(x, 15, 50), type="n")
> title("F Distribution\nn2=50 -> 10")
> for (i in 1:5) curve(df(x, 13, 50-10*(i-1)), type="l", add=T)

●Ｆ分布のパラメーター（３）――分子と分母の自由度を同時に変える
> x <- seq(0, 4, 0.01)
> plot(x, df(x, 15, 50), type="n")
> title("F Distribution\nn1=13 -> 5\nn2=50 -> 10")
> for (i in 1:5) curve(df(x, 13-2*(i-1), 50-10*(i-1)), type="l", add=T)

-------------------------------------------------
【８】正規分布のもとでの棄却域の図示
-------------------------------------------------

●正規分布（平均0, 標準偏差0.8）の図示
> x <- seq(-3,3,0.01)
> plot(x,dnorm(x,mean=0,sd=0.8),type="n")
> curve(dnorm(x,mean=0,sd=0.8),type="l",add=T)

●棄却水準（α＝0.05）を設定と表示
> alpha <- 0.05
> title("Alpha=0.05")

●左側棄却域の表示
> xmin <- -3
> xmax <- 3
> critical.left <- qnorm(alpha/2, mean=0, sd=0.8)
> xaxis <- seq(xmin, critical.left, length=100)

> yaxis <- c(dnorm(xaxis, mean=0, sd=0.8), 0, 0)

> yaxis <- c(dnorm(xaxis, mean=0, sd=0.8), 0, 0)
> xaxis <- c(xaxis, critical.left, xmin)
> polygon(xaxis, yaxis, density=25)

●右側棄却域の表示
> critical.right <- qnorm(alpha/2, mean=0,sd=0.8,lower.tail=F)
> xaxis <- seq(critical.right, xmax, length=100)
> yaxis <- c(dnorm(xaxis, mean=0, sd=0.8), 0, 0)
> xaxis <- c(xaxis, xmax, critical.right)
> polygon(xaxis, yaxis, density=25)

●棄却域タイトル表示
> ypos <- dnorm(critical.left, mean=0, sd=0.8)
> text(xmin, ypos, "rejection\nregion", adj=0)
> text(xmax, ypos, "rejection\nregion", adj=1)

●受容域タイトル表示
> text((critical.left+critical.right)/2, 2*ypos+0.02, "acceptance region")
> xaxis <- c(rep(critical.left,2), rep(critical.right,2))
> yaxis <- c(2*ypos-0.02, 2*ypos, 2*ypos, 2*ypos-0.02)
> lines(xaxis,yaxis)

●α＝0.05での棄却水準値
> critical.left
[1] -1.567971
> critical.right
[1] 1.567971

-------------------------------------------------