実験計画法の基本的な考え方は、要因間のばらつきと、 外乱(誤差)のばらつきを比較して、要因間のばらつきが大きいければ 成果物(の量)の違いは「要因による意味のある差(有意)」が存在すると 判断することである。 その際に用いるアイディアとして、全体のばらつきを、 要因間のばらつきと外乱(誤差)のばらつきに算術的に分解出来ることである。 ST=SA+Se
ばらつきの指標が分散なので、「分散分析」を行い判断することになる。 大きさの比較として比を用い、分散の比はF分布に従うことを利用して検定を行う。 「帰無仮説H0: 要因間に差がない」とした場合の検定。
/* Lesson 15-01 */ /* File Name = les1501.sas 02/02/21 */ options nocenter linesize=78 pagesize=30; options locale='en_US'; /* options locale='ja_JP'; */ proc printto print = 'StatM20/les1501-Results.txt' new; ods listing gpath='StatM20/SAS_ODS15'; data polymer; infile 'StatM20/table811.csv' firstobs=2 dlm=',' dsd encoding=sjis termstr=crlf ; input A R Y; proc print data=polymer; run; proc glm data=polymer; : 実験計画法 class A; : 水準の変量 model Y = A; : モデル means A / tukey; : 水準間の比較(多重比較) run;
Monday, February 1, 2021 06:21:52 PM 63 Obs A R Y 1 1 1 10.8 2 1 2 9.9 3 1 3 10.7 4 1 4 10.4 5 1 5 9.7 6 2 1 10.7 7 2 2 10.6 8 2 3 11.0 9 2 4 10.8 10 2 5 10.9 11 3 1 11.9 12 3 2 11.2 13 3 3 11.0 14 3 4 11.1 15 3 5 11.3 16 4 1 11.4 17 4 2 10.7 18 4 3 10.9 19 4 4 11.3 20 4 5 11.7 Monday, February 1, 2021 06:21:52 PM 64 The GLM Procedure Class Level Information Class Levels Values A 4 1 2 3 4 Number of Observations Read 20 Number of Observations Used 20 Monday, February 1, 2021 06:21:52 PM 65 The GLM Procedure Dependent Variable: Y Sum of Source DF Squares Mean Square F Value Pr > F Model 3 3.10000000 1.03333333 7.58 0.0022 Error 16 2.18000000 0.13625000 Corrected Total 19 5.28000000 R-Square Coeff Var Root MSE Y Mean 0.587121 3.386427 0.369121 10.90000 Source DF Type I SS Mean Square F Value Pr > F A 3 3.10000000 1.03333333 7.58 0.0022 Monday, February 1, 2021 06:21:52 PM 66 The GLM Procedure Dependent Variable: Y Source DF Type III SS Mean Square F Value Pr > F A 3 3.10000000 1.03333333 7.58 0.0022 Monday, February 1, 2021 06:21:52 PM 67 The GLM Procedure Tukey's Studentized Range (HSD) Test for Y NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 16 Error Mean Square 0.13625 Critical Value of Studentized Range 4.04606 Minimum Significant Difference 0.6679
/* Lesson 15-02 */ /* File Name = les1502.sas 02/02/21 */ options nocenter linesize=78 pagesize=30; options locale='en_US'; /* options locale='ja_JP'; */ proc printto print = 'StatM20/les1502-Results.txt' new; ods listing gpath='StatM20/SAS_ODS15'; data polymer; infile 'StatM20/table821.csv' firstobs=2 dlm=',' dsd encoding=sjis termstr=crlf ; input A R Y; proc print data=polymer; run; proc glm data=polymer; class A; model Y = A; means A / tukey; run;
/* Lesson 15-03 */ /* File Name = les1503.sas 02/02/21 */ options nocenter linesize=78 pagesize=30; options locale='en_US'; /* options locale='ja_JP'; */ proc printto print = 'StatM20/les1503-Results.txt' new; ods listing gpath='StatM20/SAS_ODS15'; data gakusei; infile 'StatM20/StudAll20e.csv' firstobs=8 dlm=',' dsd missover encoding=sjis termstr=crlf; input sex $ shintyou taijyuu kyoui jitaku : $10. kodukai carryer $ tsuuwa; if shintyou='.' or taijyuu='.' or kyoui='.' then delete; if kyoui<60 then delete; if taijyuu>85 then delete; proc print data=gakusei(obs=5); run; proc means data=gakusei; run; : 計算結果を out_clustに、クラス数を 2に指定 proc fastclus data=gakusei out=out_clust maxclusters=2; var shintyou taijyuu kyoui; : 変量を指定 run; proc plot data=out_clust; plot shintyou*taijyuu=cluster; : プロット場所にクラスター番号を表示 plot taijyuu*kyoui=cluster; plot kyoui*shintyou=cluster; run; proc print data=out_clust(obs=20); : 計算結果の出力(形式1) run; /* Output to text file */ : 計算結果をテキストファイルに書き出す(形式2) data _null_; : ファイルに書き出す set out_clust; : 書き出すデータセットを指定 file 'StatM20/les1503-OutValue.txt'; : ファイル名を指定 put shintyou taijyuu kyoui cluster distance; : 書き出す変量を指定 run; /* Output to CSV file */ : 計算結果をCSVファイルに書き出す(形式3) proc export data=out_clust : 書き出すデータセットを指定 outfile= "StatM20/les1503-OutCSV.csv" : ファイル名を指定 dbms=CSV replace; run;
Monday, February 1, 2021 05:27:17 PM 18 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------- shintyou 149 168.6973154 8.3515005 146.7000000 185.0000000 taijyuu 149 59.1489933 8.9139400 35.0000000 80.0000000 kyoui 149 86.2590604 6.4126671 63.0000000 110.0000000 kodukai 138 47083.33 54619.87 0 300000.00 tsuuwa 74 6511.46 4026.49 350.0000000 25000.00 -------------------------------------------------------------------------- Monday, February 1, 2021 05:27:17 PM 19 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=2 Maxiter=1 Initial Seeds Cluster shintyou taijyuu kyoui ------------------------------------------------------------- 1 178.0000000 78.0000000 110.0000000 2 152.0000000 35.0000000 77.0000000 Criterion Based on Final Seeds = 5.4963 Cluster Summary Maximum Distance RMS Std from Seed Radius Nearest Cluster Frequency Deviation to Observation Exceeded Cluster ----------------------------------------------------------------------------- 1 84 5.2983 23.4776 2 2 65 5.7051 20.9336 1 Monday, February 1, 2021 05:27:17 PM 20 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=2 Maxiter=1 Cluster Summary Distance Between Cluster Cluster Centroids ----------------------------- 1 20.1845 2 20.1845 Statistics for Variables Variable Total STD Within STD R-Square RSQ/(1-RSQ) ------------------------------------------------------------------ shintyou 8.35150 5.60318 0.552909 1.236682 taijyuu 8.91394 5.50325 0.621423 1.641470 kyoui 6.41267 5.32740 0.314498 0.458785 OVER-ALL 7.96509 5.47913 0.530001 1.127665 Pseudo F Statistic = 165.77 Monday, February 1, 2021 05:27:17 PM 21 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=2 Maxiter=1 Approximate Expected Over-All R-Squared = 0.32147 Cubic Clustering Criterion = 12.325 WARNING: The two values above are invalid for correlated variables. Cluster Means Cluster shintyou taijyuu kyoui ------------------------------------------------------------- 1 174.1416667 65.3095238 89.4119048 2 161.6615385 51.1876923 82.1846154 Cluster Standard Deviations Cluster shintyou taijyuu kyoui ------------------------------------------------------------- 1 4.896426244 5.472661938 5.503830096 2 6.404629485 5.542661468 5.089487657 Monday, February 1, 2021 05:27:17 PM 22 Plot of shintyou*taijyuu. Symbol is value of CLUSTER. 200 + | | | | 1 | 1 1 1 1 180 + 2 1 1 1 1 1 1 11 1 | 2 11 11 1 1111 1 1 11 | 2 11 11 11 111 1 1 shintyou | 22 2 2 111 1 1 11 1 | 22 2 22 22 2 22 2 1 | 222 2 2 2 2 1 1 1 160 + 22 22 2 2 2 | 2 22 2 22 2 2 | 2 2 2 2 2 | 2 | 2 2 | 140 + --+------------+------------+------------+------------+------------+- 30 40 50 60 70 80 taijyuu NOTE: 51 obs hidden. Monday, February 1, 2021 05:27:17 PM 23 Plot of taijyuu*kyoui. Symbol is value of CLUSTER. 80 + 1 | 11 1 | 1 11 1 | 1 1 11 1 1 1 | 1 11 1 11111 11 11 1 | 2 1 1 1 11 11 11 1 60 + 2 2 2 1 12 2 1 | 2 22 2 2 11 22 2 12 1 | 2 2 2 22 taijyuu | 2 2 22 2 2 2 | 2 22 2 22 222 | 2 2 2 40 + 2 2 | 2 | | | | 20 + --+------------+------------+------------+------------+------------+-- 60 70 80 90 100 110 kyoui NOTE: 61 obs hidden. Monday, February 1, 2021 05:27:17 PM 24 Plot of kyoui*shintyou. Symbol is value of CLUSTER. 120 + | | | 1 | | 1 100 + 1 1 | 1 1 1 1111 11 | 1 1 1 1 1 1 1 1 kyoui | 2 2 12 2 111 11111 1 1 1 | 2 2 2 2 2 22 2 222 22 12 11 11111 11 1 1 | 22 2 22 22 222 1 11 1 1 80 + 2 2 2 2 2 2 2 2 12 1 1 1 | 22 2 2 2 2 2 2 2 1 2 | | 2 2 | | 2 60 + -+-------------+-------------+-------------+-------------+-------------+ 140 150 160 170 180 190 shintyou NOTE: 44 obs hidden. Monday, February 1, 2021 05:27:17 PM 25 s D h t k c C I i a j o a t L S n i k i d r s U T t j y t u r u S A O s y y o a k y u T N b e o u u k a e w E C s x u u i u i r a R E 1 F 146.7 41.0 85 自宅生 10000 Vodafone 6000 2 19.9233 2 F 148.0 43.0 80 自宅生 50000 DoCoMo 4000 2 17.6836 3 F 150.0 46.0 86 40000 . 2 14.8815 4 F 151.7 41.5 80 自宅生 35000 . 2 15.6260 5 F 152.0 35.0 77 自宅生 60000 DoCoMo 2000 2 20.9336 6 F 153.0 46.5 87 下宿生 10000 . 2 12.4492 7 F 153.0 55.0 78 自宅生 30000 . 2 11.2980 8 F 154.4 44.0 75 自宅生 9000 au 2000 2 13.8010 9 F 155.0 48.0 83 下宿生 180000 . 2 9.0430 10 F 156.0 42.0 85 自宅生 0 DoCoMo 15000 2 12.5706 11 F 156.0 46.0 82 自宅生 10000 Vodafone 7000 2 9.2728 12 F 156.0 48.0 70 自宅生 30000 . 2 14.6260 13 F 156.0 49.0 85 自宅生 25000 . 2 8.1843 14 F 156.0 50.0 82 自宅生 40000 Vodafone 10000 2 7.3459 15 M 156.0 61.0 90 自宅生 0 . 2 13.8662 16 F 156.5 45.0 80 下宿生 60000 au 10000 2 9.8403 17 F 157.0 53.0 84 自宅生 30000 . 2 6.4154 18 F 158.0 46.0 80 27500 Willcom 3000 2 8.1473
/* Lesson 14-05 */ /* File Name = les1405.sas 01/26/21 */ options nocenter linesize=78 pagesize=30; options locale='en_US'; /* options locale='ja_JP'; */ proc printto print = 'StatM20/les1405-Results.txt' new; ods listing gpath='StatM20/SAS_ODS14'; title "Sashelp.iris --- Fisher's Iris Data (1936)"; proc contents data=sashelp.iris varnum; : データの変量情報を表示する ods select position; : データの指定方法にも注目 run; title "The First Five Observations Out of 150"; proc print data=sashelp.iris(obs=5) noobs; : 先頭5サンプルを表示 run; title "The Species Variable"; proc freq data=sashelp.iris; : 頻度集計 tables Species; run; proc fastclus data=sashelp.iris out=out_clust maxclusters=3; : クラスター分析 var SepalLength SepalWidth PetalLength PetalWidth; run; proc plot data=out_clust; plot SepalLength*SepalWidth=cluster; plot SepalLength*PetalLength=cluster; plot SepalLength*PetalWidth=cluster; plot SepalWidth*PetalLength=cluster; plot SepalWidth*PetalWidth=cluster; plot PetalLength*PetalWidth=cluster; run; title "Scatterplot Matrix for Iris Data"; proc sgscatter data=sashelp.iris; : [おまけ1] 散布図行列 matrix SepalLength SepalWidth PetalLength PetalWidth / group=Species; run; title "Scatterplot Matrix with histogram for Iris Data"; proc sgscatter data=sashelp.iris; : [おまけ2] ヒストグラム付き散布図行列 matrix SepalLength SepalWidth PetalLength PetalWidth / group=Species diagonal=(kernel histogram); run; title;
Sashelp.iris --- Fisher's Iris Data (1936) 242 Monday, January 25, 2021 06:31:16 PM The CONTENTS Procedure Variables in Creation Order # Variable Type Len Label 1 Species Char 10 Iris Species 2 SepalLength Num 8 Sepal Length (mm) 3 SepalWidth Num 8 Sepal Width (mm) 4 PetalLength Num 8 Petal Length (mm) 5 PetalWidth Num 8 Petal Width (mm) The First Five Observations Out of 150 243 Monday, January 25, 2021 06:31:16 PM Sepal Sepal Petal Petal Species Length Width Length Width Setosa 50 33 14 2 Setosa 46 34 14 3 Setosa 46 36 10 2 Setosa 51 33 17 5 Setosa 55 35 13 2 The Species Variable Monday, January 25, 2021 06:31:16 PM 244 The FREQ Procedure Iris Species Cumulative Cumulative Species Frequency Percent Frequency Percent --------------------------------------------------------------- Setosa 50 33.33 50 33.33 Versicolor 50 33.33 100 66.67 Virginica 50 33.33 150 100.00 The Species Variable Monday, January 25, 2021 06:31:16 PM 245 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=3 Maxiter=1 Initial Seeds Cluster SepalLength SepalWidth PetalLength PetalWidth --------------------------------------------------------------------------- 1 77.00000000 38.00000000 67.00000000 22.00000000 2 57.00000000 44.00000000 15.00000000 4.00000000 3 49.00000000 25.00000000 45.00000000 17.00000000 Criterion Based on Final Seeds = 3.7097 Cluster Summary Maximum Distance RMS Std from Seed Radius Nearest Cluster Frequency Deviation to Observation Exceeded Cluster ----------------------------------------------------------------------------- 1 33 3.8831 12.9226 3 2 50 2.7803 12.4803 3 3 67 4.1797 18.5320 1 The Species Variable Monday, January 25, 2021 06:31:16 PM 246 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=3 Maxiter=1 Cluster Summary Distance Between Cluster Cluster Centroids ----------------------------- 1 18.3409 2 34.2516 3 18.3409 Statistics for Variables Variable Total STD Within STD R-Square RSQ/(1-RSQ) --------------------------------------------------------------------- SepalLength 8.28066 4.48242 0.710915 2.459187 SepalWidth 4.35866 3.24819 0.452092 0.825123 PetalLength 17.65298 4.29764 0.941527 16.101961 PetalWidth 7.62238 2.38707 0.903243 9.335201 OVER-ALL 10.69224 3.70171 0.881751 7.456709 Pseudo F Statistic = 548.07 The Species Variable Monday, January 25, 2021 06:31:16 PM 247 The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=3 Maxiter=1 Approximate Expected Over-All R-Squared = 0.62728 Cubic Clustering Criterion = 24.559 WARNING: The two values above are invalid for correlated variables. Cluster Means Cluster SepalLength SepalWidth PetalLength PetalWidth --------------------------------------------------------------------------- 1 69.00000000 30.96969697 58.27272727 21.27272727 2 50.06000000 34.28000000 14.62000000 2.46000000 3 59.47761194 27.61194030 44.52238806 14.53731343 Cluster Standard Deviations Cluster SepalLength SepalWidth PetalLength PetalWidth --------------------------------------------------------------------------- 1 5.012484414 2.909948974 4.577613511 2.401467354 2 3.524896872 3.790643691 1.736639965 1.053855894 3 4.831582365 2.953966126 5.360795421 3.011736428 The Species Variable Monday, January 25, 2021 06:31:16 PM 248 Plot of SepalLength*SepalWidth. Symbol is value of CLUSTER. | S | e 84 + p | a | 1 1 1 1 l | 1 1 72 + 1 1 1 1 L | 3 1 3 3 e | 1 3 3 3 3 3 1 n | 3 3 3 3 1 3 1 3 3 1 g 60 + 3 3 3 3 3 3 3 3 t | 3 3 3 3 3 3 2 2 2 h | 3 3 3 3 3 2 2 2 2 2 |3 3 3 3 2 2 2 2 2 2 2 2 2 ( 48 + 3 3 2 2 2 2 2 m | 2 2 2 2 2 2 2 m | 2 ) | 36 + | -+-------------+-------------+-------------+-------------+-------------+- 20 25 30 35 40 45 Sepal Width (mm) NOTE: 64 obs hidden. The Species Variable Monday, January 25, 2021 06:31:16 PM 249 Plot of SepalLength*PetalLength. Symbol is value of CLUSTER. | S | e 84 + p | a | 1 1 1 1 l | 1 1 72 + 1111 1 L | 333 1 11 1 1 e | 3 33 331 1111 n | 333 33333 1111 1 g 60 + 3 3 3333333 3 t | 2 2 2 33 3333 3 333 h | 222 2 33 3 33 | 222222 2 3 3 3 3 ( 48 + 2222 2 3 3 m | 2 222 m | 2 ) | 36 + | ---+---------+---------+---------+---------+---------+---------+-- 10 20 30 40 50 60 70 Petal Length (mm) NOTE: 53 obs hidden. The Species Variable Monday, January 25, 2021 06:31:16 PM 250 Plot of SepalLength*PetalWidth. Symbol is value of CLUSTER. | S | e 84 + p | a | 1 1 1 l | 1 1 72 + 1 1 1 1 L | 3 3 1 1 e | 3 3 3 3 1 1 1 1 1 1 1 n | 3 3 3 3 3 1 1 1 1 1 g 60 + 3 3 3 3 3 3 3 t | 2 2 2 3 3 3 3 3 3 3 3 h | 2 2 3 3 3 3 3 | 2 2 2 2 2 2 3 3 3 ( 48 + 2 2 2 3 3 m | 2 2 m | 2 ) | 36 + | ---+---------+---------+---------+---------+---------+-- 0 5 10 15 20 25 Petal Width (mm) NOTE: 74 obs hidden. The Species Variable Monday, January 25, 2021 06:31:16 PM 251 Plot of SepalWidth*PetalLength. Symbol is value of CLUSTER. | 50 + S | e | p | a | 2 l | 22 40 + 2 W | 2 222 2 1 1 i | 2 22 1 d | 2222 t | 2222 2 3 3 1 11 1 h | 22222 33 333 1 11111 11 30 + 2 22 2 33 333 33331 1 11 1 1 ( | 2 3 3333 33333 3 1 1 1 1 m | 3 3333 3 3 3 3 3 1 m | 3 33 3 33 1 ) | 2 3 33 3 3 | 3 3 3 20 + 3 ---+---------+---------+---------+---------+---------+---------+-- 10 20 30 40 50 60 70 Petal Length (mm) NOTE: 39 obs hidden. The Species Variable Monday, January 25, 2021 06:31:16 PM 252 Plot of SepalWidth*PetalWidth. Symbol is value of CLUSTER. | 50 + S | e | p | a | 2 l | 2 2 40 + 2 W | 2 2 2 1 1 i | 2 2 2 1 d | 2 2 2 t | 2 2 2 2 3 1 1 1 1 h | 2 2 3 3 3 3 1 1 1 30 + 2 2 2 3 3 3 3 1 3 1 1 1 1 1 ( | 2 3 3 3 3 3 1 1 1 1 3 m | 3 3 3 3 3 3 3 1 m | 3 3 3 3 1 3 3 ) | 2 3 3 3 | 3 3 20 + 3 ---+---------+---------+---------+---------+---------+-- 0 5 10 15 20 25 Petal Width (mm) NOTE: 69 obs hidden. The Species Variable Monday, January 25, 2021 06:31:16 PM 253 Plot of PetalLength*PetalWidth. Symbol is value of CLUSTER. | P 72 + e | 1 1 1 1 t | 1 1 a 60 + 1 1 1 1 1 1 1 l | 3 1 1 1 1 1 1 | 3 3 3 3 3 1 1 3 L 48 + 3 3 3 3 3 3 3 e | 3 3 3 3 3 3 n | 3 3 3 3 3 g 36 + 3 3 t | 3 3 h | 24 + ( | 2 2 m | 2 2 2 2 2 2 m 12 + 2 2 2 2 ) | | 0 + ---+---------+---------+---------+---------+---------+-- 0 5 10 15 20 25 Petal Width (mm) NOTE: 88 obs hidden.