● 目次
1. 単回帰分析 : 予測等に使う、連続変量の関係
2. 「体重の大きい者を除外」して実行するには?
3. 重回帰分析 : 2変量以上の説明する変量(説明変量)で 1変量(目的変量)を説明
4. 特定グループでの解析
5. [要点] 解析する上での注意点
6. 誤用?!
7. 4つの尺度と回帰分析
8. 有効桁数に注意せよ : どこまでが「意味ある桁」か?
9. 回帰分析における変数選択、総当たり法
10. 次回は、...
● 回帰分析 : 連続変量の予測
/* Lesson 11-2 */ /* File Name = les1102.sas 06/28/07 */ data gakusei; infile 'all07ae.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; if sex^='M' & sex^='F' then delete; proc print data=gakusei(obs=10); run; proc reg data=gakusei; : 回帰分析 model taijyuu=shintyou; : 変量を指定 output out=outreg1 predicted=pred1 residual=resid1; : 結果項目の保存 run; : : proc print data=outreg1(obs=15); : 表示してみる run; : : proc plot data=outreg1; : 散布図を描く plot taijyuu*shintyou/vaxis=20 to 100 by 20; : 体重と身長(縦軸指定) plot pred1*taijyuu; : 予測値と観測値 plot resid1*pred1 /vref=0; : 残差と予測値(残差解析)(水平軸指定) plot resid1*shintyou/vref=0; : 残差と説明変数(残差解析) plot resid1*taijyuu /vref=0; : 残差と目的変数(残差解析) run; : : proc univariate data=outreg1 plot normal; : 残差を正規プロットして確かめる var resid1; : run; :[備考] 上記のコロン以降は説明のためのものであり、 SAS のプログラムではありません。
[補足] proc plot
の下に以下の行を追加した方がより正確ではある。
欠損値を含むデータを解析対象から除外する事を指示する命令文である。
「欠損値です」の表示が無くなるだけで、得られる図は同じ(欠損値は描画できないから)。
試しに追加する/しないの両方で実行してみよ。
where shintyou^=. and taijyuu^=.;
SAS システム 1 08:52 Thursday, June 28, 2007 OBS SEX SHINTYOU TAIJYUU KYOUI JITAKU KODUKAI CARRYER TSUUWA 1 F 145.0 38.0 . J 10000 . 2 F 146.7 41.0 85 J 10000 Vodafone 6000 3 F 148.0 42.0 . J 50000 . 4 F 148.0 43.0 80 J 50000 DoCoMo 4000 5 F 148.9 . . J 60000 . 6 F 149.0 45.0 . G 60000 . 7 F 150.0 46.0 86 40000 . 8 F 151.0 45.0 . J 20000 docomo 5000 9 F 151.0 50.0 . G 60000 J-PHONE . 10 F 151.7 41.5 80 J 35000 . SAS システム 2 08:52 Thursday, June 28, 2007 Model: MODEL1 Dependent Variable: TAIJYUU Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 1 14055.20146 14055.20146 318.560 0.0001 Error 323 14251.10026 44.12105 C Total 324 28306.30172 Root MSE 6.64237 R-square 0.4965 Dep Mean 58.78092 Adj R-sq 0.4950 C.V. 11.30021 SAS システム 3 08:52 Thursday, June 28, 2007 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -79.351524 7.74803757 -10.241 0.0001 SHINTYOU 1 0.818831 0.04587737 17.848 0.0001 SAS システム 4 08:52 Thursday, June 28, 2007 S H T K C I A J O A T R N I K I D R S P E T J Y T U R U R S O S Y Y O A K Y U E I B E O U U K A E W D D S X U U I U I R A 1 1 1 F 145.0 38.0 . J 10000 . 39.3789 -1.3789 2 F 146.7 41.0 85 J 10000 Vodafone 6000 40.7709 0.2291 3 F 148.0 42.0 . J 50000 . 41.8354 0.1646 4 F 148.0 43.0 80 J 50000 DoCoMo 4000 41.8354 1.1646 5 F 148.9 . . J 60000 . 42.5724 . 6 F 149.0 45.0 . G 60000 . 42.6542 2.3458 7 F 150.0 46.0 86 40000 . 43.4731 2.5269 8 F 151.0 45.0 . J 20000 docomo 5000 44.2919 0.7081 9 F 151.0 50.0 . G 60000 J-PHONE . 44.2919 5.7081 10 F 151.7 41.5 80 J 35000 . 44.8651 -3.3651 11 F 152.0 35.0 77 J 60000 DoCoMo 2000 45.1107 -10.1107 12 F 152.0 43.0 . J 20000 au 3500 45.1107 -2.1107 13 F 152.0 44.0 . 45000 DoCoMo 4000 45.1107 -1.1107 14 F 153.0 41.0 . J 125000 No . 45.9296 -4.9296 15 F 153.0 42.0 . G 0 Vodafone 1000 45.9296 -3.9296 SAS システム 6 08:52 Thursday, June 28, 2007 プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... (NOTE: 45 オブザベーションが欠損値です.) TAIJYUU | 100 + B | A A 80 + A A A B B B A A | A B CBDDE ECGBD DCH B BB 60 + A AA AE B CBECG KDSJMBMGFFE CBDCB A | AAB CACEC EEIBH EBEGG DAACC BC 40 + A A B D BA BA | 20 + | --+-----------+-----------+-----------+-----------+-----------+- 140 150 160 170 180 190 SHINTYOU SAS システム 7 08:52 Thursday, June 28, 2007 プロット : PRED1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ... (NOTE: 45 オブザベーションが欠損値です.) 80 + | PRED1 | A A B A A | A BDACFBB H B A A A A | ABBCCCNHEECIBB A BC A 60 + CFCLHHIMEIBBADBBA A A | AH EHDIACCAAE A | BCEEEHACAABA | BABCDACA A A | A CACB B A 40 + A BA ---+------------+------------+------------+------------+-- 20 40 60 80 100 TAIJYUU SAS システム 8 08:52 Thursday, June 28, 2007 プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... (NOTE: 45 オブザベーションが欠損値です.) | R 50 + e | s | A A i 25 + A A A d | A B B A BA A u | A A A AB BBBB BCBCDEDBB ABA A A a 0 +-------------A-ABAAACCCCECDDJBEEBFDIJDQIJJHIDECBJ-A-AB----------- l | AA BAA CABA CGDDACFFDBDFBCBBBBAA | A A -25 + ---+-----------+-----------+-----------+-----------+-----------+-- 30 40 50 60 70 80 Predicted Value of TAIJYUU SAS システム 9 08:52 Thursday, June 28, 2007 プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... (NOTE: 45 オブザベーションが欠損値です.) | R 50 + e | s | A A i 25 + A A A d | A B B A B A A u | A A A AB B BBB B CBCDE DBB A BA A A a 0 +--------A-A-BAAAC-DBCEC-DDJBE-EBFDI-JDQGKAJHIDE-CBJ-A--AB-------- l | A A BA AAB C A CFE DACFEADBDDD CBBBB AA | A A -25 + ---+-----------+-----------+-----------+-----------+-----------+-- 140 150 160 170 180 190 SHINTYOU SAS システム 10 08:52 Thursday, June 28, 2007 プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ... (NOTE: 45 オブザベーションが欠損値です.) | R 50 + e | s | A A i 25 + A A A d | A BABC A u | A ABACBBJBECGBC B A a 0 +--------------A-DBEFFFMLERGKUTINEJ-GA-------------------- l | A CABCK DMEIEEHCCB | A A -25 + ---+------------+------------+------------+------------+-- 20 40 60 80 100 TAIJYUU SAS システム 11 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Moments N 325 Sum Wgts 325 Mean 0 Sum 0 Std Dev 6.63211 Variance 43.98488 Skewness 1.42133 Kurtosis 4.000837 USS 14251.1 CSS 14251.1 CV . Std Mean 0.367883 T:Mean=0 0 Pr>|T| 1.0000 Num ^= 0 325 Num > 0 140 M(Sign) -22.5 Pr>=|M| 0.0145 Sgn Rank -3248.5 Pr>=|S| 0.0552 W:Normal 0.916608 Pr<W 0.0001 SAS システム 12 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Quantiles(Def=5) 100% Max 33.59967 99% 22.24447 75% Q3 2.693822 95% 11.59967 50% Med -1.02372 90% 8.244467 25% Q1 -4.03799 10% -7.28364 0% Min -13.9438 5% -8.57436 1% -10.9438 Range 47.54351 Q3-Q1 6.731815 Mode -2.30618 SAS システム 15 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Histogram # Boxplot 35+* 1 * .** 5 0 .**** 15 0 .****************************** 119 +--+--+ .********************************************* 178 *-----* -15+** 7 | ----+----+----+----+----+----+----+----+----+ * may represent up to 4 counts SAS システム 16 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Normal Probability Plot 35+ * | ***** | ******+++++ | ++************** | *********************** -15+***+**++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
[注意] 誤差は「説明変量」の軸と垂直に取ることに注意せよ。 誤差は測定時に混入していると考えてモデルが構築されているから。
[注意] 「正規性を乱している者は何でも除外してかまわない」というわけではない。 今回の場合は、元データに戻ったところ、体育会系のずんぐりした者であったため、 普通の大学生とは異なる性質を有していると判断し除外対象とした。 除外する場合にはその根拠を明確にしないと、「恣意的な解析」と言われかねないことに注意せよ。
/* Lesson 11-3 */ /* File Name = les1103.sas 06/28/07 */ data gakusei; infile 'all07ae.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; if sex^='M' & sex^='F' then delete; if shintyou=. | taijyuu=. then delete; : 欠損値データを除外 proc print data=gakusei(obs=10); run; proc corr data=gakusei; where taijyuu<85; : 対象データを絞る run; proc reg data=gakusei; model taijyuu=shintyou; where taijyuu<85; : 対象データを絞る output out=outreg1 predicted=pred1 residual=resid1; run; proc print data=outreg1(obs=15); run; proc plot data=outreg1; where taijyuu<85; : 対象データを絞る plot taijyuu*shintyou; plot taijyuu*pred1; plot resid1*(pred1 shintyou taijyuu)/vref=0; : まとめて指定することも可 run; proc univariate data=outreg1 plot normal; var resid1; run;
SAS システム 2 08:52 Thursday, June 28, 2007 Correlation Analysis 5 'VAR' Variables: SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum SHINTYOU 321 168.6 8.0251 54118.5 145.0 186.0 TAIJYUU 321 58.3498 8.5473 18730.3 35.0000 84.0000 KYOUI 111 85.7477 7.9561 9518.0 46.0000 110.0 KODUKAI 303 49107.3 51750.8 14879500 0 350000 TSUUWA 132 6742.4 4469.7 890002 0 30000.0 SAS システム 3 08:52 Thursday, June 28, 2007 Correlation Analysis Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / Number of Observations SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA SHINTYOU 1.00000 0.72880 0.28729 0.06533 -0.05960 0.0 0.0001 0.0022 0.2569 0.4972 321 321 111 303 132 TAIJYUU 0.72880 1.00000 0.38406 0.06408 -0.04543 0.0001 0.0 0.0001 0.2662 0.6050 321 321 111 303 132 KYOUI 0.28729 0.38406 1.00000 -0.28125 -0.17722 0.0022 0.0001 0.0 0.0033 0.2940 111 111 111 107 37 KODUKAI 0.06533 0.06408 -0.28125 1.00000 0.26949 0.2569 0.2662 0.0033 0.0 0.0021 303 303 107 303 128 TSUUWA -0.05960 -0.04543 -0.17722 0.26949 1.00000 0.4972 0.6050 0.2940 0.0021 0.0 132 132 37 128 132 SAS システム 6 08:52 Thursday, June 28, 2007 Model: MODEL1 Dependent Variable: TAIJYUU Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 1 12417.15747 12417.15747 361.385 0.0001 Error 319 10960.80502 34.35989 C Total 320 23377.96249 Root MSE 5.86173 R-square 0.5311 Dep Mean 58.34984 Adj R-sq 0.5297 C.V. 10.04584 SAS システム 7 08:52 Thursday, June 28, 2007 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -72.515375 6.89174111 -10.522 0.0001 SHINTYOU 1 0.776218 0.04083178 19.010 0.0001 SAS システム 10 08:52 Thursday, June 28, 2007 プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... TAIJYUU | 100 + | | A A 75 + A B BAB B C B BA A A | BB B CBICDAEDGDD CCKAB BA | A AA AE B C DCG JDMJLALFEDE CAABA A 50 + AA CACEB DEGBG EBEGG DAACC BC | A A BA AC BA BB A B A | A 25 + --+-----------+-----------+-----------+-----------+-----------+- 140 150 160 170 180 190 SHINTYOU SAS システム 11 08:52 Thursday, June 28, 2007 プロット : TAIJYUU*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... TAIJYUU | 100 + | | A A 75 + A B BABAACAABA A A | BB EBHDDFDGDDCCKABBA | A AA AE BC DDNFLKLLGEFCCABAB 50 + AABBCEBDDHCFEBEHIBACC BC | A ABA BD ABBA B A | A 25 + ---+-----------+-----------+-----------+-----------+-- 40 50 60 70 80 Predicted Value of TAIJYUU SAS システム 12 08:52 Thursday, June 28, 2007 プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... | R 40 + e | s | i 20 + A A A A d | A AAAAB AC BA A u | A B B AB BBBBABEBHDCDBD B A a 0 +--A-ABAAABABDEACDGCEE-FCIFEJGKKGGDECBJ-A-AB-------------- l | AB BBB E CABCFECBDEDDCBDDCBACBAA | A AA C A -20 + ---+------------+------------+------------+------------+-- 40 50 60 70 80 Predicted Value of TAIJYUU SAS システム 13 08:52 Thursday, June 28, 2007 プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... | R 40 + e | s | i 20 + A A A A d | A AAAAB A C B A A u | A B B AB B BBBAB EBICC DBD AA A a 0 +--------A-A-BAAAB-BBCEA-CDHBE-E-FCI-HCJGKAJHFDE-CBJ-A--AB-------- l | A B BB BAD C ABCFE DADEDADBBDD CBBBB AA | A AA C A -20 + ---+-----------+-----------+-----------+-----------+-----------+-- 140 150 160 170 180 190 SHINTYOU SAS システム 14 08:52 Thursday, June 28, 2007 プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ... | R 40 + e | s | i 20 + A B A d | A A AB C BCA A u | B A A B AD EFDHBAGAEB AA a 0 +----------A--AABBBDADFFDFEEMEEICTFLDEGECD-G-A-------------------- l | ADA CDDI ABFFEEFDDAHC CAA | A C B A -20 + ---+---------+---------+---------+---------+---------+---------+-- 30 40 50 60 70 80 90 TAIJYUU SAS システム 15 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Moments N 321 Sum Wgts 321 Mean 0 Sum 0 Std Dev 5.852565 Variance 34.25252 Skewness 0.822649 Kurtosis 1.167031 USS 10960.81 CSS 10960.81 CV . Std Mean 0.326658 T:Mean=0 0 Pr>|T| 1.0000 Num ^= 0 321 Num > 0 142 M(Sign) -18.5 Pr>=|M| 0.0443 Sgn Rank -2359.5 Pr>=|S| 0.1565 W:Normal 0.954643 Pr<W 0.0001 SAS システム 18 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Histogram # Boxplot 22.5+* 2 0 .** 4 0 .***** 13 0 .************* 38 | .***************************** 85 +--+--+ .******************************************* 127 *-----* .*************** 45 | -12.5+*** 7 | ----+----+----+----+----+----+----+----+--- * may represent up to 3 counts SAS システム 19 08:52 Thursday, June 28, 2007 Univariate Procedure Variable=RESID1 Residual Normal Probability Plot 22.5+ * | **** | *****+++++ | *******++ | +********* | ************ | ***********+ -12.5+****+*++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
/* Lesson 12-1 */ /* File Name = les1201.sas 07/05/07 */ data gakusei; infile 'all07ae.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; if sex^='M' & sex^='F' then delete; proc print data=gakusei(obs=10); run; proc reg data=gakusei; : 回帰分析 model taijyuu=shintyou kyoui; : 複数変量を指定 output out=outreg1 predicted=pred1 residual=resid1; : 結果項目の保存 run; : proc print data=outreg1(obs=15); run; : proc plot data=outreg1; : 散布図を描く where shintyou^=. and taijyuu^=. and kyoui^=.; : 解析に使ったデータのみ plot taijyuu*shintyou; : plot taijyuu*kyoui; : plot taijyuu*pred1; : 観測値と予測値 plot resid1*pred1 /vref=0; : 残差と予測値(残差解析) plot resid1*shintyou/vref=0; : 残差と説明変量(残差解析) plot resid1*kyoui /vref=0; : 残差と説明変量(残差解析) plot resid1*taijyuu /vref=0; : 残差と目的変量(残差解析) run; : : proc univariate data=outreg1 plot normal; : 残差を正規プロットして確かめる var resid1; : run; :
SAS システム 2 16:41 Wednesday, July 4, 2007 Model: MODEL1 Dependent Variable: TAIJYUU Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 8070.70705 4035.35353 85.102 0.0001 Error 111 5263.40733 47.41808 C Total 113 13334.11439 Root MSE 6.88608 R-square 0.6053 Dep Mean 58.79298 Adj R-sq 0.5982 C.V. 11.71242 SAS システム 3 16:41 Wednesday, July 4, 2007 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -106.300229 12.75196946 -8.336 0.0001 SHINTYOU 1 0.806547 0.07854137 10.269 0.0001 KYOUI 1 0.349475 0.08192373 4.266 0.0001 SAS システム 4 16:41 Wednesday, July 4, 2007 S H T K C I A J O A T R N I K I D R S P E T J Y T U R U R S O S Y Y O A K Y U E I B E O U U K A E W D D S X U U I U I R A 1 1 1 F 145.0 38.0 . J 10000 . . . 2 F 146.7 41.0 85 J 10000 Vodafone 6000 41.7256 -0.72559 3 F 148.0 42.0 . J 50000 . . . 4 F 148.0 43.0 80 J 50000 DoCoMo 4000 41.0267 1.97328 5 F 148.9 . . J 60000 . . . 6 F 149.0 45.0 . G 60000 . . . 7 F 150.0 46.0 86 40000 . 44.7367 1.26333 8 F 151.0 45.0 . J 20000 docomo 5000 . . 9 F 151.0 50.0 . G 60000 J-PHONE . . . 10 F 151.7 41.5 80 J 35000 . 44.0109 -2.51095 11 F 152.0 35.0 77 J 60000 DoCoMo 2000 43.2045 -8.20449 12 F 152.0 43.0 . J 20000 au 3500 . . 13 F 152.0 44.0 . 45000 DoCoMo 4000 . . 14 F 153.0 41.0 . J 125000 No . . . 15 F 153.0 42.0 . G 0 Vodafone 1000 . . SAS システム 6 16:41 Wednesday, July 4, 2007 プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... 100 + A | A A TAIJYUU | A A A A | B BABAB BACAA B B A AA | A A B A B BBA BAGBC ACAA AABBA 50 + A A ADB CDEAC BBACB A | A A B A A | | | 0 + --+-----------+-----------+-----------+-----------+-----------+- 140 150 160 170 180 190 SHINTYOU SAS システム 7 16:41 Wednesday, July 4, 2007 プロット : TAIJYUU*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ... 100 + A | A A TAIJYUU | A AA A | A ACACFACCA A A | A A CCAAFBFKAAAA A 50 + A A AA EEHICB | AA B B | | | 0 + ---+-----------+-----------+-----------+-----------+-- 40 60 80 100 120 KYOUI SAS システム 8 16:41 Wednesday, July 4, 2007 プロット : TAIJYUU*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... 100 + A | A A TAIJYUU | A A A A | A BBCAABBB B ABC A | A A BA AABBBABAAECDAC BA BA 50 + AA CBDBCCCBEAD B | AA AB A | | | 0 + ---+-----------+-----------+-----------+-----------+-- 40 50 60 70 80 Predicted Value of TAIJYUU SAS システム 9 16:41 Wednesday, July 4, 2007 プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A A A B A A A BAABB A A A a 0 +---AA--BB---CBCBCCBABBACB-BAB-BDBDBCBAAA--ABA---A-------- l | A A BB AB B A AB B CB A | A -25 + ---+------------+------------+------------+------------+-- 40 50 60 70 80 Predicted Value of TAIJYUU SAS システム 10 16:41 Wednesday, July 4, 2007 プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A A B A A CABAB AB A a 0 +----------A-A-A-A-AAADB-CDEAB-BACBB-BAGAC-CBCAA-B-A-A--A--------- l | A A B AA CB A A A BACAA A | A -25 + ---+-----------+-----------+-----------+-----------+-----------+-- 140 150 160 170 180 190 SHINTYOU SAS システム 11 16:41 Wednesday, July 4, 2007 プロット : RESID1*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A BB C ABEA B a 0 +----------------------B--C--GCENBGBIBBE--B--A---A-------- l | AAABBAG BACA A | A -25 + ---+------------+------------+------------+------------+-- 40 60 80 100 120 KYOUI SAS システム 12 16:41 Wednesday, July 4, 2007 プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | AAAAAB C ABBAA AA a 0 +----------------BABDDEGDAE-CFEECAC-E----A---------------- l | A A BD AB AAAD CAA | A -25 + ---+------------+------------+------------+------------+-- 20 40 60 80 100 TAIJYUU SAS システム 13 16:41 Wednesday, July 4, 2007 Univariate Procedure Variable=RESID1 Residual Moments N 114 Sum Wgts 114 Mean 0 Sum 0 Std Dev 6.824868 Variance 46.57883 Skewness 2.026813 Kurtosis 7.211418 USS 5263.407 CSS 5263.407 CV . Std Mean 0.639207 T:Mean=0 0 Pr>|T| 1.0000 Num ^= 0 114 Num > 0 43 M(Sign) -14 Pr>=|M| 0.0111 Sgn Rank -517.5 Pr>=|S| 0.1442 W:Normal 0.865365 Pr<W 0.0001 SAS システム 17 16:41 Wednesday, July 4, 2007 Univariate Procedure Variable=RESID1 Residual Histogram # Boxplot 35+* 1 * .* 2 * .** 4 0 .****************** 36 +--+--+ .*********************************** 69 *-----* -15+* 2 0 ----+----+----+----+----+----+----+ * may represent up to 2 counts SAS システム 18 16:41 Wednesday, July 4, 2007 Univariate Procedure Variable=RESID1 Residual Normal Probability Plot 35+ * | * * | +**+*++++++ | +++************* | ** ********************* -15+*++*+++++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
/* Lesson 12-2 */ /* File Name = les1202.sas 07/05/07 */ data gakusei; infile 'all07ae.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; if sex^='M' & sex^='F' then delete; : 性別不明は除外 if shintyou=. | taijyuu=. | kyoui=. then delete; : 欠損のあるデータは除外 proc print data=gakusei(obs=10); run; proc corr data=gakusei; : 相関係数 where sex='M'; : 男性について run; : : proc reg data=gakusei; : 回帰分析 model taijyuu=shintyou kyoui; : where sex='M'; : 男性について output out=outreg1 predicted=pred1 residual=resid1; : run; : proc print data=outreg1(obs=15); run; proc plot data=outreg1; where sex='M'; : 対象データについて plot taijyuu*shintyou; plot taijyuu*kyoui; plot taijyuu*pred1; plot resid1*(pred1 shintyou kyoui taijyuu)/vref=0; : まとめて記述 /* plot resid1*pred1 /vref=0; plot resid1*shintyou/vref=0; plot resid1*kyoui /vref=0; plot resid1*taijyuu /vref=0; */ run; proc univariate data=outreg1 plot normal; var resid1; run;
SAS システム 2 16:41 Wednesday, July 4, 2007 Correlation Analysis 5 'VAR' Variables: SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum SHINTYOU 71 172.5 5.9351 12250.1 156.0 185.0 TAIJYUU 71 64.7282 9.0651 4595.7 46.0000 100.0 KYOUI 71 88.0986 9.6853 6255.0 46.0000 112.0 KODUKAI 67 56358.2 66471.6 3776000 0 350000 TSUUWA 14 6632.1 4247.9 92850.0 350.0 15000.0 SAS システム 3 16:41 Wednesday, July 4, 2007 Correlation Analysis Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / Number of Observations SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA SHINTYOU 1.00000 0.39968 0.15872 0.09516 0.11552 0.0 0.0006 0.1862 0.4437 0.6941 71 71 71 67 14 TAIJYUU 0.39968 1.00000 0.40227 0.11042 0.14591 0.0006 0.0 0.0005 0.3737 0.6187 71 71 71 67 14 KYOUI 0.15872 0.40227 1.00000 -0.37945 -0.38661 0.1862 0.0005 0.0 0.0015 0.1721 71 71 71 67 14 KODUKAI 0.09516 0.11042 -0.37945 1.00000 0.53783 0.4437 0.3737 0.0015 0.0 0.0473 67 67 67 67 14 TSUUWA 0.11552 0.14591 -0.38661 0.53783 1.00000 0.6941 0.6187 0.1721 0.0473 0.0 14 14 14 14 14 SAS システム 6 16:41 Wednesday, July 4, 2007 Model: MODEL1 Dependent Variable: TAIJYUU Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 1596.38065 798.19033 13.060 0.0001 Error 68 4155.98301 61.11740 C Total 70 5752.36366 Root MSE 7.81776 R-square 0.2775 Dep Mean 64.72817 Adj R-sq 0.2563 C.V. 12.07784 SAS システム 7 16:41 Wednesday, July 4, 2007 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -54.721337 27.50850038 -1.989 0.0507 SHINTYOU 1 0.526195 0.15945819 3.300 0.0015 KYOUI 1 0.325335 0.09771516 3.329 0.0014 SAS システム 10 16:41 Wednesday, July 4, 2007 プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... TAIJYUU | 100 + A | A A | A 75 + A A A A A AA | B B A D A A A B C A A A D A A A | A A A A B A B A D B C A AAA A A A AA A 50 + A B A | | 25 + --+---------+---------+---------+---------+---------+---------+- 155 160 165 170 175 180 185 SHINTYOU SAS システム 11 16:41 Wednesday, July 4, 2007 プロット : TAIJYUU*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ... TAIJYUU | 100 + A | A A | A 75 + AA BA A A | A ACABIBCBB A | A A BCAADBEF AA A 50 + A A AA | | 25 + ---+-----------+-----------+-----------+-----------+-- 40 60 80 100 120 KYOUI SAS システム 12 16:41 Wednesday, July 4, 2007 プロット : TAIJYUU*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... TAIJYUU | 100 + A | A A | A 75 + AA AAAA A | A AABCAB BCAA B BBB A | B A A A A BAAACBEAAABB A 50 + A A AA | | 25 + --+---------+---------+---------+---------+---------+---------+- 50 55 60 65 70 75 80 Predicted Value of TAIJYUU SAS システム 13 16:41 Wednesday, July 4, 2007 プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A BBAA A A a 0 +------A--------A-A-A-A---BAABBCDB-BCAA-BA-CB-------A------------- l | A AA A A A AAB AABB C A | -25 + ---+---------+---------+---------+---------+---------+---------+-- 50 55 60 65 70 75 80 Predicted Value of TAIJYUU SAS システム 14 16:41 Wednesday, July 4, 2007 プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A A A A B B A a 0 +----------------A-A-----A-C-D-A-G-A-B-B-BAC-A-A-BA--B---A---A---- l | A B A A A A A A A A A A A BAA A A | -25 + ---+---------+---------+---------+---------+---------+---------+-- 155 160 165 170 175 180 185 SHINTYOU SAS システム 15 16:41 Wednesday, July 4, 2007 プロット : RESID1*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A BD B a 0 +------------A---------A--A-AB-AE-CAHCBE--B--A---A-------- l | A ABB C DADA A | -25 + ---+------------+------------+------------+------------+-- 40 60 80 100 120 KYOUI SAS システム 16 16:41 Wednesday, July 4, 2007 プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ... | R 50 + e | s | A i 25 + A A d | A u | A AAB A A AA a 0 +----------------AAA---FAED-GBAB-DB------A------------------------ l | A A A CAABAE A B A | -25 + ---+---------+---------+---------+---------+---------+---------+-- 40 50 60 70 80 90 100 TAIJYUU SAS システム 17 16:41 Wednesday, July 4, 2007 Univariate Procedure Variable=RESID1 Residual Moments N 71 Sum Wgts 71 Mean 0 Sum 0 Std Dev 7.70527 Variance 59.37119 Skewness 2.013612 Kurtosis 5.673909 USS 4155.983 CSS 4155.983 CV . Std Mean 0.914447 T:Mean=0 0 Pr>|T| 1.0000 Num ^= 0 71 Num > 0 27 M(Sign) -8.5 Pr>=|M| 0.0568 Sgn Rank -260 Pr>=|S| 0.1374 W:Normal 0.837353 Pr<W 0.0001 SAS システム 20 16:41 Wednesday, July 4, 2007 Univariate Procedure Variable=RESID1 Residual Stem Leaf # Boxplot 3 3 1 * 2 5 1 * 2 4 1 0 1 6 1 0 1 0 1 | 0 5566799 7 | 0 111222222334444 15 +--+--+ -0 4444444333333322221111110 25 *-----* -0 99877766655555555 17 +-----+ -1 21 2 | ----+----+----+----+----+ Multiply Stem.Leaf by 10**+1 SAS システム 21 16:41 Wednesday, July 4, 2007 Univariate Procedure Variable=RESID1 Residual Normal Probability Plot 32.5+ * | | * * 17.5+ * ++++++ | +++++++ | ++++***** 2.5+ +++******** | *********** | * *********+ -12.5+ * *+++++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
where sex='M' and taijyuu<85;
67.8 <=== 67.75〜67.84
測定精度上回る計算結果は出せても、意味はない。
[重要な注意] 統計ソフトは単なる道具。使いこなすのは各自。
[例1] 四捨五入の数値で考えてみれば : 精度(正確さ)が異なることに注意
[例2] 日本の観測史上の
最高気温は、2007(平成19)年8月16日に熊谷市と多治見市で観測された40.9度であり、
最低気温は、1902(明治35)年1月25日に北海道旭川市の-41度であった。===> -41.0度
12.3 <=== 12.25〜12.34
12 <=== 11.5 〜12.4
68 <=== 67.5 〜68.4
[例3] 2001年のイチロー選手の打率は3割5分であった。
2006年は3割3分1厘であった。===> 3割5分0厘
/* Lesson 12-4 */
/* File Name = les1204.sas 07/05/07 */
data air;
infile 'usair2.prn';
input id $ y x1 x2 x3 x4 x5 x6;
/*
label y='SO2 of air in micrograms per cubic metre'
x1='Average annual temperature in F'
x2='Number of manufacturing enterprises employing 20 or more workers'
x3='Population size (1970 census); in thousands'
x4='Average annual wind speed in miles per hour'
x5='Average annual precipitation in inches'
x6='Average number of days with precipitation per year'
;
*/
proc print data=air(obs=10);
run;
proc corr data=air;
run;
proc reg data=air; :
model y=x1 x2 x3 x4 x5 x6; : フルモデル
output out=outreg1 predicted=pred1 residual=resid1; :
run; :
proc plot data=outreg1;
plot resid1*pred1 /vref=0; :
plot resid1*x1 /vref=0; : ズラズラと列記
plot resid1*x2 /vref=0; :
plot resid1*x3 /vref=0; :
plot resid1*x4 /vref=0; :
plot resid1*x5 /vref=0; :
plot resid1*x6 /vref=0; :
plot resid1*y /vref=0; :
run;
proc reg data=air; :
model y=x1-x6 / selection=stepwise; : 逐次増減法
output out=outreg1 predicted=pred1 residual=resid1; : 連続変数の指定方法
run; :
proc print data=outreg1(obs=15);
run;
proc plot data=outreg1;
plot resid1*pred1 /vref=0; :
plot resid1*(x1 x2 x3 x4 x5 x6) /vref=0; : 簡略形(上と比較せよ)
plot resid1*(x1-x6) /vref=0; : 簡略形(これも同じ意味)
plot resid1*y /vref=0; :
run;
proc reg data=air; :
model y=x1-x6 / selection=rsquare; : 総当たり法
run; :
SAS システム 1
10:28 Thursday, July 5, 2007
OBS ID Y X1 X2 X3 X4 X5 X6
1 Phoenix 10 70.3 213 582 6.0 7.05 36
2 Little_R 13 61.0 91 132 8.2 48.52 100
3 San_Fran 12 56.7 453 716 8.7 20.66 67
4 Denver 17 51.9 454 515 9.0 12.95 86
5 Hartford 56 49.1 412 158 9.0 43.37 127
6 Wilmingt 36 54.0 80 80 9.0 40.25 114
7 Washingt 29 57.3 434 757 9.3 38.89 111
8 Jacksonv 14 68.4 136 529 8.8 54.47 116
9 Miami 10 75.5 207 335 9.0 59.80 128
10 Atlanta 24 61.5 368 497 9.1 48.34 115
SAS システム 2
10:28 Thursday, July 5, 2007
Correlation Analysis
7 'VAR' Variables: Y X1 X2 X3 X4
X5 X6
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Y 41 30.0488 23.4723 1232.0 8.0000 110.0
X1 41 55.7634 7.2277 2286.3 43.5000 75.5000
X2 41 463.1 563.5 18987.0 35.0000 3344.0
X3 41 608.6 579.1 24953.0 71.0000 3369.0
X4 41 9.4439 1.4286 387.2 6.0000 12.7000
X5 41 36.7690 11.7715 1507.5 7.0500 59.8000
X6 41 113.9 26.5064 4670.0 36.0000 166.0
SAS システム 3
10:28 Thursday, July 5, 2007
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 41
Y X1 X2 X3 X4 X5 X6
Y 1.00000 -0.43360 0.64477 0.49378 0.09469 0.05429 0.36956
0.0 0.0046 0.0001 0.0010 0.5559 0.7360 0.0174
X1 -0.43360 1.00000 -0.19004 -0.06268 -0.34974 0.38625 -0.43024
0.0046 0.0 0.2340 0.6970 0.0250 0.0126 0.0050
X2 0.64477 -0.19004 1.00000 0.95527 0.23795 -0.03242 0.13183
0.0001 0.2340 0.0 0.0001 0.1341 0.8405 0.4113
X3 0.49378 -0.06268 0.95527 1.00000 0.21264 -0.02612 0.04208
0.0010 0.6970 0.0001 0.0 0.1819 0.8712 0.7939
X4 0.09469 -0.34974 0.23795 0.21264 1.00000 -0.01299 0.16411
0.5559 0.0250 0.1341 0.1819 0.0 0.9357 0.3052
X5 0.05429 0.38625 -0.03242 -0.02612 -0.01299 1.00000 0.49610
0.7360 0.0126 0.8405 0.8712 0.9357 0.0 0.0010
X6 0.36956 -0.43024 0.13183 0.04208 0.16411 0.49610 1.00000
0.0174 0.0050 0.4113 0.7939 0.3052 0.0010 0.0
SAS システム 5
10:28 Thursday, July 5, 2007
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 6 14754.63603 2459.10601 11.480 0.0001
Error 34 7283.26641 214.21372
C Total 40 22037.90244
Root MSE 14.63604 R-square 0.6695
Dep Mean 30.04878 Adj R-sq 0.6112
C.V. 48.70761
SAS システム 6
10:28 Thursday, July 5, 2007
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 111.728481 47.31810073 2.361 0.0241
X1 1 -1.267941 0.62117952 -2.041 0.0491
X2 1 0.064918 0.01574825 4.122 0.0002
X3 1 -0.039277 0.01513274 -2.595 0.0138
X4 1 -3.181366 1.81501910 -1.753 0.0887
X5 1 0.512359 0.36275507 1.412 0.1669
X6 1 -0.052050 0.16201386 -0.321 0.7500
SAS システム 14
13:09 Thursday, July 5, 2007
プロット : RESID1*Y. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 + A
e |
s | A
i 25 +
d | A A AA
u | AA AA A A A A
a 0 +------AB------AAABA-A---------A--------------------------A-------
l | CAA C A
| ABA A
-25 + A
---+---------+---------+---------+---------+---------+---------+--
0 20 40 60 80 100 120
Y
SAS システム 15
10:28 Thursday, July 5, 2007
Stepwise Procedure for Dependent Variable Y
Step 1 Variable X2 Entered R-square = 0.41572671 C(p) = 23.10893175
DF Sum of Squares Mean Square F Prob>F
Regression 1 9161.74469120 9161.74469120 27.75 0.0001
Error 39 12876.15774782 330.15789097
Total 40 22037.90243902
Parameter Standard Type II
Variable Estimate Error Sum of Squares F Prob>F
INTERCEP 17.61057438 3.69158676 7513.50474182 22.76 0.0001
X2 0.02685872 0.00509867 9161.74469120 27.75 0.0001
Bounds on condition number: 1, 1
------------------------------------------------------------------------
Step 2 Variable X3 Entered R-square = 0.58632019 C(p) = 7.55859687
DF Sum of Squares Mean Square F Prob>F
Regression 2 12921.26717485 6460.63358743 26.93 0.0001
Error 38 9116.63526417 239.91145432
Total 40 22037.90243902
Parameter Standard Type II
Variable Estimate Error Sum of Squares F Prob>F
INTERCEP 26.32508332 3.84043919 11272.71964000 46.99 0.0001
X2 0.08243410 0.01469656 7548.02378137 31.46 0.0001
X3 -0.05660660 0.01429968 3759.52248365 15.67 0.0003
Bounds on condition number: 11.43374, 45.73494
------------------------------------------------------------------------
Step 3 Variable X6 Entered R-square = 0.61740155 C(p) = 6.36100514
DF Sum of Squares Mean Square F Prob>F
Regression 3 13606.23518823 4535.41172941 19.90 0.0001
Error 37 8431.66725079 227.88289867
Total 40 22037.90243902
Parameter Standard Type II
Variable Estimate Error Sum of Squares F Prob>F
INTERCEP 6.96584888 11.77690656 79.72552238 0.35 0.5578
X2 0.07433399 0.01506613 5547.32153619 24.34 0.0001
X3 -0.04939437 0.01454421 2628.36952166 11.53 0.0016
X6 0.16435940 0.09480151 684.96801338 3.01 0.0913
Bounds on condition number: 12.65025, 78.63322
------------------------------------------------------------------------
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1500 significance level for entry into the
model.
SAS システム 20
10:28 Thursday, July 5, 2007
Summary of Stepwise Procedure for Dependent Variable Y
Variable Number Partial Model
Step Entered Removed In R**2 R**2 C(p) F Prob>F
1 X2 1 0.4157 0.4157 23.1089 27.7496 0.0001
2 X3 2 0.1706 0.5863 7.5586 15.6705 0.0003
3 X6 3 0.0311 0.6174 6.3610 3.0058 0.0913
SAS システム 21
10:28 Thursday, July 5, 2007
OBS ID Y X1 X2 X3 X4 X5 X6 PRED1 RESID1
1 Phoenix 10 70.3 213 582 6.0 7.05 36 -0.032 10.0316
2 Little_R 13 61.0 91 132 8.2 48.52 100 23.646 -10.6461
3 San_Fran 12 56.7 453 716 8.7 20.66 67 16.285 -4.2849
4 Denver 17 51.9 454 515 9.0 12.95 86 29.410 -12.4103
5 Hartford 56 49.1 412 158 9.0 43.37 127 50.661 5.3392
6 Wilmingt 36 54.0 80 80 9.0 40.25 114 27.698 8.3020
7 Washingt 29 57.3 434 757 9.3 38.89 111 20.079 8.9208
8 Jacksonv 14 68.4 136 529 8.8 54.47 116 10.011 3.9887
9 Miami 10 75.5 207 335 9.0 59.80 128 26.844 -16.8439
10 Atlanta 24 61.5 368 497 9.1 48.34 115 28.673 -4.6731
11 Chicago 110 50.6 3344 3369 10.4 34.44 122 109.181 0.8191
12 Indianap 28 52.3 361 746 9.7 38.74 121 16.840 11.1603
13 Des_Moin 17 49.0 104 201 11.2 30.85 103 21.697 -4.6973
14 Wichita 8 56.6 125 277 12.7 30.58 82 16.053 -8.0528
15 Louisvil 30 55.6 291 593 8.3 43.11 123 19.522 10.4776
SAS システム 35
13:09 Thursday, July 5, 2007
プロット : RESID1*Y. 凡例: A = 1 OBS, B = 2 OBS, ...
50 + A
R |
e | A
s | AA
i | A ABA A A A
d 0 +--------BA-A--ABA-A-A---------A--------------------------A-------
u | AC C B A A
a | B A A A
l | A
|
-50 +
---+---------+---------+---------+---------+---------+---------+--
0 20 40 60 80 100 120
Y
SAS システム 36
10:28 Thursday, July 5, 2007
N = 41 Regression Models for Dependent Variable: Y
Number in R-square Variables in Model
Model
1 0.41572671 X2
1 0.24381828 X3
1 0.18800913 X1
1 0.13657727 X6
1 0.00896628 X4
1 0.00294788 X5
--------------------------
2 0.58632019 X2 X3
2 0.51611499 X1 X2
2 0.49813569 X2 X6
2 0.42138706 X2 X5
2 0.41938296 X2 X4
≪略≫
2 0.01204980 X4 X5
-----------------------------
3 0.61740155 X2 X3 X6
3 0.61254683 X1 X2 X3
3 0.59304760 X2 X3 X5
3 0.59298732 X2 X3 X4
3 0.56222293 X1 X2 X5
3 0.54523587 X1 X2 X6
3 0.54521259 X1 X2 X4
3 0.50833841 X2 X4 X6
≪略≫
3 0.15899893 X4 X5 X6
--------------------------------
4 0.63964257 X1 X2 X3 X5
4 0.63287070 X1 X2 X3 X4
4 0.62909408 X1 X2 X3 X6
4 0.62847667 X2 X3 X4 X6
4 0.61759495 X2 X3 X5 X6
4 0.60282531 X1 X2 X4 X5
4 0.59965327 X2 X3 X4 X5
4 0.57466704 X1 X2 X4 X6
≪略≫
4 0.25499437 X1 X4 X5 X6
-----------------------------------
5 0.66850854 X1 X2 X3 X4 X5
5 0.65012088 X1 X2 X3 X4 X6
5 0.63964824 X1 X2 X3 X5 X6
5 0.62901313 X2 X3 X4 X5 X6
5 0.60403117 X1 X2 X4 X5 X6
5 0.50433666 X1 X3 X4 X5 X6
--------------------------------------
6 0.66951181 X1 X2 X3 X4 X5 X6
-----------------------------------------
講義のホームページへ戻ります