● 導入
● 目次
1. 単回帰分析 : 予測等に使う、連続変量の関係
2. 「体重の大きい者を除外」して実行するには?
3. 重回帰分析 : 2変量以上の説明する変量(説明変量)で 1変量(目的変量)を説明
4. 特定グループでの解析
5. [要点] 解析する上での注意点
6. 誤用?!
7. 4つの尺度と回帰分析
8. 有効桁数に注意せよ : どこまでが「意味ある桁」か?
9. 回帰分析における変数選択、総当たり法
● 回帰分析 : 連続変量の予測
/* Lesson 11-2 */
/* File Name = les1102.sas 06/28/09 */
data gakusei;
infile 'all07ae.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
if sex^='M' & sex^='F' then delete;
proc print data=gakusei(obs=10);
run;
proc reg data=gakusei; : 回帰分析
model taijyuu=shintyou; : 変量を指定
output out=outreg1 predicted=pred1 residual=resid1; : 結果項目の保存
run; :
:
proc print data=outreg1(obs=15); : 表示してみる
run; :
:
proc plot data=outreg1; : 散布図を描く
plot taijyuu*shintyou/vaxis=20 to 100 by 20; : 体重と身長(縦軸指定)
plot pred1*taijyuu; : 予測値と観測値
plot resid1*pred1 /vref=0; : 残差と予測値(残差解析)(水平軸指定)
plot resid1*shintyou/vref=0; : 残差と説明変数(残差解析)
plot resid1*taijyuu /vref=0; : 残差と目的変数(残差解析)
run; :
:
proc univariate data=outreg1 plot normal; : 残差を正規プロットして確かめる
var resid1; :
run; :
[備考] 上記のコロン以降は説明のためのものであり、
SAS のプログラムではありません。
[補足] proc plot
の下に以下の行を追加した方がより正確ではある。
欠損値を含むデータを解析対象から除外する事を指示する命令文である。
「欠損値です」の表示が無くなるだけで、得られる図は同じ(欠損値は描画できないから)。
試しに追加する/しないの両方で実行してみよ。
where shintyou^=. and taijyuu^=.;
SAS システム 1
08:52 Thursday, June 28, 2009
OBS SEX SHINTYOU TAIJYUU KYOUI JITAKU KODUKAI CARRYER TSUUWA
1 F 145.0 38.0 . J 10000 .
2 F 146.7 41.0 85 J 10000 Vodafone 6000
3 F 148.0 42.0 . J 50000 .
4 F 148.0 43.0 80 J 50000 DoCoMo 4000
5 F 148.9 . . J 60000 .
6 F 149.0 45.0 . G 60000 .
7 F 150.0 46.0 86 40000 .
8 F 151.0 45.0 . J 20000 docomo 5000
9 F 151.0 50.0 . G 60000 J-PHONE .
10 F 151.7 41.5 80 J 35000 .
SAS システム 2
08:52 Thursday, June 28, 2009
Model: MODEL1
Dependent Variable: TAIJYUU
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 1 14055.20146 14055.20146 318.560 0.0001
Error 323 14251.10026 44.12105
C Total 324 28306.30172
Root MSE 6.64237 R-square 0.4965
Dep Mean 58.78092 Adj R-sq 0.4950
C.V. 11.30021
SAS システム 3
08:52 Thursday, June 28, 2009
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -79.351524 7.74803757 -10.241 0.0001
SHINTYOU 1 0.818831 0.04587737 17.848 0.0001
SAS システム 4
08:52 Thursday, June 28, 2009
S
H T K C
I A J O A T R
N I K I D R S P E
T J Y T U R U R S
O S Y Y O A K Y U E I
B E O U U K A E W D D
S X U U I U I R A 1 1
1 F 145.0 38.0 . J 10000 . 39.3789 -1.3789
2 F 146.7 41.0 85 J 10000 Vodafone 6000 40.7709 0.2291
3 F 148.0 42.0 . J 50000 . 41.8354 0.1646
4 F 148.0 43.0 80 J 50000 DoCoMo 4000 41.8354 1.1646
5 F 148.9 . . J 60000 . 42.5724 .
6 F 149.0 45.0 . G 60000 . 42.6542 2.3458
7 F 150.0 46.0 86 40000 . 43.4731 2.5269
8 F 151.0 45.0 . J 20000 docomo 5000 44.2919 0.7081
9 F 151.0 50.0 . G 60000 J-PHONE . 44.2919 5.7081
10 F 151.7 41.5 80 J 35000 . 44.8651 -3.3651
11 F 152.0 35.0 77 J 60000 DoCoMo 2000 45.1107 -10.1107
12 F 152.0 43.0 . J 20000 au 3500 45.1107 -2.1107
13 F 152.0 44.0 . 45000 DoCoMo 4000 45.1107 -1.1107
14 F 153.0 41.0 . J 125000 No . 45.9296 -4.9296
15 F 153.0 42.0 . G 0 Vodafone 1000 45.9296 -3.9296
SAS システム 6
08:52 Thursday, June 28, 2009
プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
(NOTE: 45 オブザベーションが欠損値です.)
TAIJYUU |
100 + B
| A A
80 + A A A B B B A A
| A B CBDDE ECGBD DCH B BB
60 + A AA AE B CBECG KDSJMBMGFFE CBDCB A
| AAB CACEC EEIBH EBEGG DAACC BC
40 + A A B D BA BA
|
20 +
|
--+-----------+-----------+-----------+-----------+-----------+-
140 150 160 170 180 190
SHINTYOU
SAS システム 7
08:52 Thursday, June 28, 2009
プロット : PRED1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ...
(NOTE: 45 オブザベーションが欠損値です.)
80 +
|
PRED1 | A A B A A
| A BDACFBB H B A A A A
| ABBCCCNHEECIBB A BC A
60 + CFCLHHIMEIBBADBBA A A
| AH EHDIACCAAE A
| BCEEEHACAABA
| BABCDACA A A
| A CACB B A
40 + A BA
---+------------+------------+------------+------------+--
20 40 60 80 100
TAIJYUU
SAS システム 8
08:52 Thursday, June 28, 2009
プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
(NOTE: 45 オブザベーションが欠損値です.)
|
R 50 +
e |
s | A A
i 25 + A A A
d | A B B A BA A
u | A A A AB BBBB BCBCDEDBB ABA A A
a 0 +-------------A-ABAAACCCCECDDJBEEBFDIJDQIJJHIDECBJ-A-AB-----------
l | AA BAA CABA CGDDACFFDBDFBCBBBBAA
| A A
-25 +
---+-----------+-----------+-----------+-----------+-----------+--
30 40 50 60 70 80
Predicted Value of TAIJYUU
SAS システム 9
08:52 Thursday, June 28, 2009
プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
(NOTE: 45 オブザベーションが欠損値です.)
|
R 50 +
e |
s | A A
i 25 + A A A
d | A B B A B A A
u | A A A AB B BBB B CBCDE DBB A BA A A
a 0 +--------A-A-BAAAC-DBCEC-DDJBE-EBFDI-JDQGKAJHIDE-CBJ-A--AB--------
l | A A BA AAB C A CFE DACFEADBDDD CBBBB AA
| A A
-25 +
---+-----------+-----------+-----------+-----------+-----------+--
140 150 160 170 180 190
SHINTYOU
SAS システム 10
08:52 Thursday, June 28, 2009
プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ...
(NOTE: 45 オブザベーションが欠損値です.)
|
R 50 +
e |
s | A A
i 25 + A A A
d | A BABC A
u | A ABACBBJBECGBC B A
a 0 +--------------A-DBEFFFMLERGKUTINEJ-GA--------------------
l | A CABCK DMEIEEHCCB
| A A
-25 +
---+------------+------------+------------+------------+--
20 40 60 80 100
TAIJYUU
SAS システム 11
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Moments
N 325 Sum Wgts 325
Mean 0 Sum 0
Std Dev 6.63211 Variance 43.98488
Skewness 1.42133 Kurtosis 4.000837
USS 14251.1 CSS 14251.1
CV . Std Mean 0.367883
T:Mean=0 0 Pr>|T| 1.0000
Num ^= 0 325 Num > 0 140
M(Sign) -22.5 Pr>=|M| 0.0145
Sgn Rank -3248.5 Pr>=|S| 0.0552
W:Normal 0.916608 Pr<W 0.0001
SAS システム 12
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Quantiles(Def=5)
100% Max 33.59967 99% 22.24447
75% Q3 2.693822 95% 11.59967
50% Med -1.02372 90% 8.244467
25% Q1 -4.03799 10% -7.28364
0% Min -13.9438 5% -8.57436
1% -10.9438
Range 47.54351
Q3-Q1 6.731815
Mode -2.30618
SAS システム 15
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Histogram # Boxplot
35+* 1 *
.** 5 0
.**** 15 0
.****************************** 119 +--+--+
.********************************************* 178 *-----*
-15+** 7 |
----+----+----+----+----+----+----+----+----+
* may represent up to 4 counts
SAS システム 16
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Normal Probability Plot
35+ *
| *****
| ******+++++
| ++**************
| ***********************
-15+***+**++++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
[注意] 誤差は「説明変量」の軸と垂直に取ることに注意せよ。 誤差は測定時に混入していると考えてモデルが構築されているから。
[注意] 「正規性を乱している者は何でも除外してかまわない」というわけではない。 今回の場合は、元データに戻ったところ、体育会系のずんぐりした者であったため、 普通の大学生とは異なる性質を有していると判断し除外対象とした。 除外する場合にはその根拠を明確にしないと、「恣意的な解析」と言われかねないことに注意せよ。
/* Lesson 11-3 */
/* File Name = les1103.sas 06/28/09 */
data gakusei;
infile 'all07ae.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
if sex^='M' & sex^='F' then delete;
if shintyou=. | taijyuu=. then delete; : 欠損値データを除外
proc print data=gakusei(obs=10);
run;
proc corr data=gakusei;
where taijyuu<85; : 対象データを絞る
run;
proc reg data=gakusei;
model taijyuu=shintyou;
where taijyuu<85; : 対象データを絞る
output out=outreg1 predicted=pred1 residual=resid1;
run;
proc print data=outreg1(obs=15);
run;
proc plot data=outreg1;
where taijyuu<85; : 対象データを絞る
plot taijyuu*shintyou;
plot taijyuu*pred1;
plot resid1*(pred1 shintyou taijyuu)/vref=0; : まとめて指定することも可
run;
proc univariate data=outreg1 plot normal;
var resid1;
run;
SAS システム 2
08:52 Thursday, June 28, 2009
Correlation Analysis
5 'VAR' Variables: SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
SHINTYOU 321 168.6 8.0251 54118.5 145.0 186.0
TAIJYUU 321 58.3498 8.5473 18730.3 35.0000 84.0000
KYOUI 111 85.7477 7.9561 9518.0 46.0000 110.0
KODUKAI 303 49107.3 51750.8 14879500 0 350000
TSUUWA 132 6742.4 4469.7 890002 0 30000.0
SAS システム 3
08:52 Thursday, June 28, 2009
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0
/ Number of Observations
SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA
SHINTYOU 1.00000 0.72880 0.28729 0.06533 -0.05960
0.0 0.0001 0.0022 0.2569 0.4972
321 321 111 303 132
TAIJYUU 0.72880 1.00000 0.38406 0.06408 -0.04543
0.0001 0.0 0.0001 0.2662 0.6050
321 321 111 303 132
KYOUI 0.28729 0.38406 1.00000 -0.28125 -0.17722
0.0022 0.0001 0.0 0.0033 0.2940
111 111 111 107 37
KODUKAI 0.06533 0.06408 -0.28125 1.00000 0.26949
0.2569 0.2662 0.0033 0.0 0.0021
303 303 107 303 128
TSUUWA -0.05960 -0.04543 -0.17722 0.26949 1.00000
0.4972 0.6050 0.2940 0.0021 0.0
132 132 37 128 132
SAS システム 6
08:52 Thursday, June 28, 2009
Model: MODEL1
Dependent Variable: TAIJYUU
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 1 12417.15747 12417.15747 361.385 0.0001
Error 319 10960.80502 34.35989
C Total 320 23377.96249
Root MSE 5.86173 R-square 0.5311
Dep Mean 58.34984 Adj R-sq 0.5297
C.V. 10.04584
SAS システム 7
08:52 Thursday, June 28, 2009
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -72.515375 6.89174111 -10.522 0.0001
SHINTYOU 1 0.776218 0.04083178 19.010 0.0001
SAS システム 10
08:52 Thursday, June 28, 2009
プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
TAIJYUU |
100 +
|
| A A
75 + A B BAB B C B BA A A
| BB B CBICDAEDGDD CCKAB BA
| A AA AE B C DCG JDMJLALFEDE CAABA A
50 + AA CACEB DEGBG EBEGG DAACC BC
| A A BA AC BA BB A B A
| A
25 +
--+-----------+-----------+-----------+-----------+-----------+-
140 150 160 170 180 190
SHINTYOU
SAS システム 11
08:52 Thursday, June 28, 2009
プロット : TAIJYUU*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
TAIJYUU |
100 +
|
| A A
75 + A B BABAACAABA A A
| BB EBHDDFDGDDCCKABBA
| A AA AE BC DDNFLKLLGEFCCABAB
50 + AABBCEBDDHCFEBEHIBACC BC
| A ABA BD ABBA B A
| A
25 +
---+-----------+-----------+-----------+-----------+--
40 50 60 70 80
Predicted Value of TAIJYUU
SAS システム 12
08:52 Thursday, June 28, 2009
プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 40 +
e |
s |
i 20 + A A A A
d | A AAAAB AC BA A
u | A B B AB BBBBABEBHDCDBD B A
a 0 +--A-ABAAABABDEACDGCEE-FCIFEJGKKGGDECBJ-A-AB--------------
l | AB BBB E CABCFECBDEDDCBDDCBACBAA
| A AA C A
-20 +
---+------------+------------+------------+------------+--
40 50 60 70 80
Predicted Value of TAIJYUU
SAS システム 13
08:52 Thursday, June 28, 2009
プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 40 +
e |
s |
i 20 + A A A A
d | A AAAAB A C B A A
u | A B B AB B BBBAB EBICC DBD AA A
a 0 +--------A-A-BAAAB-BBCEA-CDHBE-E-FCI-HCJGKAJHFDE-CBJ-A--AB--------
l | A B BB BAD C ABCFE DADEDADBBDD CBBBB AA
| A AA C A
-20 +
---+-----------+-----------+-----------+-----------+-----------+--
140 150 160 170 180 190
SHINTYOU
SAS システム 14
08:52 Thursday, June 28, 2009
プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 40 +
e |
s |
i 20 + A B A
d | A A AB C BCA A
u | B A A B AD EFDHBAGAEB AA
a 0 +----------A--AABBBDADFFDFEEMEEICTFLDEGECD-G-A--------------------
l | ADA CDDI ABFFEEFDDAHC CAA
| A C B A
-20 +
---+---------+---------+---------+---------+---------+---------+--
30 40 50 60 70 80 90
TAIJYUU
SAS システム 15
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Moments
N 321 Sum Wgts 321
Mean 0 Sum 0
Std Dev 5.852565 Variance 34.25252
Skewness 0.822649 Kurtosis 1.167031
USS 10960.81 CSS 10960.81
CV . Std Mean 0.326658
T:Mean=0 0 Pr>|T| 1.0000
Num ^= 0 321 Num > 0 142
M(Sign) -18.5 Pr>=|M| 0.0443
Sgn Rank -2359.5 Pr>=|S| 0.1565
W:Normal 0.954643 Pr<W 0.0001
SAS システム 18
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Histogram # Boxplot
22.5+* 2 0
.** 4 0
.***** 13 0
.************* 38 |
.***************************** 85 +--+--+
.******************************************* 127 *-----*
.*************** 45 |
-12.5+*** 7 |
----+----+----+----+----+----+----+----+---
* may represent up to 3 counts
SAS システム 19
08:52 Thursday, June 28, 2009
Univariate Procedure
Variable=RESID1 Residual
Normal Probability Plot
22.5+ *
| ****
| *****+++++
| *******++
| +*********
| ************
| ***********+
-12.5+****+*++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
/* Lesson 12-1 */
/* File Name = les1201.sas 07/05/09 */
data gakusei;
infile 'all07ae.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
if sex^='M' & sex^='F' then delete;
proc print data=gakusei(obs=10);
run;
proc reg data=gakusei; : 回帰分析
model taijyuu=shintyou kyoui; : 複数変量を指定
output out=outreg1 predicted=pred1 residual=resid1; : 結果項目の保存
run; :
proc print data=outreg1(obs=15);
run;
:
proc plot data=outreg1; : 散布図を描く
where shintyou^=. and taijyuu^=. and kyoui^=.; : 解析に使ったデータのみ
plot taijyuu*shintyou; :
plot taijyuu*kyoui; :
plot taijyuu*pred1; : 観測値と予測値
plot resid1*pred1 /vref=0; : 残差と予測値(残差解析)
plot resid1*shintyou/vref=0; : 残差と説明変量(残差解析)
plot resid1*kyoui /vref=0; : 残差と説明変量(残差解析)
plot resid1*taijyuu /vref=0; : 残差と目的変量(残差解析)
run; :
:
proc univariate data=outreg1 plot normal; : 残差を正規プロットして確かめる
var resid1; :
run; :
SAS システム 2
16:41 Wednesday, July 4, 2009
Model: MODEL1
Dependent Variable: TAIJYUU
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 8070.70705 4035.35353 85.102 0.0001
Error 111 5263.40733 47.41808
C Total 113 13334.11439
Root MSE 6.88608 R-square 0.6053
Dep Mean 58.79298 Adj R-sq 0.5982
C.V. 11.71242
SAS システム 3
16:41 Wednesday, July 4, 2009
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -106.300229 12.75196946 -8.336 0.0001
SHINTYOU 1 0.806547 0.07854137 10.269 0.0001
KYOUI 1 0.349475 0.08192373 4.266 0.0001
SAS システム 4
16:41 Wednesday, July 4, 2009
S
H T K C
I A J O A T R
N I K I D R S P E
T J Y T U R U R S
O S Y Y O A K Y U E I
B E O U U K A E W D D
S X U U I U I R A 1 1
1 F 145.0 38.0 . J 10000 . . .
2 F 146.7 41.0 85 J 10000 Vodafone 6000 41.7256 -0.72559
3 F 148.0 42.0 . J 50000 . . .
4 F 148.0 43.0 80 J 50000 DoCoMo 4000 41.0267 1.97328
5 F 148.9 . . J 60000 . . .
6 F 149.0 45.0 . G 60000 . . .
7 F 150.0 46.0 86 40000 . 44.7367 1.26333
8 F 151.0 45.0 . J 20000 docomo 5000 . .
9 F 151.0 50.0 . G 60000 J-PHONE . . .
10 F 151.7 41.5 80 J 35000 . 44.0109 -2.51095
11 F 152.0 35.0 77 J 60000 DoCoMo 2000 43.2045 -8.20449
12 F 152.0 43.0 . J 20000 au 3500 . .
13 F 152.0 44.0 . 45000 DoCoMo 4000 . .
14 F 153.0 41.0 . J 125000 No . . .
15 F 153.0 42.0 . G 0 Vodafone 1000 . .
SAS システム 6
16:41 Wednesday, July 4, 2009
プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
100 + A
| A A
TAIJYUU | A A A A
| B BABAB BACAA B B A AA
| A A B A B BBA BAGBC ACAA AABBA
50 + A A ADB CDEAC BBACB A
| A A B A A
|
|
|
0 +
--+-----------+-----------+-----------+-----------+-----------+-
140 150 160 170 180 190
SHINTYOU
SAS システム 7
16:41 Wednesday, July 4, 2009
プロット : TAIJYUU*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ...
100 + A
| A A
TAIJYUU | A AA A
| A ACACFACCA A A
| A A CCAAFBFKAAAA A
50 + A A AA EEHICB
| AA B B
|
|
|
0 +
---+-----------+-----------+-----------+-----------+--
40 60 80 100 120
KYOUI
SAS システム 8
16:41 Wednesday, July 4, 2009
プロット : TAIJYUU*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
100 + A
| A A
TAIJYUU | A A A A
| A BBCAABBB B ABC A
| A A BA AABBBABAAECDAC BA BA
50 + AA CBDBCCCBEAD B
| AA AB A
|
|
|
0 +
---+-----------+-----------+-----------+-----------+--
40 50 60 70 80
Predicted Value of TAIJYUU
SAS システム 9
16:41 Wednesday, July 4, 2009
プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A A A B A A A BAABB A A A
a 0 +---AA--BB---CBCBCCBABBACB-BAB-BDBDBCBAAA--ABA---A--------
l | A A BB AB B A AB B CB A
| A
-25 +
---+------------+------------+------------+------------+--
40 50 60 70 80
Predicted Value of TAIJYUU
SAS システム 10
16:41 Wednesday, July 4, 2009
プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A A B A A CABAB AB A
a 0 +----------A-A-A-A-AAADB-CDEAB-BACBB-BAGAC-CBCAA-B-A-A--A---------
l | A A B AA CB A A A BACAA A
| A
-25 +
---+-----------+-----------+-----------+-----------+-----------+--
140 150 160 170 180 190
SHINTYOU
SAS システム 11
16:41 Wednesday, July 4, 2009
プロット : RESID1*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A BB C ABEA B
a 0 +----------------------B--C--GCENBGBIBBE--B--A---A--------
l | AAABBAG BACA A
| A
-25 +
---+------------+------------+------------+------------+--
40 60 80 100 120
KYOUI
SAS システム 12
16:41 Wednesday, July 4, 2009
プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | AAAAAB C ABBAA AA
a 0 +----------------BABDDEGDAE-CFEECAC-E----A----------------
l | A A BD AB AAAD CAA
| A
-25 +
---+------------+------------+------------+------------+--
20 40 60 80 100
TAIJYUU
SAS システム 13
16:41 Wednesday, July 4, 2009
Univariate Procedure
Variable=RESID1 Residual
Moments
N 114 Sum Wgts 114
Mean 0 Sum 0
Std Dev 6.824868 Variance 46.57883
Skewness 2.026813 Kurtosis 7.211418
USS 5263.407 CSS 5263.407
CV . Std Mean 0.639207
T:Mean=0 0 Pr>|T| 1.0000
Num ^= 0 114 Num > 0 43
M(Sign) -14 Pr>=|M| 0.0111
Sgn Rank -517.5 Pr>=|S| 0.1442
W:Normal 0.865365 Pr<W 0.0001
SAS システム 17
16:41 Wednesday, July 4, 2009
Univariate Procedure
Variable=RESID1 Residual
Histogram # Boxplot
35+* 1 *
.* 2 *
.** 4 0
.****************** 36 +--+--+
.*********************************** 69 *-----*
-15+* 2 0
----+----+----+----+----+----+----+
* may represent up to 2 counts
SAS システム 18
16:41 Wednesday, July 4, 2009
Univariate Procedure
Variable=RESID1 Residual
Normal Probability Plot
35+ *
| * *
| +**+*++++++
| +++*************
| ** *********************
-15+*++*+++++++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
/* Lesson 12-2 */
/* File Name = les1202.sas 07/05/09 */
data gakusei;
infile 'all07ae.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
if sex^='M' & sex^='F' then delete; : 性別不明は除外
if shintyou=. | taijyuu=. | kyoui=. then delete; : 欠損のあるデータは除外
proc print data=gakusei(obs=10);
run;
proc corr data=gakusei; : 相関係数
where sex='M'; : 男性について
run; :
:
proc reg data=gakusei; : 回帰分析
model taijyuu=shintyou kyoui; :
where sex='M'; : 男性について
output out=outreg1 predicted=pred1 residual=resid1; :
run; :
proc print data=outreg1(obs=15);
run;
proc plot data=outreg1;
where sex='M'; : 対象データについて
plot taijyuu*shintyou;
plot taijyuu*kyoui;
plot taijyuu*pred1;
plot resid1*(pred1 shintyou kyoui taijyuu)/vref=0; : まとめて記述
/*
plot resid1*pred1 /vref=0;
plot resid1*shintyou/vref=0;
plot resid1*kyoui /vref=0;
plot resid1*taijyuu /vref=0;
*/
run;
proc univariate data=outreg1 plot normal;
var resid1;
run;
SAS システム 2
16:41 Wednesday, July 4, 2009
Correlation Analysis
5 'VAR' Variables: SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
SHINTYOU 71 172.5 5.9351 12250.1 156.0 185.0
TAIJYUU 71 64.7282 9.0651 4595.7 46.0000 100.0
KYOUI 71 88.0986 9.6853 6255.0 46.0000 112.0
KODUKAI 67 56358.2 66471.6 3776000 0 350000
TSUUWA 14 6632.1 4247.9 92850.0 350.0 15000.0
SAS システム 3
16:41 Wednesday, July 4, 2009
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0
/ Number of Observations
SHINTYOU TAIJYUU KYOUI KODUKAI TSUUWA
SHINTYOU 1.00000 0.39968 0.15872 0.09516 0.11552
0.0 0.0006 0.1862 0.4437 0.6941
71 71 71 67 14
TAIJYUU 0.39968 1.00000 0.40227 0.11042 0.14591
0.0006 0.0 0.0005 0.3737 0.6187
71 71 71 67 14
KYOUI 0.15872 0.40227 1.00000 -0.37945 -0.38661
0.1862 0.0005 0.0 0.0015 0.1721
71 71 71 67 14
KODUKAI 0.09516 0.11042 -0.37945 1.00000 0.53783
0.4437 0.3737 0.0015 0.0 0.0473
67 67 67 67 14
TSUUWA 0.11552 0.14591 -0.38661 0.53783 1.00000
0.6941 0.6187 0.1721 0.0473 0.0
14 14 14 14 14
SAS システム 6
16:41 Wednesday, July 4, 2009
Model: MODEL1
Dependent Variable: TAIJYUU
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 1596.38065 798.19033 13.060 0.0001
Error 68 4155.98301 61.11740
C Total 70 5752.36366
Root MSE 7.81776 R-square 0.2775
Dep Mean 64.72817 Adj R-sq 0.2563
C.V. 12.07784
SAS システム 7
16:41 Wednesday, July 4, 2009
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -54.721337 27.50850038 -1.989 0.0507
SHINTYOU 1 0.526195 0.15945819 3.300 0.0015
KYOUI 1 0.325335 0.09771516 3.329 0.0014
SAS システム 10
16:41 Wednesday, July 4, 2009
プロット : TAIJYUU*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
TAIJYUU |
100 + A
| A A
| A
75 + A A A A A AA
| B B A D A A A B C A A A D A A A
| A A A A B A B A D B C A AAA A A A AA A
50 + A B A
|
|
25 +
--+---------+---------+---------+---------+---------+---------+-
155 160 165 170 175 180 185
SHINTYOU
SAS システム 11
16:41 Wednesday, July 4, 2009
プロット : TAIJYUU*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ...
TAIJYUU |
100 + A
| A A
| A
75 + AA BA A A
| A ACABIBCBB A
| A A BCAADBEF AA A
50 + A A AA
|
|
25 +
---+-----------+-----------+-----------+-----------+--
40 60 80 100 120
KYOUI
SAS システム 12
16:41 Wednesday, July 4, 2009
プロット : TAIJYUU*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
TAIJYUU |
100 + A
| A A
| A
75 + AA AAAA A
| A AABCAB BCAA B BBB A
| B A A A A BAAACBEAAABB A
50 + A A AA
|
|
25 +
--+---------+---------+---------+---------+---------+---------+-
50 55 60 65 70 75 80
Predicted Value of TAIJYUU
SAS システム 13
16:41 Wednesday, July 4, 2009
プロット : RESID1*PRED1. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A BBAA A A
a 0 +------A--------A-A-A-A---BAABBCDB-BCAA-BA-CB-------A-------------
l | A AA A A A AAB AABB C A
|
-25 +
---+---------+---------+---------+---------+---------+---------+--
50 55 60 65 70 75 80
Predicted Value of TAIJYUU
SAS システム 14
16:41 Wednesday, July 4, 2009
プロット : RESID1*SHINTYOU. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A A A A B B A
a 0 +----------------A-A-----A-C-D-A-G-A-B-B-BAC-A-A-BA--B---A---A----
l | A B A A A A A A A A A A A BAA A A
|
-25 +
---+---------+---------+---------+---------+---------+---------+--
155 160 165 170 175 180 185
SHINTYOU
SAS システム 15
16:41 Wednesday, July 4, 2009
プロット : RESID1*KYOUI. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A BD B
a 0 +------------A---------A--A-AB-AE-CAHCBE--B--A---A--------
l | A ABB C DADA A
|
-25 +
---+------------+------------+------------+------------+--
40 60 80 100 120
KYOUI
SAS システム 16
16:41 Wednesday, July 4, 2009
プロット : RESID1*TAIJYUU. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 +
e |
s | A
i 25 + A A
d | A
u | A AAB A A AA
a 0 +----------------AAA---FAED-GBAB-DB------A------------------------
l | A A A CAABAE A B A
|
-25 +
---+---------+---------+---------+---------+---------+---------+--
40 50 60 70 80 90 100
TAIJYUU
SAS システム 17
16:41 Wednesday, July 4, 2009
Univariate Procedure
Variable=RESID1 Residual
Moments
N 71 Sum Wgts 71
Mean 0 Sum 0
Std Dev 7.70527 Variance 59.37119
Skewness 2.013612 Kurtosis 5.673909
USS 4155.983 CSS 4155.983
CV . Std Mean 0.914447
T:Mean=0 0 Pr>|T| 1.0000
Num ^= 0 71 Num > 0 27
M(Sign) -8.5 Pr>=|M| 0.0568
Sgn Rank -260 Pr>=|S| 0.1374
W:Normal 0.837353 Pr<W 0.0001
SAS システム 20
16:41 Wednesday, July 4, 2009
Univariate Procedure
Variable=RESID1 Residual
Stem Leaf # Boxplot
3 3 1 *
2 5 1 *
2 4 1 0
1 6 1 0
1 0 1 |
0 5566799 7 |
0 111222222334444 15 +--+--+
-0 4444444333333322221111110 25 *-----*
-0 99877766655555555 17 +-----+
-1 21 2 |
----+----+----+----+----+
Multiply Stem.Leaf by 10**+1
SAS システム 21
16:41 Wednesday, July 4, 2009
Univariate Procedure
Variable=RESID1 Residual
Normal Probability Plot
32.5+ *
|
| * *
17.5+ * ++++++
| +++++++
| ++++*****
2.5+ +++********
| ***********
| * *********+
-12.5+ * *+++++++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
where sex='M' and taijyuu<85;
67.8 <=== 67.75〜67.84
測定精度上回る計算結果は出せても、意味はない。
[重要な注意] 統計ソフトは単なる道具。使いこなすのは各自。
[例1] 四捨五入の数値で考えてみれば : 精度(正確さ)が異なることに注意
[例2] 日本の観測史上の
最高気温は、2007(平成19)年8月16日に熊谷市と多治見市で観測された40.9度であり、
最低気温は、1902(明治35)年1月25日に北海道旭川市の-41度であった。===> -41.0度
12.3 <=== 12.25〜12.34
12 <=== 11.5 〜12.4
68 <=== 67.5 〜68.4
[例3] 2001年のイチロー選手の打率は3割5分であった。
2006年は3割3分1厘であった。===> 3割5分0厘
/* Lesson 12-4 */
/* File Name = les1204.sas 07/05/09 */
data air;
infile 'usair2.prn';
input id $ y x1 x2 x3 x4 x5 x6;
/*
label y='SO2 of air in micrograms per cubic metre'
x1='Average annual temperature in F'
x2='Number of manufacturing enterprises employing 20 or more workers'
x3='Population size (1970 census); in thousands'
x4='Average annual wind speed in miles per hour'
x5='Average annual precipitation in inches'
x6='Average number of days with precipitation per year'
;
*/
proc print data=air(obs=10);
run;
proc corr data=air;
run;
proc reg data=air; :
model y=x1 x2 x3 x4 x5 x6; : フルモデル
output out=outreg1 predicted=pred1 residual=resid1; :
run; :
proc plot data=outreg1;
plot resid1*pred1 /vref=0; :
plot resid1*x1 /vref=0; : ズラズラと列記
plot resid1*x2 /vref=0; :
plot resid1*x3 /vref=0; :
plot resid1*x4 /vref=0; :
plot resid1*x5 /vref=0; :
plot resid1*x6 /vref=0; :
plot resid1*y /vref=0; :
run;
proc reg data=air; :
model y=x1-x6 / selection=stepwise; : 逐次増減法
output out=outreg1 predicted=pred1 residual=resid1; : 連続変数の指定方法
run; :
proc print data=outreg1(obs=15);
run;
proc plot data=outreg1;
plot resid1*pred1 /vref=0; :
plot resid1*(x1 x2 x3 x4 x5 x6) /vref=0; : 簡略形(上と比較せよ)
plot resid1*(x1-x6) /vref=0; : 簡略形(これも同じ意味)
plot resid1*y /vref=0; :
run;
proc reg data=air; :
model y=x1-x6 / selection=rsquare; : 総当たり法
run; :
SAS システム 1
10:28 Thursday, July 5, 2009
OBS ID Y X1 X2 X3 X4 X5 X6
1 Phoenix 10 70.3 213 582 6.0 7.05 36
2 Little_R 13 61.0 91 132 8.2 48.52 100
3 San_Fran 12 56.7 453 716 8.7 20.66 67
4 Denver 17 51.9 454 515 9.0 12.95 86
5 Hartford 56 49.1 412 158 9.0 43.37 127
6 Wilmingt 36 54.0 80 80 9.0 40.25 114
7 Washingt 29 57.3 434 757 9.3 38.89 111
8 Jacksonv 14 68.4 136 529 8.8 54.47 116
9 Miami 10 75.5 207 335 9.0 59.80 128
10 Atlanta 24 61.5 368 497 9.1 48.34 115
SAS システム 2
10:28 Thursday, July 5, 2009
Correlation Analysis
7 'VAR' Variables: Y X1 X2 X3 X4
X5 X6
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Y 41 30.0488 23.4723 1232.0 8.0000 110.0
X1 41 55.7634 7.2277 2286.3 43.5000 75.5000
X2 41 463.1 563.5 18987.0 35.0000 3344.0
X3 41 608.6 579.1 24953.0 71.0000 3369.0
X4 41 9.4439 1.4286 387.2 6.0000 12.7000
X5 41 36.7690 11.7715 1507.5 7.0500 59.8000
X6 41 113.9 26.5064 4670.0 36.0000 166.0
SAS システム 3
10:28 Thursday, July 5, 2009
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 41
Y X1 X2 X3 X4 X5 X6
Y 1.00000 -0.43360 0.64477 0.49378 0.09469 0.05429 0.36956
0.0 0.0046 0.0001 0.0010 0.5559 0.7360 0.0174
X1 -0.43360 1.00000 -0.19004 -0.06268 -0.34974 0.38625 -0.43024
0.0046 0.0 0.2340 0.6970 0.0250 0.0126 0.0050
X2 0.64477 -0.19004 1.00000 0.95527 0.23795 -0.03242 0.13183
0.0001 0.2340 0.0 0.0001 0.1341 0.8405 0.4113
X3 0.49378 -0.06268 0.95527 1.00000 0.21264 -0.02612 0.04208
0.0010 0.6970 0.0001 0.0 0.1819 0.8712 0.7939
X4 0.09469 -0.34974 0.23795 0.21264 1.00000 -0.01299 0.16411
0.5559 0.0250 0.1341 0.1819 0.0 0.9357 0.3052
X5 0.05429 0.38625 -0.03242 -0.02612 -0.01299 1.00000 0.49610
0.7360 0.0126 0.8405 0.8712 0.9357 0.0 0.0010
X6 0.36956 -0.43024 0.13183 0.04208 0.16411 0.49610 1.00000
0.0174 0.0050 0.4113 0.7939 0.3052 0.0010 0.0
SAS システム 5
10:28 Thursday, July 5, 2009
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 6 14754.63603 2459.10601 11.480 0.0001
Error 34 7283.26641 214.21372
C Total 40 22037.90244
Root MSE 14.63604 R-square 0.6695
Dep Mean 30.04878 Adj R-sq 0.6112
C.V. 48.70761
SAS システム 6
10:28 Thursday, July 5, 2009
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 111.728481 47.31810073 2.361 0.0241
X1 1 -1.267941 0.62117952 -2.041 0.0491
X2 1 0.064918 0.01574825 4.122 0.0002
X3 1 -0.039277 0.01513274 -2.595 0.0138
X4 1 -3.181366 1.81501910 -1.753 0.0887
X5 1 0.512359 0.36275507 1.412 0.1669
X6 1 -0.052050 0.16201386 -0.321 0.7500
SAS システム 14
13:09 Thursday, July 5, 2009
プロット : RESID1*Y. 凡例: A = 1 OBS, B = 2 OBS, ...
|
R 50 + A
e |
s | A
i 25 +
d | A A AA
u | AA AA A A A A
a 0 +------AB------AAABA-A---------A--------------------------A-------
l | CAA C A
| ABA A
-25 + A
---+---------+---------+---------+---------+---------+---------+--
0 20 40 60 80 100 120
Y
SAS システム 15
10:28 Thursday, July 5, 2009
Stepwise Procedure for Dependent Variable Y
Step 1 Variable X2 Entered R-square = 0.41572671 C(p) = 23.10893175
DF Sum of Squares Mean Square F Prob>F
Regression 1 9161.74469120 9161.74469120 27.75 0.0001
Error 39 12876.15774782 330.15789097
Total 40 22037.90243902
Parameter Standard Type II
Variable Estimate Error Sum of Squares F Prob>F
INTERCEP 17.61057438 3.69158676 7513.50474182 22.76 0.0001
X2 0.02685872 0.00509867 9161.74469120 27.75 0.0001
Bounds on condition number: 1, 1
------------------------------------------------------------------------
Step 2 Variable X3 Entered R-square = 0.58632019 C(p) = 7.55859687
DF Sum of Squares Mean Square F Prob>F
Regression 2 12921.26717485 6460.63358743 26.93 0.0001
Error 38 9116.63526417 239.91145432
Total 40 22037.90243902
Parameter Standard Type II
Variable Estimate Error Sum of Squares F Prob>F
INTERCEP 26.32508332 3.84043919 11272.71964000 46.99 0.0001
X2 0.08243410 0.01469656 7548.02378137 31.46 0.0001
X3 -0.05660660 0.01429968 3759.52248365 15.67 0.0003
Bounds on condition number: 11.43374, 45.73494
------------------------------------------------------------------------
Step 3 Variable X6 Entered R-square = 0.61740155 C(p) = 6.36100514
DF Sum of Squares Mean Square F Prob>F
Regression 3 13606.23518823 4535.41172941 19.90 0.0001
Error 37 8431.66725079 227.88289867
Total 40 22037.90243902
Parameter Standard Type II
Variable Estimate Error Sum of Squares F Prob>F
INTERCEP 6.96584888 11.77690656 79.72552238 0.35 0.5578
X2 0.07433399 0.01506613 5547.32153619 24.34 0.0001
X3 -0.04939437 0.01454421 2628.36952166 11.53 0.0016
X6 0.16435940 0.09480151 684.96801338 3.01 0.0913
Bounds on condition number: 12.65025, 78.63322
------------------------------------------------------------------------
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1500 significance level for entry into the
model.
SAS システム 20
10:28 Thursday, July 5, 2009
Summary of Stepwise Procedure for Dependent Variable Y
Variable Number Partial Model
Step Entered Removed In R**2 R**2 C(p) F Prob>F
1 X2 1 0.4157 0.4157 23.1089 27.7496 0.0001
2 X3 2 0.1706 0.5863 7.5586 15.6705 0.0003
3 X6 3 0.0311 0.6174 6.3610 3.0058 0.0913
SAS システム 21
10:28 Thursday, July 5, 2009
OBS ID Y X1 X2 X3 X4 X5 X6 PRED1 RESID1
1 Phoenix 10 70.3 213 582 6.0 7.05 36 -0.032 10.0316
2 Little_R 13 61.0 91 132 8.2 48.52 100 23.646 -10.6461
3 San_Fran 12 56.7 453 716 8.7 20.66 67 16.285 -4.2849
4 Denver 17 51.9 454 515 9.0 12.95 86 29.410 -12.4103
5 Hartford 56 49.1 412 158 9.0 43.37 127 50.661 5.3392
6 Wilmingt 36 54.0 80 80 9.0 40.25 114 27.698 8.3020
7 Washingt 29 57.3 434 757 9.3 38.89 111 20.079 8.9208
8 Jacksonv 14 68.4 136 529 8.8 54.47 116 10.011 3.9887
9 Miami 10 75.5 207 335 9.0 59.80 128 26.844 -16.8439
10 Atlanta 24 61.5 368 497 9.1 48.34 115 28.673 -4.6731
11 Chicago 110 50.6 3344 3369 10.4 34.44 122 109.181 0.8191
12 Indianap 28 52.3 361 746 9.7 38.74 121 16.840 11.1603
13 Des_Moin 17 49.0 104 201 11.2 30.85 103 21.697 -4.6973
14 Wichita 8 56.6 125 277 12.7 30.58 82 16.053 -8.0528
15 Louisvil 30 55.6 291 593 8.3 43.11 123 19.522 10.4776
SAS システム 35
13:09 Thursday, July 5, 2009
プロット : RESID1*Y. 凡例: A = 1 OBS, B = 2 OBS, ...
50 + A
R |
e | A
s | AA
i | A ABA A A A
d 0 +--------BA-A--ABA-A-A---------A--------------------------A-------
u | AC C B A A
a | B A A A
l | A
|
-50 +
---+---------+---------+---------+---------+---------+---------+--
0 20 40 60 80 100 120
Y
SAS システム 36
10:28 Thursday, July 5, 2009
N = 41 Regression Models for Dependent Variable: Y
Number in R-square Variables in Model
Model
1 0.41572671 X2
1 0.24381828 X3
1 0.18800913 X1
1 0.13657727 X6
1 0.00896628 X4
1 0.00294788 X5
--------------------------
2 0.58632019 X2 X3
2 0.51611499 X1 X2
2 0.49813569 X2 X6
2 0.42138706 X2 X5
2 0.41938296 X2 X4
≪略≫
2 0.01204980 X4 X5
-----------------------------
3 0.61740155 X2 X3 X6
3 0.61254683 X1 X2 X3
3 0.59304760 X2 X3 X5
3 0.59298732 X2 X3 X4
3 0.56222293 X1 X2 X5
3 0.54523587 X1 X2 X6
3 0.54521259 X1 X2 X4
3 0.50833841 X2 X4 X6
≪略≫
3 0.15899893 X4 X5 X6
--------------------------------
4 0.63964257 X1 X2 X3 X5
4 0.63287070 X1 X2 X3 X4
4 0.62909408 X1 X2 X3 X6
4 0.62847667 X2 X3 X4 X6
4 0.61759495 X2 X3 X5 X6
4 0.60282531 X1 X2 X4 X5
4 0.59965327 X2 X3 X4 X5
4 0.57466704 X1 X2 X4 X6
≪略≫
4 0.25499437 X1 X4 X5 X6
-----------------------------------
5 0.66850854 X1 X2 X3 X4 X5
5 0.65012088 X1 X2 X3 X4 X6
5 0.63964824 X1 X2 X3 X5 X6
5 0.62901313 X2 X3 X4 X5 X6
5 0.60403117 X1 X2 X4 X5 X6
5 0.50433666 X1 X3 X4 X5 X6
--------------------------------------
6 0.66951181 X1 X2 X3 X4 X5 X6
-----------------------------------------
講義のホームページへ戻ります