前回までに分布特性を把握するためのいくつかの指標を説明し、
その使い方や注意点を喚起した。またグループ分けが有用なことも説明した。
解析の過程では、特徴の異なるサンプルや外れ値を除外することもあるので、
その方法について紹介する。
また、単純集計としてよく利用される頻度集計やクロス集計の方法についても
紹介する。
/* Lesson 8-1 */
/* File Name = les0801.sas 11/21/07 */
data gakusei;
infile 'all07be.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
if kodukai>=200000 then delete; : 20万円以上の場合、除外
if sex^='M' & sex^='F' then delete; : 男でも女でもない場合、除外
(以下略)
SAS システム 2
21:48 Monday, November 19, 2007
Variable N Mean Std Dev Minimum Maximum
---------------------------------------------------------------------
SHINTYOU 360 167.7697222 8.2095196 145.0000000 186.0000000
TAIJYUU 324 58.6753086 9.2548611 35.0000000 100.0000000
KYOUI 111 86.5585586 7.5566764 56.0000000 112.0000000
KODUKAI 346 44976.88 41679.15 0 180000.00
TSUUWA 152 6478.83 4416.28 0 30000.00
---------------------------------------------------------------------
SAS システム 21
21:48 Monday, November 19, 2007
Univariate Procedure
Variable=KODUKAI
Moments
N 346 Sum Wgts 346
Mean 44976.88 Sum 15562000
Std Dev 41679.15 Variance 1.7372E9
Skewness 1.180932 Kurtosis 0.769481
USS 1.299E12 CSS 5.993E11
CV 92.66795 Std Mean 2240.685
T:Mean=0 20.07282 Pr>|T| 0.0001
Num ^= 0 291 Num > 0 291
M(Sign) 145.5 Pr>=|M| 0.0001
Sgn Rank 21243 Pr>=|S| 0.0001
SAS システム 22
21:48 Monday, November 19, 2007
Univariate Procedure
Variable=KODUKAI
Quantiles(Def=5)
100% Max 180000 99% 160000
75% Q3 60000 95% 150000
50% Med 30000 90% 100000
25% Q1 20000 10% 0
0% Min 0 5% 0
1% 0
Range 180000
Q3-Q1 40000
Mode 0
SAS システム 25
21:48 Monday, November 19, 2007
Univariate Procedure
Variable=KODUKAI
Histogram # Boxplot
190000+* 1 0
.** 6 0
.**** 12 0
130000+***** 13 0
.******** 23 |
.**** 11 |
70000+************ 35 +-----+
.****************** 52 | + |
.************************************* 109 *-----*
10000+**************************** 84 |
----+----+----+----+----+----+----+--
* may represent up to 3 counts
SAS システム 32
21:48 Monday, November 19, 2007
--------------------------------- SEX=F --------------------------------
Variable N Mean Std Dev Minimum Maximum
---------------------------------------------------------------------
SHINTYOU 119 158.9386555 5.3375566 145.0000000 171.0000000
TAIJYUU 83 48.7228916 4.7244906 35.0000000 60.0000000
KYOUI 42 82.9523810 3.9752428 70.0000000 90.0000000
KODUKAI 115 44330.43 35037.19 0 180000.00
TSUUWA 62 6640.06 4331.96 80.0000000 25000.00
---------------------------------------------------------------------
SAS システム 33
21:48 Monday, November 19, 2007
--------------------------------- SEX=M --------------------------------
Variable N Mean Std Dev Minimum Maximum
---------------------------------------------------------------------
SHINTYOU 241 172.1302905 5.3891979 156.0000000 186.0000000
TAIJYUU 241 62.1029046 7.8482663 46.0000000 100.0000000
KYOUI 69 88.7536232 8.3620392 56.0000000 112.0000000
KODUKAI 231 45298.70 44687.24 0 165000.00
TSUUWA 90 6367.76 4494.19 0 30000.00
---------------------------------------------------------------------
SAS システム 90
21:48 Monday, November 19, 2007
Univariate Procedure
Schematic Plots
Variable=SHINTYOU
200 +
|
| 0
180 + |
| | *--+--*
| | +-----+
160 + *--+--* 0
| +-----+ 0
| 0
140 +
------------+-----------+-----------
SEX F M
SAS システム 91
21:48 Monday, November 19, 2007
Univariate Procedure
Schematic Plots
Variable=TAIJYUU
|
100 + *
| 0
| | *--+--*
50 + *--+--* +-----+
| 0
|
0 +
------------+-----------+-----------
SEX F M
SAS システム 105
21:48 Monday, November 19, 2007
SEX SHINTYOU Cum. Cum.
Midpoint Freq Freq Percent Percent
|
F 146 | 2 2 0.56 0.56
150 |** 9 11 2.50 3.06
154 |*** 17 28 4.72 7.78
158 |****** 32 60 8.89 16.67
162 |******* 34 94 9.44 26.11
166 |**** 21 115 5.83 31.94
170 |* 4 119 1.11 33.06
174 | 0 119 0.00 33.06
178 | 0 119 0.00 33.06
182 | 0 119 0.00 33.06
186 | 0 119 0.00 33.06
|
M 146 | 0 119 0.00 33.06
150 | 0 119 0.00 33.06
154 | 0 119 0.00 33.06
158 | 2 121 0.56 33.61
162 |*** 13 134 3.61 37.22
166 |***** 26 160 7.22 44.44
170 |************** 72 232 20.00 64.44
174 |************** 69 301 19.17 83.61
178 |******* 35 336 9.72 93.33
182 |**** 19 355 5.28 98.61
186 |* 5 360 1.39 100.00
|
----+---+---+--
20 40 60
Frequency
SAS システム 111
21:48 Monday, November 19, 2007
SEX KODUKAI Cum. Cum.
Midpoint Freq Freq Percent Percent
|
F 0 |****** 14 14 4.05 4.05
20000 |********** 25 39 7.23 11.27
40000 |*********** 27 66 7.80 19.08
60000 |*********** 27 93 7.80 26.88
80000 |**** 10 103 2.89 29.77
100000 |** 5 108 1.45 31.21
120000 |* 3 111 0.87 32.08
140000 | 1 112 0.29 32.37
160000 |* 2 114 0.58 32.95
180000 | 1 115 0.29 33.24
|
M 0 |******************** 50 165 14.45 47.69
20000 |****************** 45 210 13.01 60.69
40000 |********************* 52 262 15.03 75.72
60000 |*********** 28 290 8.09 83.82
80000 |***** 12 302 3.47 87.28
100000 |******* 18 320 5.20 92.49
120000 |*** 8 328 2.31 94.80
140000 |* 3 331 0.87 95.66
160000 |****** 15 346 4.34 100.00
180000 | 0 346 0.00 100.00
|
----+---+---+---+---+-
10 20 30 40 50
Frequency
data seito07;
infile 'seito.prn';
input id $ sex $ kesseki $ univ $
koku $ suu1 $ suu2 $ tireki $ koumin $ rika $;
if sex^='M' then delete; /* male only */
if kesseki^='0' then delete; /* syusseki-sya only */
area="不明";
if univ="早稲田大学" then area="東日本";
if univ="慶応大学" then area="東日本";
if univ="関西大学" then area="西日本";
if univ="同志社大学" then area="西日本";
if tireki="世界史-0" then tireki="世界史";
if tireki="世界史-2" then tireki="世界史";
if tireki="日本史-2" then tireki="日本史";
if tireki="日本史-3" then tireki="日本史";
...
[例4] 複数の処理をさせたい場合 : do 〜 end で囲む
if tireki="世界史-0" then do;
tireki="世界史";
koumin=.;
end;
...
[比較演算子]
/* Lesson 8-2 */
/* File Name = les0802.sas 06/14/07 */
data gakusei;
infile 'all07be.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
proc print data=gakusei(obs=5);
run;
:
proc freq data=gakusei; : 頻度を算出
tables sex jitaku carryer; : 一変量ごとに
run; :
proc freq data=gakusei; : 頻度を算出
tables sex*jitaku; : 二変量の組み合わせで
tables sex*carryer; :
tables jitaku*carryer; :
run; :
SAS システム 1
21:48 Monday, November 19, 2007
OBS SEX SHINTYOU TAIJYUU KYOUI JITAKU KODUKAI CARRYER TSUUWA
1 F 145.0 38 . J 10000 .
2 F 146.7 41 85 J 10000 Vodafone 6000
3 F 148.0 42 . J 50000 .
4 F 148.0 43 80 J 50000 DoCoMo 4000
5 F 148.9 . . J 60000 .
SAS システム 2
21:48 Monday, November 19, 2007
Cumulative Cumulative
SEX Frequency Percent Frequency Percent
-------------------------------------------------
F 128 34.0 128 34.0
M 248 66.0 376 100.0
Frequency Missing = 5
Cumulative Cumulative
JITAKU Frequency Percent Frequency Percent
----------------------------------------------------
G 121 37.2 121 37.2
J 204 62.8 325 100.0
Frequency Missing = 56
Cumulative Cumulative
CARRYER Frequency Percent Frequency Percent
------------------------------------------------------
DDIp 2 1.3 2 1.3
DoCoMo 60 39.5 62 40.8
J-PHONE 10 6.6 72 47.4
KDDI 1 0.7 73 48.0
No 5 3.3 78 51.3
Vodafone 20 13.2 98 64.5
Willcom 1 0.7 99 65.1
au 41 27.0 140 92.1
au+willc 1 0.7 141 92.8
docomo 5 3.3 146 96.1
docomo+w 1 0.7 147 96.7
softbank 4 2.6 151 99.3
vodafone 1 0.7 152 100.0
Frequency Missing = 229
SAS システム 6
21:48 Monday, November 19, 2007
TABLE OF SEX BY JITAKU
SEX JITAKU
Frequency|
Percent |
Row Pct |
Col Pct |G |J | Total
---------+--------+--------+
F | 36 | 73 | 109
| 11.15 | 22.60 | 33.75
| 33.03 | 66.97 |
| 30.00 | 35.96 |
---------+--------+--------+
M | 84 | 130 | 214
| 26.01 | 40.25 | 66.25
| 39.25 | 60.75 |
| 70.00 | 64.04 |
---------+--------+--------+
Total 120 203 323
37.15 62.85 100.00
Frequency Missing = 58
SAS システム 9
21:48 Monday, November 19, 2007
TABLE OF SEX BY CARRYER
SEX CARRYER
Frequency|
Percent |
Row Pct |
Col Pct |DDIp |DoCoMo |J-PHONE |KDDI |No | Total
---------+--------+--------+--------+--------+--------+
F | 1 | 25 | 4 | 0 | 1 | 60
| 0.66 | 16.56 | 2.65 | 0.00 | 0.66 | 39.74
| 1.67 | 41.67 | 6.67 | 0.00 | 1.67 |
| 50.00 | 41.67 | 44.44 | 0.00 | 20.00 |
---------+--------+--------+--------+--------+--------+
M | 1 | 35 | 5 | 1 | 4 | 91
| 0.66 | 23.18 | 3.31 | 0.66 | 2.65 | 60.26
| 1.10 | 38.46 | 5.49 | 1.10 | 4.40 |
| 50.00 | 58.33 | 55.56 | 100.00 | 80.00 |
---------+--------+--------+--------+--------+--------+
Total 2 60 9 1 5 151
1.32 39.74 5.96 0.66 3.31 100.00
(Continued)
SAS システム 11
21:48 Monday, November 19, 2007
TABLE OF SEX BY CARRYER
SEX CARRYER
Frequency|
Percent |
Row Pct |
Col Pct |Vodafone|Willcom |au |au+willc|docomo | Total
---------+--------+--------+--------+--------+--------+
F | 9 | 1 | 14 | 1 | 1 | 60
| 5.96 | 0.66 | 9.27 | 0.66 | 0.66 | 39.74
| 15.00 | 1.67 | 23.33 | 1.67 | 1.67 |
| 45.00 | 100.00 | 34.15 | 100.00 | 20.00 |
---------+--------+--------+--------+--------+--------+
M | 11 | 0 | 27 | 0 | 4 | 91
| 7.28 | 0.00 | 17.88 | 0.00 | 2.65 | 60.26
| 12.09 | 0.00 | 29.67 | 0.00 | 4.40 |
| 55.00 | 0.00 | 65.85 | 0.00 | 80.00 |
---------+--------+--------+--------+--------+--------+
Total 20 1 41 1 5 151
13.25 0.66 27.15 0.66 3.31 100.00
(Continued)
SAS システム 13
21:48 Monday, November 19, 2007
TABLE OF SEX BY CARRYER
SEX CARRYER
Frequency|
Percent |
Row Pct |
Col Pct |docomo+w|softbank|vodafone| Total
---------+--------+--------+--------+
F | 0 | 3 | 0 | 60
| 0.00 | 1.99 | 0.00 | 39.74
| 0.00 | 5.00 | 0.00 |
| 0.00 | 75.00 | 0.00 |
---------+--------+--------+--------+
M | 1 | 1 | 1 | 91
| 0.66 | 0.66 | 0.66 | 60.26
| 1.10 | 1.10 | 1.10 |
| 100.00 | 25.00 | 100.00 |
---------+--------+--------+--------+
Total 1 4 1 151
0.66 2.65 0.66 100.00
Frequency Missing = 230
SAS システム 16
21:48 Monday, November 19, 2007
TABLE OF JITAKU BY CARRYER
JITAKU CARRYER
Frequency|
Percent |
Row Pct |
Col Pct |DDIp |DoCoMo |J-PHONE |KDDI |No | Total
---------+--------+--------+--------+--------+--------+
G | 1 | 21 | 4 | 1 | 0 | 47
| 0.78 | 16.28 | 3.10 | 0.78 | 0.00 | 36.43
| 2.13 | 44.68 | 8.51 | 2.13 | 0.00 |
| 100.00 | 41.18 | 44.44 | 100.00 | 0.00 |
---------+--------+--------+--------+--------+--------+
J | 0 | 30 | 5 | 0 | 4 | 82
| 0.00 | 23.26 | 3.88 | 0.00 | 3.10 | 63.57
| 0.00 | 36.59 | 6.10 | 0.00 | 4.88 |
| 0.00 | 58.82 | 55.56 | 0.00 | 100.00 |
---------+--------+--------+--------+--------+--------+
Total 1 51 9 1 4 129
0.78 39.53 6.98 0.78 3.10 100.00
(Continued)
SAS システム 18
21:48 Monday, November 19, 2007
TABLE OF JITAKU BY CARRYER
JITAKU CARRYER
Frequency|
Percent |
Row Pct |
Col Pct |Vodafone|Willcom |au |au+willc|docomo | Total
---------+--------+--------+--------+--------+--------+
G | 4 | 0 | 12 | 0 | 2 | 47
| 3.10 | 0.00 | 9.30 | 0.00 | 1.55 | 36.43
| 8.51 | 0.00 | 25.53 | 0.00 | 4.26 |
| 23.53 | . | 34.29 | 0.00 | 40.00 |
---------+--------+--------+--------+--------+--------+
J | 13 | 0 | 23 | 1 | 3 | 82
| 10.08 | 0.00 | 17.83 | 0.78 | 2.33 | 63.57
| 15.85 | 0.00 | 28.05 | 1.22 | 3.66 |
| 76.47 | . | 65.71 | 100.00 | 60.00 |
---------+--------+--------+--------+--------+--------+
Total 17 0 35 1 5 129
13.18 0.00 27.13 0.78 3.88 100.00
(Continued)
SAS システム 20
21:48 Monday, November 19, 2007
TABLE OF JITAKU BY CARRYER
JITAKU CARRYER
Frequency|
Percent |
Row Pct |
Col Pct |docomo+w|softbank|vodafone| Total
---------+--------+--------+--------+
G | 1 | 1 | 0 | 47
| 0.78 | 0.78 | 0.00 | 36.43
| 2.13 | 2.13 | 0.00 |
| 100.00 | 33.33 | 0.00 |
---------+--------+--------+--------+
J | 0 | 2 | 1 | 82
| 0.00 | 1.55 | 0.78 | 63.57
| 0.00 | 2.44 | 1.22 |
| 0.00 | 66.67 | 100.00 |
---------+--------+--------+--------+
Total 1 3 1 129
0.78 2.33 0.78 100.00
Frequency Missing = 252
≪前略≫ if carryer="au+willc" then carryer="au+Willc"; if carryer="docomo" then carryer="DoCoMo"; if carryer="docomo+w" then carryer="DoCoMo+W"; if carryer="vodafone" then carryer="Vodafone"; ≪後略≫
≪前略≫
proc freq data=gakusei order=freq; : 頻度の高いもの順
tables sex jitaku carryer; :
run; :
:
proc freq data=gakusei order=freq; : 頻度の高いもの順
tables sex*jitaku; :
tables sex*carryer; :
tables jitaku*carryer; :
run; :
≪後略≫
/* Lesson 8-5 */
/* File Name = les0805.sas 06/14/07 */
data gakusei;
infile 'all07be.prn'
firstobs=2;
input sex $ shintyou taijyuu kyoui
jitaku $ kodukai carryer $ tsuuwa;
proc format; : 階級を作る。class shintyou の意
value clshint low-<150=' -149' : 階級の定義 1
150-<160='150-159' : 2
160-<170='160-169' : 3
170-<180='170-179' : 4
180-high='180- ' : 5
other ='missing'; : 6
run; :
proc print data=gakusei(obs=5);
run;
proc freq data=gakusei; : 頻度を算出
tables shintyou; : 一変量ごとに
format shintyou clshint.; : 連続変量をグループ化することの指定
run; :
:
proc freq data=gakusei; : 頻度を算出
tables sex*shintyou; : 二変量の組合わせで
format shintyou clshint.; : 連続変量をグループ化することの指定
run; :
:
proc sort data=gakusei; : 今までの方法で実現しようとすると
by sex; :
run; :
proc freq data=gakusei; :
tables shintyou; :
format shintyou clshint.; : 連続変量をグループ化することの指定
by sex; : 性別ごとに
run; :
SAS システム 2
21:48 Monday, November 19, 2007
Cumulative Cumulative
SHINTYOU Frequency Percent Frequency Percent
------------------------------------------------------
-149 6 1.6 6 1.6
150-159 56 15.3 62 16.9
160-169 126 34.4 188 51.4
170-179 154 42.1 342 93.4
180- 24 6.6 366 100.0
Frequency Missing = 15
SAS システム 3
21:48 Monday, November 19, 2007
TABLE OF SEX BY SHINTYOU
SEX SHINTYOU
Frequency|
Percent |
Row Pct |
Col Pct | -149 |150-159 |160-169 |170-179 |180- | Total
---------+--------+--------+--------+--------+--------+
F | 6 | 54 | 59 | 2 | 0 | 121
| 1.64 | 14.79 | 16.16 | 0.55 | 0.00 | 33.15
| 4.96 | 44.63 | 48.76 | 1.65 | 0.00 |
| 100.00 | 96.43 | 47.20 | 1.30 | 0.00 |
---------+--------+--------+--------+--------+--------+
M | 0 | 2 | 66 | 152 | 24 | 244
| 0.00 | 0.55 | 18.08 | 41.64 | 6.58 | 66.85
| 0.00 | 0.82 | 27.05 | 62.30 | 9.84 |
| 0.00 | 3.57 | 52.80 | 98.70 | 100.00 |
---------+--------+--------+--------+--------+--------+
Total 6 56 125 154 24 365
1.64 15.34 34.25 42.19 6.58 100.00
Frequency Missing = 16
SAS システム 6
21:48 Monday, November 19, 2007
------------------------------- SEX=' ' --------------------------------
Cumulative Cumulative
SHINTYOU Frequency Percent Frequency Percent
------------------------------------------------------
160-169 1 100.0 1 100.0
Frequency Missing = 4
SAS システム 7
21:48 Monday, November 19, 2007
-------------------------------- SEX=F ---------------------------------
Cumulative Cumulative
SHINTYOU Frequency Percent Frequency Percent
------------------------------------------------------
-149 6 5.0 6 5.0
150-159 54 44.6 60 49.6
160-169 59 48.8 119 98.3
170-179 2 1.7 121 100.0
Frequency Missing = 7
SAS システム 8
21:48 Monday, November 19, 2007
-------------------------------- SEX=M ---------------------------------
Cumulative Cumulative
SHINTYOU Frequency Percent Frequency Percent
------------------------------------------------------
150-159 2 0.8 2 0.8
160-169 66 27.0 68 27.9
170-179 152 62.3 220 90.2
180- 24 9.8 244 100.0
Frequency Missing = 4
data mon2007;
infile 'd:\home\mon05d.csv' dlm=','
firstobs=2
truncover;
missover
dsd
;
input No $ Univ : $30. SName : $40. Faculty : $50. Dept : $50.
Center1 : $8. Center2 : $8. Sel1 : $8. Sel2 : $8.
Book1 : $10. Book2 : $10.
Vol0 VolS VolT
ZenKou $ ScoreS ScoreT KoKouSi
;
data mon2007;
infile 'd:\home\mon05e.txt' dlm='09'x
firstobs=2
truncover;
data math; infile 'foo.dat' lrecl=230;
data math; infile 'foo.dat' lrecl=230 truncover;
input
kamoku $ 2
kesseki $ 3
k_code $ 10-11
t_score 12-14
s_scor01 103-104
s_scor02 105-106
s_scor03 107-108
s_scor04 109-110
;
data math; infile 'foo.dat' firstobs=4;