前回までに分布特性を把握するためのいくつかの指標を説明し、
その使い方や注意点を喚起した。またグループ分けが有用なことも説明した。
解析の過程では、特徴の異なるサンプルや外れ値を除外することもあるので、
その方法について紹介する。
また、単純集計としてよく利用される頻度集計やクロス集計の方法についても
紹介する。
/* Lesson 8-1 */ /* File Name = les0801.sas 11/21/07 */ data gakusei; infile 'all07be.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; if kodukai>=200000 then delete; : 20万円以上の場合、除外 if sex^='M' & sex^='F' then delete; : 男でも女でもない場合、除外 (以下略)
SAS システム 2 21:48 Monday, November 19, 2007 Variable N Mean Std Dev Minimum Maximum --------------------------------------------------------------------- SHINTYOU 360 167.7697222 8.2095196 145.0000000 186.0000000 TAIJYUU 324 58.6753086 9.2548611 35.0000000 100.0000000 KYOUI 111 86.5585586 7.5566764 56.0000000 112.0000000 KODUKAI 346 44976.88 41679.15 0 180000.00 TSUUWA 152 6478.83 4416.28 0 30000.00 --------------------------------------------------------------------- SAS システム 21 21:48 Monday, November 19, 2007 Univariate Procedure Variable=KODUKAI Moments N 346 Sum Wgts 346 Mean 44976.88 Sum 15562000 Std Dev 41679.15 Variance 1.7372E9 Skewness 1.180932 Kurtosis 0.769481 USS 1.299E12 CSS 5.993E11 CV 92.66795 Std Mean 2240.685 T:Mean=0 20.07282 Pr>|T| 0.0001 Num ^= 0 291 Num > 0 291 M(Sign) 145.5 Pr>=|M| 0.0001 Sgn Rank 21243 Pr>=|S| 0.0001 SAS システム 22 21:48 Monday, November 19, 2007 Univariate Procedure Variable=KODUKAI Quantiles(Def=5) 100% Max 180000 99% 160000 75% Q3 60000 95% 150000 50% Med 30000 90% 100000 25% Q1 20000 10% 0 0% Min 0 5% 0 1% 0 Range 180000 Q3-Q1 40000 Mode 0 SAS システム 25 21:48 Monday, November 19, 2007 Univariate Procedure Variable=KODUKAI Histogram # Boxplot 190000+* 1 0 .** 6 0 .**** 12 0 130000+***** 13 0 .******** 23 | .**** 11 | 70000+************ 35 +-----+ .****************** 52 | + | .************************************* 109 *-----* 10000+**************************** 84 | ----+----+----+----+----+----+----+-- * may represent up to 3 counts SAS システム 32 21:48 Monday, November 19, 2007 --------------------------------- SEX=F -------------------------------- Variable N Mean Std Dev Minimum Maximum --------------------------------------------------------------------- SHINTYOU 119 158.9386555 5.3375566 145.0000000 171.0000000 TAIJYUU 83 48.7228916 4.7244906 35.0000000 60.0000000 KYOUI 42 82.9523810 3.9752428 70.0000000 90.0000000 KODUKAI 115 44330.43 35037.19 0 180000.00 TSUUWA 62 6640.06 4331.96 80.0000000 25000.00 --------------------------------------------------------------------- SAS システム 33 21:48 Monday, November 19, 2007 --------------------------------- SEX=M -------------------------------- Variable N Mean Std Dev Minimum Maximum --------------------------------------------------------------------- SHINTYOU 241 172.1302905 5.3891979 156.0000000 186.0000000 TAIJYUU 241 62.1029046 7.8482663 46.0000000 100.0000000 KYOUI 69 88.7536232 8.3620392 56.0000000 112.0000000 KODUKAI 231 45298.70 44687.24 0 165000.00 TSUUWA 90 6367.76 4494.19 0 30000.00 --------------------------------------------------------------------- SAS システム 90 21:48 Monday, November 19, 2007 Univariate Procedure Schematic Plots Variable=SHINTYOU 200 + | | 0 180 + | | | *--+--* | | +-----+ 160 + *--+--* 0 | +-----+ 0 | 0 140 + ------------+-----------+----------- SEX F M SAS システム 91 21:48 Monday, November 19, 2007 Univariate Procedure Schematic Plots Variable=TAIJYUU | 100 + * | 0 | | *--+--* 50 + *--+--* +-----+ | 0 | 0 + ------------+-----------+----------- SEX F M SAS システム 105 21:48 Monday, November 19, 2007 SEX SHINTYOU Cum. Cum. Midpoint Freq Freq Percent Percent | F 146 | 2 2 0.56 0.56 150 |** 9 11 2.50 3.06 154 |*** 17 28 4.72 7.78 158 |****** 32 60 8.89 16.67 162 |******* 34 94 9.44 26.11 166 |**** 21 115 5.83 31.94 170 |* 4 119 1.11 33.06 174 | 0 119 0.00 33.06 178 | 0 119 0.00 33.06 182 | 0 119 0.00 33.06 186 | 0 119 0.00 33.06 | M 146 | 0 119 0.00 33.06 150 | 0 119 0.00 33.06 154 | 0 119 0.00 33.06 158 | 2 121 0.56 33.61 162 |*** 13 134 3.61 37.22 166 |***** 26 160 7.22 44.44 170 |************** 72 232 20.00 64.44 174 |************** 69 301 19.17 83.61 178 |******* 35 336 9.72 93.33 182 |**** 19 355 5.28 98.61 186 |* 5 360 1.39 100.00 | ----+---+---+-- 20 40 60 Frequency SAS システム 111 21:48 Monday, November 19, 2007 SEX KODUKAI Cum. Cum. Midpoint Freq Freq Percent Percent | F 0 |****** 14 14 4.05 4.05 20000 |********** 25 39 7.23 11.27 40000 |*********** 27 66 7.80 19.08 60000 |*********** 27 93 7.80 26.88 80000 |**** 10 103 2.89 29.77 100000 |** 5 108 1.45 31.21 120000 |* 3 111 0.87 32.08 140000 | 1 112 0.29 32.37 160000 |* 2 114 0.58 32.95 180000 | 1 115 0.29 33.24 | M 0 |******************** 50 165 14.45 47.69 20000 |****************** 45 210 13.01 60.69 40000 |********************* 52 262 15.03 75.72 60000 |*********** 28 290 8.09 83.82 80000 |***** 12 302 3.47 87.28 100000 |******* 18 320 5.20 92.49 120000 |*** 8 328 2.31 94.80 140000 |* 3 331 0.87 95.66 160000 |****** 15 346 4.34 100.00 180000 | 0 346 0.00 100.00 | ----+---+---+---+---+- 10 20 30 40 50 Frequency
data seito07; infile 'seito.prn'; input id $ sex $ kesseki $ univ $ koku $ suu1 $ suu2 $ tireki $ koumin $ rika $; if sex^='M' then delete; /* male only */ if kesseki^='0' then delete; /* syusseki-sya only */ area="不明"; if univ="早稲田大学" then area="東日本"; if univ="慶応大学" then area="東日本"; if univ="関西大学" then area="西日本"; if univ="同志社大学" then area="西日本"; if tireki="世界史-0" then tireki="世界史"; if tireki="世界史-2" then tireki="世界史"; if tireki="日本史-2" then tireki="日本史"; if tireki="日本史-3" then tireki="日本史"; ...
[例4] 複数の処理をさせたい場合 : do 〜 end で囲む
if tireki="世界史-0" then do; tireki="世界史"; koumin=.; end; ...
[比較演算子]
/* Lesson 8-2 */ /* File Name = les0802.sas 06/14/07 */ data gakusei; infile 'all07be.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; proc print data=gakusei(obs=5); run; : proc freq data=gakusei; : 頻度を算出 tables sex jitaku carryer; : 一変量ごとに run; : proc freq data=gakusei; : 頻度を算出 tables sex*jitaku; : 二変量の組み合わせで tables sex*carryer; : tables jitaku*carryer; : run; :
SAS システム 1 21:48 Monday, November 19, 2007 OBS SEX SHINTYOU TAIJYUU KYOUI JITAKU KODUKAI CARRYER TSUUWA 1 F 145.0 38 . J 10000 . 2 F 146.7 41 85 J 10000 Vodafone 6000 3 F 148.0 42 . J 50000 . 4 F 148.0 43 80 J 50000 DoCoMo 4000 5 F 148.9 . . J 60000 . SAS システム 2 21:48 Monday, November 19, 2007 Cumulative Cumulative SEX Frequency Percent Frequency Percent ------------------------------------------------- F 128 34.0 128 34.0 M 248 66.0 376 100.0 Frequency Missing = 5 Cumulative Cumulative JITAKU Frequency Percent Frequency Percent ---------------------------------------------------- G 121 37.2 121 37.2 J 204 62.8 325 100.0 Frequency Missing = 56 Cumulative Cumulative CARRYER Frequency Percent Frequency Percent ------------------------------------------------------ DDIp 2 1.3 2 1.3 DoCoMo 60 39.5 62 40.8 J-PHONE 10 6.6 72 47.4 KDDI 1 0.7 73 48.0 No 5 3.3 78 51.3 Vodafone 20 13.2 98 64.5 Willcom 1 0.7 99 65.1 au 41 27.0 140 92.1 au+willc 1 0.7 141 92.8 docomo 5 3.3 146 96.1 docomo+w 1 0.7 147 96.7 softbank 4 2.6 151 99.3 vodafone 1 0.7 152 100.0 Frequency Missing = 229 SAS システム 6 21:48 Monday, November 19, 2007 TABLE OF SEX BY JITAKU SEX JITAKU Frequency| Percent | Row Pct | Col Pct |G |J | Total ---------+--------+--------+ F | 36 | 73 | 109 | 11.15 | 22.60 | 33.75 | 33.03 | 66.97 | | 30.00 | 35.96 | ---------+--------+--------+ M | 84 | 130 | 214 | 26.01 | 40.25 | 66.25 | 39.25 | 60.75 | | 70.00 | 64.04 | ---------+--------+--------+ Total 120 203 323 37.15 62.85 100.00 Frequency Missing = 58 SAS システム 9 21:48 Monday, November 19, 2007 TABLE OF SEX BY CARRYER SEX CARRYER Frequency| Percent | Row Pct | Col Pct |DDIp |DoCoMo |J-PHONE |KDDI |No | Total ---------+--------+--------+--------+--------+--------+ F | 1 | 25 | 4 | 0 | 1 | 60 | 0.66 | 16.56 | 2.65 | 0.00 | 0.66 | 39.74 | 1.67 | 41.67 | 6.67 | 0.00 | 1.67 | | 50.00 | 41.67 | 44.44 | 0.00 | 20.00 | ---------+--------+--------+--------+--------+--------+ M | 1 | 35 | 5 | 1 | 4 | 91 | 0.66 | 23.18 | 3.31 | 0.66 | 2.65 | 60.26 | 1.10 | 38.46 | 5.49 | 1.10 | 4.40 | | 50.00 | 58.33 | 55.56 | 100.00 | 80.00 | ---------+--------+--------+--------+--------+--------+ Total 2 60 9 1 5 151 1.32 39.74 5.96 0.66 3.31 100.00 (Continued) SAS システム 11 21:48 Monday, November 19, 2007 TABLE OF SEX BY CARRYER SEX CARRYER Frequency| Percent | Row Pct | Col Pct |Vodafone|Willcom |au |au+willc|docomo | Total ---------+--------+--------+--------+--------+--------+ F | 9 | 1 | 14 | 1 | 1 | 60 | 5.96 | 0.66 | 9.27 | 0.66 | 0.66 | 39.74 | 15.00 | 1.67 | 23.33 | 1.67 | 1.67 | | 45.00 | 100.00 | 34.15 | 100.00 | 20.00 | ---------+--------+--------+--------+--------+--------+ M | 11 | 0 | 27 | 0 | 4 | 91 | 7.28 | 0.00 | 17.88 | 0.00 | 2.65 | 60.26 | 12.09 | 0.00 | 29.67 | 0.00 | 4.40 | | 55.00 | 0.00 | 65.85 | 0.00 | 80.00 | ---------+--------+--------+--------+--------+--------+ Total 20 1 41 1 5 151 13.25 0.66 27.15 0.66 3.31 100.00 (Continued) SAS システム 13 21:48 Monday, November 19, 2007 TABLE OF SEX BY CARRYER SEX CARRYER Frequency| Percent | Row Pct | Col Pct |docomo+w|softbank|vodafone| Total ---------+--------+--------+--------+ F | 0 | 3 | 0 | 60 | 0.00 | 1.99 | 0.00 | 39.74 | 0.00 | 5.00 | 0.00 | | 0.00 | 75.00 | 0.00 | ---------+--------+--------+--------+ M | 1 | 1 | 1 | 91 | 0.66 | 0.66 | 0.66 | 60.26 | 1.10 | 1.10 | 1.10 | | 100.00 | 25.00 | 100.00 | ---------+--------+--------+--------+ Total 1 4 1 151 0.66 2.65 0.66 100.00 Frequency Missing = 230 SAS システム 16 21:48 Monday, November 19, 2007 TABLE OF JITAKU BY CARRYER JITAKU CARRYER Frequency| Percent | Row Pct | Col Pct |DDIp |DoCoMo |J-PHONE |KDDI |No | Total ---------+--------+--------+--------+--------+--------+ G | 1 | 21 | 4 | 1 | 0 | 47 | 0.78 | 16.28 | 3.10 | 0.78 | 0.00 | 36.43 | 2.13 | 44.68 | 8.51 | 2.13 | 0.00 | | 100.00 | 41.18 | 44.44 | 100.00 | 0.00 | ---------+--------+--------+--------+--------+--------+ J | 0 | 30 | 5 | 0 | 4 | 82 | 0.00 | 23.26 | 3.88 | 0.00 | 3.10 | 63.57 | 0.00 | 36.59 | 6.10 | 0.00 | 4.88 | | 0.00 | 58.82 | 55.56 | 0.00 | 100.00 | ---------+--------+--------+--------+--------+--------+ Total 1 51 9 1 4 129 0.78 39.53 6.98 0.78 3.10 100.00 (Continued) SAS システム 18 21:48 Monday, November 19, 2007 TABLE OF JITAKU BY CARRYER JITAKU CARRYER Frequency| Percent | Row Pct | Col Pct |Vodafone|Willcom |au |au+willc|docomo | Total ---------+--------+--------+--------+--------+--------+ G | 4 | 0 | 12 | 0 | 2 | 47 | 3.10 | 0.00 | 9.30 | 0.00 | 1.55 | 36.43 | 8.51 | 0.00 | 25.53 | 0.00 | 4.26 | | 23.53 | . | 34.29 | 0.00 | 40.00 | ---------+--------+--------+--------+--------+--------+ J | 13 | 0 | 23 | 1 | 3 | 82 | 10.08 | 0.00 | 17.83 | 0.78 | 2.33 | 63.57 | 15.85 | 0.00 | 28.05 | 1.22 | 3.66 | | 76.47 | . | 65.71 | 100.00 | 60.00 | ---------+--------+--------+--------+--------+--------+ Total 17 0 35 1 5 129 13.18 0.00 27.13 0.78 3.88 100.00 (Continued) SAS システム 20 21:48 Monday, November 19, 2007 TABLE OF JITAKU BY CARRYER JITAKU CARRYER Frequency| Percent | Row Pct | Col Pct |docomo+w|softbank|vodafone| Total ---------+--------+--------+--------+ G | 1 | 1 | 0 | 47 | 0.78 | 0.78 | 0.00 | 36.43 | 2.13 | 2.13 | 0.00 | | 100.00 | 33.33 | 0.00 | ---------+--------+--------+--------+ J | 0 | 2 | 1 | 82 | 0.00 | 1.55 | 0.78 | 63.57 | 0.00 | 2.44 | 1.22 | | 0.00 | 66.67 | 100.00 | ---------+--------+--------+--------+ Total 1 3 1 129 0.78 2.33 0.78 100.00 Frequency Missing = 252
≪前略≫ if carryer="au+willc" then carryer="au+Willc"; if carryer="docomo" then carryer="DoCoMo"; if carryer="docomo+w" then carryer="DoCoMo+W"; if carryer="vodafone" then carryer="Vodafone"; ≪後略≫
≪前略≫ proc freq data=gakusei order=freq; : 頻度の高いもの順 tables sex jitaku carryer; : run; : : proc freq data=gakusei order=freq; : 頻度の高いもの順 tables sex*jitaku; : tables sex*carryer; : tables jitaku*carryer; : run; : ≪後略≫
/* Lesson 8-5 */ /* File Name = les0805.sas 06/14/07 */ data gakusei; infile 'all07be.prn' firstobs=2; input sex $ shintyou taijyuu kyoui jitaku $ kodukai carryer $ tsuuwa; proc format; : 階級を作る。class shintyou の意 value clshint low-<150=' -149' : 階級の定義 1 150-<160='150-159' : 2 160-<170='160-169' : 3 170-<180='170-179' : 4 180-high='180- ' : 5 other ='missing'; : 6 run; : proc print data=gakusei(obs=5); run; proc freq data=gakusei; : 頻度を算出 tables shintyou; : 一変量ごとに format shintyou clshint.; : 連続変量をグループ化することの指定 run; : : proc freq data=gakusei; : 頻度を算出 tables sex*shintyou; : 二変量の組合わせで format shintyou clshint.; : 連続変量をグループ化することの指定 run; : : proc sort data=gakusei; : 今までの方法で実現しようとすると by sex; : run; : proc freq data=gakusei; : tables shintyou; : format shintyou clshint.; : 連続変量をグループ化することの指定 by sex; : 性別ごとに run; :
SAS システム 2 21:48 Monday, November 19, 2007 Cumulative Cumulative SHINTYOU Frequency Percent Frequency Percent ------------------------------------------------------ -149 6 1.6 6 1.6 150-159 56 15.3 62 16.9 160-169 126 34.4 188 51.4 170-179 154 42.1 342 93.4 180- 24 6.6 366 100.0 Frequency Missing = 15 SAS システム 3 21:48 Monday, November 19, 2007 TABLE OF SEX BY SHINTYOU SEX SHINTYOU Frequency| Percent | Row Pct | Col Pct | -149 |150-159 |160-169 |170-179 |180- | Total ---------+--------+--------+--------+--------+--------+ F | 6 | 54 | 59 | 2 | 0 | 121 | 1.64 | 14.79 | 16.16 | 0.55 | 0.00 | 33.15 | 4.96 | 44.63 | 48.76 | 1.65 | 0.00 | | 100.00 | 96.43 | 47.20 | 1.30 | 0.00 | ---------+--------+--------+--------+--------+--------+ M | 0 | 2 | 66 | 152 | 24 | 244 | 0.00 | 0.55 | 18.08 | 41.64 | 6.58 | 66.85 | 0.00 | 0.82 | 27.05 | 62.30 | 9.84 | | 0.00 | 3.57 | 52.80 | 98.70 | 100.00 | ---------+--------+--------+--------+--------+--------+ Total 6 56 125 154 24 365 1.64 15.34 34.25 42.19 6.58 100.00 Frequency Missing = 16 SAS システム 6 21:48 Monday, November 19, 2007 ------------------------------- SEX=' ' -------------------------------- Cumulative Cumulative SHINTYOU Frequency Percent Frequency Percent ------------------------------------------------------ 160-169 1 100.0 1 100.0 Frequency Missing = 4 SAS システム 7 21:48 Monday, November 19, 2007 -------------------------------- SEX=F --------------------------------- Cumulative Cumulative SHINTYOU Frequency Percent Frequency Percent ------------------------------------------------------ -149 6 5.0 6 5.0 150-159 54 44.6 60 49.6 160-169 59 48.8 119 98.3 170-179 2 1.7 121 100.0 Frequency Missing = 7 SAS システム 8 21:48 Monday, November 19, 2007 -------------------------------- SEX=M --------------------------------- Cumulative Cumulative SHINTYOU Frequency Percent Frequency Percent ------------------------------------------------------ 150-159 2 0.8 2 0.8 160-169 66 27.0 68 27.9 170-179 152 62.3 220 90.2 180- 24 9.8 244 100.0 Frequency Missing = 4
data mon2007; infile 'd:\home\mon05d.csv' dlm=',' firstobs=2 truncover; missover dsd ; input No $ Univ : $30. SName : $40. Faculty : $50. Dept : $50. Center1 : $8. Center2 : $8. Sel1 : $8. Sel2 : $8. Book1 : $10. Book2 : $10. Vol0 VolS VolT ZenKou $ ScoreS ScoreT KoKouSi ;
data mon2007; infile 'd:\home\mon05e.txt' dlm='09'x firstobs=2 truncover;
data math; infile 'foo.dat' lrecl=230;
data math; infile 'foo.dat' lrecl=230 truncover;
input kamoku $ 2 kesseki $ 3 k_code $ 10-11 t_score 12-14 s_scor01 103-104 s_scor02 105-106 s_scor03 107-108 s_scor04 109-110 ;
data math; infile 'foo.dat' firstobs=4;