|
|
[ Pobierz całość w formacie PDF ]
are as many parameters as cases. It s important to understand the coding here, so look at the X-matrix. > model.matrix(g) (Intercept) h d l b j f n a i e m c k g o 1 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 ...etc... 16.6. FACTORIAL EXPERIMENTS 193 We see that + is coded as 0 and - is coded as 1. This unnatural ordering is because of their order in the ASCII alphabet. We don t have any degrees of freedom so we can t make the usual F-tests. We need a different method. Suppose there were no significant effects and the errors are normally distributed. The estimated effects would then just be linear combinations of the errors and hence normal. We now make a normal quantile plot of the main effects with the idea that outliers represent significant effects. Theqqnorm()function is not suitable because we want to label the points. > coef > i > plot(qnorm(1:15/16),coef[i],type="n",xlab="Normal Quantiles", ylab="Effects") > text(qnorm(1:15/16),coef[i],names(coef)[i]) g e b j no g iml kahd f c c f k bdha i m l e jon -1.5 -0.5 0.5 1.5 0.0 0.5 1.0 1.5 Normal Quantiles Half-Normal Quantiles Figure 16.10: Fractional Factorial analysis See Figure 16.6. Notice that e and possibly g are extreme. Since the e effect is negative, the + level of e increases the response. Since shrinkage is a bad thing, increasing the response is not good so we d prefer what ever wire braid type corresponds to the - level of e. The same reasoning for g leads us to expect that a larger (assuming that is +) would decrease shrinkage. A half-normal plot is better for detecting extreme points. This plots the sorted absolute values against 1 ¦ n i 2n 1 . Thus it compares the absolute values of the data against the upper half of a normal distribution. We don t particularly care if the coefficients are not normally distributed, it s just the extreme cases we want to detect. Because the half-normal folds over the ends of a QQ plot it doubles our resolution for the detection of outliers. > coef > i > plot(qnorm(16:30/31),coef[i],type="n",xlab="Half-Normal Quantiles", Effects Effects -0.2 -0.1 0.0 0.1 0.00 0.10 0.20 16.6. FACTORIAL EXPERIMENTS 194 ylab="Effects") > text(qnorm(16:30/31),coef[i],names(coef)[i]) We might now conduct another experiment focusing on the effect of e and g . Appendix A Recommended Books A.1 Books on R There are currently no books written specifically for R , although several guides can be downloaded from the R web site. R is very similar to S-PLUS so most material on S-PLUS applies immediately to R . I highly recom- mend Venables and Ripley (1999). Alternative introductory books are Spector (1994) and Krause and Olson (2000). You may also find Becker, Chambers, and Wilks (1998) and Chambers and Hastie (1991), useful references to the S language. Ripley and Venables (2000) is a more advanced test on programming in S or R . A.2 Books on Regression and Anova There are many books on regression analysis. Weisberg (1985) is a very readable book while Sen and Srivastava (1990) contains more theoretical content. Draper and Smith (1998) is another well-known book. One popular textbook is Kutner, Nachtschiem, Wasserman, and Neter (1996). This book has everything spelled out in great detail and will certainly strengthen your biceps (1400 pages) if not your knowledge of regression. 195 Appendix B R functions and data This book uses some functions and data that are not part of base R . You may wish to download these functions from the R web site. The additional packages used are MASS leaps xgobi ellipse nlme The MASS functions are part of the VR package that comes with the book Venables and Ripley (1999). The xgobi data visualization application will also need to be installed. This is an X-windows application but it is possible with some work to make it work under Windows. See the documentation that comes with the package for details. This is not essential so don t sweat it if you can t install it. In addition, you will need the splines, mva and lqs packages but these come with basic R installation so no extra work is necessary. I have packaged the data and functions that I have used in this book as an R package that you may obtain from my web site www.stat.lsa.umich.edu/Üfaraway. The functions available are halfnorm Half normal plot Cpplot Cp plot qqnorml Case-labeled Q-Q plot maxadjr Models with maximum adjusted RÆ2 vif Variance Inflation factors prplot Partial residual plot These are very simple functions and no documentation has been written for them. In addition the following datasets are used: breaking Breaking strengths of material by day, supplier, operator cathedral Cathedral nave heights and lengths in England chicago Chicago insurance redlining chiczip Chicago zip codes north/south chmiss Chicago data with some missing values coagulation Blood coagulation times by diet 196 APPENDIX B. R FUNCTIONS AND DATA 197 corrosion Corrosion loss in Cu-Ni alloys eco Ecological regression example gala Species diversity on the Galapagos Islands odor Odor of chemical by production settings penicillin Penicillin yields by block and treatment rabbit Rabbit weight gain by diet and litter rats Rat survival times by treatment and poison savings Savings rates in 50 countries speedo Speedometer cable shrinkage star Star light intensities and temperatures strongx Strong interaction experiment data twins Twin IQs from Burt Again no documentation has been written other than that found in the text. Where add-on libraries are needed in the text, you will find the appropriate library() command. However, I have assumed that thefarawaylibrary is always loaded. Appendix C Quick introduction to R C.1 Reading the data in The first step is to read the data in. You can use theread.table()orscan()function to read data in from outside R . You can also use thedata()function to access data already available within R . > data(stackloss) > stackloss Air.Flow Water.Temp Acid.Conc. stack.loss 1 80 27 89 42 2 80 27 88 37 ... stuff deleted ... 21 70 20 91 15 Type > help(stackloss) We can check the dimension of the data: > dim(stackloss) [1] 21 4 C.2 Numerical Summaries One easy way to get the basic numerical summaries is: > summary(stackloss) Air.Flow Water.Temp Acid.Conc. stack.loss Min. :50.0 Min. :17.0 Min. :72.0 Min. : 7.0 1st Qu.:56.0 1st Qu.:18.0 1st Qu.:82.0 1st Qu.:11.0 Median :58.0 Median :20.0 Median :87.0 Median :15.0 Mean :60.4 Mean :21.1 Mean :86.3 Mean :17.5 3rd Qu.:62.0 3rd Qu.:24.0 3rd Qu.:89.0 3rd Qu.:19.0 Max. :80.0 Max. :27.0 Max. :93.0 Max. :42.0 198 C.2. NUMERICAL SUMMARIES 199 We can compute these numbers seperately also: > stackloss$Air.Flow [1] 80 80 75 62 62 62 62 62 58 58 58 58 58 58 50 50 50 50 50 56 70 > mean(stackloss$Ai) [1] 60.429 > median(stackloss$Ai) [1] 58 > range(stackloss$Ai) [1] 50 80 > quantile(stackloss$Ai) 0% 25% 50% 75% 100% 50 56 58 62 80 We can get the variance and sd: > var(stackloss$Ai) [1] 84.057 > sqrt(var(stackloss$Ai)) [1] 9.1683 We can write a function to compute sd s: > sd > sd(stackloss$Ai) [1] 9.1683 We might also want the correlations: > cor(stackloss) Air.Flow Water.Temp Acid.Conc. stack.loss Air.Flow 1.00000 0.78185 0.50014 0.91966 Water.Temp 0.78185 1.00000 0.39094 0.87550 Acid.Conc. 0.50014 0.39094 1.00000 0.39983 stack.loss 0.91966 0.87550 0.39983 1.00000 Another numerical summary with a graphical element is the stem plot: > stem(stackloss$Ai) The decimal point is 1 digit(s) to the right of the | 5 | 000006888888 6 | 22222 7 | 05 8 | 00 C.3. GRAPHICAL SUMMARIES 200 C.3 Graphical Summaries
[ Pobierz całość w formacie PDF ] zanotowane.pldoc.pisz.plpdf.pisz.plkwiatpolny.htw.pl
|
|
Cytat |
Dobre pomysły nie mają przeszłości, mają tylko przyszłość. Robert Mallet De minimis - o najmniejszych rzeczach. Dobroć jest ważniejsza niż mądrość, a uznanie tej prawdy to pierwszy krok do mądrości. Theodore Isaac Rubin Dobro to tylko to, co szlachetne, zło to tylko to, co haniebne. Dla człowieka nie tylko świat otaczający jest zagadką; jest on nią sam dla siebie. I z obu tajemnic bardziej dręczącą wydaje się ta druga. Antoni Kępiński (1918-1972)
|
|