the r book tables-－金锄头文库

6TablesThe alternative to using graphics is to summarize your data in tabular form. Broadly speaking, if you want to convey detail use a table, and if you want to show effects then use graphics. You are more likely to want to use a table to summarize data when your explanatory variables are categorical (such as peoples names, or different commodities) than when they are continuous (in which case a scatterplot is likely to be more informative; see p. 189). There are two very important functions that you need to distinguish:rtable for counting things; rtapply for averaging things, and applying other functions across factor levels.6.1Tables of countsThe table function is perhaps the most useful of all the simple vector functions, because it does so much work behind the scenes. We have a vector of objects (they could be numbers or character strings) and we want to know how many of each is present in the vector. Here are 1000 integers from a Poisson distribution with mean 0.6:counts-rpois(1000,0.6)We want to count up all of the zeros, ones, twos, and so on. A big task, but here is the table function in action:table(counts)counts 012345 539 325 1102411There were 539 zeros, 325 ones, 110 twos, 24 threes, 1 four, 1 five and nothing larger than 5. That is a lot of work (imagine tallying them for yourself). The function works for characters as well as for numbers, and forThe R Book, Second Edition. Michael J. Crawley. 2013 John Wiley this variable has fivelevels: vulgaris, kochii, splendens, viridis and knowlesii. Note that there was no header row in the data file, so the variable name parasite had to be added subsequently, using names:data-read.table(“c:tempparasites.txt“) names(data)-“parasite“ attach(data) head(data)parasite 1vulgaris 2 splendens 3 knowlesii 4vulgaris 5 knowlesii 6viridislevels(parasite)1 “knowlesii“ “kochii“splendens“ “viridis“vulgaris“In our modelling we want to create a two-level dummy variable (present or absent) for each parasite species(in five extra columns), so that we can ask questions such as whether the mean value of the responsevariable is significantly different in cases where each parasite was present and when it was absent. So forthe first row of the dataframe, we want vulgaris = TRUE, knowlesii=FALSE, kochii=FALSE, splendens=FALSE and viridis=FALSE. The long-winded way of doing this is to create a new factor for each species separately:vulgaris-factor(1*(parasite=“vulgaris“) kochii-factor(1*(parasite=“kochii“) table(vulgaris)vulgaris 01 99 52table(kochii)kochii 01 13417and so on, with 1 for TRUE (meaning present) and 0 for FALSE (meaning absent). This is how easy it is to do with model.matrix:model.matrix(parasite-1)parasiteknowlesii parasitekochii parasitesplendens parasiteviridis parasitevulgaris 100001 200100256THE R BOOK310000 400001 510000 600010 . etc. down to . 14710000 14800010 14900001 15001000 15100100attr(,“assign“) 1 1 1 1 1 1 attr(,“contrasts“) attr(,“contrasts“)$parasite 1 “contr.treatment“The -1 in the model formula ensures that we create a dummy variable for each of the five parasite species(technically, it suppresses the creation of an intercept). Now we can join these five columns of dummy variables to the dataframe containing the response variable and the other explanatory variables. Suppose we had an original.frame. We just join the new columns to it,new.frame-data.frame(original.frame, model.matrix(parasite-1) attach(new.frame)after which we can use variable names like parasiteknowlesii in statistical modelling.6.9ComparingtableandtabulateYouwilloftenwanttocounthowmanytimesdifferentvaluesarerepresentedinavector.Thissimpleexample illustrates the difference between the two functions. Here is table in action:table(c(2,2,2,7,7,11)27 11 321It produces names for each element in the vector (2, 7, 11), and counts only those elements that are present (e.g. there are no zeros or ones in the output vector). The tabulate function counts all of the integers (turning real numbers into the nearest integer if necessary), starting at 1 and ending at the maximum (11 in this case), putting a zero in the resulting vector for every missing integer, like this:tabulate(c(2,2,2,7,7,11)1 0 3 0 0 0 0 2 0 0 0 1Because there are no 1s in our example, a count of zero is returned for the first element. There are three 2s but then a long gap to two 7s, then another gap to the maximum 11. It is important that you understand that tabulate will ignore negative numbers and zeros without warning:tabulate(c(2,0,-3,2,2,7,-1, 0,0,7,11)1 0 3 0 0 0 0 2 0 0 0 1TABLES257For most applications, table is much more useful than tabulate, but there are occasions when you want the zero counts to be retained. The commonest case is where you are generating a set of vectors, and you want all the vectors to be the same length (e.g. so that you can bind them to a dataframe). Suppose, for instance, that you want to make a dataframe containing three different realizations of