Sets: union, intersect and setdiff
There are three essential functions for manipulating sets. The principles are easy to see if we work with an example of two sets:
setA<-c("a", "b", "c", "d", "e") setB<-c("d", "e", "f", "g")
Make a mental note of what the two sets have in common, and what is unique to each.
The union of two sets is everything in the two sets taken together, but counting elements only once that are common to both sets:
union(setA,setB)
[1] "a" "b" "c" "d" "e" "f" "g"
The intersection of two sets is the material that they have in common:
intersect(setA,setB)
[1] "d" "e"
Note, however, that the difference between two sets is order-dependent. It is the material that is in the first named set, that is not in the second named set. Thus setdiff(A,B) gives a different answer than setdiff(B,A). For our example,
setdiff(setA,setB) [1] "a" "b" "c" setdiff(setB,setA) [1] "f" "g"
Thus, it should be the case that setdiff(setA,setB) plus intersect(setA,setB) plus setdiff(setB,setA) is the same as the union of the two sets. Let's check:
all(c(setdiff(setA,setB),intersect(setA,setB),setdiff(setB,setA))== union(setA,setB)) [1] TRUE
There is also a built-in function setequal for testing if two sets are equal
setequal(c(setdiff(setA,setB),intersect(setA,setB),setdiff(setB,setA)), union(setA,setB)) [1] TRUE
You can use %in% for comparing sets. The result is a logical vector whose length matches the vector on the left
setA %in% setB [1] FALSE FALSE FALSE TRUE TRUE setB ...
Get The R Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.