Sets: union, intersect and setdiff

There are three essential functions for manipulating sets. The principles are easy to see if we work with an example of two sets:

setA<-c("a", "b", "c", "d", "e")
setB<-c("d", "e", "f", "g")

Make a mental note of what the two sets have in common, and what is unique to each.

The union of two sets is everything in the two sets taken together, but counting elements only once that are common to both sets:

union(setA,setB)

[1] "a"  "b"  "c"  "d"  "e"  "f"  "g"

The intersection of two sets is the material that they have in common:

intersect(setA,setB)

[1] "d" "e"

Note, however, that the difference between two sets is order-dependent. It is the material that is in the first named set, that is not in the second named set. Thus setdiff(A,B) gives a different answer than setdiff(B,A). For our example,

setdiff(setA,setB)

[1] "a" "b" "c"

setdiff(setB,setA)

[1] "f" "g"

Thus, it should be the case that setdiff(setA,setB) plus intersect(setA,setB) plus setdiff(setB,setA) is the same as the union of the two sets. Let's check:

all(c(setdiff(setA,setB),intersect(setA,setB),setdiff(setB,setA))==
   union(setA,setB))

[1]   TRUE

There is also a built-in function setequal for testing if two sets are equal

setequal(c(setdiff(setA,setB),intersect(setA,setB),setdiff(setB,setA)),
   union(setA,setB))

[1] TRUE

You can use %in% for comparing sets. The result is a logical vector whose length matches the vector on the left

setA %in% setB

[1] FALSE FALSE FALSE TRUE TRUE

setB ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.