You want to eliminate duplicate values from a list, such as when you build the list from a file or from the output of another command. This recipe is equally applicable to removing duplicates as they occur in input and to removing duplicates from an array you’ve already populated.
Use a hash to record which items have been seen, then
keys
to extract them. You can use Perl’s
idea of truth to shorten and speed up your code.
%seen = (); @uniq = (); foreach $item (@list) { unless ($seen{$item}) { # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }
The question at the heart of the matter is “Have I seen this element before?” Hashes are ideally suited to such lookups. The first technique (Section 4.6.2.1) builds up the array of unique values as we go along, using a hash to record whether something is already in the array.
The second technique (Section 4.6.2.2) is the
most natural way to write this sort of thing in Perl. It creates a
new entry in the hash every time it sees an element that hasn’t
been seen before, using the ++
operator. This has
the side effect of making the hash record the number of times the
element was seen. This time we only use the hash for its property of
working like a set.
The third example (Section 4.6.2.3) is similar to the second but rather than storing the item away, we call some user-defined function with that item as its argument. If that’s all we’re doing, keeping a spare array of those unique values is unnecessary.
The next mechanism (Section 4.6.2.4) waits until
it’s done processing the list to extract the unique keys from
the %seen
hash. This may be convenient, but the
original order has been lost.
The final approach, (Section 4.6.2.5) merges the
construction of the %seen
hash with the extraction
of unique elements. This preserves the original order of elements.
Using a hash to record the values has two side effects: processing
long lists can take a lot of memory and the list returned by
keys
is not in alphabetical, numeric, or insertion
order.
Here’s an example of processing input as it is read. We use
`who`
to gather information on the
current user list, and then we extract the username from each line
before updating the hash:
# generate a list of users logged in, removing duplicates %ucnt = (); for (`who`) { s/\s.*\n//; # kill from first space till end-of-line, yielding username $ucnt{$_}++; # record the presence of this user } # extract and print unique keys @users = sort keys %ucnt; print "users logged in: @users\n";
The “Foreach Loops” section of perlsyn
(1) and Chapter 2 of Programming Perl
; the keys
function in
perlfunc
(1) and Chapter 3 of
Programming Perl; the “Hashes
(Associative Arrays)” section of Chapter 2 of
Programming Perl; Chapter 5; we use hashes in a similar fashion in Section 4.7 and Section 4.8
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.