join
command is a useful tool for joining two files on a common field. It allows you to join two files, similar to the way you would join two tables in a SQL database.
The following example illustrates the power of the join
command. You have two files, one containing a list of employees with their department ids and the other containing departments and their ids. You want to find out the names of the departments for each employee. You MUST first sort the files on the department id column (using the sort
command) and then join them on that column.
$ cat employees.txt Jones,33 Steinberg,33 Robinson,34 Smith,34 Rafferty,31 John, $ cat departments.txt 31,Sales 33,Engineering 34,Clerical 35,Marketing $ join -a 1 -t, -1 2 -2 1 -o 1.1 2.2 <(sort -t, -k2 employees.txt) <(sort -t, -k1 departments.txt) John, Rafferty,Sales Jones,Engineering Steinberg,Engineering Robinson,Clerical Smith,ClericalJoining on multiple columns
The
join
command joins on a single field. What do you do if you want to join on multiple fields? You create a composite field by combining the multiple fields together! This can be done using awk
. For example:
$ cat employees2.txt Jones,33,50 Steinberg,33,51 Robinson,34,50 Smith,34,50 Rafferty,31,51 $ awk -F, '{print $2"_"$3","$0}' employees2.txt 33_50,Jones,33,50 33_51,Steinberg,33,51 34_50,Robinson,34,50 34_50,Smith,34,50 31_51,Rafferty,31,51As you can see, an additional field has been created by concatenating the second and third fields of the file. Now you can
join
the files on the new composite field.
(File data courtesy of Wikipedia.)