join command is a useful tool for joining two files on a common field. It allows you to join two files, similar to the way you would join two tables in a SQL database.
The following example illustrates the power of the join command. You have two files, one containing a list of employees with their department ids and the other containing departments and their ids. You want to find out the names of the departments for each employee. You MUST first sort the files on the department id column (using the sort command) and then join them on that column.
$ cat employees.txt Jones,33 Steinberg,33 Robinson,34 Smith,34 Rafferty,31 John, $ cat departments.txt 31,Sales 33,Engineering 34,Clerical 35,Marketing $ join -a 1 -t, -1 2 -2 1 -o 1.1 2.2 <(sort -t, -k2 employees.txt) <(sort -t, -k1 departments.txt) John, Rafferty,Sales Jones,Engineering Steinberg,Engineering Robinson,Clerical Smith,ClericalJoining on multiple columns
The
join command joins on a single field. What do you do if you want to join on multiple fields? You create a composite field by combining the multiple fields together! This can be done using awk. For example:
$ cat employees2.txt
Jones,33,50
Steinberg,33,51
Robinson,34,50
Smith,34,50
Rafferty,31,51
$ awk -F, '{print $2"_"$3","$0}' employees2.txt
33_50,Jones,33,50
33_51,Steinberg,33,51
34_50,Robinson,34,50
34_50,Smith,34,50
31_51,Rafferty,31,51
As you can see, an additional field has been created by concatenating the second and third fields of the file. Now you can join the files on the new composite field.
(File data courtesy of Wikipedia.)