You're probably familiar with SQL joins, which are used to combine data from more than one table. This post shows how you can do the same in kdb.
I'll use the following two tables to demonstrate joins in kdb:
q) show age:([] name:`alice`charlie`dave`frank; age:25 26 31 33) name age ----------- alice 25 charlie 26 dave 31 frank 33 q) show work:([] name:`alice`bob`charlie`eve; dept:`ops`dev`hr`eng) name dept ------------ alice ops bob dev charlie hr eve dev
Now let's join these tables on the name
field using the different kinds of kdb joins:
Left Join (lj
):
In a left join, all the rows of the left table are preserved, and those from the right table are only combined if the values in the key columns match those in the left table. If there are values in the left table that don't exist in the right, the joined columns will contain nulls.
q) age lj `name xkey work name age dept ---------------- alice 25 ops charlie 26 hr dave 31 frank 33
Union Join (uj
):
A union join (referred to as a full outer join in SQL) returns rows from both tables, regardless of whether there is a match. If there is a match, it returns combined rows, and if there isn't, the missing side will contain nulls.
q) (`name xkey age) uj `name xkey work name | age dept -------| -------- alice | 25 ops charlie| 26 hr dave | 31 frank | 33 bob | dev eve | eng
Inner Join (ij
):
An inner join returns only those rows where there is a match in both tables.
q) age ij `name xkey work name age dept ---------------- alice 25 ops charlie 26 hr
Equi Join (ej
):
This is an inner join in which you can specify the list of columns to join on.
q) ej[`name;age;work] name age dept ---------------- alice 25 ops charlie 26 hr
Plus Join (pj
):
This is a left join, which also sums the values of common columns (other than key columns). This type of join doesn't exist in SQL.
q) t1:([] name:`alice`charlie`dave`frank; x:1 2 3 4) q) t2:([] name:`alice`bob`charlie`eve; x:10 10 10 10) q) t1 pj `name xkey t2 name x ---------- alice 11 charlie 12 dave 3 frank 4
Cross Join (cross
):
This is a "cartesian product". It joins everything to everything and isn't normally the kind of join people are looking for. Each row is a combination of the rows of the first and second table, so can be a dangerous join to run against large tables.
q) `name xasc work cross age name dept age ---------------- alice ops 25 alice dev 25 alice hr 25 alice eng 25 charlie ops 26 charlie dev 26 charlie hr 26 charlie eng 26 dave ops 31 dave dev 31 dave hr 31 dave eng 31 frank ops 33 frank dev 33 frank hr 33 frank eng 33