Saturday, April 23, 2016

kdb+/q - Joins

You're probably familiar with SQL joins, which are used to combine data from more than one table. This post shows how you can do the same in kdb.

I'll use the following two tables to demonstrate joins in kdb:

q) show age:([] name:`alice`charlie`dave`frank; age:25 26 31 33)
name    age
alice   25
charlie 26
dave    31
frank   33

q) show work:([] name:`alice`bob`charlie`eve; dept:`ops`dev`hr`eng)
name    dept
alice   ops
bob     dev
charlie hr
eve     dev

Now let's join these tables on the name field using the different kinds of kdb joins:

Left Join (lj):

In a left join, all the rows of the left table are preserved, and those from the right table are only combined if the values in the key columns match those in the left table. If there are values in the left table that don't exist in the right, the joined columns will contain nulls.

q) age lj `name xkey work
name    age dept
alice   25  ops
charlie 26  hr
dave    31
frank   33

Union Join (uj):

A union join (referred to as a full outer join in SQL) returns rows from both tables, regardless of whether there is a match. If there is a match, it returns combined rows, and if there isn't, the missing side will contain nulls.

q) (`name xkey age) uj `name xkey work
name   | age dept
-------| --------
alice  | 25  ops
charlie| 26  hr
dave   | 31
frank  | 33
bob    |     dev
eve    |     eng

Inner Join (ij):

An inner join returns only those rows where there is a match in both tables.

q) age ij `name xkey work
name    age dept
alice   25  ops
charlie 26  hr

Equi Join (ej):

This is an inner join in which you can specify the list of columns to join on.

q) ej[`name;age;work]
name    age dept
alice   25  ops
charlie 26  hr

Plus Join (pj):

This is a left join, which also sums the values of common columns (other than key columns). This type of join doesn't exist in SQL.

q) t1:([] name:`alice`charlie`dave`frank; x:1 2 3 4)
q) t2:([] name:`alice`bob`charlie`eve; x:10 10 10 10)
q) t1 pj `name xkey t2
name    x
alice   11
charlie 12
dave    3
frank   4

Cross Join (cross):

This is a "cartesian product". It joins everything to everything and isn't normally the kind of join people are looking for. Each row is a combination of the rows of the first and second table, so can be a dangerous join to run against large tables.

q) `name xasc work cross age
name    dept age
alice   ops  25
alice   dev  25
alice   hr   25
alice   eng  25
charlie ops  26
charlie dev  26
charlie hr   26
charlie eng  26
dave    ops  31
dave    dev  31
dave    hr   31
dave    eng  31
frank   ops  33
frank   dev  33
frank   hr   33
frank   eng  33

No comments:

Post a Comment