This post shows how you can convert a keyed table to a hierarchical tree format in kdb+/q. This could be useful if you want to display data as a tree widget in a front-end.
Consider the following keyed table of world populations:
continent country city | population
---------------------------------------------| ----------
North America United States New York City | 8550405
North America United States Los Angeles | 3971883
North America Mexico Mexico City | 8918653
Europe United Kingdom London | 9126366
Europe Russia Moscow | 12195221
Europe Russia Saint Petersburg| 5383890
Africa Nigeria Lagos | 14862000
Africa Egypt Cairo | 9908788
Africa Egypt Giza | 8800000
Asia China Shanghai | 22315474
Asia India Mumbai | 12691836
Asia China Beijing | 11716620
We would like to display it as a tree of continent > country > city, as shown below (similar to a pivot table in Excel):
node | population
----------------------------| ----------
Total | 128441136
Asia | 46723930
China | 34032094
Shanghai | 22315474
Beijing | 11716620
India | 12691836
Mumbai | 12691836
Africa | 33570788
Egypt | 18708788
Cairo | 9908788
Giza | 8800000
Nigeria | 14862000
Lagos | 14862000
Europe | 26705477
Russia | 17579111
Moscow | 12195221
Saint Petersburg| 5383890
United Kingdom | 9126366
London | 9126366
North America | 21440941
United States | 12522288
New York City | 8550405
Los Angeles | 3971883
Mexico | 8918653
Mexico City | 8918653
In order to achieve this, we need to aggregate the data with different groupings, then combine the resultant tables and format it into a tree.
1. Grouping the data
First, we will add a dummy Total
column to the table and then aggregate the table with the following groupings:
- Total
- Total, continent
- Total, continent, country
- Total, continent, country, city
The code for this is shown below:
// add Total column to the table. (td is a keyed table)
td:(`Total,keys[td]) xkey update Total:`Total from td;
keyCols:keys td;
// create a list of groupings
groupings:(1+til count keyCols) sublist\: keyCols;
// aggregate the table with each grouping
// this gives us a list of keyed tables (one per grouping)
tds:?[td;();;c!(sum;)each c:cols value td] each {x!x} each groupings;
// this step is optional but it's nice to sort each table on population
tds:`population xdesc'tds;
2. Joining the data
Next, we need to join the tables that were obtained as a result of the groupings. We do this by unkeying the tables and then using uj
:
td:keyCols xkey (uj/) 0!'tds;
3. Formatting the data
Now let's add a Path column by concatenating the key columns:
td:![td;();0b;enlist[`Path]!enlist(`$sv';">";(string;(each;{x except `};(flip;enlist,keyCols))))];
td:(`Path,keyCols) xkey td;
This is what our tree looks like so far:
Path Total continent country city | population
---------------------------------------------------------------------------------------------------| ----------
Total Total | 128441136
Total>Asia Total Asia | 46723930
Total>Africa Total Africa | 33570788
Total>Europe Total Europe | 26705477
Total>North America Total North America | 21440941
Total>Asia>China Total Asia China | 34032094
Total>Africa>Egypt Total Africa Egypt | 18708788
Total>Europe>Russia Total Europe Russia | 17579111
Total>Africa>Nigeria Total Africa Nigeria | 14862000
Total>Asia>India Total Asia India | 12691836
Total>North America>United States Total North America United States | 12522288
Total>Europe>United Kingdom Total Europe United Kingdom | 9126366
Total>North America>Mexico Total North America Mexico | 8918653
Total>Asia>China>Shanghai Total Asia China Shanghai | 22315474
Total>Africa>Nigeria>Lagos Total Africa Nigeria Lagos | 14862000
Total>Asia>India>Mumbai Total Asia India Mumbai | 12691836
Total>Europe>Russia>Moscow Total Europe Russia Moscow | 12195221
Total>Asia>China>Beijing Total Asia China Beijing | 11716620
Total>Africa>Egypt>Cairo Total Africa Egypt Cairo | 9908788
Total>Europe>United Kingdom>London Total Europe United Kingdom London | 9126366
Total>North America>Mexico>Mexico City Total North America Mexico Mexico City | 8918653
Total>Africa>Egypt>Giza Total Africa Egypt Giza | 8800000
Total>North America>United States>New York City Total North America United States New York City | 8550405
Total>Europe>Russia>Saint Petersburg Total Europe Russia Saint Petersburg| 5383890
Total>North America>United States>Los Angeles Total North America United States Los Angeles | 3971883
4. Reordering the rows
The tree looks okay so far and you can stop there if you want but it would look better if child nodes were directly under their parents e.g. Shanghai should appear under China. In order to do this, we cannot simply use uj
to combine our tables but we need to use the Over (/) accumulator to build the tree instead.
In order to get the row ordering correct, we add an id to each row, which will be a combination of the parent id and the row id. These id's look like this: 0, 0.0, 0.1, 0.1.1 etc. and will be used to sort the tree so that children appear under their parents.
Here is the final version of the code:
// Converts a table into a tree.
// @param td - a keyed table
// @param sortCol - the column to sort on
// @returns a table with a tree column
table2tree:{[td;sortCol]
// add Total column to the table
td:(`Total,keys[td]) xkey update Total:`Total from td;
keyCols:keys td;
// create a list of groupings
groupings:(1+til count keyCols) sublist\: keyCols;
// aggregate the table with each grouping
// this gives us a list of keyed tables (one per grouping)
tds:?[td;();;c!(sum;)each c:cols value td] each {x!x} each groupings;
// sort the tables
if[not null sortCol;tds:sortCol xdesc'tds];
// initial tree only has the Total row
tree:update id:"0",node:enlist "Total" from 0!first tds;
// build the tree using the over accumulator
tree:{[tree;td]
keyCols:keys td;
// join the parent id to the current table
td:td lj k xkey ?[tree;();0b;{x!x}(k:-1_keyCols),`id];
// update the id by concatenating the parent id to the row id
// we need to left-pad the row id so that sorting works correctly
// e.g. 1.3 should come before 1.10
td:update id:`$"."sv'flip(string id;(-1*count string count td)$string i) from td;
// add a node column which corresponds to the value of the last key column
td:![td;();0b;enlist[`node]!enlist last keyCols];
// add indentation to the node based on the depth (i.e. number of key columns)
indentation:(4*-1+count keyCols)#" ";
td:update node:(indentation,/:string node) from td;
// now add the table to tree
tree uj 0!td
}/[tree;1_tds];
// sort the tree on id
(`node,keyCols) xkey `id xasc tree}
This is what our final tree looks like:
q) data:3!("SSSI";enlist",") 0: `$"population.csv";
q) select node,population from table2tree[data;`population]
node population
-----------------------------------------
Total 128441136
Asia 46723930
China 34032094
Shanghai 22315474
Beijing 11716620
India 12691836
Mumbai 12691836
Africa 33570788
Egypt 18708788
Cairo 9908788
Giza 8800000
Nigeria 14862000
Lagos 14862000
Europe 26705477
Russia 17579111
Moscow 12195221
Saint Petersburg 5383890
United Kingdom 9126366
London 9126366
North America 21440941
United States 12522288
New York City 8550405
Los Angeles 3971883
Mexico 8918653
Mexico City 8918653
I also played around with adding lines to connect nodes of the tree but it got complicated very fast!
Can you think of a better way to do this? Let me know in the comments below!