This post shows how you can convert a keyed table to a hierarchical tree format in kdb+/q. This could be useful if you want to display data as a tree widget in a front-end.
Consider the following keyed table of world populations:
continent country city | population ---------------------------------------------| ---------- North America United States New York City | 8550405 North America United States Los Angeles | 3971883 North America Mexico Mexico City | 8918653 Europe United Kingdom London | 9126366 Europe Russia Moscow | 12195221 Europe Russia Saint Petersburg| 5383890 Africa Nigeria Lagos | 14862000 Africa Egypt Cairo | 9908788 Africa Egypt Giza | 8800000 Asia China Shanghai | 22315474 Asia India Mumbai | 12691836 Asia China Beijing | 11716620
We would like to display it as a tree of continent > country > city, as shown below (similar to a pivot table in Excel):
node | population ----------------------------| ---------- Total | 128441136 Asia | 46723930 China | 34032094 Shanghai | 22315474 Beijing | 11716620 India | 12691836 Mumbai | 12691836 Africa | 33570788 Egypt | 18708788 Cairo | 9908788 Giza | 8800000 Nigeria | 14862000 Lagos | 14862000 Europe | 26705477 Russia | 17579111 Moscow | 12195221 Saint Petersburg| 5383890 United Kingdom | 9126366 London | 9126366 North America | 21440941 United States | 12522288 New York City | 8550405 Los Angeles | 3971883 Mexico | 8918653 Mexico City | 8918653
In order to achieve this, we need to aggregate the data with different groupings, then combine the resultant tables and format it into a tree.
1. Grouping the data
First, we will add a dummy Total
column to the table and then aggregate the table with the following groupings:
- Total
- Total, continent
- Total, continent, country
- Total, continent, country, city
The code for this is shown below:
// add Total column to the table. (td is a keyed table) td:(`Total,keys[td]) xkey update Total:`Total from td; keyCols:keys td; // create a list of groupings groupings:(1+til count keyCols) sublist\: keyCols; // aggregate the table with each grouping // this gives us a list of keyed tables (one per grouping) tds:?[td;();;c!(sum;)each c:cols value td] each {x!x} each groupings; // this step is optional but it's nice to sort each table on population tds:`population xdesc'tds;
2. Joining the data
Next, we need to join the tables that were obtained as a result of the groupings. We do this by unkeying the tables and then using uj
:
td:keyCols xkey (uj/) 0!'tds;
3. Formatting the data
Now let's add a Path column by concatenating the key columns:
td:![td;();0b;enlist[`Path]!enlist(`$sv';">";(string;(each;{x except `};(flip;enlist,keyCols))))]; td:(`Path,keyCols) xkey td;
This is what our tree looks like so far:
Path Total continent country city | population ---------------------------------------------------------------------------------------------------| ---------- Total Total | 128441136 Total>Asia Total Asia | 46723930 Total>Africa Total Africa | 33570788 Total>Europe Total Europe | 26705477 Total>North America Total North America | 21440941 Total>Asia>China Total Asia China | 34032094 Total>Africa>Egypt Total Africa Egypt | 18708788 Total>Europe>Russia Total Europe Russia | 17579111 Total>Africa>Nigeria Total Africa Nigeria | 14862000 Total>Asia>India Total Asia India | 12691836 Total>North America>United States Total North America United States | 12522288 Total>Europe>United Kingdom Total Europe United Kingdom | 9126366 Total>North America>Mexico Total North America Mexico | 8918653 Total>Asia>China>Shanghai Total Asia China Shanghai | 22315474 Total>Africa>Nigeria>Lagos Total Africa Nigeria Lagos | 14862000 Total>Asia>India>Mumbai Total Asia India Mumbai | 12691836 Total>Europe>Russia>Moscow Total Europe Russia Moscow | 12195221 Total>Asia>China>Beijing Total Asia China Beijing | 11716620 Total>Africa>Egypt>Cairo Total Africa Egypt Cairo | 9908788 Total>Europe>United Kingdom>London Total Europe United Kingdom London | 9126366 Total>North America>Mexico>Mexico City Total North America Mexico Mexico City | 8918653 Total>Africa>Egypt>Giza Total Africa Egypt Giza | 8800000 Total>North America>United States>New York City Total North America United States New York City | 8550405 Total>Europe>Russia>Saint Petersburg Total Europe Russia Saint Petersburg| 5383890 Total>North America>United States>Los Angeles Total North America United States Los Angeles | 3971883
4. Reordering the rows
The tree looks okay so far and you can stop there if you want but it would look better if child nodes were directly under their parents e.g. Shanghai should appear under China. In order to do this, we cannot simply use uj
to combine our tables but we need to use the Over (/) accumulator to build the tree instead.
In order to get the row ordering correct, we add an id to each row, which will be a combination of the parent id and the row id. These id's look like this: 0, 0.0, 0.1, 0.1.1 etc. and will be used to sort the tree so that children appear under their parents.
Here is the final version of the code:
// Converts a table into a tree. // @param td - a keyed table // @param sortCol - the column to sort on // @returns a table with a tree column table2tree:{[td;sortCol] // add Total column to the table td:(`Total,keys[td]) xkey update Total:`Total from td; keyCols:keys td; // create a list of groupings groupings:(1+til count keyCols) sublist\: keyCols; // aggregate the table with each grouping // this gives us a list of keyed tables (one per grouping) tds:?[td;();;c!(sum;)each c:cols value td] each {x!x} each groupings; // sort the tables if[not null sortCol;tds:sortCol xdesc'tds]; // initial tree only has the Total row tree:update id:"0",node:enlist "Total" from 0!first tds; // build the tree using the over accumulator tree:{[tree;td] keyCols:keys td; // join the parent id to the current table td:td lj k xkey ?[tree;();0b;{x!x}(k:-1_keyCols),`id]; // update the id by concatenating the parent id to the row id // we need to left-pad the row id so that sorting works correctly // e.g. 1.3 should come before 1.10 td:update id:`$"."sv'flip(string id;(-1*count string count td)$string i) from td; // add a node column which corresponds to the value of the last key column td:![td;();0b;enlist[`node]!enlist last keyCols]; // add indentation to the node based on the depth (i.e. number of key columns) indentation:(4*-1+count keyCols)#" "; td:update node:(indentation,/:string node) from td; // now add the table to tree tree uj 0!td }/[tree;1_tds]; // sort the tree on id (`node,keyCols) xkey `id xasc tree}
This is what our final tree looks like:
q) data:3!("SSSI";enlist",") 0: `$"population.csv"; q) select node,population from table2tree[data;`population] node population ----------------------------------------- Total 128441136 Asia 46723930 China 34032094 Shanghai 22315474 Beijing 11716620 India 12691836 Mumbai 12691836 Africa 33570788 Egypt 18708788 Cairo 9908788 Giza 8800000 Nigeria 14862000 Lagos 14862000 Europe 26705477 Russia 17579111 Moscow 12195221 Saint Petersburg 5383890 United Kingdom 9126366 London 9126366 North America 21440941 United States 12522288 New York City 8550405 Los Angeles 3971883 Mexico 8918653 Mexico City 8918653
I also played around with adding lines to connect nodes of the tree but it got complicated very fast!
Can you think of a better way to do this? Let me know in the comments below!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.