Friday, April 02, 2021

kdb+/q - Display a Table as a Tree

This post shows how you can convert a keyed table to a hierarchical tree format in kdb+/q. This could be useful if you want to display data as a tree widget in a front-end.

Consider the following keyed table of world populations:

continent     country        city            | population
---------------------------------------------| ----------
North America United States  New York City   | 8550405
North America United States  Los Angeles     | 3971883
North America Mexico         Mexico City     | 8918653
Europe        United Kingdom London          | 9126366
Europe        Russia         Moscow          | 12195221
Europe        Russia         Saint Petersburg| 5383890
Africa        Nigeria        Lagos           | 14862000
Africa        Egypt          Cairo           | 9908788
Africa        Egypt          Giza            | 8800000
Asia          China          Shanghai        | 22315474
Asia          India          Mumbai          | 12691836
Asia          China          Beijing         | 11716620

We would like to display it as a tree of continent > country > city, as shown below (similar to a pivot table in Excel):

node                        | population
----------------------------| ----------
Total                       | 128441136
    Asia                    | 46723930
        China               | 34032094
            Shanghai        | 22315474
            Beijing         | 11716620
        India               | 12691836
            Mumbai          | 12691836
    Africa                  | 33570788
        Egypt               | 18708788
            Cairo           | 9908788
            Giza            | 8800000
        Nigeria             | 14862000
            Lagos           | 14862000
    Europe                  | 26705477
        Russia              | 17579111
            Moscow          | 12195221
            Saint Petersburg| 5383890
        United Kingdom      | 9126366
            London          | 9126366
    North America           | 21440941
        United States       | 12522288
            New York City   | 8550405
            Los Angeles     | 3971883
        Mexico              | 8918653
            Mexico City     | 8918653

In order to achieve this, we need to aggregate the data with different groupings, then combine the resultant tables and format it into a tree.

1. Grouping the data

First, we will add a dummy Total column to the table and then aggregate the table with the following groupings:

  • Total
  • Total, continent
  • Total, continent, country
  • Total, continent, country, city

The code for this is shown below:

// add Total column to the table. (td is a keyed table)
td:(`Total,keys[td]) xkey update Total:`Total from td;
keyCols:keys td;

// create a list of groupings
groupings:(1+til count keyCols) sublist\: keyCols;

// aggregate the table with each grouping
// this gives us a list of keyed tables (one per grouping)
tds:?[td;();;c!(sum;)each c:cols value td] each {x!x} each groupings;

// this step is optional but it's nice to sort each table on population
tds:`population xdesc'tds;

2. Joining the data

Next, we need to join the tables that were obtained as a result of the groupings. We do this by unkeying the tables and then using uj:

td:keyCols xkey (uj/) 0!'tds;

3. Formatting the data

Now let's add a Path column by concatenating the key columns:

td:![td;();0b;enlist[`Path]!enlist(`$sv';">";(string;(each;{x except `};(flip;enlist,keyCols))))];
td:(`Path,keyCols) xkey td;

This is what our tree looks like so far:

Path                                            Total continent     country        city            | population
---------------------------------------------------------------------------------------------------| ----------
Total                                           Total                                              | 128441136
Total>Asia                                      Total Asia                                         | 46723930
Total>Africa                                    Total Africa                                       | 33570788
Total>Europe                                    Total Europe                                       | 26705477
Total>North America                             Total North America                                | 21440941
Total>Asia>China                                Total Asia          China                          | 34032094
Total>Africa>Egypt                              Total Africa        Egypt                          | 18708788
Total>Europe>Russia                             Total Europe        Russia                         | 17579111
Total>Africa>Nigeria                            Total Africa        Nigeria                        | 14862000
Total>Asia>India                                Total Asia          India                          | 12691836
Total>North America>United States               Total North America United States                  | 12522288
Total>Europe>United Kingdom                     Total Europe        United Kingdom                 | 9126366
Total>North America>Mexico                      Total North America Mexico                         | 8918653
Total>Asia>China>Shanghai                       Total Asia          China          Shanghai        | 22315474
Total>Africa>Nigeria>Lagos                      Total Africa        Nigeria        Lagos           | 14862000
Total>Asia>India>Mumbai                         Total Asia          India          Mumbai          | 12691836
Total>Europe>Russia>Moscow                      Total Europe        Russia         Moscow          | 12195221
Total>Asia>China>Beijing                        Total Asia          China          Beijing         | 11716620
Total>Africa>Egypt>Cairo                        Total Africa        Egypt          Cairo           | 9908788
Total>Europe>United Kingdom>London              Total Europe        United Kingdom London          | 9126366
Total>North America>Mexico>Mexico City          Total North America Mexico         Mexico City     | 8918653
Total>Africa>Egypt>Giza                         Total Africa        Egypt          Giza            | 8800000
Total>North America>United States>New York City Total North America United States  New York City   | 8550405
Total>Europe>Russia>Saint Petersburg            Total Europe        Russia         Saint Petersburg| 5383890
Total>North America>United States>Los Angeles   Total North America United States  Los Angeles     | 3971883

4. Reordering the rows

The tree looks okay so far and you can stop there if you want but it would look better if child nodes were directly under their parents e.g. Shanghai should appear under China. In order to do this, we cannot simply use uj to combine our tables but we need to use the Over (/) accumulator to build the tree instead.

In order to get the row ordering correct, we add an id to each row, which will be a combination of the parent id and the row id. These id's look like this: 0, 0.0, 0.1, 0.1.1 etc. and will be used to sort the tree so that children appear under their parents.

Here is the final version of the code:

// Converts a table into a tree.
// @param td - a keyed table
// @param sortCol - the column to sort on
// @returns a table with a tree column
table2tree:{[td;sortCol]
    // add Total column to the table
    td:(`Total,keys[td]) xkey update Total:`Total from td;
    keyCols:keys td;

    // create a list of groupings
    groupings:(1+til count keyCols) sublist\: keyCols;

    // aggregate the table with each grouping
    // this gives us a list of keyed tables (one per grouping)
    tds:?[td;();;c!(sum;)each c:cols value td] each {x!x} each groupings;

    // sort the tables
    if[not null sortCol;tds:sortCol xdesc'tds];

    // initial tree only has the Total row
    tree:update id:"0",node:enlist "Total" from 0!first tds;

    // build the tree using the over accumulator
    tree:{[tree;td]
        keyCols:keys td;

        // join the parent id to the current table
        td:td lj k xkey ?[tree;();0b;{x!x}(k:-1_keyCols),`id];

        // update the id by concatenating the parent id to the row id
        // we need to left-pad the row id so that sorting works correctly
        // e.g. 1.3 should come before 1.10
        td:update id:`$"."sv'flip(string id;(-1*count string count td)$string i) from td;

        // add a node column which corresponds to the value of the last key column
        td:![td;();0b;enlist[`node]!enlist last keyCols];

        // add indentation to the node based on the depth (i.e. number of key columns)
        indentation:(4*-1+count keyCols)#" ";
        td:update node:(indentation,/:string node) from td;

        // now add the table to tree
        tree uj 0!td
	}/[tree;1_tds];

    // sort the tree on id
    (`node,keyCols) xkey `id xasc tree}

This is what our final tree looks like:

q) data:3!("SSSI";enlist",") 0: `$"population.csv";
q) select node,population from table2tree[data;`population]

node                         population
-----------------------------------------
Total                        128441136
    Asia                     46723930
        China                34032094
            Shanghai         22315474
            Beijing          11716620
        India                12691836
            Mumbai           12691836
    Africa                   33570788
        Egypt                18708788
            Cairo            9908788
            Giza             8800000
        Nigeria              14862000
            Lagos            14862000
    Europe                   26705477
        Russia               17579111
            Moscow           12195221
            Saint Petersburg 5383890
        United Kingdom       9126366
            London           9126366
    North America            21440941
        United States        12522288
            New York City    8550405
            Los Angeles      3971883
        Mexico               8918653
            Mexico City      8918653

I also played around with adding lines to connect nodes of the tree but it got complicated very fast!

Can you think of a better way to do this? Let me know in the comments below!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.