Friday, November 20, 2020

Kdb+/q - File Compression

Large tables can be compressed in Kdb+ by setting .z.zd. Compression of data can reduce disk cost and in some cases even improve performance for applications that have fast CPUs but slow disks.

.z.zd is a list of three integers consisting of logical block size, algorithm (0=none, 1=q, 2=gzip, 3=snappy, 4=lz4hc) and compression level.

Here is an example showing how to compress a table:

// Helper function that sets .z.zd and
// returns the previous value of .z.zd
.util.setZzd:{
  origZzd:$[count key `.z.zd;.z.zd;()];
  if[x~();
    system"x .z.zd";
    :origZzd;
  ];
  .z.zd:x;
  origZzd}

// create a table
td:([]a:1000000?10; b:1000000?10; c:1000000?10);

// save the table to disk without compression
`:uncompressed set td;

// save the table to disk using q IPC compression
origZzd:.util.setZzd[(17;1;0)];
`:compressed set td;
.util.setZzd[origZzd];

You can check compression stats by using the -21! function:

q)-21!`:compressed
compressedLength  | 5747890
uncompressedLength| 24000041
algorithm         | 1i
logicalBlockSize  | 17i
zipLevel          | 0i

The size of the file on disk is reduced from 22.8 MB to 5.5 MB after using q IPC compression.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.