Wednesday, May 22, 2024

Calling Python Functions from kdb+/q with PyKX

PyKX allows you to call Python functions from kdb+/q (and vice versa), enabling powerful data analysis using the rich ecosystem of libraries available in Python. In this post, I will show how you can invoke a Lasso Regression function in Python by passing a table from a q script.

1. Install PyKX

```pip install pykx
```

2. Create a Python (.p) file

Create a Python file called `lasso.p` containing a function that takes a Pandas DataFrame, performs Lasso regression (using the scikit-learn machine learning library), and returns a vector of coefficients.

```# lasso.p

import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

def lasso_regression(df):
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Perform Lasso regression
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)

coefficients = np.append(lasso.coef_, lasso.intercept_)
return coefficients
```

3. Invoke the Python function from a q script

Next, write a q script that generates a table of random data and invokes the Python function with it.

```// Load pykx and the python file
\l /path/to/python/site-packages/pykx/pykx.q
\l lasso.p

// Create a sample table with random x and y values
n:100;
x:n?10f;
y:2*x+n?2f;
data:([]x;y);

// Call the python function
qfunc:.pykx.get[`lasso_regression;<];
coefficients:qfunc data;
```

Conversion of data types between kdb+/q and Python

When transferring data between q and Python, PyKX applies "default" type conversions. For instance, tables in q are automatically converted to Pandas DataFrames, and lists are converted to NumPy arrays. You can call `.pykx.setdefault` to change the default conversion type to Pandas, Numpy, Python, or PyArrow. PyKX also provides functions to convert q data types to specific Python types, such as `.pykx.tonp` which tags a q object to be converted to a NumPy object. The following code illustrates type conversion:

```q) .pykx.util.defaultConv
"default"

// lists are converted to NumPy arrays by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] til 10
<class 'numpy.ndarray'>

// tables are converted to Pandas DataFrames by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>

// change default conversion to NumPy
q) .pykx.setdefault["Numpy"]

// tables are NumPy arrays now
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'numpy.recarray'>

// change default conversion to Python
q) .pykx.setdefault["Python"]

// tables are converted to dict when using Python conversion
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'dict'>

// tag a q object as a Pandas DataFrame
q) .pykx.print .pykx.eval["lambda x: type(x)"] .pykx.topd ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>
```