Wednesday, May 22, 2024

Calling Python Functions from kdb+/q with PyKX

PyKX allows you to call Python functions from kdb+/q (and vice versa), enabling powerful data analysis using the rich ecosystem of libraries available in Python. In this post, I will show how you can invoke a Lasso Regression function in Python by passing a table from a q script.

1. Install PyKX

pip install pykx

2. Create a Python (.p) file

Create a Python file called lasso.p containing a function that takes a Pandas DataFrame, performs Lasso regression (using the scikit-learn machine learning library), and returns a vector of coefficients.

# lasso.p

import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

def lasso_regression(df):
    X = df.iloc[:, :-1]
    y = df.iloc[:, -1]

    # Perform Lasso regression
    lasso = Lasso(alpha=0.1)
    lasso.fit(X, y)

    coefficients = np.append(lasso.coef_, lasso.intercept_) 
    return coefficients

3. Invoke the Python function from a q script

Next, write a q script that generates a table of random data and invokes the Python function with it.

// Load pykx and the python file
\l /path/to/python/site-packages/pykx/pykx.q
\l lasso.p

// Create a sample table with random x and y values
n:100;
x:n?10f;
y:2*x+n?2f;
data:([]x;y);

// Call the python function
qfunc:.pykx.get[`lasso_regression;<];
coefficients:qfunc data;

Conversion of data types between kdb+/q and Python

When transferring data between q and Python, PyKX applies "default" type conversions. For instance, tables in q are automatically converted to Pandas DataFrames, and lists are converted to NumPy arrays. You can call .pykx.setdefault to change the default conversion type to Pandas, Numpy, Python, or PyArrow. PyKX also provides functions to convert q data types to specific Python types, such as .pykx.tonp which tags a q object to be converted to a NumPy object. The following code illustrates type conversion:

q) .pykx.util.defaultConv
"default"

// lists are converted to NumPy arrays by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] til 10
<class 'numpy.ndarray'>

// tables are converted to Pandas DataFrames by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>

// change default conversion to NumPy
q) .pykx.setdefault["Numpy"]

// tables are NumPy arrays now
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'numpy.recarray'>

// change default conversion to Python
q) .pykx.setdefault["Python"]

// tables are converted to dict when using Python conversion
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'dict'>

// tag a q object as a Pandas DataFrame
q) .pykx.print .pykx.eval["lambda x: type(x)"] .pykx.topd ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.