PyKX allows you to call Python functions from kdb+/q (and vice versa), enabling powerful data analysis using the rich ecosystem of libraries available in Python. In this post, I will show how you can invoke a Lasso Regression function in Python by passing a table from a q script.
1. Install PyKX
pip install pykx
2. Create a Python (.p) file
Create a Python file called lasso.p
containing a function that takes a Pandas DataFrame, performs Lasso regression (using the scikit-learn machine learning library), and returns a vector of coefficients.
# lasso.p
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
def lasso_regression(df):
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
# Perform Lasso regression
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)
coefficients = np.append(lasso.coef_, lasso.intercept_)
return coefficients
3. Invoke the Python function from a q script
Next, write a q script that generates a table of random data and invokes the Python function with it.
// Load pykx and the python file
\l /path/to/python/site-packages/pykx/pykx.q
\l lasso.p
// Create a sample table with random x and y values
n:100;
x:n?10f;
y:2*x+n?2f;
data:([]x;y);
// Call the python function
qfunc:.pykx.get[`lasso_regression;<];
coefficients:qfunc data;
Conversion of data types between kdb+/q and Python
When transferring data between q and Python, PyKX applies "default" type conversions. For instance, tables in q are automatically converted to Pandas DataFrames, and lists are converted to NumPy arrays. You can call .pykx.setdefault
to change the default conversion type to Pandas, Numpy, Python, or PyArrow. PyKX also provides functions to convert q data types to specific Python types, such as .pykx.tonp
which tags a q object to be converted to a NumPy object. The following code illustrates type conversion:
q) .pykx.util.defaultConv
"default"
// lists are converted to NumPy arrays by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] til 10
<class 'numpy.ndarray'>
// tables are converted to Pandas DataFrames by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>
// change default conversion to NumPy
q) .pykx.setdefault["Numpy"]
// tables are NumPy arrays now
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'numpy.recarray'>
// change default conversion to Python
q) .pykx.setdefault["Python"]
// tables are converted to dict when using Python conversion
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'dict'>
// tag a q object as a Pandas DataFrame
q) .pykx.print .pykx.eval["lambda x: type(x)"] .pykx.topd ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>