Wednesday, May 22, 2024

Calling Python Functions from kdb+/q with PyKX

PyKX allows you to call Python functions from kdb+/q (and vice versa), enabling powerful data analysis using the rich ecosystem of libraries available in Python. In this post, I will show how you can invoke a Lasso Regression function in Python by passing a table from a q script.

1. Install PyKX

pip install pykx

2. Create a Python (.p) file

Create a Python file called lasso.p containing a function that takes a Pandas DataFrame, performs Lasso regression (using the scikit-learn machine learning library), and returns a vector of coefficients.

# lasso.p

import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

def lasso_regression(df):
    X = df.iloc[:, :-1]
    y = df.iloc[:, -1]

    # Perform Lasso regression
    lasso = Lasso(alpha=0.1)
    lasso.fit(X, y)

    coefficients = np.append(lasso.coef_, lasso.intercept_) 
    return coefficients

3. Invoke the Python function from a q script

Next, write a q script that generates a table of random data and invokes the Python function with it.

// Load pykx and the python file
\l /path/to/python/site-packages/pykx/pykx.q
\l lasso.p

// Create a sample table with random x and y values
n:100;
x:n?10f;
y:2*x+n?2f;
data:([]x;y);

// Call the python function
qfunc:.pykx.get[`lasso_regression;<];
coefficients:qfunc data;

Conversion of data types between kdb+/q and Python

When transferring data between q and Python, PyKX applies "default" type conversions. For instance, tables in q are automatically converted to Pandas DataFrames, and lists are converted to NumPy arrays. You can call .pykx.setdefault to change the default conversion type to Pandas, Numpy, Python, or PyArrow. PyKX also provides functions to convert q data types to specific Python types, such as .pykx.tonp which tags a q object to be converted to a NumPy object. The following code illustrates type conversion:

q) .pykx.util.defaultConv
"default"

// lists are converted to NumPy arrays by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] til 10
<class 'numpy.ndarray'>

// tables are converted to Pandas DataFrames by default
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>

// change default conversion to NumPy
q) .pykx.setdefault["Numpy"]

// tables are NumPy arrays now
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'numpy.recarray'>

// change default conversion to Python
q) .pykx.setdefault["Python"]

// tables are converted to dict when using Python conversion
q) .pykx.print .pykx.eval["lambda x: type(x)"] ([] foo:1 2)
<class 'dict'>

// tag a q object as a Pandas DataFrame
q) .pykx.print .pykx.eval["lambda x: type(x)"] .pykx.topd ([] foo:1 2)
<class 'pandas.core.frame.DataFrame'>

Saturday, May 18, 2024

Using MathML to Embed Mathematical Equations in Webpages

MathML (Mathematical Markup Language), a markup language developed by the World Wide Web Consortium (W3C), serves as the standard for representing mathematical notation on the web. Integrating MathML into webpages involves encapsulating mathematical expressions within <math> tags and utilising a variety of MathML elements to represent different components of equations.

For example, the following snippet represents the quadratic formula: x = - b ± b2 - 4 a c 2 a

<math>
  <mrow>
    <mi>x</mi>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mo>-</mo>
        <mi>b</mi>
        <mo>±</mo>
        <msqrt>
          <mrow>
            <msup><mi>b</mi><mn>2</mn></msup>
            <mo>-</mo>
            <mn>4</mn>
            <mi>a</mi>
            <mi>c</mi>
          </mrow>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>
</math>

While alternatives like LaTeX exist, MathML emerges as the superior choice for the web because it is supported natively by modern web browsers, without the need for additional libraries or plugins. Also, search engines can parse MathML-encoded equations, enhancing the discoverability of mathematical content on the web. LaTeX requires additional processing and rendering engines like MathJax or KaTeX to display equations in webpages, introducing complexities and potential compatibility issues.

Saturday, May 11, 2024

Java 22: Stream Gatherers

Java 22 introduces Stream Gatherers, a preview language feature, that allows you to build complex stream pipelines using custom intermediate operations, such as grouping elements based on specific criteria, selecting elements with intricate conditions, or performing sophisticated transformations. Furthermore, Stream Gatherers offer seamless integration with parallel stream processing, ensuring optimal performance even in parallel execution scenarios.

There are built-in gatherers like fold, mapConcurrent, scan, windowFixed, and windowSliding, but you can also define your own custom gatherers.

Here is an example using the windowFixed gatherer to group elements in a stream into sliding windows of a specified size:

IntStream.range(0,10)
  .boxed()
  .gather(Gatherers.windowFixed(2))
  .forEach(System.out::println);

// Result:
[0, 1]
[2, 3]
[4, 5]
[6, 7]
[8, 9]

Sunday, May 05, 2024

Java 22: Statements Before super(...)

Java 22 brings forth a new preview language feature: the ability to include statements before the super() call in constructors.

Traditionally, Java constructors have had a strict rule: the super() call, which invokes the superclass constructor, must always be the first statement in a subclass constructor. This rule, while ensuring proper initialisation order, sometimes led to verbose or convoluted constructor implementations, especially when additional setup was required before invoking the superclass constructor.

With the introduction of JDK 22, this limitation has been relaxed with the introduction of pre-super statements, which allow you to validate and prepare arguments before the super() call. This also facilitates fail-fast scenarios, because you can perform rigorous argument validation or exception handling before superclass instantiation.

Here is an example:

class Shape {
  private final String color;

  Shape(String color) {
    this.color = color;
  }
}

class Rectangle extends Shape {
  private final double length;
  private final double width;

  Rectangle(String color, double length, double width) {
    if (length <= 0 || width <= 0) {
      throw new IllegalArgumentException("Dimensions must be positive");
    }
    super(color);
    this.length = length;
    this.width = width;
  }
}

In this example, before invoking the superclass constructor, a pre-super statement validates the dimensions of the rectangle, ensuring they are positive.

Saturday, May 04, 2024

Java 22: Unnamed Variables and Patterns

Java 22 introduces Unnamed Variables & Patterns. Unnamed variables are placeholders denoted by the underscore character (_) that stand in for variable names, particularly in situations where the variable's identifier is insignificant or redundant. They can be declared in several contexts, including local variable declarations, catch clauses, lambda expressions, and more. By omitting explicit variable names in scenarios where the variable name serves no functional purpose, code becomes more succinct, reducing unnecessary verbosity and aiding readability.

Here are a few examples of unnamed variables in action:

For-loop:

for (Order _ : orders) {
  doSomething();
}

Assignment statement:

Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2, ...
var x = q.remove();
var y = q.remove();
var _ = q.remove();

Lambda expressions:

list.stream().mapToInt(_ -> 1).sum();

Exception handling:

String s = ...
try {
  int i = Integer.parseInt(s);
} catch (NumberFormatException _) {
  System.out.println("Invalid number: " + s);
}

Try-with-resources:

try (BufferedReader _ = new BufferedReader(...)) {
  System.out.println("File opened successfully.");
} catch (IOException _) {
  System.err.println("An error occurred while opening the file.");
}

Unnamed pattern variables:

switch (shape) {
  case Circle _ -> process(shape, 0);
  case Triangle _ -> process(shape, 3);
  case Rectangle _ -> process(shape, 4);
  case var _ -> System.out.println("Unknown shape");
}

Unnamed Patterns

Unnamed Patterns provide an elegant solution when you need to match a pattern without extracting specific components. If you have nested data structures, such as records within records, with unnamed patterns, you can focus on extracting the necessary components without cluttering your code with unnecessary variable assignments.

record Address(String city, String country) {}
record Person(String name, int age, Address address) {}

if (person instanceof Person(var name, _, Address(var city, _))) {
  System.out.println(name + " lives in " + city);
}

So, the next time you encounter a situation where the variable name seems inconsequential, consider using an unnamed variable to streamline your code.