Saturday, December 09, 2023

Python: Running Tasks in Parallel

The concurrent.futures module can be used to run tasks in parallel in Python. Here is an example:

import concurrent.futures
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor(max_workers = 10) as executor:
    futures = [executor.submit(perform_task, task) for task in tasks]
	
results = [future.result() for future in futures]

The ProcessPoolExecutor class uses a pool of processes to execute tasks asynchronously. The submit function immediately returns a Future object, and you can call future.result(), which will block until the task has completed.

Note that there is also a ThreadPoolExecutor class which uses a pool of threads; however, due to the Global Interpreter Lock, only one thread can execute python bytecode at any one time, which means that you will not achieve any parallelisation in most cases.

Sunday, December 03, 2023

Java 21: Sequenced Collections

Introduced in Java 21, a SequencedCollection is a Collection whose elements have a defined encounter order i.e. it has first and last elements, and the elements between them have successors and predecessors. Some examples include List, Deque, SortedSet, and LinkedHashSet.

The SequencedCollection interface provides methods to add, retrieve, and remove elements at either end of the collection. It also has a reversed() method which provides a reverse-ordered view of the original collection.

Similarly, the new SequencedMap interface is a map that has a well-defined encounter order, supports operations at both ends, and is reversible. Examples include LinkedHashMap and TreeMap.

Example usage:

var set = new LinkedHashSet<String>(Arrays.asList("a", "b", "c"));

set instanceof SequencedCollection
==> true

set.getFirst()
==> "a"

set.getLast()
==> "c"

set.reversed()
==> [c, b, a]

Saturday, December 02, 2023

Java 21: Unnamed Classes

Java 21 introduces Unnamed Classes (a preview language feature) that allow you to write small programs without having an enclosing class declaration.

Here is the classic Hello World program that we were all taught when starting to learn Java:

public class HelloWorld { 
  public static void main(String[] args) { 
    System.out.println("Hello, World!");
  }
}

There is a lot of clutter here. Using an unnamed class, this can be simplified to:

void main() { 
  System.out.println("Hello, World!");
}

Not only is the enclosing class not required, but the main method has also been enhanced so that it does not need to be public, static or require any arguments.

You can also add fields and methods to an unnamed class, as shown below:

private static final String GREETING = "Hello, World!";

private String getGreeting() {
  return GREETING;
}

void main() { 
  System.out.println(getGreeting());
}

Since an unnamed class cannot be instantiated or referenced by name, it is only useful as a standalone program or as an entry point to a program.

Saturday, November 25, 2023

Java 21: String Templates

In Java 21, String Templates have been introduced as a preview language feature, that allow text and expressions to be composed safely and efficiently, without using the + operator.

Here is an example:

int x = 5, y = 6;

String s = STR."\{x} plus \{y} is equal to \{x + y}";

// evaluates to: "5 plus 6 is equal to 11"

In this example, STR is a template processor. The template is \{x} plus \{y} is equal to \{x + y} and \{x} is one of the embedded expressions in the template. The STR template processor is defined in the Java Platform (and is automatically imported into every Java source file), and it performs string interpolation by evaluating the embedded expressions.

More examples:

// you can invoke methods, access fields, use ternaries
String s = STR."\{user.name}: Access \{user.hasAccess() ? "Granted" : "Denied"}";

// multi-line template expression
String xml = STR."""
<book>
  <author>\{author}</author>
  <title>\{title}</title>
</book>
""";

FMT Template Processor

FMT is like STR but it also interprets format specifiers which appear to the left of embedded expressions. For example:

double val = 4999.4567;

FormatProcessor.FMT."The value is %,.2f\{val}";

// evaluates to: "The value is 4,999.46"

It's quite easy to create your own Template Processor by implementing StringTemplate.Processor. This is useful if you want to validate inputs before composing the string. It's also possible to return an object of any type, not just String. For instance, a SQL Template processor could first sanitise the input to prevent a SQL injection attack, and then return a PreparedStatement instead of a String.

Sunday, September 03, 2023

Linear and Polynomial Regression in kdb+/q

In this post, I'll describe how you can implement linear and polynomial regression in kdb+/q to determine the equation of a line of best fit (also known as a trendline) through the data on a scatter plot.

Consider the following scatter plot:

Our aim is to estimate a function of a line that most closely fits the data.

The vector of estimated polynomial regression coefficients (using ordinary least squares estimation) can be obtained using the following formula (for information about how this formula is derived, see Regression analysis [Wikipedia]):

b = (XTX)−1XTy

This can be translated into q as follows:

computeRegressionCoefficients:{
    xt:flip x;
    xt_x:xt mmu x;
    xt_x_inv:inv xt_x;
    xt_y:xt mmu y;
    xt_x_inv mmu xt_y}

Linear:
In order to perform linear regression, we have to first create a matrix X with a column of 1s and a column containing the x-values. The output of linear regression will be a vector of 2 coefficients and the equation of the trendline will be of the form: y = b1x + b0

computeLinearRegressionCoefficients:{
    computeRegressionCoefficients[flip (1f;x);y]}

Polynomial:
In order to fit a polynomial line, all we have to do is take the matrix X from the linear regression model and add more columns corresponding to the order of the polynomial desired. For example, for quadratic regression, we will add a column for x2 on the right side of the matrix X. The output of quadratic regression will be a vector of 3 coefficients and the equation of the curve will be of the form: y = b2x2 + b1x + b0

// quadratic
computeQuadraticRegressionCoefficients:{
    computeRegressionCoefficients[flip (1f;x;x*x);y]}

// cubic
computeCubicRegressionCoefficients:{
    computeRegressionCoefficients[flip (1f;x;x*x;x*x*x);y]}

// generalisation of polynomial regression for any order
computePolynomialRegressionCoefficients:{[x;y;order]
    computeRegressionCoefficients[flip x xexp/: til order+1;y]}

Related post:
Matrix Operations in kdb+/q

Saturday, September 02, 2023

Python: Printing the Stdout/Stderr of a Subprocess

This is how you can run a subprocess in python and print out its stdout and stderr:

import subprocess

proc = subprocess.Popen(["/path/to/myscript", "arg1", "arg2"], 
            stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
for line in proc.stdout:
    print(line.decode().rstrip())
proc.wait()
if proc.returncode != 0:
    print("Command failed with status:", proc.returncode)

Check out the subprocess documentation for more information.

Related post:
Python Cheat Sheet

Sunday, August 27, 2023

Matrix Operations in kdb+/q

In q, a matrix (an array of m x n numbers) is represented as a list of lists. For example, here is a matrix with 2 rows and 3 columns:

q)A:(1 2 3;4 5 6)
q)A
1 2 3
4 5 6

Matrix Addition and Subtraction
If A and B are matrices of the same size, then they can be added and subtracted. To find the entries of A + B, you simply add the corresponding entries of A and B. To find A - B, subtract corresponding entries. If A and B have different sizes, you will get a 'length error.

q)A:(1 2;3 4)
q)B:(5 6;7 8)
q)A+B
6  8
10 12
q)B-A
4 4
4 4
q)C:(1 1 1;2 2 2)
q)A+C
'length
  [0]  A+C
        ^

Scalar Multiplication
If A is a matrix and k is a scalar, then the matrix kA is obtained by multiplying each entry of A by k.

q)A:(1 2;3 4)
q)k:2
q)k*A
2 4
6 8

Matrix Multiplication
First, in order to multiply two matrices A and B, the number of columns of A must match the number of rows of B. If A is m x n and B is n x p, then the size of the product matrix AB will be m x p. In order to calculate AB, you have to take the dot product (multiply corresponding numbers and then add them up) of each row vector in A and the corresponding column vector in B. This can be done in q using the mmu (or $) operator.

q)A:(1 2f;3 4f)
q)B:(5 6f;7 8f)
q)A
1 2
3 4
q)B
5 6
7 8
q)A mmu B
19 22
43 50

Identity Matrix
This is a square matrix with 1s on the diagonal and 0s everywhere else.

q)I:{`float${x=/:x}til x}
q)I 3
1 0 0
0 1 0
0 0 1

Matrix Inverse
The inverse of A is A-1 if AA-1=A-1A=I, where I is the identity matrix. Use the inv function to find the inverse of a matrix.

q)A:(1 2f;3 4f)
q)inv A
-2  1
1.5 -0.5
q)A mmu inv A
1 1.110223e-016
0 1

Matrix Tranpose
The tranpose AT of a matrix A is a flipped version of the original matrix which is obtained by changing its rows into columns (or equivalently, its columns into rows). This can be done by using the flip operation in q.

q)A:(1 2 3;4 5 6)
q)flip A
1 4
2 5
3 6

Saturday, August 26, 2023

Using "flock" to Prevent Multiple Instances of a Script from Running

In a previous post, I wrote about how you can use the lockfile command to ensure that only one instance of a script is running at a time. An alternative to lockfile is the flock command, which is used as follows:

flock /path/to/mylockfile cmd

By default, if the lock cannot be immediately acquired, flock will wait indefinitely until it becomes available. However, you can use the --nonblock (or -n) flag if you want flock to fail (with an exit code of 1) rather than wait if the lock cannot be immediately acquired. You can also specify how long flock should wait by passing in a --timeout in seconds.

A convenient form of flock often used within shell scripts is to use a file descriptor, as follows:

(
flock -n 9 || exit 1
# ... commands executed under lock ...
) 9>/path/to/mylockfile

If you want to prevent multiple instances of a shell script from running simultaneously, add the following boilerplate at the top of your script, which will cause the script to lock itself automatically on first run:

[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$@" || :

Related posts:
Using "lockfile" to Prevent Multiple Instances of a Script from Running
Retrying Commands in Shell Scripts
Executing a Shell Command with a Timeout

Saturday, April 08, 2023

kdb+/q - Converting a CSV String into a Table

I often find myself wanting to create quick in-memory tables in q for testing purposes. The usual way to create a table is by specifying lists of column names and values, or flipping a dictionary, as shown below:

([] name:`Alice`Bob`Charles;age:20 30 40;city:`London`Paris`Athens)

// or, using a dictionary:

flip `name`age`city!(`Alice`Bob`Charles;20 30 40;`London`Paris`Athens)

As you can see, it's quite difficult to visualise the table being created using this approach. That's why I sometimes prefer to create a table from a multi-line CSV string instead, using the 0: operator, as shown below:

("SIS";enlist",") 0:
"name,age,city
Alice,20,London
Bob,30,Paris
Charles,40,Athens
"

name    age city
------------------
Alice   20  London
Bob     30  Paris
Charles 40  Athens

Note that you can also load from a CSV file:

("SIS";enlist",") 0: `$"/path/to/file.csv"
Related post:
kdb+/q - Reading and Writing a CSV File

Sunday, April 02, 2023

Java 20: Record Patterns in For Loops

Previously, I wrote about "Record Patterns" introduced in Java 19, that allow you to "deconstruct" records and access their components directly.

In Java 20 (released a couple of weeks ago!), record patterns have been enhanced so that they can also be used in for loops.

Here is an example that uses a nested record pattern in a for loop to print out a list of records:

record Author(String firstName, String lastName) {}
record Book(String title, Author author, double price) {}

static void printBooks(List<Book> books) {
  for (Book(var title, Author(var firstName, var lastName), var price): books) {
    System.out.printf("%s by %s %s for %.2f\n", title, firstName, lastName, price);
  }
}
Related post:
Java 19: Record Patterns

Friday, January 06, 2023

fahd.blog in 2022

Happy 2023, everyone!

I'd like to wish everyone a great start to an even greater new year!

In keeping with tradition, here's one last look back at fahd.blog in 2022.

During 2022, I posted 7 new entries on fahd.blog. I am also thrilled that I have more readers from all over the world! Thanks for reading and especially for giving feedback.

Top 3 posts of 2022:

I'm going to be writing a lot more this year, so stay tuned for more great techie tips, tricks and hacks! :)

Related posts: