fahd.blog: java8

Showing posts with label java8. Show all posts

Saturday, July 29, 2017

Java 8: Multimaps

A multimap is a Map which maps a single key to multiple values e.g. HashMap<String, List<String>>.

Java 8 introduces a Map.computeIfAbsent method, which makes inserting values into a multimap much simpler:

Map<String, List<String>> multimap = new HashMap<>();
multimap.computeIfAbsent(key, k -> new ArrayList<>()).add(value);

Prior to Java 8, a multimap was usually created as follows:

// create the map
Map<String, List<String>> multimap = new HashMap<>();

// put a key/value into the map
List<String> list = multimap.get(key);
if (list == null) {
  multimap.put(key, list = new ArrayList<>());
}
list.add(value);

Or with Guava's Multimap class:

ListMultimap<String, String> multimap = ArrayListMultimap.create();
multimap.put(key, value);

You can read more about Java 8 updates to the Map class in my previous blog post.

Saturday, July 22, 2017

Java: Splitting a Pipe-delimited String - Fast!

This post shows how you can efficiently split a pipe-delimited string e.g. "foo|bar|baz". There are many ways to do this - I could even write my own - but I will only use those that are available in the JDK (or commonly used libraries) and will measure the performance of each.

Remember that, since the pipe symbol (|) is a special character in regular expressions, it needs to be escaped if necessary.

1. String.split

The most obvious way to split a string on the pipe character is to use Java's String.split:

public static String[] split(String s) {
  return s.split("\\|");
}

2. String.split with Pattern.quote

Instead of escaping the pipe ourselves, we can use Pattern.quote to do it for us. (Note: Pattern.quote("|") returns "\Q|\E".)

public static String[] splitWithPatternQuote(String s) {
  return s.split(Pattern.quote("|"));
}

3. Pattern.split

Create a static Pattern and use it to split the string.

private static final Pattern SPLITTER = Pattern.compile("\\|");

public static String[] splitWithPattern(String s) {
  return SPLITTER.split(s);
}

4. StringUtils.split

Apache Commons provides StringUtils.split, which splits a string on a single character:

import org.apache.commons.lang3.StringUtils;

public static String[] splitWithStringUtils(String s) {
  return StringUtils.split(s, '|');
}

So, which one is fastest?

I ran each method on 1 million pipe-delimited strings of different lengths - RandomStringUtils.randomAlphabetic is great for generating random strings - and the table below shows how long each one took:

Method	Time (ms)
`split`	485
`splitWithStringUtils`	520
`splitWithPattern`	643
`splitWithPatternQuote`	936

An interesting observation is that splitWithPatternQuote is so much slower than split, even though they both call String.split internally! If we delve into the source code for String.split, we can see that there is an optimisation (a "fastpath") if the provided regex has two-chars and the first char is a backslash. This applies to "\\|" but, since Pattern.quote produces \Q|\E, it does not use the fastpath and instead creates a new Pattern object for every split. This also explains why it is slower than splitWithPattern, which re-uses the same Pattern object.

Saturday, March 11, 2017

Java 8: Tree Traversal Using Streams

It's quite common to traverse a tree using recursion, but with Java 8 it is now possible to lazily traverse a tree using streams.

The class below represents a tree. The stream() method streams the nodes contained in the tree. You can then do all the cool things that you can do with other streams, such as filtering, mapping and collecting!

public class TreeNode<E> {

  private final E data;
  private final List<TreeNode<E>> children;

  /**
   * Creates the tree node with the specified data and no children.
   * @param data
   */
  public TreeNode(final E data) {
    this.data = data;
    this.children = new ArrayList<>();
  }

  /**
   * @return the data contained in this node
   */
  public E getData() {
    return data;
  }

  /**
   * Adds a child to this tree node.
   * @param data the data to add
   */
  public TreeNode<E> addChild(final E data) {
    final TreeNode<E> toAdd = new TreeNode<>(data);
    children.add(toAdd);
    return toAdd;
  }

  /**
   * @return a stream of nodes in this tree in depth first order
   */
  public Stream<TreeNode<E>> stream() {
    return Stream.concat(Stream.of(this), children.stream().flatMap(TreeNode::stream));
  }
}

Example usage:

TreeNode<String> root = new TreeNode<>("Root");
TreeNode<String> a = root.addChild("A");
a.addChild("A1");
a.addChild("A2");
root.addChild("B");
TreeNode<String> c = root.addChild("C");
c.addChild("C1");

int count = root.stream().count(); // 7

String tree = root.stream().map(TreeNode::getData).collect(Collectors.joining(","));
// Root,A,A1,A2,B,C,C1

Sunday, January 22, 2017

Java 8: Top N elements from a stream

This post shows how you can sort a stream in reverse order and then select the top N elements.

This is quite common in financial systems.

For example, let's say that you have a list of currency exchange rate movements and you want to see the top 5 largest movements. You can do this using Java 8 Streams as shown below:

import static java.util.Comparator.*;
import static java.util.stream.Collectors.*;

// list of currency exchange rate percentage moves
final List<Currency> currencies = Arrays.asList(
    new Currency("EUR/USD", 0.37),
    new Currency("USD/JPY", -0.21),
    new Currency("GBP/USD", 0.27),
    new Currency("AUD/USD", -0.08),
    new Currency("USD/CAD", 0.02),
    new Currency("USD/CHF", -0.46),
    new Currency("EUR/JPY", 0.16),
    new Currency("EUR/GBP", 0.13),
    new Currency("USD/HKD", 0.0),
    new Currency("EUR/CHF", 0.05),
    new Currency("USD/KRW", -0.71)
    );

currencies.stream()
  .sorted(comparing(Currency::getMove, comparing(Math::abs)).reversed())
  .limit(5)
  .collect(toList());

The result is:

Currency [ccy=USD/KRW, move=-0.71]
Currency [ccy=USD/CHF, move=-0.46]
Currency [ccy=EUR/USD, move=0.37]
Currency [ccy=GBP/USD, move=0.27]
Currency [ccy=USD/JPY, move=-0.21]

The two argument Comparator.comparing method easily allows us to compare currency moves on absolute value.

Sunday, October 30, 2016

Java 8: Updates to Map

In Java 8, the Map interface has been updated to include several new, convenient methods. Some of the important changes are outlined below:

1. computeIfAbsent

computeIfAbsent is particularly useful when implementing the caching pattern. For example, previously, you would do this:

private final Map<String, Data> map = new HashMap<>();

public Data getData(final String key) {
  Data data = map.get(key); // look in cache
  if (data == null) { // data is not in cache
    data = computeData(key); // perform an (expensive) operation
    map.put(key, data); // put data in cache
  }
  return data; // return data
}

You can now write this method more concisely as follows:

public Data getData(final String key) {
  return map.computeIfAbsent(key, this::computeData);
}

You can also use computeIfAbsent when implementing a "multi-map" (Map<K, Collection<V>>):

 map.computeIfAbsent(key, k -> new ArrayList<V>()).add(v);

2. putIfAbsent

putIfAbsent puts a value in the map, if the key is not already present. For example, instead of this:

final Map<String, String> map = new HashMap<>();
final String v = map.get(k);
if (v == null) {
  map.put(k, newValue);
}

you can write:

final Map<String, String> map = new HashMap<>();
map.putIfAbsent(k, newValue);

3. getOrDefault

getOrDefault returns a default value, if a specified key is not found in the map. Previously, you would do this:

final Map<String, String> map = new HashMap<>();
return map.containsKey(key) ? map.get(key) : "foo";

Now you can write:

final Map<String, String> map = new HashMap<>();
return map.getOrDefault(key, "foo");

4. Performance

The implementation of HashMap has been updated so that when buckets become too big (see the "TREEIFY_THRESHOLD" setting) they are transformed from lists (with O(n) retrieval) into trees (with faster O(log(n)) retrieval). This improves performance when hashCode() methods return values that are poorly distributed, or when many keys share the same hashCode, so long as they are also Comparable.

Sunday, September 25, 2016

Java 8 Concurrency: LongAdder and LongAccumulator

Java 8 introduces LongAdder, LongAccumulator, DoubleAdder and DoubleAccumulator, which are recommended instead of the Atomic classes when multiple threads update frequently but read less frequently (for example, in the context of gathering statistics). Under high thread contention, these new classes are designed to grow dynamically and, therefore, there is higher throughput at the expense of higher space consumption.

The "Adder" classes support operations for additions, whereas the "Accumulator" clases are given a function to combine values.

To understand how these classes work, you need to take a look at the implementation of the base class, Striped64. It has a "base" field, plus a table of "cells", each of which is a padded variant of AtomicLong to reduce cache contention. It first tries to update the base field using the usual AtomicLong CAS operation. If there is no contention, CAS will succeed. However, if there is contention (i.e. the CAS update of "base" failed), it will use the "probe" value (hash value) of the thread to map to a cell in the table. If a cell does not exist, it will be created. New cells are created upon contention until the number of cells reaches the nearest power of two greater than or equal to the number of CPUs. If the cell exists, CAS will be tried to update the value of the cell. If CAS fails, the probe value will be updated using a secondary hash until an uncontended cell is found. Finally, the sum method simply adds the base field and the elements in the cells array.

The code below shows how you can use LongAdder to calculate the sum of several values:

LongAdder adder = new LongAdder();

// do some addition in different threads
adder.add(10);
adder.increment();
// ...

// get the current sum
long sum = adder.sum();

Or you can use LongAccumulator as follows:

LongAccumulator acc = new LongAccumulator(Long::sum, 0);

acc.accumulate(10);
adder.increment();
// ...

// get the current sum
long sum = acc.get();

You can also use LongAdder in conjuction with a ConcurrentHashMap to maintain a frequency map:

ConcurrentMap<String, LongAdder> freqs = new ConcurrentHashMap<>();
freqs.computeIfAbsent("foo", k -> new LongAdder()).increment();

Sunday, September 18, 2016

Power Set Algorithm

The power set of a set S is the set of all subsets of S including the empty set and S itself. For example, the power set of {1, 2, 3} is {{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.

The problem can be solved using recursion. We know that if the initial set is empty, then its power set is {{}} and if it has only one element {a} then its power set is {{}, {a}}.

Here is the algorithm:

public static <E extends Comparable<E>> List<List<E>> subsets(final List<E> list) {
  final List<List<E>> result = new ArrayList<>();

  // if the list is empty there is only one subset, which is the empty set
  if (list.isEmpty()) {
    result.add(Collections.emptyList());
    return result;
  }

  // otherwise, get the first element
  final E first = list.get(0);

  // find the subsets of the rest of the list
  final List<E> rest = list.subList(1, list.size());
  final List<List<E>> subsets = subsets(rest);
  result.addAll(subsets);

  // create new subsets by inserting the first element into each subset
  for (final List<E> subset : subsets) {
    final List<E> newSubset = new ArrayList<>(subset);
    newSubset.add(0, first);
    result.add(newSubset);
  }

  return result;
}

Sunday, September 11, 2016

Factorial Using Iteration, Recursion and Java 8 Streams

In this post, we will solve the classic school problem of calculating the factorial of a positive integer, using different Java algorithms.

Note: In all the algorithms below, we assume that the input is > 1.

1. Iterative Factorial

In this algorithm, we use a standard for-loop and increment the variables i and result at each iteration:

static int iterativeFactorial(final int n) {
  int result = 1;
  for (int i = 1; i <= n; i++) {
    result *= i;
  }
  return result;
}

2. Recursive Factorial

Here is the recursive version (the function calls itself):

static int recursiveFactorial(final int n) {
  return n == 1 ? 1 : n * recursiveFactorial(n - 1);
}

Recursion lets you get rid of variables that are updated at each iteration. Typically, making a recursive function call is more expensive than iteration because each time the function is called, a new stack frame has to be created on the call stack to hold the state of each function call. Therefore, the memory consumed by the recursive function is proportional to the size of its input. You may even get a StackOverflowError with a large input.

3. Tail-recursive Factorial

Here's a tail-recursive definition of factorial. The recursive call is the last thing that happens in the function. In contrast, in our previous definition of factorial, the last thing was n multiplied by the result of the recursive call.

static int tailRecursiveFactorial(final int n) {
  return tailRecursiveFactorialHelper(n, 1);
}
private static int tailRecursiveFactorialHelper(final int n, final int acc) {
  return n == 1 ? acc : tailRecursiveFactorialHelper(n - 1, n * acc);
}

Tail-recursive functions can be optimised by the compiler in some programming languages. Instead of storing each intermediate result of the recursion onto different stack frames, a single stack frame can be reused instead, because there's no need to keep track of intermediate results. Unfortunately, Java does not support this kind of optimisation, but you might want to take this approach anyway, in case the compiler is optimised in the future. Other JVM languages (e.g. Scala) can optimise tail-recursion.

4. Stream Factorial

Finally, Java 8 streams provide a simple, declarative way of defining factorial:

static int streamFactorial(final int n) {
  return IntStream.rangeClosed(1, n).reduce(1, (a, b) -> a * b);
}

Sunday, July 31, 2016

Java 8: Implementing a custom TemporalAdjuster

TemporalAdjusters, introduced in Java 8's new Date and Time API, allow you to perform complex date manipulations. For example, you can adjust a date to the next Friday, or to the last day of the month. There are already several pre-defined TemporalAdjusters, which can be accessed using the static factory methods in the TemporalAdjusters class, as shown below:

import java.time.DayOfWeek;
import java.time.LocalDate;
import static java.time.temporal.TemporalAdjusters.*;

LocalDate date = LocalDate.of(2016, 7, 30);
LocalDate nextFriday = date.with(nextOrSame(DayOfWeek.FRIDAY)); // 2016-08-05
LocalDate monthEnd = date.with(lastDayOfMonth()); // 2016-07-31

If you can't find a suitable TemporalAdjuster, it's quite easy to create your own, by implementing the TemporalAdjuster interface.

Here is an example of a custom TemporalAdjuster, which moves the date forward to the next working day:

public class NextWorkingDay implements TemporalAdjuster {
  @Override
  public Temporal adjustInto(Temporal temporal) {
    DayOfWeek dayOfWeek = DayOfWeek.of(temporal.get(ChronoField.DAY_OF_WEEK));
    int daysToAdd = dayOfWeek == DayOfWeek.FRIDAY ? 3 :
                   (dayOfWeek == DayOfWeek.SATURDAY ? 2 : 1);
    return temporal.plus(daysToAdd, ChronoUnit.DAYS);
  }
}

Since TemporalAdjuster is a functional interface, you could use a lambda expression, but it is very likely that you will want to use this adjuster in other parts of your code, so it is better to encapsulate the logic in a proper class, which can then be re-used.

To define a TemporalAdjuster with a lambda expression, it is more convenient to use the ofDateAdjuster static factory method because it allows you work with a LocalDate object instead of a low level Temporal.

static TemporalAdjuster NEXT_WORKING_DAY = TemporalAdjusters.ofDateAdjuster(
  date -> {
   DayOfWeek dayOfWeek = date.getDayOfWeek();
   int daysToAdd = dayOfWeek == DayOfWeek.FRIDAY ? 3 :
                  (dayOfWeek == DayOfWeek.SATURDAY ? 2 : 1);
   return date.plusDays(daysToAdd);
});

Sunday, June 26, 2016

Java 8: CompletableFuture vs Parallel Stream

This post shows how Java 8's CompletableFuture compares with parallel streams when peforming asynchronous computations.

We will use the following class to model a long-running task:

class MyTask {
  private final int duration;
  public MyTask(int duration) {
    this.duration = duration;
  }
  public int calculate() {
    System.out.println(Thread.currentThread().getName());
    try {
      Thread.sleep(duration * 1000);
    } catch (final InterruptedException e) {
      throw new RuntimeException(e);
    }
    return duration;
  }
}

Let's create ten tasks, each with a duration of 1 second:

List<MyTask> tasks = IntStream.range(0, 10)
                                    .mapToObj(i -> new MyTask(1))
                                    .collect(toList());

How can we calculate the list of tasks efficiently?

Approach 1: Sequentially

Your first thought might be to calculate the tasks sequentially, as follows:

public static void runSequentially(List<MyTask> tasks) {
  long start = System.nanoTime();
  List<Integer> result = tasks.stream()
                              .map(MyTask::calculate)
                              .collect(toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
}

As you might expect, this takes 10 seconds to run, because each task is run one after the other on the main thread.

Approach 2: Using a parallel stream

A quick improvement is to convert your code to use a parallel stream, as shown below:

public static void useParallelStream(List<MyTask> tasks) {
  long start = System.nanoTime();
  List<Integer> result = tasks.parallelStream()
                              .map(MyTask::calculate)
                              .collect(toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
}

The output is

main
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-2
main
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-1
main
Processed 10 tasks in 3043 millis

This time it took 3 seconds because 4 tasks were run in parallel (using three threads from the ForkJoinPool, plus the main thread).

Approach 3: Using CompletableFutures

Let's see if CompletableFutures perform any better:

public static void useCompletableFuture(List<MyTask> tasks) {
  long start = System.nanoTime();
  List<CompletableFuture<Integer>> futures =
      tasks.stream()
           .map(t -> CompletableFuture.supplyAsync(() -> t.calculate()))
           .collect(Collectors.toList());

  List<Integer> result =
      futures.stream()
             .map(CompletableFuture::join)
             .collect(Collectors.toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
}

In the code above, we first obtain a list of CompletableFutures and then invoke the join method on each future to wait for them to complete one by one. Note that join is the same as get, with the only difference being that the former doesn't throw any checked exception, so it's more convenient in a lambda expression.

Also, you must use two separate stream pipelines, as opposed to putting the two map operations after each other, because intermediate stream operations are lazy and you would have ended up processing your tasks sequentially! That's why you first need to collect your CompletableFutures in a list to allow them to start before waiting for their completion.

The output is

ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-1
Processed 10 tasks in 4010 millis

It took 4 seconds to process 10 tasks. You will notice that only 3 ForkJoinPool threads were used and that, unlike the parallel stream, the main thread was not used.

Approach 4: Using CompletableFutures with a custom Executor

One of the advantages of CompletableFutures over parallel streams is that they allow you to specify a different Executor to submit their tasks to. This means that you can choose a more suitable number of threads based on your application. Since my example is not very CPU-intensive, I can choose to increase the number of threads to be greater than Runtime.getRuntime().getAvailableProcessors(), as shown below:

public static void useCompletableFutureWithExecutor(List<MyTask> tasks) {
  long start = System.nanoTime();
  ExecutorService executor = Executors.newFixedThreadPool(Math.min(tasks.size(), 10));
  List<CompletableFuture<Integer>> futures =
      tasks.stream()
           .map(t -> CompletableFuture.supplyAsync(() -> t.calculate(), executor))
           .collect(Collectors.toList());

  List<Integer> result =
      futures.stream()
             .map(CompletableFuture::join)
             .collect(Collectors.toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
  executor.shutdown();
}

The output is

pool-1-thread-2
pool-1-thread-4
pool-1-thread-3
pool-1-thread-1
pool-1-thread-5
pool-1-thread-6
pool-1-thread-7
pool-1-thread-8
pool-1-thread-9
pool-1-thread-10
Processed 10 tasks in 1009 millis

After this improvement, it now takes only 1 second to process 10 tasks.

As you can see, CompletableFutures provide more control over the size of the thread pool and should be used if your tasks involve I/O. However, if you're doing CPU-intensive operations, there's no point in having more threads than processors, so go for a parallel stream, as it is easier to use.

Sunday, June 19, 2016

Java 8: Default Method Resolution Rules

With the introduction of default methods in Java 8, it is now possible for a class to inherit the same method from multiple places (such as another class or interface). The following rules can be used to determine which method is selected in such cases:

A class or superclass method declaration always takes priority over a default method
Otherwise, the method with the most specific default-providing interface is used
Finally, if the methods are equally specific, there will be a compiler error and you will be forced to explicitly override the method and specify which one your class should call

Let's look at a few examples and apply these rules.

Example 1:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B {
  default void name() {
    System.out.println("B");
  }
}

public class C implements A {
  @Override
  public void name() {
    System.out.println("C");
  }
}

public class D extends C implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: C

This is because, as stated in Rule 1, the method declaration of name() from the superclass C takes priority over the default methods declarations in A and B.

Example 2:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B extends A {
  @Override
  default void name() {
    System.out.println("B");
  }
}

public class C implements A {}

public class D extends C implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: B

Unlike the previous example, C does not override name(), but since it implements A, it has a default method from A. According to Rule 2, if there are no methods in the class or superclass, the most specific default-providing interface is selected. Since B extends A, it is more specific and, as a result, "B" is printed.

Example 3:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B {
  default void name() {
    System.out.println("B");
  }
}

public class D implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: Compiler error! Duplicate default methods named name with the parameters () and () are inherited from the types B and A

In this example, there's no more-specific default-providing interface to select, so the compiler throws an error. To resolve the error, you need to explicitly override the method in D and specify which method declaration you want D to use. For example, if you want to use B's:

class D implements A, B {
  @Override
  public void name() {
    B.super.name();
  }
}

Example 4:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B extends A {}

public interface C extends A {}

public class D implements B, C {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: A

The sub-interfaces B and C haven't overridden the method, so there is actually only the method from A to choose from. As a side note, if either B or C (but not both) had overridden the method, then Rule 2 would have applied. By the way, this is the diamond problem.

Saturday, June 18, 2016

Java 8: Debugging Stream Pipelines

I've found that stream pipelines can be difficult to debug because stack traces involving lambda expressions are quite cryptic. Consider the following contrived example:

import java.util.Arrays;
import java.util.List;
import java.util.function.Function;

public class Test {
  public static void main(final String[] args) {
    final List<String> list = Arrays.asList("foo", null, "bar");
    list.stream()
        .map(Function.identity())
        .filter(x -> true)
        .map(String::length)
        .forEach(System.out::println);
  }
}

You may have already guessed that the code above will throw a NullPointerException when String.length is called on the null element in the list. I've added extra map and filter operations, which do nothing, just to make the example a bit more interesting. In the real world, you will probably have a number of different operations in your stream pipeline.

Running the code, produces the following stack trace:

Exception in thread "main" java.lang.NullPointerException
  at Test$$Lambda$3/455659002.apply(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source)
  at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
  at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
  at java.util.stream.ReferencePipeline.forEach(Unknown Source)
  at Test.main(Test.java:12)

The stack trace shows that a NullPointerException occurred but it doesn't tell you which operation in the pipeline failed. What does Test$$Lambda$3/455659002.apply(Unknown Source) mean and why is there no line number?! Since lambda expressions don't have a name, the compiler makes one up (similar to anonymous classes). In this case, it is Test$$Lambda$3 but that doesn't help us track the bug in our code.

So, what can we do? Let's go old-school and add some logging to our code! We can use peek to print out each element before it is consumed by the next operation in the pipeline.

import java.util.Arrays;
import java.util.List;
import java.util.function.Function;

public class Test {
  public static void main(final String[] args) {
    final List<String> list = Arrays.asList("foo", null, "bar");
    list.stream()
        .peek(x -> System.out.println("Running identity on: " + x))
        .map(Function.identity())
        .peek(x -> System.out.println("Running filter on: " + x))
        .filter(x -> true)
        .peek(x -> System.out.println("Running string length on: " + x))
        .map(String::length)
        .peek(x -> System.out.println("Running print on: " + x))
        .forEach(System.out::println);
  }
}

Running it produces the following output:

Running identity map on: foo
Running filter on: foo
Running string length on: foo
Running print on: 3
3
Running identity map on: null
Running filter on: null
Running string length on: null
Exception in thread "main" java.lang.NullPointerException
  at Test$$Lambda$6/295530567.apply(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source)
  at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
  at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
  at java.util.stream.ReferencePipeline.forEach(Unknown Source)
  at Test.main(Test.java:16)

Great! Now we know that the NullPointerException was thrown by the string length lambda!

In general, I think stack traces involving lambdas could be improved in future versions of Java.

Sunday, June 12, 2016

Java 8: Converting Anonymous Classes to Lambda Expressions

Refactoring anonymous classes (that implement a single method) to lambda expressions, makes your code more succint and readable. For example, here's an anonymous class for a Runnable and its lambda equivalent:

// using an anonymous class
Runnable r = new Runnable() {
  @Override
  public void run() {
    System.out.println("Hello");
  }
};

// using a lambda expression
Runnable r2 = () -> System.out.println("Hello");

However, it's not always that simple!

Here are a couple of gotchas:

1. Different scoping rules

There are different scoping rules between anonymous classes and lambda expressions. For example, in lambda expressions, this and super are lexically scoped, meaning they are relative to the enclosing class, but in an anonymous class, they are relative to the anonymous class itself. Similarly, local variables declared in lambda expressions will conflict with variables declared in the enclosing class, but in anonymous classes, they are allowed to shadow variables in the enclosing class. Here is an example:

int foo = 1;
Runnable r = new Runnable() {
  @Override
  public void run() {
    // this is ok!
    int foo = 2;
  }
};

Runnable r2 = () -> {
  // compile error: Lambda expression's local variable foo cannot
  // redeclare another local variable defined in an enclosing scope.
  int foo = 2;
};

2. Overloaded methods

If you have an overloaded method, using lambda expressions can result in an ambiguous method call and will require explicit casting. Here is an example:

// Functional interface
interface Task {
  public void execute();
}

// Overloaded methods
public static void go(final Runnable r) {
  r.run();
}
public static void go(final Task t) {
  t.execute();
}

// Calling the overloaded method:

// When using an anonymous class, there is no ambiguity because
// the type of the class is explicit at instantiation
go(new Task() {
  @Override
  public void execute() {
     System.out.println("Hello");
  }
});

// When using a lambda expression, there is a compile error!
// The method go(Runnable) is ambiguous
go(() -> {
  System.out.println("Hello");
});

// This ambiguity can be solved with an explicit cast
go((Task)() -> {
  System.out.println("Hello");
});

Sunday, April 17, 2016

Java 8 Streams API: Immutable List Collector

Instead of first collecting a stream into a list and then making it unmodifiable, as shown below:

mutableList = list.stream()
                  // perform some stream operations...
                  .filter(myFilterPredicate)
                  .map(myMapperFunction)
                  // collect into a list
                  .collect(toList());

// now make the list unmodifiable
return Collections.unmodifiableList(list);

you can using collectingAndThen as follows:

return list.stream()
           // perform some stream operations...
           .filter(myFilterPredicate)
           .map(myMapperFunction)
           // collect into an unmodifiable list
           .collect(collectingAndThen(toList(), Collections::unmodifiableList));

Sunday, March 20, 2016

Java 8: Joining a Stream

In a previous post, I wrote about how you can use Guava's Joiner to easily convert an Iterable into a String, so you no longer have to iterate over it and build the String manually.

Java 8 introduces the StringJoiner, which can be used to join a Stream using Collectors.joining. For example:

Arrays.asList("apple", "banana", "cherry")
      .stream()
      .collect(Collectors.joining(", ");

To skip nulls and empty strings, you can filter the stream first:

Arrays.asList("apple", "banana", null, "", "cherry")
      .stream()
      .filter(s -> s != null && !s.isEmpty())
      .collect(Collectors.joining(", ");

I'd still use Guava's Joiner for lists though.

Monday, December 28, 2015

Java 8: Convert a String to a Stream of Characters

I find it a bit strange that the Java API does not provide a method to convert a String to a Stream<Character>, but this is how you can do it:

Stream<Character> stream = string.chars().mapToObj(i -> (char)i);

// or:

Stream<Character> stream = IntStream.range(0, string.length())
                                    .mapToObj(string::charAt);

Sunday, November 29, 2015

Java 8 Streams API: Grouping and Partitioning a Stream

This post shows how you can use the Collectors available in the Streams API to group elements of a stream with groupingBy and partition elements of a stream with partitioningBy.

Consider a stream of Employee objects, each with a name, city and number of sales, as shown in the table below:

+----------+------------+-----------------+
| Name     | City       | Number of Sales |
+----------+------------+-----------------+
| Alice    | London     | 200             |
| Bob      | London     | 150             |
| Charles  | New York   | 160             |
| Dorothy  | Hong Kong  | 190             |
+----------+------------+-----------------+

Grouping

Let's start by grouping employees by city using imperative style (pre-lamba) Java:

Map<String, List<Employee>> result = new HashMap<>();
for (Employee e : employees) {
  String city = e.getCity();
  List<Employee> empsInCity = result.get(city);
  if (empsInCity == null) {
    empsInCity = new ArrayList<>();
    result.put(city, empsInCity);
  }
  empsInCity.add(e);
}

You're probably familiar with writing code like this, and as you can see, it's a lot of code for such a simple task!

In Java 8, you can do the same thing with a single statement using a groupingBy collector, like this:

Map<String, List<Employee>> employeesByCity =
  employees.stream().collect(groupingBy(Employee::getCity));

This results in the following map:

{New York=[Charles], Hong Kong=[Dorothy], London=[Alice, Bob]}

It's also possible to count the number of employees in each city, by passing a counting collector to the groupingBy collector. The second collector performs a further reduction operation on all the elements in the stream classified into the same group.

Map<String, Long> numEmployeesByCity =
  employees.stream().collect(groupingBy(Employee::getCity, counting()));

The result is the following map:

{New York=1, Hong Kong=1, London=2}

Just as an aside, this is equivalent to the following SQL statement:

select city, count(*) from Employee group by city

Another example is calculating the average number of sales in each city, which can be done using the averagingInt collector in conjuction with the groupingBy collector:

Map<String, Double> avgSalesByCity =
  employees.stream().collect(groupingBy(Employee::getCity,
                               averagingInt(Employee::getNumSales)));

The result is the following map:

{New York=160.0, Hong Kong=190.0, London=175.0}

Partitioning

Partitioning is a special kind of grouping, in which the resultant map contains at most two different groups - one for true and one for false. For instance, if you want to find out who your best employees are, you can partition them into those who made more than N sales and those who didn't, using the partitioningBy collector:

Map<Boolean, List<Employee>> partitioned =
  employees.stream().collect(partitioningBy(e -> e.getNumSales() > 150));

This will produce the following result:

{false=[Bob], true=[Alice, Charles, Dorothy]}

You can also combine partitioning and grouping by passing a groupingBy collector to the partitioningBy collector. For example, you could count the number of employees in each city within each partition:

Map<Boolean, Map<String, Long>> result =
  employees.stream().collect(partitioningBy(e -> e.getNumSales() > 150,
                               groupingBy(Employee::getCity, counting())));

This will produce a two-level Map:

{false={London=1}, true={New York=1, Hong Kong=1, London=1}}

Saturday, October 31, 2015

Java 8 Streams API: Finding and matching

The Streams API provides some useful methods to determine whether elements in a stream match a given condition.

anyMatch
The anyMatch method can be used to check if there exists an element in the stream that matches a given predicate. For example, to find out whether a stream of random numbers has a number greater than 5:

IntStream randomStream = new Random(100, 1, 11);
if (randomStream.anyMatch(i -> i > 5)) {
 System.out.println("The stream has a number greater than 5");
}

allMatch
The allMatch method can be used to check if all elements in the stream match a given predicate. For example, to find out whether a stream of random numbers only contains positive numbers:

boolean isPositive = randomStream.allMatch(i -> i > 0);

noneMatch
noneMatch is the opposite of allMatch and can be used to check that no elements in the stream match a given predicate. The previous example could be rewritten using noneMatch as follows:

boolean isPositive = randomStream.noneMatch(i -> i <= 0);

findAny
The findAny method returns an arbitrary element of the stream. It returns an Optional because it's possible that no element might be returned by findAny. For example, to find a number greater than 5 in our random number stream:

OptionalInt number = randomStream.filter(i -> i > 5)
                                 .findAny();

findFirst
findFirst is similar to findAny but returns the first element in the stream. For example, to find the first number greater than 5 in our random number stream:

OptionalInt number = randomStream.filter(i -> i > 5)
                                 .findFirst();

The difference between findAny and findFirst arises when using parallel streams. Finding an arbitrary element in a stream is less constraining than finding the first element, when running in parallel mode, so findAny may perform better. So, if you don't care about which element is returned, use findAny.

An interesting thing to note is that the operations described above use short-circuiting i.e. they don't need to process the entire stream to produce a result. As soon as an appropriate element is found, a result is returned.

Sunday, July 26, 2015

Java 8: Creating infinite streams

There are many ways you can build a Stream in Java 8. One of the most common ways, is to get a stream from a collection using the stream method as shown below:

List<String> list = Arrays.asList("Alice", "Bob");
Stream<String> stream = list.stream();
stream.forEach(System.out::println);

You can also create a stream from values using the static method, Stream.of:

Stream<String> stream = Stream.of("Alice", "Bob");

Arrays can be converted to streams using the static method, Arrays.stream:

int[] numbers = { 1, 3, 6, 8 };
IntStream stream = Arrays.stream(numbers);

Creating infinite streams:

The Streams API provides two static methods: Stream.iterate and Stream.generate, which allow you to create an infinite stream. Here's an example:

Stream<Integer> evenNumbers = Stream.iterate(0, n -> n + 2);

The example above produces an infinite stream of even numbers. The iterate method takes a starting value and a lambda that is used to generate a new value from the previous one. In this case, the lambda returns the previous value added with 2. The stream is infinite because values are computed on demand and can be computed forever. (Note that you can use the limit method to explicitly limit the size of the stream.)

Similarly, Stream.generate also lets you produce an infinite stream, but the difference is that it takes a lamba of type Supplier<T> to provide new values, rather than a lambda that applies successively to the previous value. Here's an example:

Stream<Integer> randomNumbers = Stream.generate(Math::random);

Sunday, June 28, 2015

Java 8: Composing Comparators

In my last post, I showed how you can sort a list of Person objects by age, using the following statement:

list.sort(comparingInt(Person::getAge));

What if you want to sort the objects in order of decreasing age instead? There's no need to create a new instance of Comparator, because the Comparator interface has a handy default method reversed that reverses its ordering:

list.sort(comparingInt(Person::getAge).reversed());

Now, what if you want to sort people by name if they have the same age. The thenComparing method allows you to do just that, as shown below:

list.sort(comparingInt(Person::getAge)
    .reversed()
    .thenComparing(Person::getName));