This post shows how you can efficiently split a pipe-delimited string e.g. "foo|bar|baz"
. There are many ways to do this - I could even write my own - but I will only use those that are available in the JDK (or commonly used libraries) and will measure the performance of each.
Remember that, since the pipe symbol (|
) is a special character in regular expressions, it needs to be escaped if necessary.
1. String.split
The most obvious way to split a string on the pipe character is to use Java's String.split
:
public static String[] split(String s) { return s.split("\\|"); }
2. String.split with Pattern.quote
Instead of escaping the pipe ourselves, we can use Pattern.quote
to do it for us. (Note: Pattern.quote("|")
returns "\Q|\E"
.)
public static String[] splitWithPatternQuote(String s) { return s.split(Pattern.quote("|")); }
3. Pattern.split
Create a static Pattern
and use it to split the string.
private static final Pattern SPLITTER = Pattern.compile("\\|"); public static String[] splitWithPattern(String s) { return SPLITTER.split(s); }
4. StringUtils.split
Apache Commons provides StringUtils.split
, which splits a string on a single character:
import org.apache.commons.lang3.StringUtils; public static String[] splitWithStringUtils(String s) { return StringUtils.split(s, '|'); }
So, which one is fastest?
I ran each method on 1 million pipe-delimited strings of different lengths - RandomStringUtils.randomAlphabetic
is great for generating random strings - and the table below shows how long each one took:
Method | Time (ms) |
split | 485 |
splitWithStringUtils | 520 |
splitWithPattern | 643 |
splitWithPatternQuote | 936 |
An interesting observation is that splitWithPatternQuote
is so much slower than split
, even though they both call String.split
internally! If we delve into the source code for String.split
, we can see that there is an optimisation (a "fastpath") if the provided regex has two-chars and the first char is a backslash. This applies to "\\|"
but, since Pattern.quote
produces \Q|\E
, it does not use the fastpath and instead creates a new Pattern
object for every split. This also explains why it is slower than splitWithPattern
, which re-uses the same Pattern
object.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.