Saturday, February 23, 2013

Comparing CSV Data Files Using SQLite

Whenever I have to compare large datasets generated by two different environments, such as Production and QA, I tend to load the data into a SQLite database first and then run SQL queries to diff the data.

The code below shows how you can import CSV files into SQLite:

-- create the tables to hold the data
CREATE TABLE dataProd (id text, price numeric);
CREATE TABLE dataQA   (id text, price numeric);

-- import the data
.separator ","
.import /path/prod/data.csv dataProd
.import /path/qa/data.csv dataQA

.headers ON

-- find differences between data
SELECT p.id,
       p.price as prodPrice,
       q.price as qaPrice,
       abs(p.price-q.price) as diff
FROM dataProd p, dataQA q
WHERE p.id = q.id
AND p.price <> q.price

Sunday, February 17, 2013

Retrying Operations using Spring's RetryTemplate

Back in 2009, I blogged about Retrying Operations in Java in which I covered three different approaches to retrying operations on failure. Here is another alternative:

If your application is using Spring then it is easier to use the Spring Framework's RetryTemplate.

The example below shows how you can use a RetryTemplate to lookup a remote object. If the remote call fails, it will be retried five times with exponential backoff.

// import the necessary classes
import org.springframework.batch.retry.RetryCallback;
import org.springframework.batch.retry.RetryContext;
import org.springframework.batch.retry.backoff.ExponentialBackOffPolicy;
import org.springframework.batch.retry.policy.SimpleRetryPolicy;
import org.springframework.batch.retry.support.RetryTemplate;
...

// create the retry template
final RetryTemplate template = new RetryTemplate();
template.setRetryPolicy(new SimpleRetryPolicy(5));
final ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
backOffPolicy.setInitialInterval(1000L);
template.setBackOffPolicy(backOffPolicy);

// execute the operation using the retry template
template.execute(new RetryCallback<Remote>() {
  @Override
  public Remote doWithRetry(final RetryContext context) throws Exception {
    return (Remote) Naming.lookup("rmi://somehost:2106/MyApp");
  }
});
Related Posts:
Retrying Operations in Java

Saturday, February 16, 2013

stackoverflow - 50k rep

Five months after crossing the 40k milestone, I've now reached a reputation of 50k on stackoverflow!

The following table shows some stats about my journey so far:

0-10k 10-20k 20-30k 30-40k 40-50k Total
Date achieved 01/2011 05/2011 01/2012 09/2012 02/2013
Questions answered 546 376 253 139 192 1506
Questions asked 46 1 6 0 1 54
Tags covered 609 202 83 10 42 946
Badges
(gold, silver, bronze)
35
(2, 10, 23)
14
(0, 4, 10)
33
(2, 8, 23)
59
(3, 20, 36)
49
(0, 19, 30)
190
(7, 61, 122)
As I mentioned before, I have really enjoyed being a member of stackoverflow. For me, it has not simply been a quest for reputation, but more about learning new technologies and picking up advice from other people on the site. I like to take on challenging questions, rather than the easy ones, because it pushes me to do research into areas I have never looked at before, and I learn so much during the process.

Next stop, 60k!

Saturday, February 09, 2013

Selecting Specific Lines of a File Using Head, Tail and Sed

This post contains a few handy commands used to select specific lines from a file.

Print the first N lines

head -N file
Print the last N lines
tail -N file
Print all EXCEPT the first N lines
tail +$((N+1)) file
Print all EXCEPT the last N lines
head -n -N file
Print lines N to M (inclusive)
sed -n 'N,Mp' file
Print line N
sed 'Nq;d' file
Print all EXCEPT line N
sed 'Nd' file
Print multiple lines, I, J, K etc
Assuming I > J > K:
sed 'Ip;Jp;Kq;d' file
The last q tells sed to quit when it reaches the Kth line instead of looping over the remaining lines that we are not interested in.

Saturday, February 02, 2013

Guava Table

Guava's Table<R, C, V> is a useful alternative to nested maps of the form Map<R, Map<C, V>>. For example, if you want to store a collection of Person objects keyed on both firstName and lastName, instead of using something like a Map<FirstName, Map<LastName, Person>>, it is easier to use a Table<FirstName, LastName, Person>.

Here is an example:

final Table<String, String, Person> table = HashBasedTable.create();
table.put("Alice", "Smith", new Person("Alice", "Smith"));
table.put("Bob", "Smith", new Person("Bob", "Smith"));
table.put("Charlie", "Jones", new Person("Charlie", "Jones"));
table.put("Bob", "Jones", new Person("Bob", "Jones"));

// get all persons with a surname of Smith
final Collection<Person> smiths = table.column("Smith").values();

// get all persons with a firstName of Bob
final Collection<Person> bobs = table.row("Bob").values();

// get a specific person
final Person alice = table.get("Alice", "Smith");