Randomized Data Pool

In my years as a Java developer, I've done a lot of simple performance tests to see how one method of processing compared to another, but one of the issues that I've run into was the effect of database caching on the performance results.

If the two methods of reading the database both read for the same value, the second read is always going to be faster than the first because the data will be cached in the database's data pool with the second read. To solve this problem, I created a simple DataPool class which can retrieve random values from a list of data.


/**
 * Wrapper class which allows a random entry from a list of values to be retrieved.
 * Useful in performance testing to get random values from a list of values.
 *
 * @param <T> defines the object types stored in the data pool
 * @author Brian Bos
 */
public class DataPool<T> {
  private final List<T> data;

  /**
   * Constructor.
   */
  public DataPool() {
    this.data = new ArrayList<T>();
  }

  /**
   * Constructor.
   * @param pData list of data to use for the pool.
   */
  public DataPool(List<T> pData) {
    this.data = pData;
  }

  /**
   * Adds a new entry to the data pool.
   * @param entry the value to add to the data pool
   * @return reference to this object, useful for chaining, also allows the "+"
   *         operator to be used in groovy
   */
  public DataPool<T> plus(T entry) {
    this.data.add(entry);
    return this;
  }

  /**
   * Adds a new entry to the data pool.
   * @param entry the value to add to the data pool
   * @return reference to this object, useful for chaining, also allows the "<<"
   *         operator to be used in groovy
   */
  public DataPool<T> leftShift(T entry) {
    return this.plus(entry);
  }

  /**
   * @return a random value from the data pool
   */
  public T nextValue() {
    return this.data.get(
      org.apache.commons.lang.math.RandomUtils.nextInt(this.data.size()));
  }

  /**
   * @param num the number of values to retrieve
   * @return a random value from the data pool
   */
  public List<T> nextValues(int num) {
    final List<T> values = new ArrayList<T>();
    for (int i = 0; i < num; i++) {
      values.add(this.data.get(RandomUtils.nextInt(this.data.size())));
    }
    return values;
  }

}

I used this class in a groovy script which compared reading data with CICS versus DB2. Before I randomized the data with this approach, the DB2 method looked like it was faster, but that was just because the data was being cached. Using random data gave me a better test and showed that the CICS read was actually faster than the complex query, at least when reading policy by policy.


//prepare data pool
def dataPool = new DataPool<String>()
new File("./performance/policyNumbers.txt").eachLine {
 dataPool << it.trim()
}

//Single policy performance test
def cicsPolicies = ([] << dataPool.nextValue())
def db2Policies = ([] << dataPool.nextValue())
performReads(reader, cicsPolicies, querier, db2Policies)

//multiple policy read performance test
cicsPolicies = dataPool.nextValues(50)
db2Policies = dataPool.nextValues(50)
performReads(reader, cicsPolicies, querier, db2Policies)

Post a Comment