Autowiring a bean manually with Spring

When creating a functional test that uses Spring and a resource, such as DB2, it's helpful to have a common context file which defines my ConnectionFactory to the database. But I need a way to inject that factory into my DAO. In general, I try not to use XML configuration, so how do I get my DAO to autowire? Enter the ApplicationContextAware class.

Here's a Spock test that autowires the DAO objects.

@ContextConfiguration(locations = ['classpath:db2ResourceContext.xml'])
public class Tester extends Specification implements ApplicationContextAware {

  @Shared
  private MyDAO dao = new MyDAO()
  
  def 'My Test'() {
    //use the dao confident the connection factory has been autowired
  }

  @Override
  public void setApplicationContext(ApplicationContext applicationContext) {
    applicationContext.autowireCapableBeanFactory.autowireBean(dao)
  }
}

Groovy Copybook Slurper


In order to call CICS from Java, we've traditionally used RDz to generate a J2C bean via an Ant script. But I don't have RDz anymore (it doesn't support Groovy and always running older versions of Eclipse), so my only alternative was to use a JCL to generate a bean via IBM's JZOS library.

I'm still not a big fan, it takes time to run the JCL, download the generated file from the mainframe and tweak the object names, packages, etc.. What if I could just declare my copylib structure in my Java code and call CICS? To that end, I've been working on a parser which converts a byte stream to/from a map.

Leaving the CICS call of it out for the moment, here's an idea of want the read code would look like.

def slurper = new CopybookSlurper("""\
01 TESTING-COPYBOOK.
   03 STATES OCCURS 10 TIMES .
      05 STATE PIC XX.
""")
def data = slurper.getReader('MIOHINWINVCAKYFLNCSC'.getBytes())
data.STATES.each {
  print it.STATE  //will print each two byte state
}
assert 'IN' == data.STATES[2].STATE //zero-based indexed of the STATES list

And here's the write code

def slurper = new CopybookSlurper("""\
  01 TOP-LEVEL.
     03 STATES OCCURS 2 TIMES.
        05 STATE-NUM        PIC X(3).
        05 STATE-NAME       PIC X(10).
        05 NUMERIC-STATE    PIC 9(2). """)
    
def writer = slurper.getWriter(new byte[slurper.length])
writer.with {
  STATES[0].with {
 STATE_NUM = '1'
 STATE_NAME = 'MICHIGAN'
 NUMERIC_STATE = 2
  }
  STATES[1].with {
 STATE_NUM = '2'
 STATE_NAME = 'OHIO'
 NUMERIC_STATE = 3
  }
}
assert new String(writer.getBytes(), 'IBM-37') ==
       '1  MICHIGAN  022  OHIO      03'

The slurper name and map structure is inspired by Groovy's XmlSlurper and JsonSlurper. My parser implementation is written in groovy to take advantage of the dynamic nature of the maps and closures. I used the JZOS library under the covers to handle parsing the individual data elements so it can handle binary fields (COMP-3 stuff). Unfortunately JZOS didn't provide any simple hooks for getting a data type based on a PIC definition, so I had to rewrite that logic. I know my parser isn't as robust as the IBM version, it doesn't handle some of the weird datatypes for instance, but it's good enough for the normal copybooks that we'd be using from the web.

I also wanted to make my map lazy so that the bytes are only parsed if requested, after all sometimes COBOL copybooks have a ton of data in them, and I only want a small piece. So rather than parsing the field right away, the parser creates a map of closures with instructions on how to parse the data. Likewise, the writer stores a closure for each variable with instructions on how to write data to a shared byte array. The getReader method is shown below. There's a lot of complexity involved in occurrences and groups that you can't see here, but I just wanted to show the mapping of the variable names to closures.

public Map<String, ?> getReader(byte [] bytes) {
  Map readers = [:]
  fieldProcessors.each { DataVariable variable ->
    if (variable.read) {
      readers[variable.name] = { variable.read(bytes) }
    }
  }
  return new LazyProperties(readers, bytes)
}

The LazyProperties class extends HashMap and invokes the closures from the map when referenced, then caches the results for the next call. The getWriter method works in a similar fashion with closures, though it doesn't cache results because any call to the write methods should write data to the byte stream.

That's what I have working so far, the next step is to look into the javax.resource.cci.Record and javax.resource.cci.Streamable interfaces and perhaps Spring's org.springframework.jca.cci.core.support.CciDaoSupport to call down to CICS.

Generating Lots Values From A DB2 Sequence Object

I was working on a Groovy script that inserts lots of data into a database, but I needed to generate a new key from a sequence object and keep track of the old one. So I need to invoke NEXTVAL for each key, but considering how many thousands of records I was inserting, I didn't want to call the database for each key individually.

There's no way to batch up SELECT statements, but I thought I could perhaps generate a big SELECT statement that includes all the keys I needed.

SELECT 1 AS OLD_KEY, MYSEQUENCE.NEXTVAL FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT 2 AS OLD_KEY, MYSEQUENCE.NEXTVAL FROM SYSIBM.SYSDUMMY1

The problem with this approach is that when you request NEXTVAL multiple times in the same SELECT statement, it returns the same value each time. So I ended up generating one new key assigned to all of my old keys. No good, so I had to try something else. I decided to insert into a GLOBAL TEMPORARY TABLE.

db.execute('''DECLARE GLOBAL TEMPORARY TABLE SESSION.GEN_IDS (
  ORIG_ID DECIMAL(12,0), NEW_ID DECIMAL(12,0)) ON COMMIT PRESERVE ROWS''')

db.withBatch('INSERT INTO SESSION.GEN_IDS VALUES (?, MYSEQ.NEXTVAL)') { 
  BatchingPreparedStatementWrapper ps ->
    oldKeys.each { ps.addBatch(it) }
  }

def mappings = target.rows('SELECT * FROM SESSION.GEN_IDS').collectEntries {
 Map data -> [(data.ORIG_ID) : data.NEW_ID]
}

db.execute('DROP TABLE SESSION.GEN_IDS')

Problem solved. The collectEntries method will create a map for the old and new values. Of course, then I learned about the JDBC getGeneratedKeys method, which makes this approach much less useful, but I'm still glad I learned how to use the GLOBAL TEMPORARY TABLEs.

Accessing Numeric Value Somewhere in a DB2 Column

We recently had some inconsistent data in our database that required a pretty special query. The column contained what should have been a three byte marketing territory number, but depending on what application/user entered the data, the value could be stored almost anywhere in the column, and be 1 to 3 bytes. The following query gets that numeric value.

LPAD(
  TRANSLATE(
    REPLACE(
      TRANSLATE(MISC1_KEY || MISC2_KEY, ' ', X'00'),
    ' ', ''),
  '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
3, '0') AS TERRITORY,

The query has four steps, working from the inside out:
  1. Combine the two columns together and replace and null values with a space, X'00' is the hex code for null.
  2. Remove any spaces from the column, originally I wanted to do this with the next TRANSLATE step, but for some reason the spaces seem to be ignored with TRANSLATE
  3. Remove any alpha characters, the column also sometimes contained state abbreviations. It would be nice if we could use something like ^[0..9], but DB2 doesn't support that, and on z/OS it doesn't even support hex ranges. 
  4. Finally pad the number with leading zeros to make it three bytes long. 
It's quite a few steps, but the end result is a consistent three byte code. Especially useful when used with a CTE or a sub-select.

Groovy Overload (of Operators)

I was recently working on a pretty long class which mapped a web-service response to a set of 'activity codes' in our Java application. The class ends up mapping some 60+ different codes based on various scenarios, and appends some extra data for some. The code was original written in Java and pretty well organized (okay I wrote it, so I may be bias), but it was very wordy, there were a lot of repeated words/verbs in the class.

So, I set about re-writing this class in Groovy, not just for the wordiness, there were some additional complexities to the rules that I needed to apply, but I wanted to make this monstrosity easier to understand. I made use of Groovy's capability to use Strings in switch statements (coincidentally, we were given the okay to use Java 7 the following week...), which helped (previously I had been converting the Strings into enum), and I made use of Groovy's implicit calls to setters and it's concise map notation. All of which made a big difference. I also used static imports for my enums so the names were shorter. So the code looked like this in the end.

details.activityCode = PAYMENT_TRANSFER_IN
details.activityProperties[REVERSAL_REAPPLIED_TO] = pay.transferAcct
details.activityProperties[ACCOUNT_NUMBER] = pay.transferAcct

But I wanted to go one step further, every single one of our activities had an activityCode, that's the primary data we're mapping, and the activity properties was only the only map in the object. So saying .activityCode and .activityProperties was redundant. What I really wanted was something more concise like this:

details << PAYMENT_TRANSFER_IN
details[REVERSAL_REAPPLIED_TO] = pay.transferAcct
details[ACCOUNT_NUMBER] = pay.transferAcct

And thanks to Groovy's ability to add methods which override operators, that was an easy task. These are the relevant parts of my POJO to support this notation. 

private String activityCode;
private Map<ActivityType, String> activityProperties = new HashMap<>();

//Overrides the '<<' operator in groovy code to set the account activity code.
public void leftShift(ActivityCode value) {
  this.setActivityCode(value);
}

// Used with groovy map notation to get activityProperties values.
public String getAt(Object key) {
  return this.sapProperties.get(key);
}

// Used with groovy map notation to set activityProperties values.
public String putAt(AccountActivityTypeDescriptor key, Object value) {
  if (value != null) {
    return this.sapProperties.put(key, String.valueOf(value));
  }
  return null;
}

Note: I could add leftShift methods for other object types too, like the amount, which is also often set on these objects, but I still want the code to be readable, that's the whole reason I was exploring this change, so since the activity code is the one that gets used repeatedly, that's the only value where I implemented leftShift.

The finally effect is much less code to read through, which makes understanding this massive class considerably easier.

Groovy Tabular Data DSL

After using Spock, I became really interested in the tabular input of data into my services, not just into my test. The project I'm working on has a lot of transaction data with various associated properties. So transaction type 1, has fee type A associated with it.

Traditionally I've done that type of property mapping using enums, or maybe a switch statement.This works well, but there's still a good amount of code, and I'd like to make the mapping as readable as possible for the next guy. Especially since in the particular case I'm working on, the enum has some 700+ transaction types and only a handful of them need additional properties.

Groovy made property mapping easier with it's slick map syntax, and that worked really well too, but I kept thinking about Spock, and that tabular data syntax, which I thought was even more readable than a map. I had this:

[Transaction.SERVICE_CHARGE : 
  [feeType: FeeType.SERVICE_FEE,
   waiveTransaction : Transaction.SERVICE_CHARGE_WAIVED,
   reapplyTransaction : Transaction.SERVICE_CHARGE_REAPPLIED
   desc: 'PAID SERVICE CHARGE'
  ],
Transaction.NSF_FEE_CHARGE : 
  [feeType: FeeType.NSF_FEE,
   waiveTransaction : Transaction.NSF_WAIVED,
   reapplyTransaction : Transaction.NSF_REAPPLIED
   desc: 'NSF CHARGE'
  ]
]

But I was really hoping to see something more like this (values omitted for formatting):

transType                  | feeType             | desc
Transaction.SERVICE_CHARGE | FeeType.SERVICE_FEE | 'PAID SERVICE CHARGE'
Transaction.NSF_FEE_CHARGE | FeeType.NSF_FEE     | 'NSF CHARGE'

During my research I ran across a blog entry by Christian Baranowski that did an excellent job of showing how to use Groovy's category and override properties features to create a simple table DSL. I took what he did and added a few more features for my own benefit, so my version supports three different outputs:

* data - returns a  list of lists, just giving the values from the table
* dataWithTitleRow - returns a list of mapped properties that represent each row, the title row column names are used as the key in the map
* dataWithTitleRow (def key) - returns a keyed map using the value from the value from the column identified from the 'key' parameter as the key.

def data = Table.withTitleRow('transType') {
transType                  | feeType             | desc
Transaction.SERVICE_CHARGE | FeeType.SERVICE_FEE | 'PAID SERVICE CHARGE'
Transaction.NSF_FEE_CHARGE | FeeType.NSF_FEE     | 'NSF CHARGE'
}
assert data[Transaction.SERVICE_CHARGE].desc == 'PAID SERVICE CHARGE'

Here's what my version looks like

import groovy.transform.Canonical

/**
 * Parses tubular data into 'rows' of just values, or mapped values (if there
 * is a title row in the data), or a map of keyed values.
 *
 * <p>
 * Based on blog entry at
 * http://tux2323.blogspot.com/2013/04/simple-table-dsl-in-groovy.html
 */
public class Table {
  /**
   * The Groovy category feature used to implement this DSL uses static
   * methods, so thread local is used to store off the parsed content as it
   * is processed.
   */
  private static ThreadLocal<List> PARSING_CONTEXT = new ThreadLocal<List>()

  /**
   * Parses a list of tabular data.
   *
   * <pre>
   * def data = Table.data {
   * Donald | Duck | Mallard
   * Mickey | Mouse | Rodent
   * }
   * assert data.first() == ['Donald', 'Duck', 'Mallard']
   * </pre>
   *
   * @param tabularData contains all the data to parse delimited by |
   * @return a list of lists of data
   */
  public static List data(Closure tabularData) {
    PARSING_CONTEXT.set([])
    use(Table) {
      tabularData.delegate = new PropertyVarConvertor()
      tabularData.resolveStrategy = Closure.DELEGATE_FIRST
      tabularData()
    }
    return PARSING_CONTEXT.get().collect { Row row -> row.values }
  }

  /**
   * Parses a list of tabular data with a title row, returns a list of
   * mapped properties.
   *
   * <pre>
   * def data = Table.dataWithTitleRow {
   * firstName | lastName | type
   * Donald | Duck | Mallard
   * Mickey | Mouse | Rodent
   * }
   * assert data.first() ==
   *   [firstName: 'Donald', lastName: 'Duck', type: 'Mallard']
   * </pre>
   *
   * @param tabularData contains all the data to parse delimited by |
   * @return a list of lists of data
   */
  public static List dataWithTitleRow(Closure tabularData) {
    List rows = data(tabularData)
    def titleRow = rows.first()

    return rows[1..<rows.size()].collect { List row ->
      def mappedRows = [:]
      row.eachWithIndex { it, index ->
        mappedRows[titleRow[index]] = it
      }
      return mappedRows
    }
  }

  /**
   * Parses a list of tabular data with a title row and specifies the column
   * that should be used as a key in the output map.
   *
   * <pre>
   * def data = Table.dataWithTitleRow('firstName') {
   * firstName | lastName | type
   * Donald | Duck | Mallard
   * Mickey | Mouse | Rodent
   * }
   * assert data['Donald'] ==
   *   [firstName: 'Donald', lastName: 'Duck', type: 'Mallard']
   * </pre>
   *
   * @param key the name of the column that should be used as the key in
   * the output map
   * @param tabularData contains all the data to parse delimited by |
   * @return a list of lists of data
   */
  public static Map dataWithTitleRow(def key, Closure tabularData) {
    Map keyed = [:]
    dataWithTitleRow(tabularData).each {
      keyed[it[key]] = it
    }
    return keyed
  }

  /**
   * Groovy treats a new line as the end of statement (with some exceptions),
   * so each new line will invoke this method which creates a new table row
   * which then is used to 'or' each value in the row together.
   *
   * @param self the left argument in the OR operator, the current value
   * @param arg the right argument in the OR operator, the next value
   *
   * @return a reference to the next argument, so that it can 'or-ed' against.
   */
  public static Row or(self, arg) {
    def row = new Row([self])
    PARSING_CONTEXT.get().add(row)
    return row.or(arg)
  }

  /**
   * Implements the 'or' operator to append each value in a row to a list.
   * Returns a reference to itself so the next or operation can append the
   * next value.
   */
  @Canonical
  static class Row {
    List values = []
    def Row or(arg) {
      values << arg
      return this
    }
  }

  /**
   * Handler to treat any properties that cannot be found as strings.
   */
  static class PropertyVarConvertor {
    def getProperty(String property) {
      return property
    }
  }
}

Generate COBOL Copybook Output From DB2 Using JCL

Several of my last few posts have involved data manipulation, and for the most part I've been using Groovy. However, what I really like about Groovy is writing less code, and there are times when other tools, JCL in this case, can help me write even less code.

IBM provides two programs, IKJEFT01 and DSNTIAUL which allows you to pipe the results of a SQL statement into an output stream (SYSPRINT or a flat file). So, assuming that I had a simple COBOL copybook like this.

01 MY-NAME-FILE.
   03 FIRST  PIC X(10).
   03 MIDDLE PIC X(10).
   03 LAST   PIC X(10).

I could easily pipe the first, middle, and last name from a DB2 file into a flat file using an JCL that looked something like this (fill in your DBNAME and PLANNAME values).

//TESTJOB JOB PROD,'USER',CLASS=A,MSGCLASS=V,REGION=0M,
// NOTIFY=&SYSUID                                        
//COLUMNS  EXEC PGM=IKJEFT01                             
//SYSTSPRT DD   SYSOUT=*                                 
//SYSPRINT DD   SYSOUT=*                                 
//SYSUDUMP DD   SYSOUT=*                                 
//SYSREC00 DD   SYSOUT=T01.MYDATASET
//SYSPUNCH DD DUMMY                                      
//SYSTSIN  DD *                                          
DSN SYSTEM(DBNAME)                                         
RUN PROGRAM(DSNTIAUL) PLAN(PLANNAME) PARMS('SQL') -     
    LIB('DBNAME.RUNLIB.LOAD')                           
//SYSIN    DD *                                          
SELECT  LPAD(FIRST_NAME, 10, ' ')
     || LPAD(MIDDLE_NAME, 10, ' ') 
     || LPAD(LAST_NAME, 10, ' ')
FROM MY_NAME_TABLE; 
/*                                                      

Okay, its not the most intuitive, but it is a simple script that can be run as part of a job stream that creates a file compatible with the copybook that I provided above. The SQL concats the three columns together and pads the fields with spaces. The padding may be unnecessary depending on how the columns are defined, but just wanted to get across the point that the records that will be written to the file will be the 30 bytes that match the copybook.

But what happens with a slightly more complex copybook, specifically a copybook involving binary data.

01 MY-NAME-FILE.
   03 FIRST  PIC X(10).
   03 MIDDLE PIC X(10).
   03 LAST   PIC X(10).
   03 ZIP    PIC 9(9) COMP-3.

Yikes! Binary data, how can you format a column in a SELECT statement to produce binary data that could be written to a file. Well, there is the HEX command in DB2, so you could potentially create a binary string and figure out how to convert it, but the IBM programs make it much easier than that. Using the same JCL above, change the SQL to be like this:

//SYSIN    DD *                                          
SELECT FIRST_NAME,
       MIDDLE_NAME,
       LAST_NAME,
       ZIP_CODE,
FROM MY_NAME_TABLE; 
/*    

I'm making an assumption here that the ZIP_CODE field is defined in the database as INTEGER, or DECIMAL(9, 0), because if it is, the default formatting of the data of DSNTIAUL will cause the ZIP_CODE field to be written out as binary data that fits perfectly into PIC 9(9) COMP-3 variable. This is different than the default formatting used by tools on a windows platform, like say Data Studio, but because DSNTIAUL is written to run on a z/OS, it assumes that COBOL datatypes will be used.

If your data isn't defined as binary (COMP), then that actually become trickier, and you might have to do something like REPLACING(CHAR(ZIP_CODE), '.', '') to get rid of that decimal point. I didn't get into that very far, cause the dataset that I needed to produce was using COMP-3 fields for it's numeric data.

COBOL Data Type File Formatting

Since a lot of existing COBOL jobs take flat files as input, and produce flat files, I've been doing some experimenting with data manipulation in COBOL friendly formats. Much of the formatting is pretty straight forward, if you have a variable defined as PIC X(10), it's going to take up 10 bytes in the flat file, and likely be right padded with spaces. And, being that my company runs on a z/OS platform, it's going to use the EBCDIC character encoding.

However, where it gets interesting is the formatting of numbers using signs, decimal points, and binary data. I wrote a little script that uses JZOS to format data into a copybook equivalent byte stream and tried a couple simple test cases to see what happened.

['99V99', 'S99V99', 'S99V99 COMP-3'].each { format ->
 def slurper = new CopybookSlurper("""
 01 TESTING.
    03 BEGIN   PIC X(6).
    03 MIDDLE  PIC $format.
    03 END     PIC X(4). """)

 def writer = slurper.getWriter(new byte[slurper.length])
 writer.BEGIN = 'BEGIN|'
 writer.MIDDLE = 43.86
 writer.END = '|END'
 println new String(writer.getBytes(), 'IBM-37')
}

After the script ran, I had the following outputs.

BEGIN|4386|END
BEGIN|438F|END
BEGIN|??%|END

So, the decimal point is assumed, it doesn't show up in the output at all. When I add the 'S' to have a sign, it changes the last digit in the number. Half of the byte is used to determine the sign, so the '6' gets converted in a binary value, which in this case happens to print as a 'F'. And of course binary data doesn't really look like anything if you try to print it as a String, it doesn't even reliably take up the same number of characters.

I probably knew this at one point, back when I was in COBOL training, but it was helpful to have a refresher course.

Groovy Method References

I was listening to a podcast recently about Eclipse tooling for Java 8 and heard them talking about method references, and one of my co-workers mentioned it as a feature in Groovy. I recently ran across a problem which seemed particular well suited for this feature.

I had a list with different implementations of the Field class which each had a form of a get... method. For instance the ExternalDecimalAsInt has a getInt method, while the StringField has getString. method.  Both methods parse a byte stream and return the data. Since I'm writing Groovy, I don't care about the data type, I just wanted a simple way to invoke the get... method, regardless of the class type.

So I created a method reference in my object, and using a switch statement load the method reference according to the type of object that I'm dealing with.

class MyObject {
  def MethodClosure read //probably would omit the data type normally
} 

private void generateReader(MyObject myObject, Field field) {
  switch (field) {
    case StringField:
      myObject.read = field.&getString
      break
    case IntAccessor:
      myObject.read = field.&getInt
      break
    case BigDecimalAccessor:
      myObject.read = field.&getBigDecimal
      break
    case LongAccessor:
      myObject.read = field.&getLong
      break
    case BigIntegerAccessor:
      myObject.read = field.&getBigInteger
      break
  }
}

//invokes getString, getInt, etc... depending on the object type
def value = myObject.read(bytes) 



Of course, you can always do the same thing with closures, but there's a little less syntax with the method references, and it looks cooler. Once I can use Java 8, I hear that method references are also more efficient than lamba expressions, the equivalent to Groovy closures.

case StringField:
  myObject.read = { field.getString() }

Groovy LazyMap - Caching Map

I was working on a Groovy class recently which involved putting a series of closures into a map to do some processing when invoked, but I realized that I didn't really want to run the closures each time they were accessed, once they ran the first time, I would prefer that the data was cached. So the code would look something like this.

Map myProcesses = [coolId : { retriever.somehow() }]
myProcesses.coolId() // <-- the first time this invokes a method call
myProcesses.coolId() // <-- should return value stored from the last call

The use of the closures worked nicely, I just had to figure out how to handle caching. At first I looked into the memoize method on my closures, but there seemed to be some disconnect between how I was trying to use them and what Groovy wanted. When I used coolId() it complained that the method doCall was not available. So, I'm not exactly sure what was happening there.

Map myProcesses = [coolId : { retriever.somehow() }.memoize()]

Then I realized I should be able to do something simpler anyway. Really I wanted a LazyMap, I 
didn't need caching on each individual closure. I had used the LazyMap class provided in the apache commons library to do the following.

return org.apache.commons.collections4.Map.LazyMap.lazyMap(
  new HashMap<String, MyObject>(),
  new org.apache.commons.collections4.Transformer<String, MyObject>() {
    @Override
    public MyObject transform(String input) {
      return retriever.goGetIt(input);
    }
  });

I also remembered coming across some LazyMap references in Groovy, so I did some looking, but the only class I found was in the groovy.json.internal package. Probably not a class I should be using. Was it possible that Groovy didn't have an implementation of a LazyMap? That seemed unlikely to me, almost everything in the commons libraries seem to be built into Groovy. So I started looking at the Groovy docs and ran across the withDefault method that Groovy added to Maps. I realized that an empty map whose default method would invoke my closures and store the results was exactly what I wanted.

Map myInnerProcesses = [coolId : { retriever.somehow() }]
Map myProcesses = [:].withDefault { myInnerProcesses[it] }
myProcesses.coolId() // <-- the first time this invokes a method call
myProcesses.coolId() // <-- should return value stored from the last call

So, by using withDefault I was able to add caching in a single line of code.

FixedLengthSlurper for Groovy Fixed Length File Parsing

Since I work in a company that relies heavily on COBOL processing, we have a lot of fixed length files in our batch environment and occasionally I end up interacting with these files in Java. Traditionally we've used an IBM proprietary product to generate Java objects from COBOL copybooks, but there are times that I don't want to go through all the effort of generating these objects if I'm just performing some simple test or analysis of a file.

Since I love Groovy's XmlSlurper and JsonSlurper, I created a FixedLengthSlurper, which is created with the format specification of your file and returns a map of named properties with the parsed data. The arguments to the FixedLengthSlurper's constructor are the size of the field, the name to assign to the field, and an optional closure to format the object. Any number of these formatting parameters can be passed. If no formatting closure is provided, the data is just returned as a String.

def dateTime = new SimpleDateFormat('yyyyMMddahhmm')
def date = new SimpleDateFormat('yyyyMMdd')
def parser = new FixedLengthSlurper(
  13, 'dateTime', { dateTime.parse(it) },
  4, 'type',
  8, 'processDate', { date.parse(it) },
  9, 'numberRecords', { new Integer(it) },
  11, 'amount', { String str -> new BigDecimal(str[0..str.size() - 3] +
                              '.' + str[str.size() - 3, str.size() - 1]) },
  1, 'typeCode')

List values = []
new File('./data/ppld.txt').eachLine { String line ->
 if (!line.startsWith('0000') && !line.startsWith('9999'))
 values << parser.parseText(line)
}

In this example, the 'dateTime', and 'processDate' properties are stored as Date objects, the 'numberRecords' an Integer, and the 'type' and 'typeCode' properties are Strings.  The 'amount' property gets a little extra parsing. The file stores '10044', but that needs to translate into '100.44', so the closure will break apart the string into the integer and decimal values before creating the BigDecimal object.

The source code for the class follows:

class FixedLengthSlurper {

  List formats = []

  /**
   * Constructor.
   * @param vars the formats that should be used when parsing the file
   */
  FixedLengthSlurper(Object ... vars) {
    int varsIndex = 0
    while (varsIndex < vars.size()) {
      //the size and column name must be provided in pairs
      def format = [size: vars[varsIndex], name: vars[varsIndex + 1]]
      varsIndex += 2

      //check next argument to see if a formatting closure was provided
      if (varsIndex < vars.size() && vars[varsIndex] instanceof Closure) {
        format << [formatter: vars[varsIndex]]
        varsIndex++
      }
      formats << format
    }
  }

  /**
   * Reads through the text and applies all formats to break apart the data
   * into mapped properties
   * @param data the fixed length text to parse
   */
  def parseText = { String data ->
    def values = [:]
    int currentIndex = 0

    formats.each { format ->
      //extract the data
      values[format.'name'] =
        data[currentIndex .. (currentIndex + format.'size' - 1)]

      //if a formatting closure was provided, apply it's formatting
      if (format.'formatter') {
        values[format.'name'] = format.'formatter'(values[format.'name'])
      }

      //increment the indexing for the next format
      currentIndex += format.'size'
    }

    return values
  }
}

While doing some research to build this class, I also ran across a neat article talking about how to override the Date class to add a "fromString" method to it. I didn't use it in my example since I only used my date parsers in one place, but it's a neat concept for situations, like unit tests, where there is a lot of date parsing going on.

Pivot Database Rows Into A Single Column

There are times when a result single record is desired instead of multiple rows, for instance when joining a main table again multiple records in a child table. Say that I have an email and recipients table as follows

EMAILS
--------------------------------------------------
ID | SUBJECT      | EMAIL_TEXT   | SENDER
1  | Great Prices | See circular | AwesomeRetailer 

RECIPIENTS
------------------------
EMAIL_ID | RECEIPIENT
1        | bob@aol.com
1        | al@aol.com



Using a JOIN, you'd get back the following records.

ID | SUBJECT      | EMAIL_TEXT   | SENDER          | RECEIPIENT
1  | Great Prices | See circular | AwesomeRetailer | bob@aol.com
1  | Great Prices | See circular | AwesomeRetailer | al@aol.com

Which gives you all the information you need, but the id, subject, text, and from values are repeated, which means that as you're parsing through the results in Java, you have to 'look back' at the previous value to see if you're processing the same email, or a new one.

Alternatively, you can use the XMLAGG aggregate function to combine the records into a single row. This function will combine data together, then you just have to unpack it to make it into a string using the XMLSERIALIZE function. This approach requires a sub-select, because the XMLAGG function operates off a result set.


SELECT ID,
       SUBJECT,
       EMAIL_TEXT,
       SENDER,
       RTRIM((SELECT CAST(XMLSERIALIZE(XMLAGG(XMLTEXT(ADDR || ';')) AS CLOB)
         AS VARCHAR(500)) FROM RECIPIENTS_TABLE WHERE ID = EMAIL_ID)) AS RECIPIENTS
  FROM EMAILS

results in

ID | SUBJECT      | EMAIL_TEXT   | SENDER          | RECEIPIENTS
1  | Great Prices | See circular | AwesomeRetailer | bob@aol.com; al@aol.com

Printing Tabular Data

Sometimes when I'm running tests, or pulling information from a database, I want to see the results in table so that it's easy to parse through the information and see what's there. For instance, let's say that I wanted to see the following information pulled from a bank application.

Account    | Owner           | Balance |
2131545422 | John Smith      | 10.00   |
23409834   | Alexander Banks | 230.03  |
987545453  | Smithy Joe      | 150.22  |
3445       | Katie Parks     | 3021.53 |

So, I write a little groovy script to query the database and for each result that's pulled back, I add the mapped data to the printer.

def printer = new TablePrinter()
new Sql(.....).eachRow(....) {
  printer << ['Account#': it.'accountNum',
              'Owner': it.'userId',
              'Balance': it.'balance']
}
printer.printTable()

The keys from the map are used as the title bar and the data is placed into the table. The TablePrinter groovy class will find the largest value from each column and make that the width when printing each record to give it the nice formatting. Here's the source for the TablePrinter.

class TablePrinter {

  def out = System.out
  Collection rowCalcs = []
  Collection<Collection> rows = []

  /**
   * Cycles through the list of entries in the table to assess the largest
   * value for each'column', then prints all the results in a table.
   *
   * @param rows the data to display
   */
  static void printTable(Collection<Collection> pRows) {
    if (pRows) {
      def printer = new TablePrinter()
      pRows.each { printer << it }
      printer.printTable()
    }
  }

  /**
   * Adds a map entry to the data being printed.
   * @see #plus(Collection)
   */
  TablePrinter leftShift(Collection pRow) {
    return plus(pRow)
  }

  /**
   * Adds a map entry to the data being printed.
   * @see #plus(Map)
   */
  TablePrinter leftShift(Map pRow) {
    return plus(pRow)
  }

  /**
   * Adds a new row to the values that will be printed and recalculates column
   * sizes. If no row calculations have yet been added, then key set from the
   * map will be assumed to the title row of the table and added to the rows
   * that will be printed.
   *
   * @param pRow the data being added to the table
   * @return reference to this updated object for chaining
   */
  TablePrinter plus(Map pRow) {
    if (rowCalcs.size() == 0) {
      this.plus(pRow.keySet())
    }
    return this.plus(pRow.values())
  }

  /**
   * Adds a new row to the values that will be printed and recalculates
   * column sizes.
   * @param pRow the data being added to the table
   * @return reference to this updated object for chaining
   */
  TablePrinter plus(Collection pRow) {
    rows << pRow
    calculateColumnSize(pRow)
    return this
  }

  /**
   * Creates a format string (for example: "%-10s | %-6s") using column
   * sizes calculated earlier and prints the values from each row in
   * tabular format.
   */
  void printTable() {
    def formatString = rowCalcs.sum { "%-${it}s | " }
    rows.each { row ->
      out.println sprintf(formatString, row.toList())
    }
  }

  /**
   * Adjusts the size of each 'column' by comparing previous calculations
   * against the size of each value in this row. The larger value is stored
   * in the {@link #rowCalcs} variable to be used later when printing.
   *
   * @param row values being added to the print display that need to be
   * sized to possibly adjust the size of each column
   */
  void calculateColumnSize(Collection row) {
    row.eachWithIndex  { cell, cellNum ->
      //ensure the storage for the row size calculations is large enough
      while (rowCalcs.size() < cellNum + 1) {
        rowCalcs << 0
      }

      //default null into an empty string, checks explicitly for null to
      // prevent 0 -> ''.
      if (row[cellNum] == null) {
        row[cellNum] = ''
      }

      //store the larger of either the previous calculation or the length of
      //current value
      rowCalcs[cellNum] = Math.max(rowCalcs[cellNum],
                                   String.valueOf(row[cellNum]).length())
    }
  }

}