FixedLengthSlurper for Groovy Fixed Length File Parsing

Since I work in a company that relies heavily on COBOL processing, we have a lot of fixed length files in our batch environment and occasionally I end up interacting with these files in Java. Traditionally we've used an IBM proprietary product to generate Java objects from COBOL copybooks, but there are times that I don't want to go through all the effort of generating these objects if I'm just performing some simple test or analysis of a file.

Since I love Groovy's XmlSlurper and JsonSlurper, I created a FixedLengthSlurper, which is created with the format specification of your file and returns a map of named properties with the parsed data. The arguments to the FixedLengthSlurper's constructor are the size of the field, the name to assign to the field, and an optional closure to format the object. Any number of these formatting parameters can be passed. If no formatting closure is provided, the data is just returned as a String.

def dateTime = new SimpleDateFormat('yyyyMMddahhmm')
def date = new SimpleDateFormat('yyyyMMdd')
def parser = new FixedLengthSlurper(
  13, 'dateTime', { dateTime.parse(it) },
  4, 'type',
  8, 'processDate', { date.parse(it) },
  9, 'numberRecords', { new Integer(it) },
  11, 'amount', { String str -> new BigDecimal(str[0..str.size() - 3] +
                              '.' + str[str.size() - 3, str.size() - 1]) },
  1, 'typeCode')

List values = []
new File('./data/ppld.txt').eachLine { String line ->
 if (!line.startsWith('0000') && !line.startsWith('9999'))
 values << parser.parseText(line)
}

In this example, the 'dateTime', and 'processDate' properties are stored as Date objects, the 'numberRecords' an Integer, and the 'type' and 'typeCode' properties are Strings.  The 'amount' property gets a little extra parsing. The file stores '10044', but that needs to translate into '100.44', so the closure will break apart the string into the integer and decimal values before creating the BigDecimal object.

The source code for the class follows:

class FixedLengthSlurper {

  List formats = []

  /**
   * Constructor.
   * @param vars the formats that should be used when parsing the file
   */
  FixedLengthSlurper(Object ... vars) {
    int varsIndex = 0
    while (varsIndex < vars.size()) {
      //the size and column name must be provided in pairs
      def format = [size: vars[varsIndex], name: vars[varsIndex + 1]]
      varsIndex += 2

      //check next argument to see if a formatting closure was provided
      if (varsIndex < vars.size() && vars[varsIndex] instanceof Closure) {
        format << [formatter: vars[varsIndex]]
        varsIndex++
      }
      formats << format
    }
  }

  /**
   * Reads through the text and applies all formats to break apart the data
   * into mapped properties
   * @param data the fixed length text to parse
   */
  def parseText = { String data ->
    def values = [:]
    int currentIndex = 0

    formats.each { format ->
      //extract the data
      values[format.'name'] =
        data[currentIndex .. (currentIndex + format.'size' - 1)]

      //if a formatting closure was provided, apply it's formatting
      if (format.'formatter') {
        values[format.'name'] = format.'formatter'(values[format.'name'])
      }

      //increment the indexing for the next format
      currentIndex += format.'size'
    }

    return values
  }
}

While doing some research to build this class, I also ran across a neat article talking about how to override the Date class to add a "fromString" method to it. I didn't use it in my example since I only used my date parsers in one place, but it's a neat concept for situations, like unit tests, where there is a lot of date parsing going on.

Post a Comment