Simple XML Data Scrubbing

Worked on a project recently where the team had some XML data that they wanted to be 'scrubbed' before passing it along to other teams. Being that it was in XML, Groovy seemed like a perfect place to create a very fast and simple scrubbing program. To demonstrate, I created a simple XML file with some data that you know you don't want to pass around.

<Policy>
  <Party>
    <FirstName>Jason</FirstName>
    <LastName>Borne</LastName>
    <SSN>1234567890</SSN>
    <Address1>4994 Road</Address1>
    <State>MI</State>
  </Party>
  <Driver>
    <State>MI</State>
    <License>B20049595091</License>
  </Driver>
  <Risk>
    <Coverage type="A">20000</Coverage>
   </Risk>
</Policy>

Then I wrote a really simple script to show Groovy could easily update the important values. It pulls all the files from the 'files-to-filter' folder, filters them and writes the updated XML to the 'filtered-files' folder. Easy.


import groovy.util.slurpersupport.NodeChild
import groovy.xml.XmlUtil

def outputDirectory = new File('./filtered-files')
new File('./files-to-filter').eachFile { File input ->
  def output = new File(outputDirectory, input.name)
 
  def xml = new XmlSlurper().parseText(input.text)
  xml.'**'.each { NodeChild tag ->
    def names = generateNewName()
    def values = generateAdditionalNewValues()
  
    switch (tag.name()) {
      case 'FirstName':
        tag.replaceBody(names.'FirstName')
        break
      case 'LastName':
        tag.replaceBody(names.'LastName')
        break
      case 'SSN':
        tag.replaceBody(values.'SSN')
        break
      case 'Address1':
        tag.replaceBody(values.'Address1')
        break
      case 'State':
        if (tag.parent().name() == 'Driver') {
          tag.replaceBody(values.'State')
        }
        break
      case 'License':
        tag.replaceBody(values.'License')
        break
} output << XmlUtil.serialize(xml) } /** * @return generate a new name, maybe reading for dataset and using 'random' to pick one */ def generateNewName() { return ['FirstName': 'Johnny', 'LastName': 'Five'] } /** * @return newly generated values, could be accessing a database or something */ def generateAdditionalNewValues() { return ['SSN': '0987654321', 'Address1': '9848 SomewhereElse', 'State': 'IA',
          'License': 'I390398349834']
}