Simple XML Data Scrubbing

Worked on a project recently where the team had some XML data that they wanted to be 'scrubbed' before passing it along to other teams. Being that it was in XML, Groovy seemed like a perfect place to create a very fast and simple scrubbing program. To demonstrate, I created a simple XML file with some data that you know you don't want to pass around.

<Policy>
  <Party>
    <FirstName>Jason</FirstName>
    <LastName>Borne</LastName>
    <SSN>1234567890</SSN>
    <Address1>4994 Road</Address1>
    <State>MI</State>
  </Party>
  <Driver>
    <State>MI</State>
    <License>B20049595091</License>
  </Driver>
  <Risk>
    <Coverage type="A">20000</Coverage>
   </Risk>
</Policy>

Then I wrote a really simple script to show Groovy could easily update the important values. It pulls all the files from the 'files-to-filter' folder, filters them and writes the updated XML to the 'filtered-files' folder. Easy.


import groovy.util.slurpersupport.NodeChild
import groovy.xml.XmlUtil

def outputDirectory = new File('./filtered-files')
new File('./files-to-filter').eachFile { File input ->
  def output = new File(outputDirectory, input.name)
 
  def xml = new XmlSlurper().parseText(input.text)
  xml.'**'.each { NodeChild tag ->
    def names = generateNewName()
    def values = generateAdditionalNewValues()
  
    switch (tag.name()) {
      case 'FirstName':
        tag.replaceBody(names.'FirstName')
        break
      case 'LastName':
        tag.replaceBody(names.'LastName')
        break
      case 'SSN':
        tag.replaceBody(values.'SSN')
        break
      case 'Address1':
        tag.replaceBody(values.'Address1')
        break
      case 'State':
        if (tag.parent().name() == 'Driver') {
          tag.replaceBody(values.'State')
        }
        break
      case 'License':
        tag.replaceBody(values.'License')
        break
} output << XmlUtil.serialize(xml) } /** * @return generate a new name, maybe reading for dataset and using 'random' to pick one */ def generateNewName() { return ['FirstName': 'Johnny', 'LastName': 'Five'] } /** * @return newly generated values, could be accessing a database or something */ def generateAdditionalNewValues() { return ['SSN': '0987654321', 'Address1': '9848 SomewhereElse', 'State': 'IA',
          'License': 'I390398349834']
}

2 comments

Hey, I like this article so much. This is amazing and interested to read. Want to know more about Data Scrubbing , Visit Us Data Scrubbing Company India .

Reply

AcmeMinds believes in 100% client satisfaction. We work round the clock to understand the customer need and deliver the best software compatible with the latest technology that sustains for a long period of time. Data Scrubbing Company India

Reply

Post a Comment