2017 | HackingAway

Simple XML Data Scrubbing

Worked on a project recently where the team had some XML data that they wanted to be 'scrubbed' before passing it along to other teams. Being that it was in XML, Groovy seemed like a perfect place to create a very fast and simple scrubbing program. To demonstrate, I created a simple XML file with some data that you know you don't want to pass around.

<Policy>
  <Party>
    <FirstName>Jason</FirstName>
    <LastName>Borne</LastName>
    <SSN>1234567890</SSN>
    <Address1>4994 Road</Address1>
    <State>MI</State>
  </Party>
  <Driver>
    <State>MI</State>
    <License>B20049595091</License>
  </Driver>
  <Risk>
    <Coverage type="A">20000</Coverage>
   </Risk>
</Policy>

Then I wrote a really simple script to show Groovy could easily update the important values. It pulls all the files from the 'files-to-filter' folder, filters them and writes the updated XML to the 'filtered-files' folder. Easy.

import groovy.util.slurpersupport.NodeChild
import groovy.xml.XmlUtil

def outputDirectory = new File('./filtered-files')
new File('./files-to-filter').eachFile { File input ->
  def output = new File(outputDirectory, input.name)
 
  def xml = new XmlSlurper().parseText(input.text)
  xml.'**'.each { NodeChild tag ->
    def names = generateNewName()
    def values = generateAdditionalNewValues()
  
    switch (tag.name()) {
      case 'FirstName':
        tag.replaceBody(names.'FirstName')
        break
      case 'LastName':
        tag.replaceBody(names.'LastName')
        break
      case 'SSN':
        tag.replaceBody(values.'SSN')
        break
      case 'Address1':
        tag.replaceBody(values.'Address1')
        break
      case 'State':
        if (tag.parent().name() == 'Driver') {
          tag.replaceBody(values.'State')
        }
        break
      case 'License':
        tag.replaceBody(values.'License')
        break
  }
  output << XmlUtil.serialize(xml)
}

/**
 * @return generate a new name, maybe reading for dataset and using 'random' to pick one
 */
def generateNewName() {
  return ['FirstName': 'Johnny',
          'LastName': 'Five']
}

/**
 * @return newly generated values, could be accessing a database or something
 */
def generateAdditionalNewValues() {
  return ['SSN': '0987654321',
          'Address1': '9848 SomewhereElse',
          'State': 'IA',

          'License': 'I390398349834']
}

Converting A Variable To Fixed Length File

Going to start off the new year right with some JCL DSORT processing!

I want to process a mainframe file on a windows machine so I can use some Java tools (like the Copybook Slurper) to process the file contents. The file contains comp data so it can't be downloaded as a text file. The binary data upsets the 'new line' characters and you get weirdly formatted data.

As a binary file, I can just read the number of bytes on a line, but with a variable length file, the number of bytes vary. So I have to convert to fixed length file first. Once converted, I can download to a windows machine in binary format and know I can always read the same number of bytes.

In the following example TST.CHNGFILE.TVAR is variable length, the longest record is 1575 bytes long. TST.CHNGFILE.TFIXED will be generated with the last 1571 bytes, since the first four bytes contain the record length.

//TESTJOB JOB TEST,'BOS',CLASS=L,REGION=0M,              
//         MSGCLASS=V,NOTIFY=&SYSUID                      
//****************************************************    
//XXDEL  EXEC PGM=IEFBR14                                 
//DD1    DD   UNIT=DISK,DSN=TST.CHNGFILE.TFIXED,        
//            SPACE=(TRK,(0)),DISP=(MOD,DELETE,DELETE)    
//****************************************************    
//* CONVERT FBA TO FB                                     
//****************************************************    
//SA03FRMT EXEC PGM=SORT                                  
//SYSOUT   DD SYSOUT=*                                    
//SYSPRINT DD SYSOUT=*                                    
//SORTIN   DD UNIT=DISK,DSN=TST.CHNGFILE.TVAR,DISP=SHR
//FBOUT    DD UNIT=DISK,DSN=TST.CHNGFILE.TFIXED,        
//            DCB=(RECFM=FB,LRECL=1571),                  
//            DISP=(,CATLG),SPACE=(TRK,(150,15),RLSE)     
//* VTOF (CONVERT FROM ONE RECORD FORMAT TO ANOTHER)      
//* BUILD(FROM,LENGTH,ETC....)                            
//* OUTREC CAN REPLACE BUILD                              
//SYSIN    DD *                                           
  OPTION COPY                                             
  SORT FIELDS=COPY                                        
  OUTFIL FNAMES=FBOUT,VTOF,BUILD=(5,1571)

Check out IBM's sorttrck document and search for VB to FB for more official documentation and options, such as the VFILL parameter.

HackingAway

Simple XML Data Scrubbing

Converting A Variable To Fixed Length File

Categories

Recent Comments

Popular Posts

About Me

Blog Archive

Comments