XML to YML conversion

Posted by blackrat on October 13, 2009

Having switched from XML to YML as a data language for most of my code, I recently had a need to process some old legacy data (a chrononauts game) which was still in XML format. Rather than using this data as it was, I wanted to see how tricky it would be to convert it to a simple YML format structure. The original code was of the form:

<chrononauts>
  <missions>
    <name>Mona Lisa Triptych</name>
    <artifact>Mona Lisa (The Real Thing)</artifact>
    <artifact>Mona Lisa (An Excellent Forgery)</artifact>
    <artifact>Mona Lisa (An Obvious Forgery)</artifact>
  </missions>
  <ids>
    <name>Squa Tront</name>
    <year>1933</year>
    <year>1950'</year>
    <year>1962'</year>
  </ids>
</chrononauts>

and I wanted it to be more like

---
chrononauts:
  missions:
    - artifact:
      - Mona Lisa (The Real Thing)
      - Mona Lisa (An Excellent Forgery)
      - Mona Lisa (An Obvious Forgery)
      name: Mona Lisa Triptych
  ids:
    - name: Squa Tront
      year:
      - "1933"
      - 1950'
      - 1962'

One thing in my favour was that I didn’t have any attributes to process, only data inside tags, so I figured it would be very straightforward to do using REXML and YAML. I also wanted to have hashes, arrays and simple strings where appropriate from the source data. Sprinkle in a little recursion, and the result is a pretty simple XML to YML converter. There are obvious areas for improvement, but this has worked with the data sets I have used so far, so I haven’t had any need to modity it.

#!/usr/bin/env ruby
class CardsXmlYaml
  require 'rubygems'
  require 'yaml'
  require 'rexml/document'
  YMLFile='cards.yml'
  XMLFile='cards.xml'

  def self.xml_process(root)
    head={}
    begin
      key=root.expanded_name
      root.children.each do |el|
        value=xml_process(el)
        if value.is_a?(String)
          next if value.gsub(/[\n\t\s]/,'').empty?
        end
        if head[key].nil?
          head[key]=value
        else
          if head[key].keys.include?(value.keys[0])
            old_value=head[key][value.keys[0]]
            head[key][value.keys[0]]=[]
            head[key][value.keys[0]]<<old_value
            head[key][value.keys[0]]<<value.values[0]
            head[key][value.keys[0]].flatten!
          else
            head[key][value.keys[0]]=value.values[0]
          end
        end
      end
    rescue
      begin
        return root.value
      rescue
      end
      return nil
    end
    head
  end

  def self.to_yml_file(infile=XMLFile,outfile=YMLFile,conversion_type=:file)
    output=File.new(outfile,'w')
    output.puts(YAML.dump(self.to_yml(infile,conversion_type)))
    output.close
  end

  def self.to_yml(doc=XMLFile,conversion_type=:file)
    begin
      doc=REXML::Document.new(File.new(doc)) if conversion_type==:file
    rescue Exception=>e
      print("Error reading xml data from file.n")
      doc=nil
    end
    return nil if doc.nil?
    base=[]
    doc.children.each do |el|
      base << xml_process(el)
    end
    base
  end
end
Trackbacks

Use this link to trackback from your own site.

Comments

You must be logged in to leave a response.