XML to YML conversion

Posted by blackrat on October 13, 2009

Having switched from XML to YML as a data language for most of my code, I recently had a need to process some old legacy data (a chrononauts game) which was still in XML format. Rather than using this data as it was, I wanted to see how tricky it would be to convert it to a simple YML format structure. The original code was of the form:

<chrononauts>
  <missions>
    <name>Mona Lisa Triptych</name>
    <artifact>Mona Lisa (The Real Thing)</artifact>
    <artifact>Mona Lisa (An Excellent Forgery)</artifact>
    <artifact>Mona Lisa (An Obvious Forgery)</artifact>
  </missions>
  <ids>
    <name>Squa Tront</name>
    <year>1933</year>
    <year>1950'</year>
    <year>1962'</year>
  </ids>
</chrononauts>

and I wanted it to be more like

---
chrononauts:
  missions:
    - artifact:
      - Mona Lisa (The Real Thing)
      - Mona Lisa (An Excellent Forgery)
      - Mona Lisa (An Obvious Forgery)
      name: Mona Lisa Triptych
  ids:
    - name: Squa Tront
      year:
      - "1933"
      - 1950'
      - 1962'

One thing in my favour was that I didn’t have any attributes to process, only data inside tags, so I figured it would be very straightforward to do using REXML and YAML. I also wanted to have hashes, arrays and simple strings where appropriate from the source data. Sprinkle in a little recursion, and the result is a pretty simple XML to YML converter. There are obvious areas for improvement, but this has worked with the data sets I have used so far, so I haven’t had any need to modity it.

#!/usr/bin/env ruby
class CardsXmlYaml
  require 'rubygems'
  require 'yaml'
  require 'rexml/document'
  YMLFile='cards.yml'
  XMLFile='cards.xml'

  def self.xml_process(root)
    head={}
    begin
      key=root.expanded_name
      root.children.each do |el|
        value=xml_process(el)
        if value.is_a?(String)
          next if value.gsub(/[\n\t\s]/,'').empty?
        end
        if head[key].nil?
          head[key]=value
        else
          if head[key].keys.include?(value.keys[0])
            old_value=head[key][value.keys[0]]
            head[key][value.keys[0]]=[]
            head[key][value.keys[0]]<<old_value
            head[key][value.keys[0]]<<value.values[0]
            head[key][value.keys[0]].flatten!
          else
            head[key][value.keys[0]]=value.values[0]
          end
        end
      end
    rescue
      begin
        return root.value
      rescue
      end
      return nil
    end
    head
  end

  def self.to_yml_file(infile=XMLFile,outfile=YMLFile,conversion_type=:file)
    output=File.new(outfile,'w')
    output.puts(YAML.dump(self.to_yml(infile,conversion_type)))
    output.close
  end

  def self.to_yml(doc=XMLFile,conversion_type=:file)
    begin
      doc=REXML::Document.new(File.new(doc)) if conversion_type==:file
    rescue Exception=>e
      print("Error reading xml data from file.n")
      doc=nil
    end
    return nil if doc.nil?
    base=[]
    doc.children.each do |el|
      base << xml_process(el)
    end
    base
  end
end

Recovering from Subversion checksum error corruption

Posted by blackrat on October 09, 2009

I still use subversion for most of my projects, despite the fact that most of the (Ruby) world seems to be moving to git as its main repository. I’m still at the “I’ll try it” stage (see Softies on Rails for the 4 stages of Rubyist experimentation), and will be moving over once I’ve tested that I can use Capistrano, CruiseControl, etc. with it, and know enough to support my developers/testers if and when things get sticky.
This is a log of my experiments trying to recover the files from a broken subversion repository. You may lose some data if you use this technique, and if you do, don’t blame me, but this let me recover the files, so that I could rebuild another repository as a fresh start. Note that all of the history will be lost, but if you need the files from a repository that you can’t get to because of a corruption issue.
My corrupted repository barfs at revision 51 which is a 221Mb checkin containing zip-files and other binaries using svnadmin verify gives:

svnadmin verify /svn

Output

* Verified revision 0.
.
.
* Verified revision 50.
svnadmin: Checksum mismatch while reading representation:
   expected:  589cf19ceac143315e0d61f9873ed7fb
     actual:  b0b0bf50ec4b0b089730796f3355c649

This means that something has gone wrong in revision 51. As the revision history of this isn’t that important, since this was a scratchpad project and most of the notes are of the “trying this”, “seeing what happens when” type, I wanted to see if I could pull all of the content back from it.
I’d had a look at the fsfsverify.py, but no dice. Even running it several time didn’t fix the problem with the broken revision, and doing a svn co broke in the iconex directory. However running

svn co file:///svn/trunk /tmp/trunk

did check out a large proportion of the files, up to the part where the file in the iconex directory was corrupted.
So I started to think. What would happen if I took a copy of the corrupted directory, and did an svn del on the original broken directory and checked it back in. Would that let me access the files in later revisions? I knew that none of the files in that directory had changed since revision 51, since it was purely a zip file checkin as a backup to the originals.
So

cp -r iconex iconex_backup
svn del iconex
svn ci -m 'removed broken files'
svn up

A brief wait, and the rest of the files started to check out. There were a couple of externals that needed to be changed using svn propedit svn:externals, since the connection to these files was no longer available as this repository wasn’t being served. Another svn up command, and all of the files were successfully checked out.
Hopefully this will give someone else a starting point if all attempts at recovery of the database fail, and at least you will get your non-corrupted files back.

Acts as state machine with legacy database - (rubyist aasm)

Posted by blackrat on October 09, 2009

One of the (many!) projects I’ve been working on required a ruby interface to a legacy database which contains existing state information in numeric form. I’ve been using aasm for other projects, and since the state is well defined, this seemed like a good opportunity to see if aasm would handle legacy data. I’ve already started to use ActiveRecord with it (this isn’t a rails app, but pure ruby), so I thought I’d have a quick spike to see what would happen.

A quick look at the source for aasm quickly revealed that it only uses strings for state information in the database, so a translation layer was required.

Since the column name containing the state information wasn’t state, I formulated a slightly cunning plan. Tell aasm that the column was called state, and create state and state= methods that would perform the database translation.


class TestStatus < ActiveRecord::Base
  aasm_column :state  #required to force aasm to call my state method rather than use its own internal column definition
  aasm_state :ok
  aasm_state :fail

  aasm_event :fail do
    transitions :to => :fail, :from=>[:ok]
  end

  aasm_event :ok do
     transitions :to => :ok, :from=>[:fail]
  end

  LEGACY_STATE={
    0=>:ok,
    1=>:fail
  }
  LEGACY_STATE_COLUMN=’ErrorState’

  def state
    LEGACY_STATE[read_attribute(LEGACY_STATE_COLUMN)]
  end

  def state=(value)
    write_attribute(LEGACY_STATE_COLUMN,LEGACY_STATE.invert[value])
  end