XML to YML conversion
Having switched from XML to YML as a data language for most of my code, I recently had a need to process some old legacy data (a chrononauts game) which was still in XML format. Rather than using this data as it was, I wanted to see how tricky it would be to convert it to a simple YML format structure. The original code was of the form:
<chrononauts>
<missions>
<name>Mona Lisa Triptych</name>
<artifact>Mona Lisa (The Real Thing)</artifact>
<artifact>Mona Lisa (An Excellent Forgery)</artifact>
<artifact>Mona Lisa (An Obvious Forgery)</artifact>
</missions>
<ids>
<name>Squa Tront</name>
<year>1933</year>
<year>1950'</year>
<year>1962'</year>
</ids>
</chrononauts>
and I wanted it to be more like
---
chrononauts:
missions:
- artifact:
- Mona Lisa (The Real Thing)
- Mona Lisa (An Excellent Forgery)
- Mona Lisa (An Obvious Forgery)
name: Mona Lisa Triptych
ids:
- name: Squa Tront
year:
- "1933"
- 1950'
- 1962'
One thing in my favour was that I didn’t have any attributes to process, only data inside tags, so I figured it would be very straightforward to do using REXML and YAML. I also wanted to have hashes, arrays and simple strings where appropriate from the source data. Sprinkle in a little recursion, and the result is a pretty simple XML to YML converter. There are obvious areas for improvement, but this has worked with the data sets I have used so far, so I haven’t had any need to modity it.
#!/usr/bin/env ruby
class CardsXmlYaml
require 'rubygems'
require 'yaml'
require 'rexml/document'
YMLFile='cards.yml'
XMLFile='cards.xml'
def self.xml_process(root)
head={}
begin
key=root.expanded_name
root.children.each do |el|
value=xml_process(el)
if value.is_a?(String)
next if value.gsub(/[\n\t\s]/,'').empty?
end
if head[key].nil?
head[key]=value
else
if head[key].keys.include?(value.keys[0])
old_value=head[key][value.keys[0]]
head[key][value.keys[0]]=[]
head[key][value.keys[0]]<<old_value
head[key][value.keys[0]]<<value.values[0]
head[key][value.keys[0]].flatten!
else
head[key][value.keys[0]]=value.values[0]
end
end
end
rescue
begin
return root.value
rescue
end
return nil
end
head
end
def self.to_yml_file(infile=XMLFile,outfile=YMLFile,conversion_type=:file)
output=File.new(outfile,'w')
output.puts(YAML.dump(self.to_yml(infile,conversion_type)))
output.close
end
def self.to_yml(doc=XMLFile,conversion_type=:file)
begin
doc=REXML::Document.new(File.new(doc)) if conversion_type==:file
rescue Exception=>e
print("Error reading xml data from file.n")
doc=nil
end
return nil if doc.nil?
base=[]
doc.children.each do |el|
base << xml_process(el)
end
base
end
end
Recovering from Subversion checksum error corruption
I still use subversion for most of my projects, despite the fact that most of the (Ruby) world seems to be moving to git as its main repository. I’m still at the “I’ll try it” stage (see Softies on Rails for the 4 stages of Rubyist experimentation), and will be moving over once I’ve tested that I can use Capistrano, CruiseControl, etc. with it, and know enough to support my developers/testers if and when things get sticky.
This is a log of my experiments trying to recover the files from a broken subversion repository. You may lose some data if you use this technique, and if you do, don’t blame me, but this let me recover the files, so that I could rebuild another repository as a fresh start. Note that all of the history will be lost, but if you need the files from a repository that you can’t get to because of a corruption issue.
My corrupted repository barfs at revision 51 which is a 221Mb checkin containing zip-files and other binaries using svnadmin verify gives:
svnadmin verify /svn
Output
* Verified revision 0.
.
.
* Verified revision 50.
svnadmin: Checksum mismatch while reading representation:
expected: 589cf19ceac143315e0d61f9873ed7fb
actual: b0b0bf50ec4b0b089730796f3355c649
This means that something has gone wrong in revision 51. As the revision history of this isn’t that important, since this was a scratchpad project and most of the notes are of the “trying this”, “seeing what happens when” type, I wanted to see if I could pull all of the content back from it.
I’d had a look at the fsfsverify.py, but no dice. Even running it several time didn’t fix the problem with the broken revision, and doing a svn co broke in the iconex directory. However running
svn co file:///svn/trunk /tmp/trunk
did check out a large proportion of the files, up to the part where the file in the iconex directory was corrupted.
So I started to think. What would happen if I took a copy of the corrupted directory, and did an svn del on the original broken directory and checked it back in. Would that let me access the files in later revisions? I knew that none of the files in that directory had changed since revision 51, since it was purely a zip file checkin as a backup to the originals.
So
cp -r iconex iconex_backup svn del iconex svn ci -m 'removed broken files' svn up
A brief wait, and the rest of the files started to check out. There were a couple of externals that needed to be changed using svn propedit svn:externals, since the connection to these files was no longer available as this repository wasn’t being served. Another svn up command, and all of the files were successfully checked out.
Hopefully this will give someone else a starting point if all attempts at recovery of the database fail, and at least you will get your non-corrupted files back.
Acts as state machine with legacy database - (rubyist aasm)
One of the (many!) projects I’ve been working on required a ruby interface to a legacy database which contains existing state information in numeric form. I’ve been using aasm for other projects, and since the state is well defined, this seemed like a good opportunity to see if aasm would handle legacy data. I’ve already started to use ActiveRecord with it (this isn’t a rails app, but pure ruby), so I thought I’d have a quick spike to see what would happen.
A quick look at the source for aasm quickly revealed that it only uses strings for state information in the database, so a translation layer was required.
Since the column name containing the state information wasn’t state, I formulated a slightly cunning plan. Tell aasm that the column was called state, and create state and state= methods that would perform the database translation.
class TestStatus < ActiveRecord::Base
aasm_column :state #required to force aasm to call my state method rather than use its own internal column definition
aasm_state :ok
aasm_state :fail
aasm_event :fail do
transitions :to => :fail, :from=>[:ok]
end
aasm_event :ok do
transitions :to => :ok, :from=>[:fail]
end
LEGACY_STATE={
0=>:ok,
1=>:fail
}
LEGACY_STATE_COLUMN=’ErrorState’
def state
LEGACY_STATE[read_attribute(LEGACY_STATE_COLUMN)]
end
def state=(value)
write_attribute(LEGACY_STATE_COLUMN,LEGACY_STATE.invert[value])
end