XML to YML conversion
Having switched from XML to YML as a data language for most of my code, I recently had a need to process some old legacy data (a chrononauts game) which was still in XML format. Rather than using this data as it was, I wanted to see how tricky it would be to convert it to a simple YML format structure. The original code was of the form:
<chrononauts>
<missions>
<name>Mona Lisa Triptych</name>
<artifact>Mona Lisa (The Real Thing)</artifact>
<artifact>Mona Lisa (An Excellent Forgery)</artifact>
<artifact>Mona Lisa (An Obvious Forgery)</artifact>
</missions>
<ids>
<name>Squa Tront</name>
<year>1933</year>
<year>1950'</year>
<year>1962'</year>
</ids>
</chrononauts>
and I wanted it to be more like
---
chrononauts:
missions:
- artifact:
- Mona Lisa (The Real Thing)
- Mona Lisa (An Excellent Forgery)
- Mona Lisa (An Obvious Forgery)
name: Mona Lisa Triptych
ids:
- name: Squa Tront
year:
- "1933"
- 1950'
- 1962'
One thing in my favour was that I didn’t have any attributes to process, only data inside tags, so I figured it would be very straightforward to do using REXML and YAML. I also wanted to have hashes, arrays and simple strings where appropriate from the source data. Sprinkle in a little recursion, and the result is a pretty simple XML to YML converter. There are obvious areas for improvement, but this has worked with the data sets I have used so far, so I haven’t had any need to modity it.
#!/usr/bin/env ruby
class CardsXmlYaml
require 'rubygems'
require 'yaml'
require 'rexml/document'
YMLFile='cards.yml'
XMLFile='cards.xml'
def self.xml_process(root)
head={}
begin
key=root.expanded_name
root.children.each do |el|
value=xml_process(el)
if value.is_a?(String)
next if value.gsub(/[\n\t\s]/,'').empty?
end
if head[key].nil?
head[key]=value
else
if head[key].keys.include?(value.keys[0])
old_value=head[key][value.keys[0]]
head[key][value.keys[0]]=[]
head[key][value.keys[0]]<<old_value
head[key][value.keys[0]]<<value.values[0]
head[key][value.keys[0]].flatten!
else
head[key][value.keys[0]]=value.values[0]
end
end
end
rescue
begin
return root.value
rescue
end
return nil
end
head
end
def self.to_yml_file(infile=XMLFile,outfile=YMLFile,conversion_type=:file)
output=File.new(outfile,'w')
output.puts(YAML.dump(self.to_yml(infile,conversion_type)))
output.close
end
def self.to_yml(doc=XMLFile,conversion_type=:file)
begin
doc=REXML::Document.new(File.new(doc)) if conversion_type==:file
rescue Exception=>e
print("Error reading xml data from file.n")
doc=nil
end
return nil if doc.nil?
base=[]
doc.children.each do |el|
base << xml_process(el)
end
base
end
end
Recovering from Subversion checksum error corruption
I still use subversion for most of my projects, despite the fact that most of the (Ruby) world seems to be moving to git as its main repository. I’m still at the “I’ll try it” stage (see Softies on Rails for the 4 stages of Rubyist experimentation), and will be moving over once I’ve tested that I can use Capistrano, CruiseControl, etc. with it, and know enough to support my developers/testers if and when things get sticky.
This is a log of my experiments trying to recover the files from a broken subversion repository. You may lose some data if you use this technique, and if you do, don’t blame me, but this let me recover the files, so that I could rebuild another repository as a fresh start. Note that all of the history will be lost, but if you need the files from a repository that you can’t get to because of a corruption issue.
My corrupted repository barfs at revision 51 which is a 221Mb checkin containing zip-files and other binaries using svnadmin verify gives:
svnadmin verify /svn
Output
* Verified revision 0.
.
.
* Verified revision 50.
svnadmin: Checksum mismatch while reading representation:
expected: 589cf19ceac143315e0d61f9873ed7fb
actual: b0b0bf50ec4b0b089730796f3355c649
This means that something has gone wrong in revision 51. As the revision history of this isn’t that important, since this was a scratchpad project and most of the notes are of the “trying this”, “seeing what happens when” type, I wanted to see if I could pull all of the content back from it.
I’d had a look at the fsfsverify.py, but no dice. Even running it several time didn’t fix the problem with the broken revision, and doing a svn co broke in the iconex directory. However running
svn co file:///svn/trunk /tmp/trunk
did check out a large proportion of the files, up to the part where the file in the iconex directory was corrupted.
So I started to think. What would happen if I took a copy of the corrupted directory, and did an svn del on the original broken directory and checked it back in. Would that let me access the files in later revisions? I knew that none of the files in that directory had changed since revision 51, since it was purely a zip file checkin as a backup to the originals.
So
cp -r iconex iconex_backup svn del iconex svn ci -m 'removed broken files' svn up
A brief wait, and the rest of the files started to check out. There were a couple of externals that needed to be changed using svn propedit svn:externals, since the connection to these files was no longer available as this repository wasn’t being served. Another svn up command, and all of the files were successfully checked out.
Hopefully this will give someone else a starting point if all attempts at recovery of the database fail, and at least you will get your non-corrupted files back.
Acts as state machine with legacy database - (rubyist aasm)
One of the (many!) projects I’ve been working on required a ruby interface to a legacy database which contains existing state information in numeric form. I’ve been using aasm for other projects, and since the state is well defined, this seemed like a good opportunity to see if aasm would handle legacy data. I’ve already started to use ActiveRecord with it (this isn’t a rails app, but pure ruby), so I thought I’d have a quick spike to see what would happen.
A quick look at the source for aasm quickly revealed that it only uses strings for state information in the database, so a translation layer was required.
Since the column name containing the state information wasn’t state, I formulated a slightly cunning plan. Tell aasm that the column was called state, and create state and state= methods that would perform the database translation.
class TestStatus < ActiveRecord::Base
aasm_column :state #required to force aasm to call my state method rather than use its own internal column definition
aasm_state :ok
aasm_state :fail
aasm_event :fail do
transitions :to => :fail, :from=>[:ok]
end
aasm_event :ok do
transitions :to => :ok, :from=>[:fail]
end
LEGACY_STATE={
0=>:ok,
1=>:fail
}
LEGACY_STATE_COLUMN=’ErrorState’
def state
LEGACY_STATE[read_attribute(LEGACY_STATE_COLUMN)]
end
def state=(value)
write_attribute(LEGACY_STATE_COLUMN,LEGACY_STATE.invert[value])
end
Executing the Unix find command to determine real file types from ruby
I recently needed to make sure all my files were named according to their content rather than to an arbitrary extension that had been added to them. This resulted in extending the ruby FileUtils to use the Unix file command to return the filename and type as array elements so with a script argument of:
/var/testfiles/*
you get an output of:
[[”/var/testfiles/movie_quiz_for_wiggy.doc”, “Microsoft Office Document”], [”/var/testfiles/movie_quiz_for_wiggy.odt”, “OpenDocument Text”], [”/var/testfiles/movie_quiz_for_wiggy.pdf”, “PDF document, version 1.4″]]
#!/usr/bin/env ruby
require 'fileutils'
module FileUtils
unless RUBY_PLATFORM=~/win[36]/
def self.file(src)
`file #{src}`.split("n").collect {|x| x.split(":",2).collect {|y| y.strip}}
end
end
end
p(FileUtils.file(ARGV[0]))
Automatically creating . for Ruby Hashes
I recently had to so some testing of an in-memory OLE object, which also allowed persistance to an XML file. The structure of the two (in-memory and in-file) were similar enough for me to look at XMLSimple, which creates a Hash, and since I only like writing code once where I figured that using the same code would be cool.
In memory I needed to do
entry.timing.connect
and with the file (via entry=XmlSimple(infile))
entry[:timing][:connect]
Those were similar enough for me to want to change how XmlSimple held its Hash internally, but I figured that there may be a more generic way to do this without breaking Hash. That led me to try out the following code
class Hash
def method_missing(sym,*args,&blk)
return self[sym] if self.key?(sym)
return self[sym.to_s] if self.key?(sym.to_s)
super
end
end
which just appears to work. Enjoy.
Enhancing Streamlined Enumerations
Recently, I’ve been looking at the Streamlined framework. For those of you who don’t know, Streamlined is an Ajaxified Scaffold currently under development. The edge version shows promise and is stable enough for my personal use as an administration tool.
One area which is particularly interesting is the way that they handle enumerations and the fact that they are called late in the process rather than being instantiated once and then used. This may appear as an inefficiency at first glance, but in tracing through the call progress, I realized that you could make them more dynamic and allow for dynamic changes to the enumeration on a per item basis.
This means that if you have an exclusive list, you can restrict the choices to only those items that haven’t yet been assigned to other rows in the database.
For example, in one of my projects, you can assign a unique number to each row, and my desire was to restrict the view so that only the numbers that are available can be chosen.
So if you have possible numbers of
[1,2,3,4,5,6,7,8]
Assign 1 to the first row and for new items, [2,3,4,5,6,7,8] should be available, but [1,2,3,4,5,6,7,8] would be available for editing the first row.
Assign 5 to the second row and for new items, [2,3,4,6,7,8] should be available with [1,2,3,4,6,7,8] available for editing the first row and [2,3,4,5,6,7,8] available for editing the second row.
Coding this for the model is fairly straightforward:
class DynamicTest < ActiveRecord::Base
def available_nodes
node_list=[1,2,3,4,5,6,7,8]
nodes=DynamicTest.find(:all)
nodes.each do |n|
node_list-=[n.number] unless n.number==number
end
node_list
end
end
The unfortunate thing is that Streamlined doesn’t support this call, you can perform a call to DynamicTest.available_nodes, but that wouldn’t let you know what the current item is and you wouldn’t be able to see it in the list or edit views. Not very useful. What is needed is a way to call this directly from the row rendering code when you have the item in scope.
Since this is new functionality for Streamlined, the guys who maintain the codebase may adopt it, but for those of you who want to monkeypatch your own version or just see my take on it, you can download this sample project.
The monkeypatch (in app/streamlined/dynamic_tests_ui.rb) overrides four of the streamlined functions and adds two more for handling dynamic enumerations. This means that in addition to the original
Streamlined.ui_for(DynamicTest) do
user_columns :number, {:enumeration => Numbers::TYPES}
end
class Numbers
TYPES = [1,2,3,4,5,6,7,8]
end
and its Hash and 2d array counterparts, you can now have:
Streamlined.ui_for(DynamicTest) do
user_columns :number, {:enumeration => {:action=>:available_nodes}}
end
which will perform a late call to the DynamicTest#available_nodes scoped for the current row.
For those of you who just want to look at the code without downloading a full rails project, the relevant monkeypatched pieces are:
#Note: There is a bug in _enumeration.html that prevents non-Fixednum numeric
#indices. This should be updated in the template version
#<% value = item.send(relationship.name) -%>
#<% key_value_pair = relationship.enumeration_key_for(value) -%>
#<%= key_value_pair ? key_value_pair.first : relationship.unassigned_value %>
module Streamlined::Controller::EnumerationMethods
def dynamic_enumeration
dynamic_enumeration_method=nil
@enumeration_name=params[:enumeration]
rel_type=model_ui.scalars[@enumeration_name.to_sym]
rel_type.enumeration.each { |k,v|
dynamic_enumeration_method=v if k==:action
}
dynamic_enumeration_method.nil? ? rel_type.enumeration : instance.send(dynamic_enumeration_method).to_2d_array
end
# Shows the enumeration’s configured +Edit+ view, as defined in streamlined_ui
# and Streamlined::Column.
def edit_enumeration
self.instance = model.find(params[:id])
@enumeration_name = params[:enumeration]
rel_type = model_ui.scalars[@enumeration_name.to_sym]
@all_items=dynamic_enumeration
@selected_item = instance.send(@enumeration_name)
render(:partial => rel_type.edit_view.partial, :locals => {:item => instance, :relationship => rel_type})
end
# Show’s the enumeration’s configured +Show+ view,
# as defined in streamlined_ui and Streamlined::Column.
def show_enumeration
self.instance = model.find(params[:id])
rel_type = model_ui.scalars[params[:enumeration].to_sym]
rel_type.enumeration=dynamic_enumeration
render(:partial => rel_type.show_view.partial, :locals => {:item => instance, :relationship => rel_type})
end
end
class Streamlined::Column::ActiveRecord < Streamlined::Column::Base
def dynamic_enumeration(item)
dynamic_enumeration_method=nil
@enumeration.each { |k,v|
dynamic_enumeration_method=v if k==:action
}
dynamic_enumeration_method.nil? ? @enumeration : item.send(dynamic_enumeration_method)
end
def render_td_show(view, item)
if enumeration
content = item.send(self.name)
@enumeration=dynamic_enumeration(item)
key_value_pair = enumeration_key_for(content) # call wraps enumeration to 2d array, so check unnecessary
content = key_value_pair.first if key_value_pair
content = content && !content.blank? ? content : self.unassigned_value
content = wrap_with_link(content, view, item)
else
render_content(view, item)
end
end
def render_enumeration_select(view, item)
id = relationship_div_id(name, item)
@enumeration=dynamic_enumeration(item)
choices = enumeration #enumeration call wraps to 2d array so extra call is redundant
choices.unshift(unassigned_option) if column_can_be_unassigned?(parent_model, name.to_sym)
args = [model_underscore, name, choices]
args << {} << html_options unless html_options.empty?
view.select(*args)
end
end
Microsoft Surface Parody
Hot on the heels of the Microsoft Surface announcement, and the statement by Tim Berners Lee that various devices and integrated software packages will be talking to each other seemlessly over the internet, and that everything (electronic) will be more tightly integrated in the future (Does anyone want to say Web 4.0 before they are shot down for confusing the internet with the web?), came this spoof voiceover of the advertising that Microsoft has produced to try to show that they are still innovating in hardware.
Complete Floors
I have another website on my portfolio. You can check out their work (and mine) at http://completefloorsltd.co.uk
Tree Surgery
New website development. Not in Ruby this time, but they are happy with the results. Check it out at http://butlerandbrown.com.
Realplayer streaming BBC to mp3 files
The BBC listen again facility allows you to play back audio broadcasts up to seven days after they originally air. That’s fine, unless you listen to most of your radio in the car, or away from your computer.
OK. So you can listen live, as long as you are in the UK, but sometimes I would like to listen to the last 3 episodes of Perelandra, and possibly find out what’s going on in Lionel Nimrod’s Inexplicable World, while on a plane, or driving around Seattle.
This is where Mplayer, and the title of this blog entry come in.
The basic premise is to use mplayer to stream an entry to the harddrive in PCM format (wav), convert from wav to mp3 and drop in onto an mp3 file.
The following snippet from a larger script demonstrates the basic principle.
#Input: $1 url
# $2 name of file to record to (excluding extension)
#
mplayer -prefer-ipv4 -bandwidth 99999999 -vc null -vo null -ao pcm:fast -ao pcm:file=$2.wav $1
lame $2.wav $2.mp3
rm $2.wav
With a high bandwidth, it takes roughly a minute to download and encode a programme. Renaming or naming files are pretty tedious to do however, so I started to look at listing programmes, such as bleb and the BBC’s own backstage listings in order to automate the process. Continue reading…