XML to YML conversion

Posted by blackrat on October 13, 2009

Having switched from XML to YML as a data language for most of my code, I recently had a need to process some old legacy data (a chrononauts game) which was still in XML format. Rather than using this data as it was, I wanted to see how tricky it would be to convert it to a simple YML format structure. The original code was of the form:

<chrononauts>
  <missions>
    <name>Mona Lisa Triptych</name>
    <artifact>Mona Lisa (The Real Thing)</artifact>
    <artifact>Mona Lisa (An Excellent Forgery)</artifact>
    <artifact>Mona Lisa (An Obvious Forgery)</artifact>
  </missions>
  <ids>
    <name>Squa Tront</name>
    <year>1933</year>
    <year>1950'</year>
    <year>1962'</year>
  </ids>
</chrononauts>

and I wanted it to be more like

---
chrononauts:
  missions:
    - artifact:
      - Mona Lisa (The Real Thing)
      - Mona Lisa (An Excellent Forgery)
      - Mona Lisa (An Obvious Forgery)
      name: Mona Lisa Triptych
  ids:
    - name: Squa Tront
      year:
      - "1933"
      - 1950'
      - 1962'

One thing in my favour was that I didn’t have any attributes to process, only data inside tags, so I figured it would be very straightforward to do using REXML and YAML. I also wanted to have hashes, arrays and simple strings where appropriate from the source data. Sprinkle in a little recursion, and the result is a pretty simple XML to YML converter. There are obvious areas for improvement, but this has worked with the data sets I have used so far, so I haven’t had any need to modity it.

#!/usr/bin/env ruby
class CardsXmlYaml
  require 'rubygems'
  require 'yaml'
  require 'rexml/document'
  YMLFile='cards.yml'
  XMLFile='cards.xml'

  def self.xml_process(root)
    head={}
    begin
      key=root.expanded_name
      root.children.each do |el|
        value=xml_process(el)
        if value.is_a?(String)
          next if value.gsub(/[\n\t\s]/,'').empty?
        end
        if head[key].nil?
          head[key]=value
        else
          if head[key].keys.include?(value.keys[0])
            old_value=head[key][value.keys[0]]
            head[key][value.keys[0]]=[]
            head[key][value.keys[0]]<<old_value
            head[key][value.keys[0]]<<value.values[0]
            head[key][value.keys[0]].flatten!
          else
            head[key][value.keys[0]]=value.values[0]
          end
        end
      end
    rescue
      begin
        return root.value
      rescue
      end
      return nil
    end
    head
  end

  def self.to_yml_file(infile=XMLFile,outfile=YMLFile,conversion_type=:file)
    output=File.new(outfile,'w')
    output.puts(YAML.dump(self.to_yml(infile,conversion_type)))
    output.close
  end

  def self.to_yml(doc=XMLFile,conversion_type=:file)
    begin
      doc=REXML::Document.new(File.new(doc)) if conversion_type==:file
    rescue Exception=>e
      print("Error reading xml data from file.n")
      doc=nil
    end
    return nil if doc.nil?
    base=[]
    doc.children.each do |el|
      base << xml_process(el)
    end
    base
  end
end

Recovering from Subversion checksum error corruption

Posted by blackrat on October 09, 2009

I still use subversion for most of my projects, despite the fact that most of the (Ruby) world seems to be moving to git as its main repository. I’m still at the “I’ll try it” stage (see Softies on Rails for the 4 stages of Rubyist experimentation), and will be moving over once I’ve tested that I can use Capistrano, CruiseControl, etc. with it, and know enough to support my developers/testers if and when things get sticky.
This is a log of my experiments trying to recover the files from a broken subversion repository. You may lose some data if you use this technique, and if you do, don’t blame me, but this let me recover the files, so that I could rebuild another repository as a fresh start. Note that all of the history will be lost, but if you need the files from a repository that you can’t get to because of a corruption issue.
My corrupted repository barfs at revision 51 which is a 221Mb checkin containing zip-files and other binaries using svnadmin verify gives:

svnadmin verify /svn

Output

* Verified revision 0.
.
.
* Verified revision 50.
svnadmin: Checksum mismatch while reading representation:
   expected:  589cf19ceac143315e0d61f9873ed7fb
     actual:  b0b0bf50ec4b0b089730796f3355c649

This means that something has gone wrong in revision 51. As the revision history of this isn’t that important, since this was a scratchpad project and most of the notes are of the “trying this”, “seeing what happens when” type, I wanted to see if I could pull all of the content back from it.
I’d had a look at the fsfsverify.py, but no dice. Even running it several time didn’t fix the problem with the broken revision, and doing a svn co broke in the iconex directory. However running

svn co file:///svn/trunk /tmp/trunk

did check out a large proportion of the files, up to the part where the file in the iconex directory was corrupted.
So I started to think. What would happen if I took a copy of the corrupted directory, and did an svn del on the original broken directory and checked it back in. Would that let me access the files in later revisions? I knew that none of the files in that directory had changed since revision 51, since it was purely a zip file checkin as a backup to the originals.
So

cp -r iconex iconex_backup
svn del iconex
svn ci -m 'removed broken files'
svn up

A brief wait, and the rest of the files started to check out. There were a couple of externals that needed to be changed using svn propedit svn:externals, since the connection to these files was no longer available as this repository wasn’t being served. Another svn up command, and all of the files were successfully checked out.
Hopefully this will give someone else a starting point if all attempts at recovery of the database fail, and at least you will get your non-corrupted files back.

Acts as state machine with legacy database - (rubyist aasm)

Posted by blackrat on October 09, 2009

One of the (many!) projects I’ve been working on required a ruby interface to a legacy database which contains existing state information in numeric form. I’ve been using aasm for other projects, and since the state is well defined, this seemed like a good opportunity to see if aasm would handle legacy data. I’ve already started to use ActiveRecord with it (this isn’t a rails app, but pure ruby), so I thought I’d have a quick spike to see what would happen.

A quick look at the source for aasm quickly revealed that it only uses strings for state information in the database, so a translation layer was required.

Since the column name containing the state information wasn’t state, I formulated a slightly cunning plan. Tell aasm that the column was called state, and create state and state= methods that would perform the database translation.


class TestStatus < ActiveRecord::Base
  aasm_column :state  #required to force aasm to call my state method rather than use its own internal column definition
  aasm_state :ok
  aasm_state :fail

  aasm_event :fail do
    transitions :to => :fail, :from=>[:ok]
  end

  aasm_event :ok do
     transitions :to => :ok, :from=>[:fail]
  end

  LEGACY_STATE={
    0=>:ok,
    1=>:fail
  }
  LEGACY_STATE_COLUMN=’ErrorState’

  def state
    LEGACY_STATE[read_attribute(LEGACY_STATE_COLUMN)]
  end

  def state=(value)
    write_attribute(LEGACY_STATE_COLUMN,LEGACY_STATE.invert[value])
  end

Executing the Unix find command to determine real file types from ruby

Posted by blackrat on July 20, 2009

I recently needed to make sure all my files were named according to their content rather than to an arbitrary extension that had been added to them. This resulted in extending the ruby FileUtils to use the Unix file command to return the filename and type as array elements so with a script argument of:

/var/testfiles/*

you get an output of:

[[”/var/testfiles/movie_quiz_for_wiggy.doc”, “Microsoft Office Document”], [”/var/testfiles/movie_quiz_for_wiggy.odt”, “OpenDocument Text”], [”/var/testfiles/movie_quiz_for_wiggy.pdf”, “PDF document, version 1.4″]]


#!/usr/bin/env ruby
require 'fileutils'

module FileUtils
  unless RUBY_PLATFORM=~/win[36]/
    def self.file(src)
      `file #{src}`.split("n").collect {|x| x.split(":",2).collect {|y| y.strip}}
    end
  end
end

p(FileUtils.file(ARGV[0]))

Automatically creating . for Ruby Hashes

Posted by blackrat on April 21, 2009

I recently had to so some testing of an in-memory OLE object, which also allowed persistance to an XML file. The structure of the two (in-memory and in-file) were similar enough for me to look at XMLSimple, which creates a Hash, and since I only like writing code once where I figured that using the same code would be cool.

In memory I needed to do
entry.timing.connect

and with the file (via entry=XmlSimple(infile))
entry[:timing][:connect]

Those were similar enough for me to want to change how XmlSimple held its Hash internally, but I figured that there may be a more generic way to do this without breaking Hash. That led me to try out the following code

  class Hash
    def method_missing(sym,*args,&blk)
     return self[sym] if self.key?(sym)
     return self[sym.to_s] if self.key?(sym.to_s)
     super
    end
  end

which just appears to work. Enjoy.

Enhancing Streamlined Enumerations

Posted by blackrat on March 26, 2008

Recently, I’ve been looking at the Streamlined framework. For those of you who don’t know, Streamlined is an Ajaxified Scaffold currently under development. The edge version shows promise and is stable enough for my personal use as an administration tool.
One area which is particularly interesting is the way that they handle enumerations and the fact that they are called late in the process rather than being instantiated once and then used. This may appear as an inefficiency at first glance, but in tracing through the call progress, I realized that you could make them more dynamic and allow for dynamic changes to the enumeration on a per item basis.
This means that if you have an exclusive list, you can restrict the choices to only those items that haven’t yet been assigned to other rows in the database.
For example, in one of my projects, you can assign a unique number to each row, and my desire was to restrict the view so that only the numbers that are available can be chosen.
So if you have possible numbers of
[1,2,3,4,5,6,7,8]

Assign 1 to the first row and for new items, [2,3,4,5,6,7,8] should be available, but [1,2,3,4,5,6,7,8] would be available for editing the first row.

Assign 5 to the second row and for new items, [2,3,4,6,7,8] should be available with [1,2,3,4,6,7,8] available for editing the first row and [2,3,4,5,6,7,8] available for editing the second row.

Coding this for the model is fairly straightforward:

class DynamicTest < ActiveRecord::Base
  def available_nodes
    node_list=[1,2,3,4,5,6,7,8]
    nodes=DynamicTest.find(:all)
    nodes.each do |n|
      node_list-=[n.number] unless n.number==number
    end
    node_list
  end
end

The unfortunate thing is that Streamlined doesn’t support this call, you can perform a call to DynamicTest.available_nodes, but that wouldn’t let you know what the current item is and you wouldn’t be able to see it in the list or edit views. Not very useful. What is needed is a way to call this directly from the row rendering code when you have the item in scope.
Since this is new functionality for Streamlined, the guys who maintain the codebase may adopt it, but for those of you who want to monkeypatch your own version or just see my take on it, you can download this sample project.

The monkeypatch (in app/streamlined/dynamic_tests_ui.rb) overrides four of the streamlined functions and adds two more for handling dynamic enumerations. This means that in addition to the original

Streamlined.ui_for(DynamicTest) do
  user_columns :number, {:enumeration => Numbers::TYPES}
end

class Numbers
  TYPES = [1,2,3,4,5,6,7,8]
end

and its Hash and 2d array counterparts, you can now have:

Streamlined.ui_for(DynamicTest) do
  user_columns :number, {:enumeration => {:action=>:available_nodes}}
end

which will perform a late call to the DynamicTest#available_nodes scoped for the current row.

For those of you who just want to look at the code without downloading a full rails project, the relevant monkeypatched pieces are:


#Note: There is a bug in _enumeration.html that prevents non-Fixednum numeric
#indices. This should be updated in the template version
#<% value = item.send(relationship.name) -%>
#<% key_value_pair = relationship.enumeration_key_for(value) -%>
#<%= key_value_pair ? key_value_pair.first : relationship.unassigned_value %>

module Streamlined::Controller::EnumerationMethods
  def dynamic_enumeration
    dynamic_enumeration_method=nil
    @enumeration_name=params[:enumeration]
    rel_type=model_ui.scalars[@enumeration_name.to_sym]
    rel_type.enumeration.each { |k,v|
      dynamic_enumeration_method=v if k==:action
    }
    dynamic_enumeration_method.nil? ? rel_type.enumeration : instance.send(dynamic_enumeration_method).to_2d_array
  end

  # Shows the enumeration’s configured +Edit+ view, as defined in streamlined_ui
  # and Streamlined::Column.
  def edit_enumeration
    self.instance = model.find(params[:id])
    @enumeration_name = params[:enumeration]
    rel_type = model_ui.scalars[@enumeration_name.to_sym]
    @all_items=dynamic_enumeration
    @selected_item = instance.send(@enumeration_name)
    render(:partial => rel_type.edit_view.partial, :locals => {:item => instance, :relationship => rel_type})
  end

  # Show’s the enumeration’s configured +Show+ view,
  # as defined in streamlined_ui and Streamlined::Column.
  def show_enumeration
    self.instance = model.find(params[:id])
    rel_type = model_ui.scalars[params[:enumeration].to_sym]
    rel_type.enumeration=dynamic_enumeration
    render(:partial => rel_type.show_view.partial, :locals => {:item => instance, :relationship => rel_type})
  end
end

class Streamlined::Column::ActiveRecord < Streamlined::Column::Base
  def dynamic_enumeration(item)
    dynamic_enumeration_method=nil
    @enumeration.each { |k,v|
      dynamic_enumeration_method=v if k==:action
    }
    dynamic_enumeration_method.nil? ? @enumeration : item.send(dynamic_enumeration_method)
  end

  def render_td_show(view, item)
    if enumeration
      content = item.send(self.name)
      @enumeration=dynamic_enumeration(item)
      key_value_pair = enumeration_key_for(content) # call wraps enumeration to 2d array, so check unnecessary
      content = key_value_pair.first if key_value_pair
      content = content && !content.blank? ? content : self.unassigned_value
      content = wrap_with_link(content, view, item)
    else
      render_content(view, item)
    end
  end

  def render_enumeration_select(view, item)
    id = relationship_div_id(name, item)
    @enumeration=dynamic_enumeration(item)
    choices = enumeration  #enumeration call wraps to 2d array so extra call is redundant
    choices.unshift(unassigned_option) if column_can_be_unassigned?(parent_model, name.to_sym)
    args = [model_underscore, name, choices]
    args << {} << html_options unless html_options.empty?
    view.select(*args)
  end
end

Microsoft Surface Parody

Posted by blackrat on March 17, 2008

Hot on the heels of the Microsoft Surface announcement, and the statement by Tim Berners Lee that various devices and integrated software packages will be talking to each other seemlessly over the internet, and that everything (electronic) will be more tightly integrated in the future (Does anyone want to say Web 4.0 before they are shot down for confusing the internet with the web?), came this spoof voiceover of the advertising that Microsoft has produced to try to show that they are still innovating in hardware.

Complete Floors

Posted by blackrat on August 20, 2007

I have another website on my portfolio. You can check out their work (and mine) at http://completefloorsltd.co.uk

Tree Surgery

Posted by blackrat on August 20, 2007

New website development. Not in Ruby this time, but they are happy with the results. Check it out at http://butlerandbrown.com.

Realplayer streaming BBC to mp3 files

Posted by blackrat on August 02, 2007

The BBC listen again facility allows you to play back audio broadcasts up to seven days after they originally air. That’s fine, unless you listen to most of your radio in the car, or away from your computer.

OK. So you can listen live, as long as you are in the UK, but sometimes I would like to listen to the last 3 episodes of Perelandra, and possibly find out what’s going on in Lionel Nimrod’s Inexplicable World, while on a plane, or driving around Seattle.

This is where Mplayer, and the title of this blog entry come in.

The basic premise is to use mplayer to stream an entry to the harddrive in PCM format (wav), convert from wav to mp3 and drop in onto an mp3 file.

The following snippet from a larger script demonstrates the basic principle.


#Input:      $1 url
#               $2 name of file to record to (excluding extension)
#
mplayer -prefer-ipv4 -bandwidth 99999999 -vc null -vo null -ao pcm:fast -ao pcm:file=$2.wav $1
lame $2.wav $2.mp3
rm $2.wav

With a high bandwidth, it takes roughly a minute to download and encode a programme. Renaming or naming files are pretty tedious to do however, so I started to look at listing programmes, such as bleb and the BBC’s own backstage listings in order to automate the process. Continue reading…