XML to YML conversion
Having switched from XML to YML as a data language for most of my code, I recently had a need to process some old legacy data (a chrononauts game) which was still in XML format. Rather than using this data as it was, I wanted to see how tricky it would be to convert it to a simple YML format structure. The original code was of the form:
<chrononauts>
<missions>
<name>Mona Lisa Triptych</name>
<artifact>Mona Lisa (The Real Thing)</artifact>
<artifact>Mona Lisa (An Excellent Forgery)</artifact>
<artifact>Mona Lisa (An Obvious Forgery)</artifact>
</missions>
<ids>
<name>Squa Tront</name>
<year>1933</year>
<year>1950'</year>
<year>1962'</year>
</ids>
</chrononauts>
and I wanted it to be more like
---
chrononauts:
missions:
- artifact:
- Mona Lisa (The Real Thing)
- Mona Lisa (An Excellent Forgery)
- Mona Lisa (An Obvious Forgery)
name: Mona Lisa Triptych
ids:
- name: Squa Tront
year:
- "1933"
- 1950'
- 1962'
One thing in my favour was that I didn’t have any attributes to process, only data inside tags, so I figured it would be very straightforward to do using REXML and YAML. I also wanted to have hashes, arrays and simple strings where appropriate from the source data. Sprinkle in a little recursion, and the result is a pretty simple XML to YML converter. There are obvious areas for improvement, but this has worked with the data sets I have used so far, so I haven’t had any need to modity it.
#!/usr/bin/env ruby
class CardsXmlYaml
require 'rubygems'
require 'yaml'
require 'rexml/document'
YMLFile='cards.yml'
XMLFile='cards.xml'
def self.xml_process(root)
head={}
begin
key=root.expanded_name
root.children.each do |el|
value=xml_process(el)
if value.is_a?(String)
next if value.gsub(/[\n\t\s]/,'').empty?
end
if head[key].nil?
head[key]=value
else
if head[key].keys.include?(value.keys[0])
old_value=head[key][value.keys[0]]
head[key][value.keys[0]]=[]
head[key][value.keys[0]]<<old_value
head[key][value.keys[0]]<<value.values[0]
head[key][value.keys[0]].flatten!
else
head[key][value.keys[0]]=value.values[0]
end
end
end
rescue
begin
return root.value
rescue
end
return nil
end
head
end
def self.to_yml_file(infile=XMLFile,outfile=YMLFile,conversion_type=:file)
output=File.new(outfile,'w')
output.puts(YAML.dump(self.to_yml(infile,conversion_type)))
output.close
end
def self.to_yml(doc=XMLFile,conversion_type=:file)
begin
doc=REXML::Document.new(File.new(doc)) if conversion_type==:file
rescue Exception=>e
print("Error reading xml data from file.n")
doc=nil
end
return nil if doc.nil?
base=[]
doc.children.each do |el|
base << xml_process(el)
end
base
end
end
Acts as state machine with legacy database - (rubyist aasm)
One of the (many!) projects I’ve been working on required a ruby interface to a legacy database which contains existing state information in numeric form. I’ve been using aasm for other projects, and since the state is well defined, this seemed like a good opportunity to see if aasm would handle legacy data. I’ve already started to use ActiveRecord with it (this isn’t a rails app, but pure ruby), so I thought I’d have a quick spike to see what would happen.
A quick look at the source for aasm quickly revealed that it only uses strings for state information in the database, so a translation layer was required.
Since the column name containing the state information wasn’t state, I formulated a slightly cunning plan. Tell aasm that the column was called state, and create state and state= methods that would perform the database translation.
class TestStatus < ActiveRecord::Base
aasm_column :state #required to force aasm to call my state method rather than use its own internal column definition
aasm_state :ok
aasm_state :fail
aasm_event :fail do
transitions :to => :fail, :from=>[:ok]
end
aasm_event :ok do
transitions :to => :ok, :from=>[:fail]
end
LEGACY_STATE={
0=>:ok,
1=>:fail
}
LEGACY_STATE_COLUMN=’ErrorState’
def state
LEGACY_STATE[read_attribute(LEGACY_STATE_COLUMN)]
end
def state=(value)
write_attribute(LEGACY_STATE_COLUMN,LEGACY_STATE.invert[value])
end
Executing the Unix find command to determine real file types from ruby
I recently needed to make sure all my files were named according to their content rather than to an arbitrary extension that had been added to them. This resulted in extending the ruby FileUtils to use the Unix file command to return the filename and type as array elements so with a script argument of:
/var/testfiles/*
you get an output of:
[[”/var/testfiles/movie_quiz_for_wiggy.doc”, “Microsoft Office Document”], [”/var/testfiles/movie_quiz_for_wiggy.odt”, “OpenDocument Text”], [”/var/testfiles/movie_quiz_for_wiggy.pdf”, “PDF document, version 1.4″]]
#!/usr/bin/env ruby
require 'fileutils'
module FileUtils
unless RUBY_PLATFORM=~/win[36]/
def self.file(src)
`file #{src}`.split("n").collect {|x| x.split(":",2).collect {|y| y.strip}}
end
end
end
p(FileUtils.file(ARGV[0]))
Automatically creating . for Ruby Hashes
I recently had to so some testing of an in-memory OLE object, which also allowed persistance to an XML file. The structure of the two (in-memory and in-file) were similar enough for me to look at XMLSimple, which creates a Hash, and since I only like writing code once where I figured that using the same code would be cool.
In memory I needed to do
entry.timing.connect
and with the file (via entry=XmlSimple(infile))
entry[:timing][:connect]
Those were similar enough for me to want to change how XmlSimple held its Hash internally, but I figured that there may be a more generic way to do this without breaking Hash. That led me to try out the following code
class Hash
def method_missing(sym,*args,&blk)
return self[sym] if self.key?(sym)
return self[sym.to_s] if self.key?(sym.to_s)
super
end
end
which just appears to work. Enjoy.
Enhancing Streamlined Enumerations
Recently, I’ve been looking at the Streamlined framework. For those of you who don’t know, Streamlined is an Ajaxified Scaffold currently under development. The edge version shows promise and is stable enough for my personal use as an administration tool.
One area which is particularly interesting is the way that they handle enumerations and the fact that they are called late in the process rather than being instantiated once and then used. This may appear as an inefficiency at first glance, but in tracing through the call progress, I realized that you could make them more dynamic and allow for dynamic changes to the enumeration on a per item basis.
This means that if you have an exclusive list, you can restrict the choices to only those items that haven’t yet been assigned to other rows in the database.
For example, in one of my projects, you can assign a unique number to each row, and my desire was to restrict the view so that only the numbers that are available can be chosen.
So if you have possible numbers of
[1,2,3,4,5,6,7,8]
Assign 1 to the first row and for new items, [2,3,4,5,6,7,8] should be available, but [1,2,3,4,5,6,7,8] would be available for editing the first row.
Assign 5 to the second row and for new items, [2,3,4,6,7,8] should be available with [1,2,3,4,6,7,8] available for editing the first row and [2,3,4,5,6,7,8] available for editing the second row.
Coding this for the model is fairly straightforward:
class DynamicTest < ActiveRecord::Base
def available_nodes
node_list=[1,2,3,4,5,6,7,8]
nodes=DynamicTest.find(:all)
nodes.each do |n|
node_list-=[n.number] unless n.number==number
end
node_list
end
end
The unfortunate thing is that Streamlined doesn’t support this call, you can perform a call to DynamicTest.available_nodes, but that wouldn’t let you know what the current item is and you wouldn’t be able to see it in the list or edit views. Not very useful. What is needed is a way to call this directly from the row rendering code when you have the item in scope.
Since this is new functionality for Streamlined, the guys who maintain the codebase may adopt it, but for those of you who want to monkeypatch your own version or just see my take on it, you can download this sample project.
The monkeypatch (in app/streamlined/dynamic_tests_ui.rb) overrides four of the streamlined functions and adds two more for handling dynamic enumerations. This means that in addition to the original
Streamlined.ui_for(DynamicTest) do
user_columns :number, {:enumeration => Numbers::TYPES}
end
class Numbers
TYPES = [1,2,3,4,5,6,7,8]
end
and its Hash and 2d array counterparts, you can now have:
Streamlined.ui_for(DynamicTest) do
user_columns :number, {:enumeration => {:action=>:available_nodes}}
end
which will perform a late call to the DynamicTest#available_nodes scoped for the current row.
For those of you who just want to look at the code without downloading a full rails project, the relevant monkeypatched pieces are:
#Note: There is a bug in _enumeration.html that prevents non-Fixednum numeric
#indices. This should be updated in the template version
#<% value = item.send(relationship.name) -%>
#<% key_value_pair = relationship.enumeration_key_for(value) -%>
#<%= key_value_pair ? key_value_pair.first : relationship.unassigned_value %>
module Streamlined::Controller::EnumerationMethods
def dynamic_enumeration
dynamic_enumeration_method=nil
@enumeration_name=params[:enumeration]
rel_type=model_ui.scalars[@enumeration_name.to_sym]
rel_type.enumeration.each { |k,v|
dynamic_enumeration_method=v if k==:action
}
dynamic_enumeration_method.nil? ? rel_type.enumeration : instance.send(dynamic_enumeration_method).to_2d_array
end
# Shows the enumeration’s configured +Edit+ view, as defined in streamlined_ui
# and Streamlined::Column.
def edit_enumeration
self.instance = model.find(params[:id])
@enumeration_name = params[:enumeration]
rel_type = model_ui.scalars[@enumeration_name.to_sym]
@all_items=dynamic_enumeration
@selected_item = instance.send(@enumeration_name)
render(:partial => rel_type.edit_view.partial, :locals => {:item => instance, :relationship => rel_type})
end
# Show’s the enumeration’s configured +Show+ view,
# as defined in streamlined_ui and Streamlined::Column.
def show_enumeration
self.instance = model.find(params[:id])
rel_type = model_ui.scalars[params[:enumeration].to_sym]
rel_type.enumeration=dynamic_enumeration
render(:partial => rel_type.show_view.partial, :locals => {:item => instance, :relationship => rel_type})
end
end
class Streamlined::Column::ActiveRecord < Streamlined::Column::Base
def dynamic_enumeration(item)
dynamic_enumeration_method=nil
@enumeration.each { |k,v|
dynamic_enumeration_method=v if k==:action
}
dynamic_enumeration_method.nil? ? @enumeration : item.send(dynamic_enumeration_method)
end
def render_td_show(view, item)
if enumeration
content = item.send(self.name)
@enumeration=dynamic_enumeration(item)
key_value_pair = enumeration_key_for(content) # call wraps enumeration to 2d array, so check unnecessary
content = key_value_pair.first if key_value_pair
content = content && !content.blank? ? content : self.unassigned_value
content = wrap_with_link(content, view, item)
else
render_content(view, item)
end
end
def render_enumeration_select(view, item)
id = relationship_div_id(name, item)
@enumeration=dynamic_enumeration(item)
choices = enumeration #enumeration call wraps to 2d array so extra call is redundant
choices.unshift(unassigned_option) if column_can_be_unassigned?(parent_model, name.to_sym)
args = [model_underscore, name, choices]
args << {} << html_options unless html_options.empty?
view.select(*args)
end
end
Rails IDE - Komodo 4.1
I’m a great believer in free software, most of my systems are run using Apache, MySQL, Linux, and Ruby, as I’m sure quite a lot of you are running also. I’m also a great believer in the right tool for the right job, even if that isn’t a free tool. After using several of the free offerings, I downloaded the 4.0 beta of Komodo IDE to see if it was the right tool for developing RubyOnRails apps.
It was a little clunky. The editor and syntax highlighting were fine, and the approach to extending language support was also great. (I use Haml and Sass, rather than RHtml and CSS) The debugger took up to a minute to hit breakpoints, however, and although it was possible to use it , it was a little more difficult than I would have liked.
4.1 has changed all of that for me. Currently in beta, it is fast and shows the great strides that they have achieved. The Pro version is simply the best IDE I have used for Rails, bar none. I purchased mine and use Komodo now almost exclusively on all of my projects. Version 4.1 is available as a Trial from here.
Update: 4.1 has become the official release version, and is no longer in Beta. I feel an upgrade coming on.
Faster approach to identifying duplicates in a Ruby array
Not mine, this one. Came from bshow here
a = [1,1,5,5,2,3,99,54,54,3,7,54,54,3,19]
a.inject({}) {|h,v| h[v]=h[v].to_i+1; h}.reject{|k,v| v==1}.keys.inspect # => [1, 3, 5, 54]
Ruby Fnord Generator - Part Two
In Part 1, I took you through the beginnings of the Fnord generator up to the point we could create Fnords using random words and optional parts of speech. This gave us a class Fnord which contained the following functions.
require "fnord_words.rb" #Arrays of NOUNS, ADJECTIVES, PLACES etc.
class Fnord
def self.true?(chance) (chance==0 or rand(chance)<1) end
def self.build(string_array,chance=0) true?(chance) ? string_array[rand(string_array.length)] : "" end
def self.in_place(chance=0) true?(chance) ? "in #{build(PLACES)}" : "" end
def self.adjective(chance=0) build(ADJECTIVES, chance) end
def self.name(chance=0) build(NAMES, chance) end
def self.place(chance=0) build(PLACES, chance) end
def self.preposition(chance=0) build(PREPOSITIONS, chance) end
def self.action(chance=0) build(ACTIONS, chance) end
def self.pronoun(chance=0) build(PRONOUNS, chance) end
def self.intro(chance=0) build(INTROS, chance) end
def self.noun(chance=0) build(NOUNS, chance) end
end
I moved the word lists into their own file “fnord_words.rb”. Since these are separately generated and updated by SJ Games, it made sense to have them as separate files. I thought of writing a quick Perl->Ruby conversion to allow for the files to be dropped in, but since manually updating requires making only small changes to the file, I decided to leave this as an exercise for a later date.
I was using the normal
msg=case rand(14) #Return generated Fnord as a string
when 0: "The #{adjective(2)} #{noun} #{in_place(5)} is #{adjective}."
when 1: "#{name} #{action} the #{adjective} #{noun} and the #{adjective} #{noun}."
when 2: "The #{noun} from #{place} will go to #{place}."
when 3: "#{name} must take the #{adjective} #{noun} from #{place}."
when 4: "#{place} is #{adjective} and the #{noun} is #{adjective}."
.
.
.
when 13: "A #{noun} from #{place} #{action} the #{adjective(2)} #{adjective(5)} #{noun}."
end
and I figured I could move all of the data including the SENTENCES into a template to make it even easier to update. This required two things to happen.
- All of the parts of speech required are embedded in the string.
- The parts of speech can only be evaluated at runtime when the method is called.
We’d already achieved the first, and in order to have the second all that was needed was to move the SENTENCE strings into single quotes, such as
'#{intro(5)} the #{adjective(2)} #{adjective(2)} #{noun} #{action} the #{adjective(2)} #{adjective(2)} #{noun} #{in_place(2)}.'
In order to execute this later, you use the eval function to perform substitutions. Ignore normalize for now, it’s only to remove extra spaces and fixup capitalization.
def self.sentence(chance=0)
normalize(eval('"'+build(SENTENCES,chance)+'"'))
end
private
def self.normalize(msg)
while msg.include?(" ")
msg.gsub!(/ /," ")
end
msg.gsub!(/^ /,"")
msg.gsub!(/ ./,".")
msg.gsub!(/[s^]([aA])s([aeiouhy])/,' 1n 2')
msg[0]=msg[0,1].upcase
while msg[/([^A-Z][.!?:])s+([a-z])/]
msg[/([^A-Z][.!?:])s+([a-z])/]="#{$1} #{$2.upcase}"
end
msg
end
and voila. Sentences can be constructed by calling
print Fnord.sentence
We are still missing the word lists. You can grab the latest updated Perl version from SJ Games, or, if you are feeling really lazy, download my updated Ruby code from my website and grab my personalized Rubyised word lists from here. They aren’t strictly the Fnords word list, so you may prefer the SJ Games ones, but they are in the correct format.
If you’ve gone down the Perl wordlist route, you won’t have found the SENTENCES array, which you need for my code to work. You can either modify my oroginal code, or use the expanded versions below.
#Note. Parts of speech are coded in the main application and MUST NOT be expanded here. Hence only ' rather then " can be used.
SENTENCES=[
'#{intro(5)} #{name} #{action} #{name} and #{pronoun} #{adjective(2)} #{adjective(2)} #{noun}.',
'#{intro(5)} #{name} #{action} the #{adjective(2)} #{adjective(2)} #{noun} and the #{adjective(2)} #{adjective(2)} #{noun} #{in_place(2)}.',
'#{intro(5)} #{name} #{action} the #{adjective(2)} #{adjective(2)} #{noun} of #{place}.',
'#{intro(5)} #{name} #{preposition} #{place} and #{action} the #{adjective(2)} #{adjective(2)} #{noun}.',
'#{intro(5)} #{name} #{preposition} #{place} for the #{adjective(2)} #{adjective} #{noun}.',
'#{intro(5)} #{name} is the #{adjective(2)} #{adjective(2)} #{noun}; #{name} #{preposition} #{place}.',
'#{intro(5)} #{name} must take the #{adjective(2)} #{adjective(2)} #{noun} from #{place}.',
'#{intro(5)} #{name} takes #{pronoun} #{adjective(2)} #{adjective(2)} #{noun} and #{preposition} #{place}.',
'#{intro(5)} #{place} is #{adjective} and the #{noun} is #{adjective}.',
'#{intro(5)} a #{adjective(2)} #{adjective(2)} #{noun} from #{place} #{action} the #{adjective(2)} #{adjective(2)} #{noun}.',
'#{intro(5)} the #{adjective(2)} #{adjective(2)} #{noun} #{action} the #{adjective(2)} #{adjective(2)} #{noun} #{in_place(2)}.',
'#{intro(5)} the #{adjective(2)} #{adjective(2)} #{noun} #{in_place(2)} is #{adjective}.',
'#{intro(5)} the #{adjective(2)} #{adjective(2)} #{noun} from #{place} will go to #{place}.',
'#{intro(5)} you must meet #{name} at #{place} and get the #{adjective(2)} #{adjective(2)} #{noun}.'
]
Enjoy
Ruby Fnord Generator - Part One
I recently came across the Steve Jackson Games Fnords program and thought that there should be a really cool and easy way to generate this type of sentence using Ruby. The basic priciple is that you take arrays of NOUNS, PLACES, VERBS, and other parts of speech and generate syntacically correct nonsensical sentences. A little bit like a truncated form of madlibs.
Ruby embedded code in strings was something that I’d played around with, and I thought that it should be relatively easy to base sentence construction using methods embedded in the strings. So basically
"#{name} is a #{adjective} #{noun}."
could be run through the generator and create “Mac is a brown dog”, “Sigmund Freud is a coherent lightbulb”, type of sentences. Fairly basic stuff with NAME, ADJECTIVE and NOUN being arrays of appropriate words and the method, for example, of self.name being defined as
def self.name NAME[rand(NAME.length)] end
If you’ve taken a look at the other Fnords programs at SJ Games (or even my early one which is now downloadable from there), you will have noticed that the sentences themselves have a random chance of having some parts of speech included. It was this random element that first attracted me to solving the problem using Ruby.
Most code on the site uses logic of the following form.
msg="The" # The
if rand(2) == 0 # adjective - (50% chance)
msg+=adjecive
end
msg+=noun # noun
if rand(5) == 0 # in place - (20% chance)
msg+="in #{place}"
end
msg+="is #{adjective}." # is adjective.
That wasn’t DRY enough for me, and too much code when a little would do the same thing. Moving the (rand()) into the actual function that defined the part of speech and making the return from the call optional would do almost the same thing. Worst case I would have to fix up spaces at the end. (I actually decided to do space fixup as a final pass, since it meant I could just type the sentences spaced normally, which makes it easier to maintain.)
Ok so with the method for noun now as
def self.noun(chance=0)
if (rand(chance)<1) then
NOUN[rand(NOUN.length)]
else
""
end
end
The way the Ruby rand works, passing it 0 gives a random number between 0 and 1 and passing a positive integer greater than 0 generates a random integer, so with this code rand(0)<1 is always true, so the next iteration changed this to
if (chance==0 or rand(chance)<1)
Finally to DRY it up even more, I removed the randomness into two separate methods and switched to the “? :” form of “if then”.
def self.true?(chance)
(chance==0 or rand(chance)<1)
end
def self.build(string_array,chance=0)
true?(chance) ? string_array[rand(string_array.length)] : ""
end
which meant that my noun method could now simplify again to
def self.noun(chance=0)
build(NOUNS, chance)
end
One special case was required to allow for “in place” as optional rather than just “place”.
def self.in_place(chance=0)
true?(chance) ? "in #{build(PLACES)}" : ""
end
Now I could create strings using my desired form.
msg="The #{adjective(2)} #{noun} #{in_place(5)} is #{adjective}."
Part 2 will examine how to take this and create the fully functional fnords program.
Fast extraction of duplicate array items into a new array
Whilst creating a natural language parser, one of the things I was presented with was multiple merged dictionaries, which needed some processing. I was asked to supply a list of duplicated words back, and after trawling the web finding slow code, I decided that going the fast, but inefficient (in terms of space) was the way to go.
The object is to return an array of just those elements that are duplicated in as little time as possible, from a 50,000 word list. One sort and one lookup per element was what I came up with
array=["long","array","with","lots","and","lots","and","lots","of","array","duplicates","long","and","array"]
arr2=array.sort
arr3=[]
arr4=[]
arr2.each { |a| (arr3[-1]==a) ? (arr4[-1]!=a) ? arr4 << a : "" : arr3 << a }
arr3 => [”and”,”array”,”duplicates”,”long”,”lots”,”of”,”with”]
arr4 => [”and”,”array”,”long”,”lots”]
If you are sure that there are fewer duplicates in the returned array of just duplicated elements, you can remove the arr4 checks at the expense of a final .uniq! pass. i.e.
array=["long","array","with","lots","and","lots","and","lots","of","array","duplicates","long","and","array"]
arr2=array.sort
arr3=[]
arr4=[]
arr2.each { |a| (arr3[-1]==a) ? arr4 << a : arr3 << a }
arr4.uniq!
arr3 => [”and”,”array”,”duplicates”,”long”,”lots”,”of”,”with”]
arr4 => [”and”,”array”,”long”,”lots”]