here’s a little plugin for ActiveRecord-JDBC which enables simple use of MySQL master-slave configurations
Archive for the ‘Programming’ Category
ActiveRecord-JDBC plugin for working with MySQL master-slave configurations
By craig mcmillan on March 20th, 2009Has Many + non default primary key loads incorrect data in Rails 2.2.2
By Emma Persky on January 29th, 2009
I found an interesting bug in Rails 2.2.2 yesterday. I couldn’t find a similar bug on the rails lighthouse so created a new ticket. What was most interesting though, was how quick the rails core team picked up the bug and assigned it to someone.
It turns out that the bug had already been fixed in the current master branch of the rails git repo, though apparently no one had noticed it’s existence because I can’t find any references to this anywhere. I guess the fix in activerecord, which is almost identical to my fix below, will form part of the next release whenever that is.
I assume this is probably also the case for other has_* relationships, but have not verified.
I have a has_many association from class Foo to class Bar, where, for this specific relationship, the primary key on Foo is not id, nor is the foreign key on Bar id.
class Foo
has_many, :bars, :primary_key => 'a_non_standard_key_name', :foreign_key => 'another_non_standard_key_name'
end
The relationship is one way, I have no need to navigate from Bar back to Foo, but only call a_foo.bars.
This works fine when working with a single object, but breaks down when you want to do eager association preloading to avoid n+1 query problem of loading bars for many foos.
When performing the following you find that
f = Foo.find :all, :include => :Bar
f.bars = [SOMETHING_UNEXPECTED]
The reason is that ActiveRecord creates the preloading query based on the default primary key of Foo (normally id).
It queries for Bar.another_non_standard_key_name matching Foo.id not Foo.a_non_standard_key_name
This causes seriously unexpected behaviour, and could easily go unnoticed since no errors are thrown.
I have found the hook in ActiveRecord where this functionality should be included and monkey patched for my system, because I need it now. I can’t vouch for it’s correctness, but we have many many specs for our product and none of them have broken because of this.
I’m running frozen rails 2.2.2
vendor/activerecord/lib/active_record/association_preload.rb, line 221
Change
primary_key_name = reflection.through_reflection_primary_key_name
to
primary_key_name = reflection.through_reflection_primary_key_name || reflection.options[:primary_key]
Hope this helps someone!
JRuby + Clojure’s Immutable Data Structures = Easy to maintain, application data-model.
By Daniel Kwiecinski on January 22nd, 2009Implementing an application with rich data-model which can be updated by multiple UI controls, many concurrent threads with undo/redo functionality may be somewhat cumbersome. In order to ease this task, the functional programming paradigm with the immutable data structures turned out to be useful.
Because all good developers are lazy, one should seek for reuse rather than reinventing required tools, especially when there is good existing one. I tried to follow that path. Since we are using JRuby as our language of choice here at Trampoline, I decided to look more closely at clojure’s immutable data structures. It is straightforward to use Java classes from JRuby which is described in many places on the web already (here, here & here). The unknown to me was how can I use clojure’s objects from Jruby. Apparently clojure data structures are delivered as pre-compiled java classes and no runtime interpretation/compilation of clojure scripts is needed. The task turned out to be very easy.
The simple implementation of graph data structure with no deletion functionality looks as simple as:
In order to have Clojure collections look more like Ruby ones one can define aliases for their methods:
Unfortunately (or fortunately due to different contract) we can not do it with all the methods. Particularly with mutating ones. That’s because Ruby’s = (assign operator) semantics is to return the value being assign. It is analogous to []= method as well. So even if we redefine the []=(key, val) method so that the method returns the updated version of the collection, the Ruby interpreter will step into the scene and wrap the whole method, so that it eventually returns val. Anyway, whether this is good or bad is the topic for a whole other post.
Pearsons in the database, part 2
By David MacIver on December 9th, 2008Before I explain what this is about, the following tweets provide useful context for how I feel about this:
http://twitter.com/DRMacIver/status/1047320819
http://twitter.com/DRMacIver/status/1047321174
We’ve been discovering that the SQL query I wrote for calculating pearsons, while it works fine for small datasets (say a few hundred thousand ratings), once you get to around the million rating mark starts being unusably slow, even for a nightly job. Basically if you look at the query plan it ends up doing a filesort. This is not nice, as it involves an awful lot of data paging to and from disk. Having tried to optimise it directly and failed, I’ve spent the last two weeks writing nasty hacks to try to make it fast and was going to follow up to the blog post with my final solution (which involves a temporary table and a stored procedure. It’s pretty grim).
But today I was doing something else which involved a similar sort of query and got really pissed off that I was having exactly the same problems. I boiled it down to a very simple example which illustrated it and logged onto freenode’s #mysql to see if anyone could help me figure out what’s going on. Last time I tried this I was not successful, but one lives in hope.
Well, as it turns out, someone could. Here’s a snippet of conversation:
16:20 < DRMacIver> I'm finding this pattern comes up a lot in what I'm doing at
the moment, and I simply can't figure out a sensible way to
optimise it. Any suggestions? http://pastebin.com/m5be98ace
16:21 < DRMacIver> The self join on the same column followed by a group by seems
to produce a filesort no matter what I do. As per example,
basically all indices that could exist on the table do.
16:22 < jbalint> DRMacIver: i think you can order by null
16:22 * DRMacIver boggles
16:22 < DRMacIver> That works. Thank you.
So, by updating one line in the SQL I’ve posted previously the query becomes dramatically faster. I’ve also updated it to use an update join (which is MySQL specific) instead of the dependent subquery in the set (which is not, but which MySQL runs appallingly slowly) in order to get the standard deviation calculations to run in reasonable time.
Why is this happening? Well, http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html contains the answer. It turns out that if you have a group by then it implicitly orders by that group. Even if the group clause has no index on it (which it can’t in this case because it’s a join across two tables), and ordering a large dataset by a non-indexed key will cause a filesort. If you add the clause “order by null” then it will not order by the group by clause, and the filesort goes away and the query becomes much faster.
We are not amused.
As another general tip, we’ve also found that this rapidly starts to generate unacceptably large amount of data, but that if you set some sensible lower bounds (such as only generating the statistics for things which cooccur more than once and have a pearsons > 0.1) it can easily be reduced to sensible levels.
rails 2.2 + jruby + jetty = win
By Jan Berkel on November 27th, 2008In case you missed it, rails 2.2 recently got released, finally promising thread safety among some other things. Thread safety has always been neglected by the rails core team, the standard way to scale up in rails (pre 2.2) is to run multiple processes, which makes deployment a lot harder (I think there’re at least 10 different ways to deploy rails apps at the moment, and people still come up with new solutions: apache+fcgi, mongrel, mongrel_cluster, thin, phusion, rack…).
Why has thread safety become a priority all of a sudden? I suspect one of the drivers is JRuby, which is now a viable alternative to MRI Ruby, and which also has the nice property of mapping Ruby threads to native threads. Another factor might be the arrival of merb, the new kid on the ‘ruby web framework’ block. Merb has been designed with thread safety in mind, and is now starting to get a lot of attention (1.0 has just been released).
Now, with a thread safe rails JRuby might become the platform of choice for deploying rails apps, especially given the performance progress the JRuby team is making. Having real threads does make a huge difference, reducing the memory footprint and making better use of multi core cpus.
There’re a couple of possibilites to deploy a rails application in JRuby, glassfish seems to be the recommended choice at the moment. However glassfish is anything but easily embeddable so I tried jetty as an option. Compared to glassfish, jetty is solid and proven (version 7 will be released soon), small and easily embeddable.
I didn’t want to use warbler (no web.xml please!), instead I used a combination of JRuby-Rack + Jetty7 and tied everything together with a simple JRuby script.
server = org.mortbay.jetty.Server.new
thread_pool = org.mortbay.thread.QueuedThreadPool.new
thread_pool.min_threads = 5 # adjust as needed
thread_pool.max_threads = 50
server.set_thread_pool(thread_pool)
connector = org.mortbay.jetty.nio.SelectChannelConnector.new
connector.port = 3000
context = org.mortbay.jetty.servlet.Context.new(nil, "/",
org.mortbay.jetty.servlet.Context::NO_SESSIONS)
context.add_filter("org.jruby.rack.RackFilter", "/*",
org.mortbay.jetty.Handler::DEFAULT)
context.set_resource_base(RAILS_DIR)
context.add_event_listener(org.jruby.rack.rails.RailsServletContextListener.new)
context.set_init_params(java.util.HashMap.new(
'rails.root'=> '.', 'public.root' => 'public',
'org.mortbay.jetty.servlet.Default.relativeResourceBase' => '/public',
'jruby.max.runtimes' => '1'))
context.add_servlet(org.mortbay.jetty.servlet.ServletHolder.new(
org.mortbay.jetty.servlet.DefaultServlet.new), "/")
server.set_handler(context)
server.start
This will run jetty on port 3000, dispatching all requests for dynamic content to a single JRuby instance. It is important to set “‘jruby.max.runtimes” to 1, so it’ll create a shared application runtime for you, otherwise you’ll get the old one runtime per thread model.
On the rails side you need “config.threadsafe!” in the configuration file. Autoloading of classes will then be disabled, be sure to load all your dependencies upfront in environment.rb. We haven’t actually used this in production, but some initial tests look very promising (mongrel: 23req/s, jetty: 50 req/s). Also, deployment will be a lot easier, because static and dynamic content can be served by one single process.
Computing connected graph components via SQL
By David MacIver on November 19th, 2008Hi, I don’t post to here much. I’m one of the devs working on SONAR, focusing on mostly theme extraction.
As with many applications, SONAR’s data crunching is basically relational database driven. We keep thinking about experimenting with graph DB based approaches, but never manage to find quite a compelling enough reason - there’s no way we’d give up the relational approach entirely, so it needs to be a really big win to be worth the annoyance of having to maintain two different types of database in synch with eachother.
Unfortunately this sometimes leaves us in the unenviable position of having to do graph algorithms in SQL. This is about as much fun as you might imagine it to be. Most recent challenge: Computing the connected components of a graph in SQL.
There’s always the option of loading it into memory and doing it there of course. But the graph in question is rather large. With our little demo data sets it would be fine to do that, but any larger (e.g. on a real live sonar deployment) and this starts to sound like a really bad idea.
It turns out this is surprisingly simple to do once you have the key insight. I couldn’t find anything on the web explaining this though, so thought I’d write a post about it in case anyone else needs to do the same. It’s not rocket science, but hopefully this will save someone some time.
Consider the following setup:
create table if not exists items( id int primary key, component_id int ); create table if not exists links( first int references items (id), second int references items (id) );
We consider entries in links as undirected edges in a graph and we want to update items so that all items in the same component have the same component_id and distinct components have distinct component ids.
We’ll do this incrementally and merge. In order to do this we need a new table which we’ll use as scratch space (this should be a temporary table, but MySQL has irritating restrictions on temporary tables which make this not work):
create table if not exists components_to_merge( component1 int, component2 int);
(Side note: All SQL here is tested only on MySQL. It shouldn’t be hard to make it work on any other database though).
The idea is that at each step we’ll merge components, using components_to_merge to map components to the component they’ll be merged with.
So we start with a set of candidate components. That’s simple enough: We take each node as a potential starting component.
Now at each stage we merge components by finding links between them. For every potential component we look at all other potential components it’s connected to via some link. We insert all component-component links into components to merge. This is straightforward enough:
insert into components_to_merge
select distinct t1.component_id c1, t2.component_id c2
from links
join items t1
on links.first = t1.id
join items t2
on links.second = t2.id
where t1.component_id != t2.component_id
insert into components_to_merge
select component2, component1 from components_to_merge; -- ensure symmetricity
So, now we have a list of groups to merge in this table. If the table is empty then we’re done - all the groups are maximal (and because of the way we constructed them they’re connected - at each point they were built by joining together two connected sets which were the connected to eachother), so the component_ids currently in items describe the actual components. If not, we now reassign components:
update items
join (
select component1 source, min(component2) target
from components_to_merge
group by source
) new_components
on new_components.source = component_id
set items.component_id = least(items.component_id, target)
This step merges each component with another one (although “merging” conveys a slightly inaccurate sense of what happens. Consider a graph 1-2-3. The first step would result in 1 and 2 being assigned component_id 1, and 3 being assigned component_id 2. So the {3} component took the place of the {2} component).
What’s the complexity of this code? Well, it’s not amazing, but it’s not terrible either. The complexity of each step in the loop is probably somewhere around O(n log(n)) depending on exactly what the database does to it. The worst case number of queries is the size of the largest component: It’s obviously an upper bound as the number groups decreases by at least one each time; In order to see that it’s achieved, consider an extension of the 1-2-3 example where we have a graph 1-2-…-n. Then at each stage what happens is that we end up with components [1], [2], …, [n] -> [1, 2], [3], …, [n] -> [1, 2, 3], [4], …, [n], etc, taking n steps to terminate. On the other hand, a complete graph terminates in one step.
I suspect that the expected run time is O(log(n)), with each part of the component chosen approximately doubling in size each time, but I confess to not actually having bothered to run the maths: For our particular use case at the moment this is fine - it turns out the graph we’re considering is fairly sparse and tends to have small components, so for the moment this is more than fast enough. On the other hand, it would be nice to have a better guaranteed time, so if anyone has a smarter approach I’d love to hear it.
Anyway, here’s some sample code that ties all of this together: http://code.trampolinesystems.com/components.rb
removing global fixtures for ruby tests
By craig mcmillan on July 2nd, 2008global fixtures are evil, but we’ve got a bunch of unit tests depending on them, so we still need them around
here’s a neat [and generally fast, though a degenerate O(#tables^2) case is possible] way of deleting all fixtures without invoking db dependent ways of ignoring foreign-key constraints, and without loading all the objects into memory :
classes = ActiveRecord::Base.connection.tables.map {|t|
t.singularize.camelize.constantize rescue nil
}.compact.reject{|cls| !cls.ancestors.include?(ActiveRecord::Base)}
while classes.size > 0
classes = classes.select{|c|
begin
c.delete_all
false
rescue
true
end
}.reverse!
end
Open Visualisation Workshop at the Trampery
By Jan Berkel on May 21st, 2008Trampoline Systems is hosting a visualisation workshop this coming Saturday, 24th May, organised by the Open Knowledge Foundation. Come along if you’re interested in open source visualisation technologies (Prefuse, Flare etc.). The goal is to have a very informal setting to talk about various aspects of visualising data. Find out more in the official announcement.
Hope to see you here!
@media Ajax 2007
By Mike Stenhouse on September 5th, 2007I have the honour and terror of presenting at @media Ajax on home turf this November. It’s a privilege to be speaking alongside the likes of Brendan Eich (creator of Javascript), Douglas Crockford (inventor of JSON), John Resig (JQuery lead) and about a dozen other top dogs.
In a lineup like that I clearly can’t talk about nuts and bolts Javascript. Instead I’m taking a slightly unusual tack for me: revelations. Since Ajax came along my job has changed in ways I wouldn’t have predicted. Technically I’m a flavour of designer yet after many years of specialising I’ve found myself having to skill up again.
- To keep a handle on what the rest of the team produce I’ve become a testing fanatic;
- I’ve had to go back and relearn how to program - not to necessarily produce back-end code but to understand what the real implications of my design decisions are;
- I’ve been converted to Agile practices as a means of effective collaboration.
None of these things are traditionally within the remit of ‘design’ but they all feed into producing a successful app. To try and describe these changes and what I’ve done about them I will be presenting But I’m a Bloody Designer! on the first day, straight after the keynote by the Ajaxians.
So, the lineup’s great, it’s in London. @media Ajax: coming soon. Say hello if you decide to come…
Springy 0.3 released
By Jan Berkel on August 2nd, 2007No big changes this time, mainly compatibility fixes for JRuby 1.0. It is now also possible to build the project using Maven, for those too afraid to use rake. Documentation and code for springy are available here.
I’m also happy to announce that Craig Walls, the author of “Spring in Action”, is going to talk about Springy as part of his “Spring Cleaning: Tips for Managing XML Clutter” talk at this year’s No Fluff Just Stuff series of events as well as the Spring Experience 2007 in Florida.


