<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Computing connected graph components via SQL</title>
	<atom:link href="http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/</link>
	<description>Ideas, thoughts and observations from Trampoline's technical brains</description>
	<pubDate>Fri, 12 Mar 2010 23:37:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: online stock trading advice</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-656</link>
		<dc:creator>online stock trading advice</dc:creator>
		<pubDate>Mon, 11 Jan 2010 04:36:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-656</guid>
		<description>Hey, I found your blog  while searching on Google your post looks very interesting for me. I will add a backlink and bookmark your site. Keep up the good work!


I'm Out!  :)</description>
		<content:encoded><![CDATA[<p>Hey, I found your blog  while searching on Google your post looks very interesting for me. I will add a backlink and bookmark your site. Keep up the good work!</p>
<p>I&#8217;m Out!  :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-498</link>
		<dc:creator>david</dc:creator>
		<pubDate>Mon, 24 Nov 2008 21:36:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-498</guid>
		<description>Well, there are graph databases. We're just not using one for various reasons. :-) (Although most of the existing graph databases don't look like they'd make operations like "find all connected components" particularly easy)</description>
		<content:encoded><![CDATA[<p>Well, there are graph databases. We&#8217;re just not using one for various reasons. :-) (Although most of the existing graph databases don&#8217;t look like they&#8217;d make operations like &#8220;find all connected components&#8221; particularly easy)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jherber</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-497</link>
		<dc:creator>jherber</dc:creator>
		<pubDate>Mon, 24 Nov 2008 18:30:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-497</guid>
		<description>sorry - explain and example read the same in haste ;) 

i foresee ACID'able subgraphs in our 64bit, large memory future, or at the very least a specific case of stm for graphs.</description>
		<content:encoded><![CDATA[<p>sorry - explain and example read the same in haste ;) </p>
<p>i foresee ACID&#8217;able subgraphs in our 64bit, large memory future, or at the very least a specific case of stm for graphs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-496</link>
		<dc:creator>david</dc:creator>
		<pubDate>Mon, 24 Nov 2008 13:42:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-496</guid>
		<description>Yes, I know it will. That's why I said I'd have to take a look at the explain again. :-) I just wasn't in front of a database at the time. And explain confirms that it uses the index rather than a table scan.

We could probably keep the graph in memory but I'm not really convinced it's a good idea, particularly given that we can run this as a once a day process. Maybe at some point it will become a better idea, but for now this seems preferable.</description>
		<content:encoded><![CDATA[<p>Yes, I know it will. That&#8217;s why I said I&#8217;d have to take a look at the explain again. :-) I just wasn&#8217;t in front of a database at the time. And explain confirms that it uses the index rather than a table scan.</p>
<p>We could probably keep the graph in memory but I&#8217;m not really convinced it&#8217;s a good idea, particularly given that we can run this as a once a day process. Maybe at some point it will become a better idea, but for now this seems preferable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jherber</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-495</link>
		<dc:creator>jherber</dc:creator>
		<pubDate>Mon, 24 Nov 2008 13:34:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-495</guid>
		<description>yah, that's a fairly large set of object to represent in ruby - especially if you are on a fixed slice.  i'm pretty sure the linked-in guys keep their graph in memory, of course they are on the jvm.  "explain" command will tell you how mysql plans to process that query.  http://dev.mysql.com/doc/refman/5.0/en/explain.html</description>
		<content:encoded><![CDATA[<p>yah, that&#8217;s a fairly large set of object to represent in ruby - especially if you are on a fixed slice.  i&#8217;m pretty sure the linked-in guys keep their graph in memory, of course they are on the jvm.  &#8220;explain&#8221; command will tell you how mysql plans to process that query.  <a href="http://dev.mysql.com/doc/refman/5.0/en/explain.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/explain.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-493</link>
		<dc:creator>david</dc:creator>
		<pubDate>Mon, 24 Nov 2008 09:34:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-493</guid>
		<description>Well the problem isn't the performance of the in memory solution so much as the fact that it's in memory. The graph in question is really on the largish side (I mean sure, it's only a few tens of thousands of nodes at the maximum in the data sets I'm running with at the moment, but that will easily bump up an order of magnitude or three in a production system). Loading all of that into memory is asking for trouble.

I don't think the select min(component2) creates a full table scan as long as component2 is sensibly indexed. I'd have to take a look at the explain to be sure though. I didn't talk about indices in this article, but the example code I linked to should have the right ones.</description>
		<content:encoded><![CDATA[<p>Well the problem isn&#8217;t the performance of the in memory solution so much as the fact that it&#8217;s in memory. The graph in question is really on the largish side (I mean sure, it&#8217;s only a few tens of thousands of nodes at the maximum in the data sets I&#8217;m running with at the moment, but that will easily bump up an order of magnitude or three in a production system). Loading all of that into memory is asking for trouble.</p>
<p>I don&#8217;t think the select min(component2) creates a full table scan as long as component2 is sensibly indexed. I&#8217;d have to take a look at the explain to be sure though. I didn&#8217;t talk about indices in this article, but the example code I linked to should have the right ones.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jherber</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-492</link>
		<dc:creator>jherber</dc:creator>
		<pubDate>Mon, 24 Nov 2008 05:47:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-492</guid>
		<description>the underlying concern of storing the graph and the merge results (ie. the communicating groups) in the database can still be satisfied while performing the actual merge (transitive entailment) outside the db with an in-memory solution.  it should be blazing fast ( groups * log (&lt;n) ) too.  btw, doesn't your select on min(component2) create a table scan?</description>
		<content:encoded><![CDATA[<p>the underlying concern of storing the graph and the merge results (ie. the communicating groups) in the database can still be satisfied while performing the actual merge (transitive entailment) outside the db with an in-memory solution.  it should be blazing fast ( groups * log (&lt;n) ) too.  btw, doesn&#8217;t your select on min(component2) create a table scan?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-490</link>
		<dc:creator>david</dc:creator>
		<pubDate>Sun, 23 Nov 2008 16:05:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-490</guid>
		<description>No, the idea is that we want to find communicating groups.

E.g. suppose A talks to B and B talks to C, but none of them talk to D, even though D talks to E (this isn't actually how we use this, but it illustrates the point). We want to find the two groups

{A, B, C} and {D, E}. 

i.e. despite the fact that A never talks to C directly, C is part of the group of people with which A communicates. However there are no communications reaching from A to D or E, so D and E are in a different group.

What this algorithm does is it assigns a unique number to each person such that people with the same number are in the same group of communications.</description>
		<content:encoded><![CDATA[<p>No, the idea is that we want to find communicating groups.</p>
<p>E.g. suppose A talks to B and B talks to C, but none of them talk to D, even though D talks to E (this isn&#8217;t actually how we use this, but it illustrates the point). We want to find the two groups</p>
<p>{A, B, C} and {D, E}. </p>
<p>i.e. despite the fact that A never talks to C directly, C is part of the group of people with which A communicates. However there are no communications reaching from A to D or E, so D and E are in a different group.</p>
<p>What this algorithm does is it assigns a unique number to each person such that people with the same number are in the same group of communications.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slow</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-489</link>
		<dc:creator>Slow</dc:creator>
		<pubDate>Fri, 21 Nov 2008 19:22:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-489</guid>
		<description>I am a bit slow and am not 100% sure of the goal. Is the idea to consider all nodes with the same external connections (i.e. to everyone but each other) as the same node?</description>
		<content:encoded><![CDATA[<p>I am a bit slow and am not 100% sure of the goal. Is the idea to consider all nodes with the same external connections (i.e. to everyone but each other) as the same node?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david</title>
		<link>http://www.trampolinesystems.com/blog/machines/2008/11/19/computing-connected-graph-components-via-sql/comment-page-1/#comment-488</link>
		<dc:creator>david</dc:creator>
		<pubDate>Thu, 20 Nov 2008 08:34:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.trampolinesystems.com/blog/machines/?p=29#comment-488</guid>
		<description>Thanks Phillip. I'm aware of the book and keep meaning to read it, but I don't think it's actually relevant to this particular problem: It focuses primarily on directed acyclic graphs, whileas this is an undirected cyclic one.</description>
		<content:encoded><![CDATA[<p>Thanks Phillip. I&#8217;m aware of the book and keep meaning to read it, but I don&#8217;t think it&#8217;s actually relevant to this particular problem: It focuses primarily on directed acyclic graphs, whileas this is an undirected cyclic one.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
