Blog | Archive for August, 2009

Open source term extraction

By david | Monday, August 17th, 2009

This is just a quick announcement to let people know that we’ve open sourced our JRuby library for term extraction. You can get the code from my github page.

Unlike a lot of term extraction libraries, this doesn’t take any stance as to the “significance” of the terms it extracts. It’s purely about looking at the syntax and determining where good boundaries for terms are. There are a couple reasons for this, but basically we’ve found that it’s more effective to separate the two steps and makes it easier to tinker around with them independently. The criteria for “interestingness” of terms seem to be largely distinct from those for terms which simply make sense linguistically. So we have a two stage pipeline, one which extracts semantically meaningful terms and one which determines what terms are actually interesting in the context of the document. The second step is much more complicated, and we’re not open sourcing that (yet? probably not any time soon, if ever. Even if we wanted to, it relies on a lot more global information across the document corpus and so is very tied in with how SONAR operates, making it much harder to isolate).

So, how does it work? Black magic and voodoo!

Actually, no. It’s pretty straightforward. It builds on top of the excellent OpenNLP library, using its tools for part of speech tagging, sentence splitting (a much harder problem than you’d imagine) and phrase chunking. It’s currently a rules based system on top of there, as while you’re figuring things out it makes much more sense to stick with something so easily fine tunable. Our expectation is that we’ll gradually start replacing bits of it with machine learning based techniques as we start to hit the limitations of a rules based system, but for now it’s working pretty well.

Let’s have an example. If we feed the second paragraph of this post into the term extractor, we get the following terms back:

term extraction libraries
stance
terms
syntax
good boundaries
couple reasons
two steps
steps
criteria
interestingness
sense
two stage pipeline
stage pipeline
semantically meaningful terms
context
context of the document
document
second step
open sourcing
time
document corpus
SONAR

Hope you find this useful. Let us know if you build anything cool with it!


Techcrunch article on Trampoline crowdfunding

By Charles Armstrong | Wednesday, August 12th, 2009

Yesterday night, as the Perseid meteorites streaked through Earth’s atmosphere, Techcrunch.com published a provocative article on Trampoline’s decision to “jump the VC ship” and raise finance via crowdfunding. Techcrunch has established itself as the journal of record for tech startups and the venture capital industry so this was bound to create a bit of a stir.

In the immediate aftermath of the Techcrunch article Trampoline’s crowdfunding website was visited by 1,500 people from 70 countries around the world and 200 people posted messages to their networks via Twitter. Many of the comments came from entrepreneurs applauding the example Trampoline is providing of an alternative to VC.


Sunday Telegraph feature on Trampoline Crowdfunding

By Charles Armstrong | Sunday, August 9th, 2009

Today’s Sunday Telegraph includes a feature on Trampoline’s Crowdfunding initiative written by the paper’s Enterprise Editor, Richard Tyler. The article discusses how conventional venture capital financing can lead businesses to raise progressively larger sums of money regardless of whether that’s what they actually need. As Trampoline proceeds down the crowdfunding path it will be interesting to see if there are other areas of strategy where we begin to innovate having previously followed the norms of the venture capital industry.


Financial Times on Trampoline & alternative finance

By Charles Armstrong | Saturday, August 8th, 2009

Hot on the heels of last Wednesday’s feature, today’s Financial Times includes an article discussing Trampoline’s crowdfunding initiative as an example of alternative sources of finance being sought by SMEs.


New York Office
234 5th Avenue, 4th Floor
New York, NY
10001, USA

London Office
The Trampery
8-15 Dereham Place
London EC2A 3HJ
United Kingdom