Blog | Archive for August, 2009
Open source term extraction
By david | Monday, August 17th, 2009
This is just a quick announcement to let people know that we’ve open sourced our JRuby library for term extraction. You can get the code from my github page.Unlike a lot of term extraction libraries, this doesn’t take any stance as to the “significance” of the terms it extracts. It’s purely about looking at the syntax and determining where good boundaries for terms are. There are a couple reasons for this, but basically we’ve found that it’s more effective to separate the two steps and makes it easier to tinker around with them independently. The criteria for “interestingness” of terms seem to be largely distinct from those for terms which simply make sense linguistically. So we have a two stage pipeline, one which extracts semantically meaningful terms and one which determines what terms are actually interesting in the context of the document. The second step is much more complicated, and we’re not open sourcing that (yet? probably not any time soon, if ever. Even if we wanted to, it relies on a lot more global information across the document corpus and so is very tied in with how SONAR operates, making it much harder to isolate).
So, how does it work? Black magic and voodoo!
Actually, no. It’s pretty straightforward. It builds on top of the excellent OpenNLP library, using its tools for part of speech tagging, sentence splitting (a much harder problem than you’d imagine) and phrase chunking. It’s currently a rules based system on top of there, as while you’re figuring things out it makes much more sense to stick with something so easily fine tunable. Our expectation is that we’ll gradually start replacing bits of it with machine learning based techniques as we start to hit the limitations of a rules based system, but for now it’s working pretty well.
Let’s have an example. If we feed the second paragraph of this post into the term extractor, we get the following terms back:
term extraction libraries stance terms syntax good boundaries couple reasons two steps steps criteria interestingness sense two stage pipeline stage pipeline semantically meaningful terms context context of the document document second step open sourcing time document corpus SONAR
Hope you find this useful. Let us know if you build anything cool with it!
Posted in Coding | 2 Comments »
Techcrunch article on Trampoline crowdfunding
By Charles Armstrong | Wednesday, August 12th, 2009
Yesterday night, as the Perseid meteorites streaked through Earth’s atmosphere, Techcrunch.com published a provocative article on Trampoline’s decision to “jump the VC ship” and raise finance via crowdfunding. Techcrunch has established itself as the journal of record for tech startups and the venture capital industry so this was bound to create a bit of a stir.In the immediate aftermath of the Techcrunch article Trampoline’s crowdfunding website was visited by 1,500 people from 70 countries around the world and 200 people posted messages to their networks via Twitter. Many of the comments came from entrepreneurs applauding the example Trampoline is providing of an alternative to VC.
Posted in Crowdfunding, Media | No Comments »
Sunday Telegraph feature on Trampoline Crowdfunding
By Charles Armstrong | Sunday, August 9th, 2009
Today’s Sunday Telegraph includes a feature on Trampoline’s Crowdfunding initiative written by the paper’s Enterprise Editor, Richard Tyler. The article discusses how conventional venture capital financing can lead businesses to raise progressively larger sums of money regardless of whether that’s what they actually need. As Trampoline proceeds down the crowdfunding path it will be interesting to see if there are other areas of strategy where we begin to innovate having previously followed the norms of the venture capital industry.Posted in Crowdfunding, Media | No Comments »
Financial Times on Trampoline & alternative finance
By Charles Armstrong | Saturday, August 8th, 2009
Hot on the heels of last Wednesday’s feature, today’s Financial Times includes an article discussing Trampoline’s crowdfunding initiative as an example of alternative sources of finance being sought by SMEs.Posted in Crowdfunding, Media | No Comments »








