But does it have to be the clownish conservative Boris Johnson? In France they have a serious politician blogger, “Dominique Strauss Kahn”:http://www.blogdsk.net/
Archive forSeptember, 2004 | back to home
The “Watchcow”:http://www.watchcow.net/ creates an RSS feed that tells you when any product’s price changes on Amazon in the US, in the UK or in Germany.
Thanks to searchenginewatch for the link.
We Are What We Do is a book with accompanying website that offers 50 suggestions for small things you could do to help others and/or the planet in your daily life.
Something a little odder but in the same vein is “Join Me”:http://www.join-me.co.uk/ – an international movement started by a British comedian, Danny Wallace, who simply asks its members to do RAoKs (random acts of kindness) on Fridays (hence Good Fridays). You can buy his book and listen to a radio interview made with Danny in Wisconsin (of all places!) “here”:http://wpr.org/book/040328a.html. What I find truly heartening is that thanks to something Danny started as a joke over 100,000 good deeds have been inspired. I must get around to posting him a photo and signing up…
I tend to assume that for all its flaws The Economist gets its facts right – at least on technical issues. But this article on How Google Works in their technology section recently repeats a popular misconception about search. The article says, ‘Google is thought to have several complete copies of the web distributed across servers in California and Virginia’ – whatever they do have it is nothing close to a complete copy of the web. Even if they had a complete index of the text of the first 100Kb of each page on the publicly spidered web (the most they would even claim) this would still miss the huge volume of available information that is stored in web-accessible databases (like the “British Telecom phone book”:http://www2.bt.com/edq_busnamesearch).
I believe that a search engine that managed to do a good job of searching this ‘invisible web’ alongside the ‘surface web’ would have a good shot at the number one spot.
P.S. While on the subject of search, here’s a tip – to get a (small) discount on your next Amazon purchase, check out their new A9 search engine.
Search engine guru “Greg Notess”:http://notess.com/ has produced a Search Engine Overview featuring in-depth “search feature comparisons”:http://searchengineshowdown.com/features/ and frequently updated reviews of individual services. It covers directories and news search engines as well as the major search engines and ones that are now defunct. Learn all the advanced features of each search engine without having to click around the ‘advanced search help’ pages.
He doesn’t yet include “jux2”:http://jux2.com/index.php which lets you search any two of the major search engines simeultaneously or most of the other metasearch sites. Mind you none of the other metasearch sites I just checked seemed to correctly deliver all Yahoo, Google and Teoma results. “Dogpile”:http://www.dogpile.com/ for example finds Yahoo and Google at once but seems to truncate the results and didn’t find any Ask or Teoma results for my name although I know they are there.
If you are thinking about analysing group behaviour by looking at links (particularly web-mediated group behaviour), you must check out the imaginatively-named Link Analysis by “Mike Thelwall”:http://www.scit.wlv.ac.uk/~cm1993/ which contains lots of relevant links from the book he is writing on the subject. Also check out “SocSciBot”:http://socscibot.wlv.ac.uk/ a free Windows link crawler created by his group for social scientists to use. It’s nice to see a fellow academic being so generous in sharing his resources with others.
Search Engine Watch publishes a good roundup of the latest coverage of flaws and bias in the way Google News’s automated news gathering works in practice. They link to a New Scientist article revealing “Google China has suppressed links to ‘forbidden’ news”:http://www.newscientist.com/news/news.jsp?id=ns99996426 on the grounds that:
“In order to create the best possible news search experience for our users, we sometimes decide not to include some sites, for a variety of reasons. These sources were not included because their sites are inaccessible.”
. It’s an explanation but not really a justification…
For those who are interested – the mystery person in charge of the “Atrios”:http://www.atrios.blogspot.com/ weblog (a leading left wing political weblog in the US) has been revealed as Duncan Black who works at “Media Matters”:http://mediamatters.org/, the new David Brock media watchdog group.
Turns out I’m probably not many degrees of separation from him as according to his profile, “he has held teaching and research positions at the London School of Economics”.
It seems odd that people on the right like “Instapundit”:http://instapundit.com/archives/016837.php are trying to make something about the fact he gets paid to do the same work that he does on his blog as his day job. It’s not as if Atrios claims to be non-partisan…
Update I learned about this at AoIR and assumed it was fresh news as it was news to me, but it seems the news has been around since late July. Shows how easy it is to miss even ‘big news’ in the blogosphere, even though your RSS reader has “126 feeds”:http://www.bloglines.com/public/derb/ if those feeds are not covering that topic. Shades of “Cass Sunstein”:http://bostonreview.mit.edu/BR26.3/sunstein.html…
There are good free applications for most tasks available for Windows (there’s a good “directory of Windows free and open source software”:http://www.jairlie.com/oss/). The bad news is that in the case of OCR the Windows options I have found are pretty poor. SimpleOCR is the best of a bad lot – it is free but doesn’t work too well- at least it didn’t on the page I tried it on. There’s also a GNU option called “GOCR”:http://sourceforge.net/projects/jocr/ but I didn’t try it as it appears to be a DOS program with a text-only interface and I am skeptical that a half-Mb application could really do much!
Fortunately, there is another option. You can get the “US National Library of Medicine”:http://www.nlm.nih.gov/ to do the work for you. They have put an “experimental application online”:http://docmorph.nlm.nih.gov/docmorph/default.htm which allows you to upload files of a variety of different formats to their server and get back PDF, TIFF, text, or synthesized speech. This can be slow since a typical A3 scan of two pages at 300dpi is around 3Mb which takes a while to upload but may be a good alternative if you have no other way to get OCR done. They have a ‘MyMorph’ application which automates the upload and conversion process for multiple files but it only converts them to Adobe Acrobat files and does not OCR the text.
If anyone knows of an OCR program that is available as non-time-limited shareware or freeware and works reasonably well under Windows please let me know. “ABBY FineReader 7”:http://buy.abbyy.com/content/frpro/default.aspx works quite well I found but costs 81 pounds to buy and after a 15 day trial period it no longer lets you use it unless you buy.
P.S. I’m still at AoIR but I haven’t had time to craft a blog entry so this is one I did earlier. Pics etc will probably have to wait until Thursday.
The first day of the “AoIR conference”:http://www.aoir.org/2004/ didn’t start until the afternoon but already I’ve met several stimulating people and am really looking forward to the next few days. It’s so nice to be surrounded by smart people who care about the social implications of the Internet and think in academic terms. The LSE has a fair number of these as well of course but it’s nice to meet new faces to bounce new ideas off of and to meet face to face the people whose work I have admired.
Today’s keynote speaker was “Ted Nelson”:http://xanadu.com.au/ted/, who certainly dreams big dreams (but maybe tries to dream too many at once)! I had hoped to give you a picture of him in full flow but discovered that my camera’s batteries are flat. Maybe tomorrow…