Weblog on the Internet and public policy, journalism, virtual community, and more from David Brake, a Canadian academic, consultant and journalist

Archive for the 'Useful web resources' Category | back to home

19 October 2004

At last someone has produced a free-to-download User Guide to Using the Linux Desktop (there may be others but this is the first general purpose one I’ve heard about). You might also check out “the O’Reilly site”:http://linux.oreilly.com/ for a few free chapters from some of their many Linux books or take a look at “Learning Debian/GNU Linux “:http://www.oreilly.com/catalog/debian/chapter/book/index.html which is completely free – one of O’Reilly’s “Open Books”:http://www.oreilly.com/openbook/.

Thanks to “Slashdot”:http://linux.slashdot.org/linux/04/08/22/1955204.shtml for the link

15 October 2004

“Google Desktop”:http://www.desktop.google.com/index.html has arrived and two more desktop searching products are on their way. AOL is reportedly developing AOL Desktop Search and a new search engine, “Exalead”:http://beta.exalead.com/search also plans a “desktop search product”:http://beta.exalead.com/search/C=0MlQAMAA%3d/2p=5.

BBC World’s Click Online just did a “short report”:http://www.bbcworld.com/content/template_clickonline.asp?pageid=666&co_pageid=2 about hard disk indexing programs which covers some of the same ground as I have done earlier (for example “here”:https://blog.org/archives/cat_search_engines.html#001238).

14 October 2004

The Wordcount site is an interesting art project and trivia goldmine in one. Did you know that ‘internet’ is the 30525th most used word in the written English language? On the other hand it is the 66th most popular word searched for on Wordcount according to its companion site, “Querycount”:http://www.wordcount.org/querycount.php

Thanks to Yahoo’s “Pick of the Week”:http://picks.yahoo.com/picks/ feature for the tip.

P.S. I just noticed that the right hand column of my weblog sometimes gets shoved to the bottom of the page when using Internet Explorer (though it displays properly in Mozilla). Can anyone suggest why?

11 October 2004

In the interests of better understanding of Arabs by the West, the (American) “National Institute for Technology and Liberal Education”:http://www.nitle.org/ has produced a useful overview of Arab culture – the Arab World Project. Of course I can’t say much about its accuracy but it seems fair. I would be interested to hear if anyone who knows Arab culture well finds the site lacking.

7 October 2004

The “Diskmeta”:http://diskmeta.com/ search engine ‘works on all Windows platforms (98 or higher)’ and ‘ is fast, intuitive and unfussy. You can also view the raw text in a special preview window but doesn’t have a preview facility like X1, dtSearch or the new Copernic Desktop Search’. Unlike some other desktop search engines it supports a variety of “boolean operators”:http://diskmeta.com/en/doc/request.asp.

The free version (for non-commercial use) only indexes txt, .doc and .html however – for indexing PDFs you need to pay, and Diskmeta doesn’t index Outlook email.

Thanks to Jeremy Wagstaff for the heads up.

Also see “here”:https://blog.org/archives/cat_search_engines.html#001230 and “here”:https://blog.org/archives/cat_search_engines.html#001202 for earlier coverage of hard disk searching programs.

29 September 2004
Filed under:E-commerce,Useful web resources,Weblogs at12:42 pm

The “Watchcow”:http://www.watchcow.net/ creates an RSS feed that tells you when any product’s price changes on Amazon in the US, in the UK or in Germany.

Thanks to searchenginewatch for the link.

27 September 2004

I tend to assume that for all its flaws The Economist gets its facts right – at least on technical issues. But this article on How Google Works in their technology section recently repeats a popular misconception about search. The article says, ‘Google is thought to have several complete copies of the web distributed across servers in California and Virginia’ – whatever they do have it is nothing close to a complete copy of the web. Even if they had a complete index of the text of the first 100Kb of each page on the publicly spidered web (the most they would even claim) this would still miss the huge volume of available information that is stored in web-accessible databases (like the “British Telecom phone book”:http://www2.bt.com/edq_busnamesearch).

I believe that a search engine that managed to do a good job of searching this ‘invisible web’ alongside the ‘surface web’ would have a good shot at the number one spot.

P.S. While on the subject of search, here’s a tip – to get a (small) discount on your next Amazon purchase, check out their new A9 search engine.

26 September 2004
Filed under:Search Engines,Useful web resources at10:24 am

Search engine guru “Greg Notess”:http://notess.com/ has produced a Search Engine Overview featuring in-depth “search feature comparisons”:http://searchengineshowdown.com/features/ and frequently updated reviews of individual services. It covers directories and news search engines as well as the major search engines and ones that are now defunct. Learn all the advanced features of each search engine without having to click around the ‘advanced search help’ pages.

He doesn’t yet include “jux2”:http://jux2.com/index.php which lets you search any two of the major search engines simeultaneously or most of the other metasearch sites. Mind you none of the other metasearch sites I just checked seemed to correctly deliver all Yahoo, Google and Teoma results. “Dogpile”:http://www.dogpile.com/ for example finds Yahoo and Google at once but seems to truncate the results and didn’t find any Ask or Teoma results for my name although I know they are there.

25 September 2004

If you are thinking about analysing group behaviour by looking at links (particularly web-mediated group behaviour), you must check out the imaginatively-named Link Analysis by “Mike Thelwall”:http://www.scit.wlv.ac.uk/~cm1993/ which contains lots of relevant links from the book he is writing on the subject. Also check out “SocSciBot”:http://socscibot.wlv.ac.uk/ a free Windows link crawler created by his group for social scientists to use. It’s nice to see a fellow academic being so generous in sharing his resources with others.

20 September 2004

There are good free applications for most tasks available for Windows (there’s a good “directory of Windows free and open source software”:http://www.jairlie.com/oss/). The bad news is that in the case of OCR the Windows options I have found are pretty poor. SimpleOCR is the best of a bad lot – it is free but doesn’t work too well- at least it didn’t on the page I tried it on. There’s also a GNU option called “GOCR”:http://sourceforge.net/projects/jocr/ but I didn’t try it as it appears to be a DOS program with a text-only interface and I am skeptical that a half-Mb application could really do much!

Fortunately, there is another option. You can get the “US National Library of Medicine”:http://www.nlm.nih.gov/ to do the work for you. They have put an “experimental application online”:http://docmorph.nlm.nih.gov/docmorph/default.htm which allows you to upload files of a variety of different formats to their server and get back PDF, TIFF, text, or synthesized speech. This can be slow since a typical A3 scan of two pages at 300dpi is around 3Mb which takes a while to upload but may be a good alternative if you have no other way to get OCR done. They have a ‘MyMorph’ application which automates the upload and conversion process for multiple files but it only converts them to Adobe Acrobat files and does not OCR the text.

If anyone knows of an OCR program that is available as non-time-limited shareware or freeware and works reasonably well under Windows please let me know. “ABBY FineReader 7”:http://buy.abbyy.com/content/frpro/default.aspx works quite well I found but costs 81 pounds to buy and after a 15 day trial period it no longer lets you use it unless you buy.

P.S. I’m still at AoIR but I haven’t had time to craft a blog entry so this is one I did earlier. Pics etc will probably have to wait until Thursday.

? Previous PageNext Page ?