Weblog on the Internet and public policy, journalism, virtual community, and more from David Brake, a Canadian academic, consultant and journalist

Archive forDecember 4th, 2003 | back to home

4 December 2003

Like any self-respecting academic, my hard disk is now full of journal articles and other files in PDF form (as well as Word files etc). Microsoft provides a rather rudimentary ‘search for text in your files’ option in Windows 2000 – what I use – under ‘search’ on the start menu but it doesn’t index Acrobat files as standard and I have been unable to get the (well-hidden) downloadable Adobe patch to work.

As it happens I was putting together a course on “Internet Search Techniques”:http://www.nmk.co.uk/search/view.cfm?ItemID=4926 (now finished but I’ll do it again for any organization that wants it) so I had an excuse to do some investigation. Thanks to Jeremy Wagstaff’s excellent technology weblog, “Loose Wire”:http://loosewire.blogspot.com/ (and “this posting”:http://loosewire.blogspot.com/archives/2003_07_10_loosewire_archive.html in particular) I tried out:

  • “DTSearch”:http://www.dtsearch.com/ (sophisticated, complex, expensive – aimed at corporate networks)
  • “Enfish”:http://www.enfish.com/ (seemed to suck up my computer’s resources and slow it down – not as powerful and still quite clunky $50+)
  • “X1”:http://www.x1.com/ – fast and easy to use (though it doesn’t yet support phrase search). They were going to offer a free version, which put it at the top of my list but now it costs $50. You can download a free trial if you want to try it for yourself. Its ability to highlight where in a gven file your search terms appear is very handy. It indexes email (several types including Eudora) and email attachments as well as documents. [obdisclaimer: they kindly gave me a full unlimited license to try it out]
  • “SearchWithin”:http://www.searchwithin.com/ – its interface is pretty rudimentary (see below) and in the course of its installation several weird Visual Basic-related error messages happened (though having ignored them there seem to be no ill effects). The search results page is also basic – it doesn’t sort what’s found by relevance and it just gives you the first few words of the document rather than showing you where in the document your search terms appear. However it does have a big advantage over the others – it’s free (it’s ad supported – popping up sponsor ads in your web browser when you launch it and every so often when you use it). It also handles more powerful boolean search queries than X1.
    searchwithin.gif
    SearchWithin’s rather basic interface

  • “Acrobat 6”:http://www.adobe.com/products/acrobat/readermain.html – The latest version of Acrobat Reader has a way to search across multiple Acrobat files built in – but it is slow, and if you are not sure whether the document you want is a Word file or an Acrobat one you’d have to search twice.

So it looks like I will stick to using X1 for the moment – but I can’t help thinking Google or some other search engine provider should really put out something free and more professional (Compaq’s Altavista had a primitive product back in the late ’90s you could download).