Like any self-respecting academic, my hard disk is now full of journal articles and other files in PDF form (as well as Word files etc). Microsoft provides a rather rudimentary ‘search for text in your files’ option in Windows 2000 – what I use – under ‘search’ on the start menu but it doesn’t index Acrobat files as standard and I have been unable to get the (well-hidden) downloadable Adobe patch to work.
As it happens I was putting together a course on “Internet Search Techniques”:http://www.nmk.co.uk/search/view.cfm?ItemID=4926 (now finished but I’ll do it again for any organization that wants it) so I had an excuse to do some investigation. Thanks to Jeremy Wagstaff’s excellent technology weblog, “Loose Wire”:http://loosewire.blogspot.com/ (and “this posting”:http://loosewire.blogspot.com/archives/2003_07_10_loosewire_archive.html in particular) I tried out:
- “DTSearch”:http://www.dtsearch.com/ (sophisticated, complex, expensive – aimed at corporate networks)
- “Enfish”:http://www.enfish.com/ (seemed to suck up my computer’s resources and slow it down – not as powerful and still quite clunky $50+)
- “X1”:http://www.x1.com/ – fast and easy to use (though it doesn’t yet support phrase search). They were going to offer a free version, which put it at the top of my list but now it costs $50. You can download a free trial if you want to try it for yourself. Its ability to highlight where in a gven file your search terms appear is very handy. It indexes email (several types including Eudora) and email attachments as well as documents. [obdisclaimer: they kindly gave me a full unlimited license to try it out]
- “SearchWithin”:http://www.searchwithin.com/ – its interface is pretty rudimentary (see below) and in the course of its installation several weird Visual Basic-related error messages happened (though having ignored them there seem to be no ill effects). The search results page is also basic – it doesn’t sort what’s found by relevance and it just gives you the first few words of the document rather than showing you where in the document your search terms appear. However it does have a big advantage over the others – it’s free (it’s ad supported – popping up sponsor ads in your web browser when you launch it and every so often when you use it). It also handles more powerful boolean search queries than X1.
SearchWithin’s rather basic interface - “Acrobat 6”:http://www.adobe.com/products/acrobat/readermain.html – The latest version of Acrobat Reader has a way to search across multiple Acrobat files built in – but it is slow, and if you are not sure whether the document you want is a Word file or an Acrobat one you’d have to search twice.
So it looks like I will stick to using X1 for the moment – but I can’t help thinking Google or some other search engine provider should really put out something free and more professional (Compaq’s Altavista had a primitive product back in the late ’90s you could download).
I believe Longhorn will provide a solution.
I use X1 as well, but it still needs some work. Right now, I am looking at Vivisimo and wondering if “managed clusters” would be the way to go for personal and professional data. Who knows?
http://vivisimo.com/
Jason
Comment by Jason Newcomb — 11 January 2004 @ 6:18 pm
You’re a patient man, Jason – Longhorn seems likely to be a while in coming (and I wouldn’t want to switch to it for at least a few months while they get the bugs out anyway). And meanwhile I know there are some corporate solutions available to data sharing but precious few individual ones. I looked around Vivisimo’s site and couldn’t find any software for personal use I could just download and try.
Comment by David Brake — 11 January 2004 @ 6:22 pm
You could give 80-20 Retriever a try … free download from http://www.80-20.com/products/retriever.asp
It won PC-Mag’s Editors choice in a shootout with the products you have already tried.
Cheers
David.
Comment by David Gillespie — 18 February 2004 @ 12:48 am