Weblog on the Internet and public policy, journalism, virtual community, and more from David Brake, a Canadian academic, consultant and journalist

Archive for the 'Search Engines' Category | back to home

4 December 2003

Like any self-respecting academic, my hard disk is now full of journal articles and other files in PDF form (as well as Word files etc). Microsoft provides a rather rudimentary ‘search for text in your files’ option in Windows 2000 – what I use – under ‘search’ on the start menu but it doesn’t index Acrobat files as standard and I have been unable to get the (well-hidden) downloadable Adobe patch to work.

As it happens I was putting together a course on “Internet Search Techniques”:http://www.nmk.co.uk/search/view.cfm?ItemID=4926 (now finished but I’ll do it again for any organization that wants it) so I had an excuse to do some investigation. Thanks to Jeremy Wagstaff’s excellent technology weblog, “Loose Wire”:http://loosewire.blogspot.com/ (and “this posting”:http://loosewire.blogspot.com/archives/2003_07_10_loosewire_archive.html in particular) I tried out:

  • “DTSearch”:http://www.dtsearch.com/ (sophisticated, complex, expensive – aimed at corporate networks)
  • “Enfish”:http://www.enfish.com/ (seemed to suck up my computer’s resources and slow it down – not as powerful and still quite clunky $50+)
  • “X1”:http://www.x1.com/ – fast and easy to use (though it doesn’t yet support phrase search). They were going to offer a free version, which put it at the top of my list but now it costs $50. You can download a free trial if you want to try it for yourself. Its ability to highlight where in a gven file your search terms appear is very handy. It indexes email (several types including Eudora) and email attachments as well as documents. [obdisclaimer: they kindly gave me a full unlimited license to try it out]
  • “SearchWithin”:http://www.searchwithin.com/ – its interface is pretty rudimentary (see below) and in the course of its installation several weird Visual Basic-related error messages happened (though having ignored them there seem to be no ill effects). The search results page is also basic – it doesn’t sort what’s found by relevance and it just gives you the first few words of the document rather than showing you where in the document your search terms appear. However it does have a big advantage over the others – it’s free (it’s ad supported – popping up sponsor ads in your web browser when you launch it and every so often when you use it). It also handles more powerful boolean search queries than X1.
    searchwithin.gif
    SearchWithin’s rather basic interface

  • “Acrobat 6”:http://www.adobe.com/products/acrobat/readermain.html – The latest version of Acrobat Reader has a way to search across multiple Acrobat files built in – but it is slow, and if you are not sure whether the document you want is a Word file or an Acrobat one you’d have to search twice.

So it looks like I will stick to using X1 for the moment – but I can’t help thinking Google or some other search engine provider should really put out something free and more professional (Compaq’s Altavista had a primitive product back in the late ’90s you could download).

3 December 2003

Perhaps inevitably, the attempts by people like “NZ Bear” to rank weblogs by popularity have spurred some to try to ‘game’ the system and get to the top of the list. This practice has spurred some discussion by Clay Shirky (an A-list blogger) and others. As Shirky points out if you get into ‘A List’ rankings you will probably get more curious readers and your ranking may perpetuate itself.
(more…)

2 December 2003

Visual Poetry is an entertaining use of Google’s image search engine. Enter a phrase and it will return pictures based on what Google associates with each word. Try it for yourself, and when you’re done why not try “a musical equivalent”:https://blog.org/archives/cat_humour_entertainment.html#000939 I found earlier?

18 November 2003
Filed under:Search Engines,Weblogs at3:30 pm

It has been suggested that because weblogs are highly linked to one another, weblog postings are likely to “dominate Google search results”:http://www.robertkbrown.com/2002/07/16/blogging_killed_the_google_star.html In July “Microdoc News”:http://microdoc-news.info/ decided to test this and found that for a selection of typical searches weblogs seemed to have little effect. What this didn’t test, however, was whether weblogs dominated subject areas webloggers were writing about – after all, the discourse of webloggers tends to be concentrated in certain specific areas. I imagine if you searched for the stuff the most prolific webloggers tend to publish about – US politics, for example, or computing – you might still find a lot of weblog entries. Then again, why shouldn’t you?

9 November 2003

As most of you will know by now, Amazon has started enabling people to search for text within 120,000 of its titles and view selected pages from the books – a feature that has inspired some interesting thoughts about where search could go next.

Steven Johnson in Slate suggests you should be able to tell Amazon which books you own and do a search just on those – it would get info on what you have already which it can use to sell you new books and you would get a search engine covering your paper library.

“Gary Wolf in Wired”:http://www.wired.com/news/print/0,1294,60948,00.html uses the news of the new service to delve into the politics of copyright protection and puts the service into context with attempts to publish out of copyright works for free on the web like Project Gutenberg and on-demand book publishing.

Amazon in an attempt to calm nervous publishers “has announced”:http://www.internetnews.com/IAR/article.php/3102731 already sales growth for searchable titles outpaced non-searchable titles by 9 percent – though “one blogger”:http://scrivenerserror.blogspot.com/2003_10_01_scrivenerserror_archive.html#106764958373017865 has pointed out this could be a one-off novelty effect.

“Steven Kaye”:http://vheissu.typepad.com/about.html has been tracking the Amazon book deal on “his weblog”:http://vheissu.typepad.com/blog/ in more detail. P.S. I had refrained from commenting on this so far because for the moment I am unable to use Amazon’s book search. It turns out (in my case at least) since I haven’t bought books from the US operation recently they can’t verify my credit card even though it is valid and therefore won’t let me see the pages. Frustrating!

Following on from that news, it turns out Google has its own book search plans covering 60,000 titles and is also going to incorporate links to library catalogues – some two million of the most popular books will be indexed and readers in North America (and only there for the moment it seems) will be directed to their nearest library that stocks the book when they enter the postcode.

All of this is very welcome news – there is a lot more “quality” information around in paper form than the Internet alone provides so people should be encouraged to broaden their searches to include books.

3 November 2003

I got talking recently to a guy who runs Internet hotel booking services for a living and naturally I had to ask him how to get cheap hotel rooms online. He suggested two services – Travelaxe (a downloadable app) and “SideStep”:http://www.sidestep.com/ which installs inside your browser (IE or Netscape only, apparently).

I tested these using London as an example over the weekend of Nov 15 to Nov 16 and trying to find the cheapest possible rooms. SideStep failed to find the “Atlantic Paddington”:http://www.newatlantic.co.uk/ hotel (really a hostel) which has some of the lowest prices and neither has many hostels or B & Bs – primarily because these are generally not yet ‘plugged in’ to the major reservation systems. As a result you may be better off going to (for example) “ase.net”:http://ase.net/servlet/HotelList?type=8%2C4&dist=2 and contacting the individual B & Bs and hostels listed individually.

Still, either may be worth a try if you travel on business and need/want to stay in hotels instead of B&Bs.

P.S. both apps are free of charge and the creators make their money via the commission they get from sending traffic to the hotels when you book through them. As far as I can tell this commission doesn’t raise the price to you…

22 October 2003

If you are in London tomorrow, have a half day free and have £150 (£120 concessions) I encourage you to attend a “half-day workshop”:http://nmk.prismix.com/courses/course.cfm?ItemID=4926 I am running on Internet research methods. It’s not too late to “book”:http://nmk.prismix.com/courses/register.cfm?CourseDateID=120! Assuming all goes well, I hope to do it again – possibly for a full day. Meanwhile take a look at my “search engine category”:https://blog.org/archives/cat_search_engines.html for some of the latest news and my thoughts on the subject.

16 September 2003
Filed under:Search Engines,Software reviews,Weblogs at12:15 am

I know I am coming late to this but I have finally gotten around to using an RSS reader myself and I have been tweaking my template settings now that I can see what my weblog looks like in that format. You may note I now have a link to “FeedDemon”:http://www.feeddemon.com/ which is the best RSS reader I have found so far and I now have two feeds – one “RSS 1.0 compliant”:https://blog.org/index.rdf and one “RSS 2.0 compliant”:https://blog.org/index.xml. Enjoy!

If you’re wondering what I am talking about, RSS is, “An XML-based format for headline syndication, in which headlines and links to the actual content are made available to other Web sites” (TechEncyclopedia). Interestingly I couldn’t find a definition in the “Foldoc”:http://foldoc.doc.ic.ac.uk/foldoc/index.html tech dictionary or “Whatis.com”:http://whatis.techtarget.com/ which suggests to me this stuff is still not mainstream (though you’d think everyone was using it if you read some weblogs)….

10 September 2003

The “Internet Archive”:http://www.archive.org/ which has an index of 11bn web pages – snapshots of the web at various stages of its development – now has a “search engine”:http://www.archive.org/iathreads/post-view.php?id=8569 covering at least part of the archive. So you don’t need to know the precise address of the web page you had given up for lost (though that function still works). And you can see how the web saw things over time – you can see when a topic became “hot” for example – it provides supplementary graphs.

Thanks to “BoingBoing”:http://boingboing.net/2003_09_01_archive.html#106280030381534395 for the link

5 September 2003

Now that my new book “Managing E-mail”:http://www.amazon.co.uk/exec/obidos/ASIN/1405300264/qid%253D1044801476/davidbrakeswe-21 is out I shall be monitoring its sales progress with interest. I looked around for sites that could help me do this and found three – “Jungle Scan”:http://www.junglescan.com/ lets you keep track of your book’s Amazon rank, “GoogleAlert”:http://www.googlealert.com/ emails you at regular intervals to tell you what has changed in a Google search for a given term (like a book title) so you can see newly-indexed pages about your subject (useful for lots of things besides books!). The third service from “Books & Writers”:http://www.booksandwriters.com/ lets you track both Amazon and Barnes and Noble’s sales ranking but the very week I started to use it they announced they are introducing a charge.

? Previous PageNext Page ?