Library search/discovery apps : intro

There’s a lot of talk in the Library world about ‘next generation catalogues’, library search tools and ‘discovery’. There’s good reason for this talk, in this domain the world has been turned on its head.

History in a nutshell:

  • The card catalogue became the online catalogue, the online catalogue let users search for physical items within the Library.
  • Journals became online journals. Libraries needed to let users find the online journals they subscribed to through various large and small publishers and hosts. They built simple in-house databases (containing journal titles and links to their homepages), or added them to the catalogue, or used a third party web based tool. As the number of e-journals grew, most ended up using the last option, a third party tool (which could offer other services, such as link resolving, and do much of the heavy lifting with managing a knowledge base).
  • But users wanted one place to search. Quite understandable. If you are after a journal, why should you look in one place to see if there is a physical copy, and another place if they had access to it online. Same with books/ebooks.
  • So libraries started to try and find ways to add online content to catalogue systems in bulk (which weren’t really designed for this). Aquabrowser : Uni Sussex beta catalogue

The online catalogues (OPAC) were simple web interfaces supplied with the much larger Library management system (ILS or LMS) which ran the backend the public never saw. These were nearly always slow, ugly, unloved and not very useful.

A couple of years a go(ish), we saw the birth of the next generation catalogue, or search and discovery tools. I could list them, but the Disruptive Technology Library Jester does an excellent job here. I strongly suggest you take a look.

Personally, I think I first heard about Aquabrowser. At the time a new OPAC which was miles ahead of those supplied with Library systems and was (I think) unique as a web catalogue interface not associated with a particular system, and shock, not from an established Library Company. The second system I heard about was probably Primo from Ex Libris. At first not understanding what it was: It sounds like Metalib (another product from the same company which cross-searches various e-resource), is Primo replacing it? Or replacing the OPAC? It took a while to appreciate that this was something that sat on top of the rest. From then, VuFind, LibraryFind and more.

While some where traditional commercial products (Primo, Encore, Aquabrowser), many more were open source solutions, a number of which developed at American Libraries. Often built on common (and modern) technology stacks such as Apache solr/Lucene, Drupal, php/java, mysql/postgres etc.Primo : British Library

In the last year or so a number of major Libraries have started to use one of these ‘Discovery Systems’ for example: the BL and Oxford using Primo, National Libraries of Scotland & Wales and Harvard have purchased Aquabrowser and the LSE is trying VuFind. At Sussex (where I work) we have purchased and implemented Aquabrowser. We’ve added data enrichments such as table of contents (searchable and visible on records), book covers and the ability to tag and review items (tag/reviewing has been removed for various reasons) .

It would be a mistake to put all of these in to one basket. Some focus on being a OPAC replacement, others on being a unified search tool, searching both local and online items. Some focus on social tools, tagging & reviewing. Some work out the box others are just a set of components which a Library can sow together, and some are ‘SaaS’.

It’s an area that is fast changing. Just recently an established Library web app Company announced a forthcoming product called ‘Summon’, which takes searching a library’s online content a step further.

So what do libraries go for, it’s not just potentially backing the wrong horse, but backing the wrong horse when everyone one else had moved on to dog racing!

And within all this it is important to remember ‘what do users actually want’. From the conversations and articles I’ve read, they want a Google search box, but one which returns results from trusted sources and academic content. Whether they are looking for a specific book, specific journal, a reference/citation, or one/many keywords. And not just one which searches the metadata, but one which brings back results based on the full text of items as well. There are some that worry that too many results are confusing. As Google proves, an intelligent ranking system makes the number of results irrelevant.

Setting up (and even reviewing) most of these systems take time, and if users start to add data (tags, reviews) to one system, then changing could cause problems (so should we be using third party tag/rating/review systems?).

You may be interested in some other articles I’ve written around this:

There’s a lot talk about discovery tools, but what sort to go for, who to back? And many issues have yet to be resolved. I’m come on to those next…

01
May 27th, 2009 5:16 pm

Wow. Very useful for me. I’m doing a part-time phd on quantifying the value of OPACS using stated preference methods. E.g. Value patrons place on enhanced content (covers, tocs), faceted browsing, relevant ranking, tagging etc. Might blog about this sometime.

Your posts are interesting enough with the examples. Also they remind me that while to the user Encore (we use this)/Primo etc are replacing the OPAC, technically they are sitting on top.

Don’t quite understand Summon. It’s not a federated search, does that mean it indexes resources like google crawlers?

02
May 27th, 2009 9:50 pm

Great summary, Chris!

Aaron — Summon is sort of like Google in that it is a large index of just about everything, but it differs in the way it gets the information to index. As you noted, Google spiders the web for content. Summon is more like OCLC or Ebsco or ProQuest — information vendors give Serials Solutions a dump of information and SerSol adds it to their Summon index.

03
May 27th, 2009 10:54 pm

Chris, nice overview. See also http://cccu-lib-tech.blogspot.com/2009/04/primo-alternatives.html for a list of tools (by Andy Ekins).
Couple of issues:
- All these discovery tools are not only OPAC replacements, as you rightly remark.
- These tools use harvesting and create large local indexes, even from your own local catalogue (a number of the tools in the DLTJ list are not discovery tools); so basically no distributed metasearch (although they may offer metasearch “addons” like Primo)
- I personally think discovery tools are best implemented in a consortia environment or some other cooperation, not by a single library (not sure how you implemenetd Aquabrowser at Sussex)
- Obviously, the need to create large “local” indexes of everything is an indication of the fact that the administrative backends of our OPAC’s and other Library Systems are not fit for the new digital “cloud” area; we need NOT ONLY sexy Front Ends, but first efficient back ends
- The important thing here is the changing nature of the “collection” concept: no longer local physical collections, but global digital (see my post “Collection 2.0″ ( http://commonplace.net/2009/02/collection-20/ )

04

[...] of next generation catalogues is a blog post by Chirs Keene from the University of Sussex called Library search / discovery apps: intro, which is very up to date, having been written a couple of days [...]

05
November 24th, 2009 5:53 pm

[...] Los OPACS del futuro (aunque fuera en Mayo), tendencias en referencia digital y unos ejemplos de [...]

06
May 26th, 2010 8:03 am

@lukask; Harvesting is an easy word for the processes of data integration certainly necessary for a thing like Summon or Primo Central (we do similar things with data we get from many different sources (like publishers), and it is really dirty work). You know for sure how ugly mapping of metasearch results is, now think you don’t only want to display some data, but want to normalize, consolidate and enrich it for indexing…
I guess, that’s why this isn’t done locally in Summon and Primo Central. With both products you get a search index that is being maintained and run centrally by the vendor.
The main difference between Summon and Primo Central in my view and understanding: With Summon they sell the central search index, give you a default user interface for free and an API to use this search index wherever you want (eg. in other discovery frontends like VuFind that has a Summon connector). Primo Central is tightly bound to Primo as frontend, they sell Primo and give you Primo Central (free at least for the first year as far as I understand). But no option to use an other frontend with Primo Central.
The consortia solution is obvious, of course. Our consortium is doing central data aggregation from many sources already for a long time, but we haven’t got good user frontends to that central data pool…
Let me dream a bit: A consortium index having all local holdings of member libraries, add the data there is in Summon or Primo Central (that’s the hard part), provide individual, flexible views for each library on that index, and add all the needed services like local loan, ILL and access to electronic resources. And all that available through one single customizable (for each library) but usable frontend (and maybe add an API for those that want to build their own UIs). All technology to do that is available (even as OSS), it’s just a question of doing it. The ugly parts are data integration and interfaces to existing technical infrastructure (ILSes, ILL, …), but that’s feasible with some effort…