“Sitting on a gold mine” – improving provision and services for learners by aggregating and using ‘learner behaviour data’

I’m at a workshop today called “Sitting on a gold mine” – improving provision and services for learners  by aggregating and using ‘learner behaviour data’ (it rolls off the tongue!), which is part of a wider JISC TILE project looking at, in a nutshell, how we can use data collected from user and user activity to provide useful services, and the issues and challenges involved (and some Library 2.0 concepts as well). As ever, these are just my notes, at some points I took more notes than others, there will be mistakes and I will badly misquote the speakers, please keep this in mind.

There’s quite a bit of ‘workshop’ discussion coming up, which I’m a little tentative about as I can rant on about many things for hours, but not sure I have a lot of views on this other than ‘this is good stuff’!

Pain Points & Vision – David Kay (TILE)

David gave an overview of the TILE project. Really interesting stuff, lots covered and good use of slides, but quite difficult to get everything down here.

TILE has three objectives

  • Capture scope/scale of Library 2.0
  • Identify significant challenges facing library system developments
  • Propose high level ‘library domain model’ positioning these challenges in the context of library ‘business processes’

You can get context from click streams, this is done by the likes of Amazon and e-music providers.

E.g. First year students searching for Napoleon also borrowed… they downloaded… they rated this resource… etc.

David referred to an idea of Lorcan Dempsey : we get too bogged down by the mechanics of journals and provision without looking at the wider business processes in the new ‘web’ environment.

Four ‘systems’ in the TILE architecture: Library systems (LMS, cross search, ERM), VLE, Repositories and associated content services, we looked at a model of how these systems interact with the user in the middle.

Mark Tool (University of Stirling)

Mark (who used to be based down the road at the University of Brighton) talking about the different systems Stirling (and the other Universities he has worked at) use and how we all don’t really know how users use them. Not just now, but historical trends, e.g. are users using e-books more now than in the past?

These questions are important to lecturers as they point students to resources and systems but what do users actually use, and how do we use them. Also a quality issue, are we pointing them to the right resources. Are we getting good value for money? e.g. licence and staff costs for a VLE.

If we were to look at how different students look at different resources, would we see that ‘high achievers’ use different resources to weaker students? Could/should we point the weaker students to the resources that the former use? Obvious privacy implications.

Also could be of use when looking at new courses and programmes and how to resource them. Nationally, might help guide us to which resources we should be negotiated for at a national level.


  • small crowd -> small dataset  -> can be misleading (one or two people can look like a trend)
  • HEI’s very different to each other.

Thinks we should run some smallish pilots and then validate the data collected by some other means.

Joy Palmer – MIMAS

Will mainly be talking about COPAC, which has done some really interesting stuff recently in opening up their data and APIs (see the COPAC blog).

What are COPAC working on:

  • Googlisation of records (will be available on Google soon)
  • Links to Digital content
  • Service coherency with zetoc and suncat
  • Personalisation tools / APIs
    • ‘My Bibliography’
    • Tagging facilities
    • Recommend-er functions
    • ummm other stuff I didn’t have time to note
  • Generally moving from a ‘Walled garden’ to something that can be mashed up [good!]

One example of a service from COPAC is the ‘My bibliography’ (or ‘marked list’ ) which can be exported in the ATOM format (which allows it to be used anywhere that takes an ATOM feed). These lists will be private by default but could be made public.

Talked about the general direction and ethos of COPAC development with lots of good examples, and the issues involved. One of the slides was titled:  From ‘service’ to ‘gravitational hub’ which I liked. She then moved on to her (and MIMAS/COPAC’s) perspective on the issue of using user generated data.

Workshop 1.

[Random notes from the group I was in, mainly the stuff that I agreed with(!), there were three groups] Talking about should we do this? the threats (and what groups of people affected by these threats). Good discussion. We talked about how these things could be useful, why some may be adverse/cautious of it (inc, privacy, inflicting on others areas – IT/library telling academics what they are recommending to students are not being used, ie telling them they are doing it wrong, creates friction). Should we do this? Blunt tool, may see wrong trends. But need to give it a go, and see what happens. Is it ‘anti-HE’ to be offering such services (i.e. recommending books), no no no! Should we leave it it to the likes of Google/Amazon? No, this is where the web is going. But real world experience of things to be aware of e.g. a catalogue ranking an edition of a  book high due to  high usage lead to a newer edition being further down the list.[lots more discussion, I forget]

Dave Pattern – Huddersfield.

[Dave is the system librarian at Huddersfield, who has ideas better than me, then implements than better than I ever could, in a fraction of the time. He’s also a great speaker. I hate him. Check out his annoyingly fantastic blog]

Lots of data generated just doing what we and users need to do, we can dig this. Dave starts of talking about Supermarket loyalty cards. Supermarkets were doing ‘people who bought this also bought’ 10 or more years a go. We can learn from them, we could do this.

We’ve been collecting circ data for years, why haven’t we done anything (bar real basic stuff) with it.

Borrowing suggestions (people who borrowed this also borrowed), working at Hud, librarians report it working well and suggesting the same books as they would.

Personalised Suggestions, if you log in, looking at what they borrowed and then what others items those who borrowed the

Lending paths: paths which join books together. potentially to predict what people will borrow and predict when particular books will be in high demand.

Library catalogue shows some book usage stats when used from a library staff PC (brilliant idea!) this can be broken down by different criteria (i.e. the courses borrowers are on).

Other functionality: Keyword suggestions, Common zero results keywords (eg, newspapermen, asbo, disneyfication). Huddersfield have found digging useful.

He’s released XML data of anonymised  circulation data, with approval of the library, for others to play with and hopes other libraries will do the same. (This is a stupidly big announcement, it feels insulting to put it just as one sentence like this, perhaps I should enclose it in the <blink> tag!?) See his blog post.

(note to self, don’t try to download 50mb file via 3g network usb stick – bad things happen to macbook)

Mark van Harmelen

Due to bad things was slightly distracted during part of this talk. Being a man completely failed to multi-task.

This was an excellent talk (at a good level) about how the TILE project is building prototype/real system(s). Some real good models of how this will/could work.  So far have developed harvesting data from institutions (and COPAC/similar services) and adding ‘group use’ to their database, a searcher known to be ‘chemistry student’ and ‘third year’ can then get relevant recommendations based on data from the groups they belong to. [I’m not doing this justice, but some really good models and examples of this working]

David Jennings – Music Recommender systems

First off refers to the Googlezon film (never heard of this before) and the idea of big brother in the private sector, and moves on and talks about (concept of) ipods which predict the music you want to hear next based on your mood and even matchmaking based on how you react to music.

Discovery: We search, we browse, we wait for things come along, we follow others, we avoid things everyone else listens to, etc.

Talking about flickr’s (not published) popularity ranking as a way to bring things to the front based on views, comments, tags etc.

Workshop 2:

Some random comments and notes from the second discussion session (from all groups)

One University’s experience was that just ‘putting it out there’ didn’t work, no one added tags to catalogue, conclusion was the need of community.

Coldstart problem: new content not surfacing with the sort of things being discussed here.

Is a Subject Librarian’s (or researcher) recommendation of the same value as a undergrad’s?

Will Library Director’s agree for library data to be released in the same way as Huddersfield, even though it is anonymised? They may fear the risks and issues that it could result in, even if we/they are not sure what those risks are (will an academic take issue with a certain aspect of the realised data).

At a national level, if academics used these services to create reading lists, may result in homogenisation of teaching across the UK. Also risk of student’s reading focusing on a small group of items/books, we could end up with four books per subject!


This was an excellent event, and clearly some good and exciting work is taking place. What are my personal thoughts?…

This is one of those things that once you get discussing it you’re never quite sure why it already hasn’t been done before, especially with circulation data. There’s a wide scope, from local library services (book recommendation) to national systems which use data from VLEs, registry systems and library systems. A lot of potential functionality, both in terms of direct user services and informing HE (and others) to help them make decisions and tailor services for users.

Challenges include: privacy, copyright, resourcing (money) and the uncertainty of (and aversion to) change. The last one includes a multitude of issues: will making data available to others lead to a budget reduction for a particular department, will it create friction between different groups (e.g. between academics and central services such as Libraries and IT)?

Perhaps the biggest fear is not knowing what demons this will release. If you are a Library Director, and you authorise your organisation’s data to be made available – or the introduction of a service such as the ones discussed today – how will it come back to haunt you in the future? Will it lead to your institution making (negative) headlines? Will a system/service supplier sue you for giving away ‘their’ data?  Will academics turn on you in Senate for releasing data that puts them in a bad light? ‘Data’ always has more complex issues than ‘services’.

In HE (and I say this more after talking to various people at different institutions over the last few years) we are sometimes to fearful of the 20% instead of thinking about the 80% (or is that more 5/95%). We will always get complaints about new services and especially about changes. No one contacts you when you are doing well (how many people contact Tesco to tell them they have allocated the perfect amount of shelf space to bacon?!) We must not let complaints dictate how we do things or how we allocate time (though of course not ignore them, relevant points can often be found).

Large organisations – both public and private – can be well known for being inflexible. But for initiatives like this (and those in the future) to have a better chance of succeeding we need to look at how we can bring down the barriers to change. This is too big an issue to get in to here it and the reasons are both big and many, from too many stakeholders requiring approval to a ‘wait until the summer vacation’ philosophy, from long term budget planning to knock-on affects across the organisation (change in department A means training/documentation/website of Department B needs to be changed first). Hmmmm, seemed to have moved away from TILE and on to a general rant offending the entire UK HE sector!

Thinking about Dave Pattern’s announcement, what will it take for other libraries to follow? First, techy stuff, he has (I think) created his own XML schema (is that the right term?) and will be working on an API to access the data. The bad thing would be for a committee to take this and spend years to finally ‘approve’ it. The Good thing would be for a few metadata/XML type people to suggest minor changes (if any) and endorse it as quickly as possible (which is no disrespect to Dave). Example: will the use of UCAS codes be a barrier for international adoption (can’t see why, just thinking out loud). There was concern at the event that some Library Directors would be cautious in approving such things. This is perhaps understandable. However, I have to say I don’t even know who the Director of Huddersfield Information Services is, but my respect for the institution and the person in that role goes about as high as it will go when they do things like this. They have taken a risk, taken the initiative and been the first to do something (to the best of my knowledge) worldwide. I will buy them a beer should I ever meet them!

I’ll be watching any developments (and chatter) that result from this announcement, and thinking about how we can support/implement such an initiative here. In theory once (programming) scripts have been written for a library system, it should be fairly trivial to port it to other customers of the same software (work will probably include mapping departments to UCAS codes, and the way user affiliation to departments is stored may vary between Universities). Perhaps Universities could club together to working on creating the code required? I’m writing this a few hours after Dave made his announcement and already his blog article has many trackbacks and comments.

So in final, final conclusion. A good day, with good speakers and a good group of attendees from mixed backgrounds. Will watch developments with interest.

[First blog post using WordPress 2.7, other blogs covering are Phil’s CETIS blog, and Dave Pattern has another blog entry on his talk. If you have written anything on this event then please let me know!]