Menu

nostuff.org

…living up to its name

A little run

I’ll try and keep this short. Since leaving University 13 years a go I’ve been getting fatter. And generally not very healthy. It would be fair to say I started running to reduce the beer belly.

Getting the runkeeper app was a key development; my endeavours went from occasionally trotting aimlessly for a little while (then feeling very smug for the next month or so before doing it again) – to actually trying to improve times, distance and improve on routes taken. Stats and Maps. Pure crack.

I did a couple of runs (10k) in 2011, and a half marathon in early 2012. This felt good, and it followed my simple logic that the further I run the more beer belly is burnt off. The day after the 2012 Brighton Marathon I signed up for 2013. Stupid Boy.

It turns out a Marathon is more than just two half marathons. Once you’ve got used the to distance of a half marathon, you can pretty much do it as required.

Marathons are different. Unless you are a freak (or ‘athlete’ as some people call them) your body can not store enough energy (carbs, glycogen) for a whole marathon, even with carb loading before hand. This is what they call hitting the wall. Walking or resting will make no difference, like a battery, if you are out of energy your stuffed.

As I’ve trained for the marathon, I’ve noticed this a lot, things get tougher soon after 14 or 15 miles. Every step is painful, a slight change is step (a curb, twisting your head to see if it is clear to cross a road) is painful. Stopping is painful. Starting again is almost impossible.

With the bad weather this year I’ve done a number of long runs in the dark, cold, rain and – above all – wind. It’s oddly lonely, you can start running in rush hour and finish when people are going to bed. A number of times I’ve not reached my planned distance.

Most training plans suggest doing a few runs of about 20 miles up until two/three weeks before the marathon. This I have done, the last and furthest in particular was hard, I hit 20 miles some way from home but had to instantly stop, and then take tiny slow steps back to my flat (it was very cold, very wet, very windy, woe is me). How could I do 26 more?

Tomorrow I find out. I haven’t been perfect, I haven’t been out 6 times a week like many training plans suggest, and haven’t worked out the exact amount of carbs I should be eating each day or anything like that. And I’m afraid right up until the last week I was eating and (plenty) drinking.

I want to finish this. I have no idea how i will do. If I can do 4hour 40 mins I will be happy (for reference I can do a half marathon in under 2 hours). These last two weeks have been odd, I feel like I’ve lost all I could do two weeks a go, I’m pretty sure tomorrow I will run a few miles and then want to stop with a stitch.

If you fancy it, and only if you do, no pressure  please do sponsor me a small amount, everything if very much appreciated and I’ve found it very touching to see all the people who have done so so far.
http://uk.virginmoneygiving.com/ChrisKeene

Dev8D

Tomorrow, Tuesday 14th February is the start of Dev8D 2012. A three day event for developers working in UK Higher Education. Which I am not really one, but don’t tell anybody and we should be ok. It’s an event that is on steroids : last time I went, my brain was pounded with more information to digest in 30 minutes than I would normally expect to receive in an entire day. It’s also bloody fantastic. I’m chuffed I can go this year.

The organisers have once again predicted the sessions I would like to attend and deliberately made sure they all clash. The Government should put a stop to this. Last time I tried to counter this by walking between the sessions of interest hoping to benefit from both. This did not work.

Had I mentioned I’m not a developer? Yes. But I’m very aware that where as developers a few years a go were talking LAMP (Linux, Apache, MySQL, Perl/PHP/Python), they are now talking almost anything but. The OS is of almost zero importance, so too is the web server (unless it is node.js it seems), MySQL is the uncool in the corner while hipsters MongoDB, CouchDB, Solr and Redis are on the dance floor, meanwhile never-cool PHP has become uncooler, perl has been forgotten, Groovy, Ruby (on Rails, of course), Scalar, Erlang, Haskell, Clojure and R are where it’s at. This article from Simon Willison sums it up nicely. If I walk away with half a clue as to what some of these technologies are useful for, and how I might use them, then job well done.

Anyway… My Plan:

  • Tuesday morning: Python. But I also want to attend the Git thing, and HTML5 looks to be interesting
  • Noon: Getting Ready to build location aware apps, AND/OR CKAN (a data portal)
  • PM: Consuming Linked Data, but also really want to hear about Redis.
  • Wed Am: Javascript and Jquery (I confess I now can’t really do the former without the latter), but Moodle plugins are also calling my name.
  • data.*.ac.uk panel could be good (as I run data.lib.sussex.ac.uk)
  • PM: CoffeeScript.
  • Thurs Am: Ebook/epub technologies, but also XCRI (a standard for course descriptions) could be useful
  • Pm: Visualising your Data Surgery, and then Indexing and Elastic Search.

I am also expecting to consume: (a) Beer (b) Gin. And maybe even interact with people in some sort of attempt at (a) being social (b) networking. If you see me there, say hi.

PS it is against the rules to quiz me after the event about any of the items above.

Meta : blogging

There’s a long tradition of bloggers blogging about blogging. This post follows in the self-obsessed insular tradition.

This blog post is somewhat unusual in that I’m posting it to nostuff, my blog. I’ve been using posterous a bit of late and like it a lot. Posterous is also setup to post to a WP instance on nostuff here: http://www.nostuff.org/posterous/ it was setup as a way to archive an externally hosted service, but the end result is very useable (and seems to rank quite highly in Google).

Why am I using Posterous more, and this blog less? A number of Posterous posts have started off as ‘I can’t quite fit this in to 140 characters on twitter so I’ll use Posterous’, and normally write much, much more than I intend.

I also find using gmail as a post creator rather nice to use, it makes me focus on writing as it doesn’t support any fancy formatting or blog specific features.

In fact the composition window of WordPress has always been its weak point. Does this put me off using it? Even though the WP developers have put a lot in to the interface, it is at the end of the day, a TinyMCE (or similar) WYSIWYG editor. The text box for which always seems a little small to me. And it feels a little like editing a form. I wish it looked more like a Google Doc, taking up nearly all the screen with the text editor, large text, and excellent ‘constant save’ / view changes support.

What’s not helpful is that my blog is hosted in the States, or a server that ins’t always as responsive as it should be, so the experience feels slower than using gmail. Finally, the categories, tags, perm link etc all make blogging feel like a ‘heavy’ experience, even though – of course – I’m free to ignore them. The simplicity of posterous was liberating.

So, for this post, and others recently to this blog, I’ve used a client called ecto to compose it, and at the end post it to WordPress. This is something ironic in doing just about everything in a browser except composing something that is so at its heart a web-based thing, a blog. You could argue that writing (relatively) long bits of text is better suited to local apps than web apps, but this doesn’t explain why I head for Google Docs and Google spreadsheets by default rather than Word of Excel. I selected ecto many years a go after trying it and MarsEdit. I can’t help thinking I made a bad choice as ecto has not since had a single update and MarsEdit has gone from strength to strength. Still, while it works I shall resist paying the £28 for MarsEdit. This is a rare area where the Microsoft alternative is better and free.

One to(o) Many

When I started out with this blog many years a go, I thought I was in the same boat as many of my peers. Over time I’ve noticed that I’m quite rare in that nostuff.org/words is a ‘anything goes’ blog – most are either work/professional related or of a particular interest. In fact very few of the blogs I follow are of a general style such as this (Dave Pattern and Tom Roper are two examples I can think of which buck the trend). Many people seem to have a professional blog, perhaps a specialist blog (cooking, running, knitting, etc) and increasingly perhaps a tumblr for random stuff.

I’ve rather keen to keep the general feel. I know I’m guilty in so many ways at letting work/personal intermingle in so many ways but you can categories blog posts (and probably twitter to) as one of: I’ve done something I want to share; I’ve got a view/opinion on something, and the third, related to the second, I want to reflect on something (which this post probably falls under), and I like the idea of this blog reflecting those things no matter what subject they are about. This space is a dump of my thoughts and things worth sharing. I prefer to let categories and tags provide a way of filtering should people only be interested. Of course, there’s an argument that you may want to avoid people in a professional network (I hate that phase) from seeing your thoughts and rantings outside of work. It’s a very good argument, but one so far I’ve resisted changing what I do because of it.

Of course, the idea that this is the place for the thoughts and outputs of Chris Keene is nonsense. As mentioned above, I also dump stuff on posterous. I’m using Google+ more, there’s flickr, youtube, comments on other blogs and most of all twitter. There’s no easy answer to this, I’ve gone away from the route of adding a lot of plugins from other sites to the sides of this blog, it makes things look messy, but it does leave a hole through my general ‘this is my dumping ground’ philosophy.

Blog TLC

I haven’t done much with this blog over the last year though I do have some things on my mind.

  • I have regular plans to move to a new theme. I like many of the very stripped down themes now out there including this one and the default Twenty Ten theme. However, somewhat cynically, I like the fact that the current theme gives it a somewhat unique feel, I like to feel that someone somewhere comes back after a year and thinks ‘oh yes, the green one, I’ve been here before’. And while it has failings (the left hand side if far from perfect, and it relies too much on images for the background shading) I have put effort in to it over the years, including a first stab at converting it to html5 last year. All in all, I’m going to hold on to it a little longer.
  • I’ve just added some social media buttons, ‘plus ones’ for Facebook/Google+ and a tweet button. They need some customising and a currently a little large. I’ve found I use these a bit on other people’s sites as a simple tip of my hat that I have read and enjoyed the post. A simple ‘hit’ within web stats just doesn’t convey this. So while they are clutter, and somewhat ugly clutter at that, I hope people might take the time to hit them, using which ever network they like to use, if they enjoy reading something.
  • I’m going to re-do Categories. I’m going to base them on potential categories of reader. Somethings like this: Library Technology, Technology, Libraries (general), Politics, Brighton (and Sussex), UK. I may also include some meta categories: essay, short, interesting (for those I think worth highlighting) and me. Of course each post can belong to many categories. Some of the current categories date back to when this was the place to share links.
  • Consider setting up my posterous to post to this blog, either automatically or when I flag it to do so. Some of the stuff I have posted there deserves to be here (which I see more as a permanent record).
  • Re-think the front page of the blog. Is the most recent blog post, in full, the best thing to see?
  • I find that on landing on to someone else’s blog (via Google or twitter) I want to know a bit about them, for example what they are saying may have a different meaning based on where they live, or what they claim to specialise in. So I may beef up the brief blurb at the top left of the page.

I have one final idea but it will take more than a bullet point to explain.

For a long time I have felt that Blog comments leave me wanting, on my own and other sites. I only get to see those who commented before me, I probably won’t see those left after mine, people who already have commented will not see mine etc. If I write a long comment with some good points, I have no way of recording that comment, i.e. there’s no way to see a list of all the comments I’ve left on other sites. Wanting to refer to an old comment on mine relies on me remembering which blog and post it was connected to. Finally, managing comments on your own blog can be hard work, even with the impressive – free – WordPress spam plugin.

What’s more, I see an increasing number of blog sites either just keep comments turned off (unless your very popular commenting is rare) or use an external commenting solution such as Discus.

I’d like to use Google+ as my comment system. And I don’t think this is currently possible.

I think Google+ would make an excellent platform for commenting (so would friendfeed, which it is almost identical to). Everyone can see a public post. All comments are listed together under the link to the post on Google+, and everyone can see every comment even if they don’t follow the other person (unlike twitter and, probably (it’s too hard to understand) Facebook). You can come back days later and easily see new comments. If you didn’t see the post you can glance at the comments to see if it is of interest. If you are not interested you can just skip past the post in your stream (the comments will be wrapped up so it won’t take much space). Facebook is too much centred around a closed set of people you follow. Twitter is very much for the here and now, miss it and it’s gone.

As much as I’m a fan of twitter, it does have flaws (by design not implementation). As noted above, it’s easy to miss things. I sometimes go back in the timeline and come across a really useful conversation that I could have missed. What’s more, I know that I may see person A’s original tweet, and person B and C – who I also follow – conversation with A about it. What I miss (and remember I’m lucky I bumped in to this) is Person D and E commenting on it with A, which I don’t see as I don’t follow them. Nor do I see person A and person B replying back, and hence I might miss the bulk of the conversation. What’s more I miss F, who follows C joining in which starts a whole new track with others! I miss all of this, in fact no one is guaranteed to see all of it. Meanwhile those who are totally uninterested are getting bored of seeing these tweets about a specific (and probably quite anal) topic.

So I would love for my blog to autopost to Google+ and then use the Google+ post as the place to comment, ideally showing it below the blog post. In a similar way to Techcrunch showing Facebook messages under its posts (but why a tech site uses Facebook, hated by much of the geek community, is beyond me… but then it is Techcrunch).

I still feel having a blog is useful. I’ve never attempted to update it regularly – nor have I ever understood the idea that there is pressure to post of a regular basis.

To an extent, I don’t see a difference between ‘maintaining a personal website’ (which was what we did before blogging) and keep a blog. Occasionally you have an idea to create a page about something, blog software just makes the process easier. Posts are just (the new) pages.

In my mind an online presence – a website and domain – are essential to those who spend much time on the web. Simply having a series of profiles on popular websites just doesn’t seem the same. And words on nostuff.org continues to be the main part of the nostuff.org content-free experience, living up to its name.

Update: Feel free to leave comments here :)

hifi

I’ve got the other place (here or here) for random ill-thought musings. But today I decided to put one here. Hi-Fi

< vaguely interesting background story with slight element of personal touch>

I left University in 1999. Having been in full time work before I had even finished my last term, and living in cheap shared accommodation, I could splash out. But being somewhat conservative (small ‘c’, you see that? SMALL ‘C’. I mean ‘c’. Just want us to be clear on this ok) I waited a year, before buying the hi-fi I had seen and desperatly wanted for a massive £280.

Now, should you have been looking at purchasing a stereo/hi-fi (what do we call them nowadays?) around this time, which I was, clearly, then you will be aware at just how vile they all were. Bulging like their biceps were about to explode (pendants will argue about their lack of biceps). The one pictured below is quite a modest example, laziness stopped my from finding a more accurate example.

The issue was not was not that these existed, I can understand there is a market for them, just like there is one for JD Sport. But that they dominated the market so. Walk in to a Comet or Currys and aisle after aisle was full of them. If they had 30 models on display then only one would be purchasable by sane persons.

[an aside: it looks like with the demise of (a) vinyl (b) tapes (c) club culture that design has moved on and most hifi’s on sale today have completely different dimensions. A good example is this from Onkyo, which is excellent and recommended]

But I had found one that was above all this, minimalistic in design, always good in my book, and stunning on the glass and wood stand it was displayed on where I first saw it (which was Dixons, yes I know). It was the Pioneer NS-9.

It consisted of one small unit, which was basically the hifi, a separate display, two small speakers and a woofer. It looked and sounded great. The front of the display – with all the buttons – could actually come off and act as a remote but this was more or less unworkable. The UI was awful, trying to set sleep mode or retune a preset radio setting still requires the manual.

The two small speakers and a bass combination was unusual and worked well. My flat mates and near neighbours through the years will attest to the kick arse bass it produced (literally as I type this, Open Up from Leftfield is playing).

It was the first, and still inexcusably very rare, system with FM/RDS I saw that showed not just radio station name but extended information, such as a the show, DJ, and maybe even the song that was playing, depending what the station sent out. This was way before DAB radio, and at the time at most you would normally see was the station name. I must have been one of the few reading the text that was being transmitted. I remember the Radio 1 Top 40 would show the song name and artist (and position in the chart)  which sometimes was displayed before the song was announced on air. It was if I was tapping in to some secret message no one else had access to.

Anyway, apart from the woofer, separate display and extra programme information, it was just a hifi, I still can’t quite believe I’m writing a post about it.

<finally getting around to the point of the post>

But I’m not writing about it.

For years I’ve been glancing an eye at a Hi-Fi separates system. Now isn’t the time for me to be buying one having just bought a property, but I’ve been keeping a look out for something that can replace my stereo. The CD player isn’t what it once was, and a few other niggles remain.

It started when I was a kid. I had a Matsui (Dixon’s home brand, made in Wales I think, but with a reassuring foreign name) cheap hifi. Why would anyone pay anymore than this? I can turn it up and the treble sounds high and the bass is low. It all sounds clear. What more could you want? One day, on a “I need to get out the house on a Saturday but I don’t know what to do so I’ll go to HMV and look at all the albums I could buy, again” I was in, well, HMV and Blue Monday came on. This was a time when HMV was a music shop that sold a few movies in the corner. Their sound system was amazing. I didn’t know what they all did but the black boxes when from floor to ceiling.

The song blew me away. Consumed me. Took me over. And I had it at home! I walked back to my parents house and put it straight on. It didn’t give me the same feeling. That was when I realised what a good sound system could do. And though the sound my system made sounded ‘ok’ it just didn’t have that magic. It sounded flat.

But how often do I use the CD player now? Not enough. Spotify more than anything dominates now. This isn’t always healthy. I can never settle on just playing an album, and spend forever playing favourite tracks. Anything that doesn’t grab my attention is skipped. And I know this is bad.

With audiophiles preaching about the importance of cable, good source, good amp, etc, this all goes to pot when your source is a live streaming free Spotify (subscribers get higher bit rate) going through your Macbook Pro headphone socket. So here’s the question. In this age of Spotify and similar, is the hi-fi stack still valid? I use my Hi-fi more for radio then I do CDs, yet the radio turner is often seen as an afterthought. And what about all in one compact systems, from good names such as Onyko, Cambridge Audio, Denon, Marantz, are separates really that much better than these?

One issue was the tuner, CD players and Amps start at around £100 (though it’s made clear these are bottom of the range). The cheapest tuner is £150, and that has bad reviews, and can’t do DAB+ (DAB+ is a major improvement over DAB, not man realise that the iplayer has a higher quality stream than DAB, with the possible exception of Radio 3). This is an odd situation. You can buy a decent all-in-one unit for say £200, yet a semi-decent turner separate costs about the same, but with probably less features. With the tuner on top of the CD and Amp you’re look at £350 for the most basic of basic systems, extra for the speakers. Is it worth it? How much of that expense goes on the physically boxes and extra overheads of a separates system? The reviews all make somewhat patronising comments ‘excellent sound – for a all-in-one system’, ‘good if you need a basic set in another room’ etc. But how much is snobbishness and how much is fact. Is it worth it?

So I can’t afford a new system. And not even sure I need one based on most of my listening is of Spotify, iplayer and radio 4. But I can’t decide that if I were buying a new system, which I’m not, whether I would buy an all-in-one system (such as this or this) or separates.

I’ll continue to debate what I wont be buying, but might buy if I could afford it, and update you if need be. But I still am not sure such things are needed especially as we move towards the internet/computer being the key source of music, and the main output of which is currently a crappy tiny headphone socket. If we want to hear good quality music via our computer a different route my need to be developed.

Top posts

For reasons that escape me, here are nostuff.org/word’s top hits

Top Posts for 90 days ending 2010-10-05 (Summarized)

Summarize: 7 Days 30 Days Quarter Year All Time

2010-07-07 to Today

Title Views
Top 20 UK Universities 905 More stats
Fun with Lisa 797 More stats
Home page 359 More stats
Top UK Universities : Combined Rankings 310 More stats
Summon @ Huddersfield 285 More stats
Library search/discovery apps : intro 181 More stats
University league tables combined data 102 More stats
JISC Library Management System Review 91 More stats
webpad : a web based text editor 80 More stats
SQL update: doing a find/replace on part of a field 80 More stats
Adverts that follow you 71 More stats
Library catalogues, search systems and data 66 More stats
short urls, perl and base64 62 More stats
Library Catalogues need to cater for light-weight discovery clients 55 More stats
Talis Aspire, checking if a course has a list 52 More stats
Look! I can post links straight from B3ta 43 More stats
Google Books API 40 More stats
Zoho and WordPress themes 36 More stats
Katie Price and Peter Andre : A whole new world (reviews) 34 More stats
Twitter clients 32 More stats
BBC Three 27 More stats
IMAP5 : or developments that should happen in email 25 More stats
Amazon AWS EC2 and vufind 24 More stats
Nick Clegg’s Fault. Beware the REAL Nasty. 18 More stats
PubSubHubbub instant RSS and Atom 17 More stats
blip.fm 17 More stats
VAT ‘offset’? No just a tax rise via the backdoor 15 More stats
to do list software 14 More stats
html5 and nostuff 13 More stats
SQL grouping mulitple values in to one SELECT field 13 More stats
Free e-books online via University of Pittsburgh Press 12 More stats
Linked data & RDF : draft notes for comment 11

2010-07-07 to Today

Referrer Views
tweetalondoncab.co.uk/ALittleAboutTwi… 20
library.hud.ac.uk/blogs/summon4hn/?p=… 18
daveyp.com/blog/archives/310 17
google.co.in/ 14
photoshopdisasters.blogspot.com/2010/… 14
google.co.uk/ 13
library.hud.ac.uk/blogs/summon4hn/?p=… 12
twitter.com/ 8
library.hud.ac.uk/blogs/summon4hn/ 7
library20.org/profiles/blogs/librarie… 7
commonplace.net/2009/06/linked-data-f… 6
philbradley.typepad.com/phil_bradleys… 5
nostuff.org/ 5

And search terms

Today

Search Views
example of summon opac 3
integrated search engine “metalib” 2
primo encore summon ebsco 1
zoho is bad for business 1
chmod 400 and amazon 1
british library opac began 1
perl short base64 encode 1
top 20 universities of uk 1
impressions of joomla 1
top 20 uk universitys 1

Yesterday

Search Views
top 10 marketing universities in the uk 2
sql update portion of field 2
summon discovery primo 1
top universities in uk for marketing 1
top 10 u.k universities for masters in r 1
good universities for marketing 1
wikepedia ranking of universities offeri 1
imap5 1
warwick not in thes university rankings 1
base 64 shorten 1

ircount development

I’ve finally got around to spending a bit of time on the ircount code.

This post goes through some of the techy stuff behind it. If you’re just interested in features, I’m afraid there’s none yet, but you can now compare more than 4 repositories, but that’s as far as you’ll want to read. (more…)

Google Reader – shared stuff

I’ve been using Google Reader for a while having jumped ship from Bloglines. One of its features is to share stuff. This is potentially a good thing as it avoids me bombarding my twitter followers with endless links to stuff i find interesting.

At the moment it is useless as I don’t really follow anyone on Google Reader, and they (probably good sense, and a firm value of their own time) don’t follow me.

So, people, here is a link to my shared stuff.

http:

So feel free to add me as a contact in Google Reader, and I’ll do the same. And read interesting stuff. Because twitter, failblog, blogs and the web don’t already waste enough of my time.

The Data Imperative: Libraries and Research Data : comment

I put this in to a seperate post. It continues on from my previous post, but didn’t want my notes of the day to be taken over by my ill thought views.

Personal Thoughts

Reluctant to give some thoughts as I know so little about the service. However… (!)

There seems to be two clear areas here: Data formatting and Data storing. There is some linkage (Preserving surely covers both, formats can become obsolete, Servers die), yet the two seem to be somewhat seperate.

Both require IT skills, but IT is a broad church, the former is technical metadata (and is very much IT and library) and in the general area that I sees covered in the Eduserv efoundations blog.

The latter in its simplest form is hard core infrastructure. Disks, sans, servers, security, but also has elements at the application level (how do we access it, using what software, repositories? CRIS? Fedora?).

On another issue, while it is easy to say that libraries should take the lead, I think we need to be cautious. With the current climate of frozen or decreasing budgets nationally, and journal subscription pressure, how wise is it to go to the University’s executive and demand funding for resources/staff for data management. We know it’s important and could make the process of research more efficient, but there are other things higher up a Universities list of priorities (NSS/atracting good students, REF, research funding). Even at a library level, journals help researchers do research (which brings funding), and keep students happy because we have the stuff they need (NSS). How many journals should we cancel to focus on Research Data? Why? The recent JISC call will help with providing a business case.

The problem at the moment is that there are not enough clear benefits for most Universities to steam ahead with this. Let’s clarify this: not enough benefits for the institution itself. The benefits are for the UK as a while (actually, the while world). It’s the UK-wide economy and research that will benefit. So maybe it needs UK-wide funding. It’s easier to convince someone (or something) to spend money when the benefits for them are clear. In this case the benefits are for UK so it should the UK which sets aside explicit cash (via HEFCE, JISC, and so on).

And this is happening, with the JISC call (talked about today), amongst other things it will help build examples.

But I’m not sure if the institutional level is the best one. Australia has been successful with a centralised approach. We have a number of small Universities, and those which only have one or two departments which are research active. Yet the resources/knowledge required of them will be similar to that of a large institution. Will this leave them at a disadvantage?

On another note, it seems the range of data is vast. When dicussing this, I always – incorrectly – picture text based data, of vearying size, perhaps using XML. Of course this is blinkered. For auido, images and similar should a data service just provide a method to download, or a method to browse and view/listen? When it comes to storage and delivery, should we just treat all data as ‘blobs’ – things to be downloaded as a file, and we no nothing more with it? This makes it easy and repository softwareapplications (eprints/dspace/fedora) are well placed to cater to this need. But I get the impression that this is somewhat simplistic. Perhaps this means a data service needs a clear scope, otherwise we could end up building front end applications which mimic flickr, youtube and last.fm all in one. A costly path to go down.

[all views are my own. are wrong, badly worded, ill thought, why are you reading this?, just think the opposite and it will be right, etc]

Library catalogues, search systems and data

Below is an email I sent to UK Library e-resources mailing list (lis-e-resources@jiscmail.ac.uk). I’m putting it here for the same reason then I sent the original email: I think there are questions relating to the changing role of the library catalogue and new models are developing in how and where metadata exists for the items libraries provide access to.

My points in a nutshell:

  • The way we work with Library system (LMS) catalogues is changing, with the need to import and export large batches of records from other systems increasing, especially with online materials such as journals and e-books. This is quite a different need to those before the web, when item = physical thing in the library building, records were added one at a time, with
  • Library systems have not adapted to this new need, and though technically possible is often fiddly and can feel like a hack.
  • While it is possible to import batches of records, there are issues regarding keeping everything in sync. For example, Libraries often subscribe to publisher (online) ‘journal bundles’, the titles included in these bundles can change over time, but how to easily update/sync the catalogue to reflect this. One option is to regularly delete the imported records and reimport from the data source completely, though, if I understand correctly, Library Systems often do not delete records, but instead simple ‘suppress’ from the public view. So doing this for twenty thousand e-journal records each month would leave 240,000 hidden records on the system each year!
  • Why do we want them on the catalogue? Because users search the web interface to the catalogue to look for journals, books, theses etc. So we need to ensure our e-journals, e-books etc can be discovered via the catalogue interface.
  • A Library System (LMS) will typically have a catalogue, which cataloguers and other library staff maintain, and a public web front end for users to search (and locate) items.
  • However, ‘next generation’ of web search systems are now on the market, these allow the user to search the LMS catalogue and other data sources simultaneously in one modern interface.
  • Setting up these systems to search other data sources (in addition to the library system catalogue) so that they include records for online journals and e-books (and more) is a much neater solution, then trying to add/sync complete cataloging records in to the library system catalogue.
  • This to me (and I’m no scholar on the subject) has changed the model. The Library System catalogue was one and the same as the public web catalogue. What was on one was on the other. Librarians would discuss ‘should we put X on to the catalogue?’. But now these two entities are separate. Do we want something to be discoverable on our web catalogue search system, Do we want a record on our back-end library System for something? These are two separate questions, and it is possible to have an item on one and not the other. It would be easy to say that if you just want users to be able to discover a set of items, just make them available on your next generation search systems if it wasn’t for the fact that…
  • Third party systems can cross search or import records from multiple library catalogues, getting data from their library system. This was a simple thing to consider: do we want to allow these systems to have access to our records/holdings, and if so they would search our catalogue. These are examples and not the only things to consider, for example Endnote allows you to search a library catalogue within the application, of course this is the Library System catalogue.
  • This creates questions: which items do we want to make available to our users searching our web catalogue? which items do we want to expose to other systems out there? What items do we want to keep in our Library system back-end catalogue for administrative/inventory  purposes? With the old simpler model these questions did not need to be asked.

I’ve drawn some rather embarrassingly bad diagrams to try and illustrate the point:

Original Library catalogue model

Original Library catalogue model (click on images for larger version)

New Library catalogue model

New Library catalogue model

So after that rather lengthy nutshell (sorry) here is the original email, which does ramble and lack a point in parts, sorry:

Over the last few years the need to add e-resources (journals/books) to our library catalogue has grown. The primarily reason being users expect (understandably) to find books and journals in the catalogue, and that includes online copies.

This has seen the way we use our catalogue change, from primarily adding individual records as we purchase items, to trying to add records in bulk from various third party systems.

These third party systems include the link resolver (for journal records), e-book suppliers and (experimentally) repository software (for theses).

I imagine many are in the same boat as us, we want to do this in a scalable way, we don’t want to be editing individual records by hand when we could be looking at a very large number of records both for journals and – as/if usage takes off – e-book.

For this to work, it requires high quality (MARC) records from suppliers, and LMS (ILS) vendors adapting their systems for this change in behaviour. For example, it may have been reasonable in the past for an LMS supplier to presume that large numbers of records would not need to be regularly suppressed/dropped, though with ever changing journal bundles this may be normal practice in the future.

Furthermore, just to add confusion, next generation web catalogues can search multiple sources. The assumption that ‘public web catalogue’ reflects the ‘LMS catalogue’ (i.e. what is in one is in the other) may no longer apply. Should e-content be kept out of the LMS but made seamlessly available to users using new web interfaces (Primo, Aquabrowser, etc etc)?

This seems like quite a big area, and a change in direction, with questions, and yet I haven’t seen much amounts of discussion (Of course, this may well be due to a bad choice of mailing lists/blogs/articles).

Are others grappling with this sort of thing?

Anyone else wishing they could import their entire online journal collection with a few clicks but find dodgy records (which we may for!) and fussy library systems turn it in to a very slow process?
And not quite sure how to keep them all in sync?

Would love to hear from you.
Who else has all their e-journals on the catalogue? Was it quick? Do you exclude free journals etc?

I also added this in a follow up email:

We already have Aquabrowser and this does seem to offer a nice solution to some of this. It looks like you just need to drop a MARC file in place and the records will be included. (See http://www.sussex.ac.uk/library/)

But this presumes the ‘keep the records out of the LMS’ is the right approach, and it is not for all.

Our (LMS) catalogue is exposed else where, such as the M25 search, Suncat. And others will add COPAC and Worldcat to the list. Plus other local union-ish services.

By simply adding these records to a next gen catalogue system they will not be available to these types of systems. This may be desirable (Does someone searching Suncat want to know that your users have online access to a  journal) but the opposite may also be true.

Lets take a thesis, born digital in a repository. It would seem desirable to add the thesis to main LMS catalogue (especially as printed/bound thesis would appear there), and make it available to third party union/cross-search systems.

Next gen catalogues are – I think – certainly part of the solution, but only when you just want to make the records available via your local web interface.

Owen Stephens has replied with some excellent points and thoughts on the matter which are worth reading.

Finally, I’m not a Librarian, cataloguer, or expert, so these are just my thoughts. There is stuff to think about in this area, I’m not suggestion I have the answers or even have articulated what I think the issues are with any success.

Update: Just come across a blog post from Lorcan Dempsey, which as ever articulates some of this very well.

2008 : nostuff.org under review

It takes someone with a grossly over inflated ego, and thinks their website is a trillion times more important than it actually is to try and write a review of the previous year. What sort of idiot does that, as if anyone will read it!

Hello.

Think of it as a school report, annual appraisal, or cheap channel 4 air time filler around the new year.

nostuff.org has grown in the last year, so did the readership. Some of the posts were even read by humans.

nostuff.org/words started out like many a blog, rambling on my oh-so-important thoughts about the latest news story, gadget, or (I confess) the software I installed on my laptop (in my defence, as no one read it, I was using the blog as a personal notepad as I reinstalled said laptop).

In 2008 (well late 2007 if truth be told, but that would ruin the whole thing, is that what you want? do you?) something happened: some original content appeared on nostuff. I didn’t even copy it from somewhere else.

Part of this was due to me getting a little more geeky than I had in a while. The web had become a read-only resource for me, I consumed (the entire) wikipedia, other blogs, news sites, and often becoming an expert on how to apply for a parking permit in some random US state. I was also watching too much crap TV (why was quizcall so addictive). The telly went off, Radio 4 went on, and I decided to actually do something (perhaps, somewhat belatedly taking the advice of why don’t you).

ircount / Repository Statistics

One of these things I had been working on well over a year. I had written a simple script which connected to ROAR each week and collected the count of records for each UK repository. I finally got around to writing a simple web interface to show all of this, which utilised the amazing simple to use Google Charts. I announced this in March 2008, which was in no way set to coincide with the Open Repositories conference I was attending a few days later.

This went down well, and was probably the first time when well known websites had linked to me (not that I’m obsessed with hits and being linked to or anything). Peter Suber’s highly regarded Open Access news linked here, it felt good (I’m embarrassed to admit this, but I actually have a delicious.com ‘ego’ tag for this). In October I  released an updated version of the site, now on my own website, which included stats for all Institutional Repositories, not just those in the UK.

A future development will be to report on ‘fulltext’ items, not just the number of records, though this will be a departure from just using ROAR as a datasource and will involve me connecting to individual repositories myself. In November I carried out a bit of research by playing with Tim Brody’s Perl library for connecting to OAI-PMH repositories.

Book catalogue using Talis Platform (and mashedlibraries)

The Talis Panlibus Blog and ‘Talis Technical Development’ have a lot of posts discussing APIs and functionality, all of it was hard to visualise at the time (probably because a lot of the infrastructure was in active development). The Talis Platform is a place to store RDF data accessible via the web (like Amazon S3 is a place to store files accessible via the web). It has seperate stores for different users/applications. I first explored this in the summer of 2007, and my steps at the time now look somewhat simple and naive!

In February 2008 I released the first version of my (simple) interface (use this link for the current version). This searches the ‘ukbib’ store which is a Talis Platform store holding RDF data of book records. A seperate store has holdings information for many libraries. The two can be linked via the ISBN (used a bit like a primary key in a relational database). The design of the Platform is such that you can merge multiple sources of data (from across the web) and it will bring back a single response with the data from the various sources.

In March I added the just released Google book API, both the static and dynamic versions. However the dynamic version (which should also show a book cover) only seems to work for a small amount of books.

Mashedlibraries: In November I attended the excellent Mashedlibraries day. During the afternoon we had a opportunity to work on various things. I decided to provide a holdings page, with a Google map showing the location of libraies which hold the item. With no experience of the Google Maps API, nor javascript, nor how to get the information our of the Platform (and Talis Silkworm), this was no small task. Luckily Rob Styles worked with me to provides a huge amount of patient help.

I’ve also added ‘Google Friend Connect’ to the site, though it currently doesn’t really work, as comments appear for all items, rather than just for the item they were added to.

UK Universities

I’ve always (well, the last few years, not so much when I was 5) taken an interest in the University league tables published by the national press. I often make the mistake in thinking that older and Russell group Universities are ‘better’ than others, but these league tables often show otherwise. The problem is they often come up with different results. But what if you combined all the scores from these tables to get an over-all average, hopefully ironing out the oddities of particular methodologies.

I spent a bit of time adding the league tables of various papers (and international rankings) to a spreadsheet, and sticking it on the web – along with some comments – as both an excel file and a Google spreadsheet. Before I published it, I also asked readers to suggested their top 20, to compare those we perceive to be ‘top’ with those that come top in these league tables.

This clearly hit a nerve. Or more to a point, hit a popular Google search term!

list of search terms for nostuff.org (most about Universities)

list of search terms for nostuff.org (most about Universities)

Annoyingly I was getting most hits to the post which asked readers to submit their own guesses to the top 20, and not to the post which listed the carefully compiled top 50 based on collecting data from various sources (which is presumably of more use to most people carrying out such searches). This seemed to be due to the titles of the two posts, so, in my first ever attempt in SEO, I changed the title of the latter, and it then increased the number of hits from search engines (I also added some bold links from the former ‘reader guesses’ post’).

These posts have brought in a lot of readers who are searching Google to find out about Universities in the UK, and had a massive affect on the number of hits the blog has received:

nostuff blog hits 2008 by month

nostuff blog hits 2008 by month

A number of people (mainly outside the UK) have asked for recommendations, especially for particular subjects. I’ve avoiding answering this directly, as I know have no knowledge or experience to be able to answer this. Though have suggested the HERO, and UCAS websites as well as the Guardian and Times Higher Education sections.

I hadn’t predicted this interest from potential students, though it seems so obvious now. Though good to see something I did for my own interest may have been of some use to others.

Open Repositories 2008

As mentioned above,  I attended the Open Repositories 2008 conference. In previous years it had been held across the globe, though this time was held just along the south coast in Southampton. My Uni kindly funded my attendance, so long as I went via the cheapest rail ticket, for two days, with no hotel or expenses. Still it means I got to hear the shipping forecast for the first time ever as I raised early both mornings.

I had only just started using Twitter and was the first time I had blogged (and tweeted?) about conference. In fact probably the first time I had really blogged about work stuff. It sounds very cynical (and I feel cheap saying it), but helped to attract attention of repository/library tech people to the blog.

Conferences

I also attended an event about a project – RIOJA – to look at how an ‘overlay’ journal could be implemented, i.e. if repositories provide a way to access content (including that not published in a traditional journal), then a overlay journal could provide a way to peer review and categorise a subset of the items published in repositories, the ‘overlay journal’ linking to the articles already available online, but by doing so, showing an essential peer review (quality) process has taken place. The day also had speakers talking about over novel journal publishing concepts, and the REF.

It was the first time I had used coveritlive, which Andy Powell has used (very successfully) many times. It was also the first time I had used a t-mobile usb 3G network stick, which I, and others, had persuaded our library to purchase, I set it up on the Mac on the train to Cambridge, and I was glad I did, it meant I had a network connection throughout (power, as ever, was a different issue). My live blog is here, and also on the even less popular Sussex repository blog.

In December I attended an event called ‘Sitting on a Goldmine’ (based on the JISC TILE project), based in London, this was a fantastic day with great speakers and attendees looking at how we can make use of usage data and user generated data to create new services. My write up is here.

And as previously mentioned, I also attended the Mashed Libraries event.

Mobile phones

I posted three articles about mobile phones. About why they are badly named, about the phones I have owned (not that you care) and my musings about getting a new smart phone (I’ve now got a iphone in front of me, it may be common, but my god it is good).

Nostuff, web hosting and wordpress

Jisc Library Management System Review

I found the report with the above name via the Talis blog, found some time to read it, and made some notes, which I randomly decided to store on this blog. Turns out this was quite popular and quite a few accessed it via Google (and via Tom Ropers blog).

Other bits

Reviews

The shorts

Templates and look

In November I looked at some new plug-ins and themes for wordpress. The theme I currently use is Greening, I’ve modified it a bit to increase the font size, show tags used for each post, and add ads. WordPress’ excellent widget system also comes in very handy.

Ads: I’ve had ads since around 2005. So far I have made (on paper) around $30 (I only get the money once I reach $100, which at this rate could take many years!). An added element to this is the large fluctuation between the pound and dollar exchange rate: a few months a go $100 was worth £50, now it is almost £100, so when that cheque gets sent makes a difference!

The ads have always been more a bit of an experiment than hard and fast capitalism (but the extra cash is still appealing). I’ve tried to place them with a balance between not being too annoying (a small one at the top right, some at the bottom, and a few on the left), and hope no one objects too much.

The continuous increase in visits over the last year has seen an increase in click-thrus (which generate the revenue), especially in the last few months with the postings about University rankings in the UK.

Stats

I mentioned stats above. Over the last year visits/hits have been going up month on month.

2008 stats for nostuff.org blog

2008 stats for nostuff.org blog

I collect stats via the WordPress Stats Plugin, via Google Analytics, and some rather basic web server reports. Of coruse they all report different numbers but more or less show the same thing, the table above is from the WordPress stats plugin.

So 2008 was, relatively speaking, a quite good year, just don’t expect the same for 2009!