Apple Automator

Automator on OS X is one of those things I use about once a year, but it always impresses.

Many attempts to use drap and drop to replace programming lead to confusing design. Examples are yahoo pipes, Business Objects, and most of all MS Access.

What’s impressive with Automator is that it always seems it was designed with the very problem you want to solve in mind.

My Problem (apart from the drink)

I use command+shift+4 to get screen grabs a lot. For twitter, for blogs (in fact for this very post), for documents, etc. However I end up with lots of images on my desktop called Picture 1, Picture 2, etc.

I want to keep these, for future use, but want a clutter free desktop.

The problem is, if I try and drag these in to a folder, there will already be a Picture 1, Picture 2 from previous occasions. Leading to annoyingly having to rename every file, or create a sub-directory for each time it runs.

Now I have a simple Automator script.

Picture 1.png

Each time I run it, it moves everything in to a folder, with the create date in front of the name. It was easy to search for ‘actions’ (‘move’, ‘rename’) and browse (‘Files’ -> ‘Find Files’).

People power : twitter is highlighting & affecting important issues.

A few weeks a go (why, the 12th Oct in fact) I was sitting at my laptop during the evening, doing this and that with Twitter ticking away on the right. I glanced at the newest tweets to pop in and noticed one from secretlondon.

200911121603.jpg

Curious, I read the Guardian article it linked to. A gagging order to stop a paper report the proceedings of parliament. This is not very good. I muttered and got back on with the this and that. A few minutes later another tweet from secretlondon came in:

200911121605.jpg

Now we have seeds of information! To Hansard, To Wikipedia, To Google. Who were Trafigura, who were Carter-Ruck?

Soon other tweets were coming along about this, and I was adding my two pence too, re-tweeting the news and adding my own little links to what I was finding.

Hansard provided the details the Guardian couldn’t report, and it quickly became clear what they were trying to hide.

By now twitter was alight. Hashtags came in to usage. Following these produced more information, once someone found something, they didn’t just share with their followers, but with everyone now following those tags. Previous Guardian articles (amongst others) were brought to our collective attention.

Before this I had not heard of Trafigura or Carter-Ruck. I suspect many were the same, yet now we were angry about what we read about their questionable activities (one apparently dumps nasty stuff in Africa, the other boasts about suppressing the press, regardless of truth). A storm was brewing and I felt it had yet to peek. But it was late and sleep beckoned.

The next morning I was curious if there had been any developments over night.

First thing I came across was a Spectator online article (a publication on the other side of the political spectrum to the Guardian). It quoted the Guardian article heavily, but then went on to quote the part of Hansard that contained the question (and company name) that the Guardian could not and provided links. I tip my hat to them. #Trafigura was now trending, celebrity twitterers (including our Lord Steven Fry) were highlighting it and more.

It felt like it was everywhere, on the news, and over in coffee room colleagues were talking about it. The Streisand effect had truly kicked in. Before noon on the 13th the case had been dropped. The Guardian was no longer prevented on reporting on the story.

Five Days later

Five days later a Daily Mail columnist Jan Moir wrote a homophobic piece (since edited) about the very recent death of singer Stephen Gately. A similar thing happens. A storm brews up. Not organised. But distributed little efforts or raising attention, mainly through twitter, which leads to coverage in main stream media and changes to the article and headline.

Black Out and Breaking News

And back in February there was a ‘black out’ campaign because of a proposed repressive internet law in New Zealand. Again, partly due to the coverage on sites like twitter, the section in question was scrapped.

It’s not just campaigns and activism. Breaking news is spread rapidly via twitter, such as the plane crash in New York (Twitter broke the story, and first images of the plane in the water came from Twitter), and Michael Jackson’s death.

But twitter doesn’t always have it’s own way. There was a Green avatar campaign for democracy in Iran, which sadly never saw success.

We’re seeing two things here…

1 – That information is now able to spread much faster than it ever has before. This has always been the case with the Internet, and has increased each year with new technologies (blogs, social networking), but especially with twitter.

2 – That people spreading this information leads to the main stream press reporting on it, and those under pressure back tracking. (I wonder how essential that middle step is?)

Twitter is such a good tool for the first point. It is instantaneous, not just the web, but on computer twitter clients and phones, and messages are public by default (unlike many other social networking sites where they are restricted to a specific groups or trusted circle of friends). Having an Open technical platform (which allows any other website or application to access tweets) also helps.

So…

My instant reaction is that this must be a good thing. When something bad is happening in the world (sorry, that sounds very simplistic) twitter, and other websites, can spread the information quickly and widely, even to those who don’t follow the news each day. This can lead to positive change.

The Trafigura/Carter-Ruck case is a good example of this. Imagine if it had happened 10 years a go. People (well, only Guardian readers) would have read the Guardian front page but not had a clue what it was about. In fact The Guardian may well not have run it as a front page story (or at all) as it would have simply confused/frustrated their readership. The Guardian took a gamble by putting this on to the front page, knowing (hoping) it would then become a story in its own right. It did, probably more than they ever hoped.

Noteworthy information is a virus, once it is in the wild it is unstoppable.

But all is not rosy. It will be slippery slope. I’m reminded of the 1995 film ‘the Last Supper‘. In the film they start off killing of the worst people in society, but as time goes on, things become more complex and grey and less clear cut. The Trafigura case was clear cut. Those trying to stop the BBC putting Nick Griffin on to Question Time, less so. There’s a thin line between the people power righting wrongs and mob rule.

One final example – baby and bump

Last Christmas I came across a news story about a ‘Lapland in the New Forest’. Long story short it was a con. Promised a lot, but was little more than muddy fields and a few fun fair (pay to use) rides, two santas (queue for hours, not allowed to take photos) and the odd tree with fairly lights, with staff who were untrained and the worst possible people to be interacting with kids.

For some reason I looked up to find out more. And I came across a thread on a web based forum called baby and bump (you can guess what it’s for). The thread was the top result on Google so became one of the main exchanges on the web for those affected by this.

The thread starts off with a few excited people discussing going to the Lapland attraction and how excited the kids are, and how much they have splashed out (money they couldn’t afford to throw away). Then those who visited the first few days after opening reported back, while others are in denial that it can be that bad. Then it really starts, more report back, and others start to join the forum simply to add their experience.

Then the fact finding starts: the owners name and address, other business addresses, legal rights, who in the council to complain to, who in the press to contact, how to file a small claims, the owner is related to the leader of Brighton (my) Council!

I like this example. It isn’t the twitterati or tech-savy web2.0 types, but just families on a simple web forum. No one organised anything, but many added bits of info, supported others, or shared their experiences. I would say it very much played it’s part in the early closure of this cruel con. After returning from a horrible day, cold, upset kids, after paying quite a sum upfront, it must feel frustrating and helpless, I think even finding others who have been through the same must be of some help. The Internet can really help in such situations. But it’s not the power of the internet, it’s the power of people. The Internet just acts as a enabling tool.

So Twitter is allowing us to share information and become aware of facts/situations in a way not thinkable until now, and at a very quick speed.

PubSubHubbub instant RSS and Atom

I have just come across PubSubHubbub via Joss Winn’s Learning Lab blog at the University of Lincoln.

It’s a way for RSS/Atom feed consumers (feed readers etc) to be instantly updated and notified when an RSS is updated.

In a nutshell, the RSS publisher notifies a specific hub when it has a new item. The hub then notifies – instantly – any subscribers who have requested the hub to contact them when there is an update.

This is all automatic and requires no special setup by users. Once the feed producer has set up PubSubHubbub, and specified a particular, the RSS feed has an extra entry in the feed itself telling subscribing client systems that they can use a specific hub for this feed. Clients which do not understand this line will just ignore and carry on as normal.Those that are compatible with PubSubHubbub can then contact the hub and ask to be notified when there are updates.

It has been developed by Google, and they’ve implemented it in to various Google services such as Google Reader and Blogger. This should help give it momentum (which is also crucial for this sorts of things). In a video on Joss’ post (linked to above) the developers demonstrate posting an article and showing Google Reader instantly update the article count for that feed (in fact, before the blog software has even finished loading the page after the user has hit ‘publish’). It reminds be of the speed of Friendfeed, I will often see by friendfeed stream webpage update with my latest tweet before I see it sent from twirl.

I’ve installed a PubSubHubbub WordPress plugin for this blog. Let’s hope it takes off

UPDATE: I’ve just looked at the source of my feed ( http://www.nostuff.org/words/feed/ ) and saw the following line:

<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>

Amazon AWS EC2 and vufind

Today I saw a tweet from juliancheal, which mentioned he was setting up his virtual server on slicehost. I hadn’t heard of this company but their offering looks interesting. This got me thinking about cloud hosting and I decided it was time to actually try out Amazon’s AWS EC2.This allows you to run a virtual server (or multiple servers) in the Amazon cloud, servers can be created and destroyed by a click of a button.

First thing is to get a server ‘image’ to run in the cloud. Thankfully many have already been created. I went for a recent ubuntu server by Eric Hammond. This is basically a ubuntu server vanilla install, but with a few tweaks to work nicely as a EC2 virtual server. Perfect!

Signing up is quick and easy, it just uses your Amazon (the shop) credentials. Once created, you are taken back to the main control panel where you can see your new instance, including details like the all important public DNS name.  Just save a private key to your computer and use it to ssh in to your new server.

e.g.: ssh -i key1.pem root@ec2-174-129-145-xx.compute.amazonaws.com

(you may need to chmod 400 the key file, but all this is documented)

Once in, well it’s a new server, what do you want to do with it?

I installed a LAMP stack (very easy in ubuntu: apt-get update and then tasksel install lamp-server). I initally couldn’t connect to apache (but could from the server itself using ‘telent localhost 80’). I presumed it was a ubuntu firewall issue, but it turned out you also control these things from the AWS control panel. The solution was to go to ‘security groups’ and modify the group I had created when setting things up and adding HTTP to ‘Allowed Connections’. This couldn’t of been easier. And then success, I could point my browser at the DNS name of the host and saw my test index page from the web server.

Amazon aws control panel, modify to allow http connections

So now what? I pondered this out loud via Twitter, and got this reply:

vufind-twitter

Excellent idea!

Good news: vufind has some good – and simple – documentation for installing on ubuntu;

http://vufind.org/wiki/installation_ubuntu

Following the instructions (and editing them as well, they specified an earlier release and lacked a couple of steps if you weren’t also reading the more general install instructions) I quickly had a vufind installation up and running. Took around 20-25 minutes in all.

Now to add some catalogue data to the installation. I grabbed a MARC file with some journal records from one of our servers at work and copied it across as a test (I copied it just by using a scp command logged in to my ec2 server). After running the import script I had the following:

vufind results.If the server is still running when you read this then you can access it here:

http://ec2-174-129-145-75.compute-1.amazonaws.com/vufind/

EC2 is charged by the hour, and while cheap, I can’t afford to leave it running for ever. :)

So, a successful evening. Mainly due to the ease of both Amazon EC2 and Vufind.

A final note that if you are interested in EC2 you may want to look at some notes made by Joss Winn as part of the jiscpress project: http://code.google.com/p/jiscpress/wiki/AmazonWebServices

Both Ec2 and vufind are worth further investigation.

Linked data & RDF : draft notes for comment

I started to try and write an email about Linked Data, which I was planning to send to some staff in Library I work in.

I felt that it was a topic that will greatly impact on the Library / Information management world and wanted to sow the seeds until I could do something more. However after defining linked data, I felt I should also mention RDF and the Semantic Web, and also try and define these. And then in a nutshell try and explain what these were all about and why they are good. And then add some links.

Quickly it became the ultimate email that no one would read, would confuse the hell out of most people and was just a bit too random to come out of the blue.

So I think I will turn it in to a talk instead.

This will be quite a challenge, I’ve read a bit about all this, and get the general gist of it all, though don’t think I have a firm foundation of what RDF should be used for or how exactly it should be structured, and what bits of all this fall under the term ‘Semantic Web’. There’s a big difference to hearing/reading various introductions to a subject and being able to talk about it with confidence.

Anyway, below was the draft of the email, with various links and explanations, which turns in to just a list of links and notes as the realisation pops in that this will never be coherent enough to send.

If you know about this stuff (and good chance you know more than me), please comment on anything you think I should add, change, or perhaps where I have got things wrong.

I will probably regret putting a quick draft email, mistakes and all, on the web, but nevertheless, here is the email:

You have probably heard the phase ‘linked data’, and if not, you will do in the future.

It’s used in conjunction with RDF (a data format) and the Semantic Web.

“Linked Data is a term used to describe a method of exposing, sharing, and connecting data on the Web via dereferenceable URIs.”

en.wikipedia.org/wiki/Linked_Data

It’s the idea about web pages (URLs) describing specific things. E.g. a page on our Repository describes (is about) a particular article. A page on the BBC describes the program ‘Dr Who’, another describes the Today program. A page on our new reading list system describes a list.

I know what these pages contain because I can look at them. But what if this was formalised so that systems (computers) could make use of these things, and extract information and links from within them?

Two companies prominent in this area are Talis and the BBC. The University of Southampton is also doing work and research around the semantic web (no coincidence Tim Berners-Lee worked there).

The following link is a diagram which is being used a lot in presentations and articles (and there’s a good chance you will see it crop up in the future):

http://www4.wiwiss.fuberlin.de/bizer/pub/loddatasets_2009-03-05_colored.png

Why should we care about all this?

Look at the image: Pubmed, IEEE, RAE, eprints, BBC and more. These are services we – and out users – use. This isn’t some distant technology thing, it’s happening in our domain.

Why should we care? (part 2)

Because when I searched for RDF I came across a result on the Department for Innovation, Universities and Skills (the one that funds us)…  It was a page called ‘library2.0’ ( that sounds relevant)…  The first link was to *Sussex’s* Resource list system… we’re already part of this.

http://sn.im/glr45

(for those confused, The DIUS site simply shows things bookmarked at Delicious matching certain tags, someone from Eduserv while at UKSG had bookmarked our Resource List system as it was being used as an example of RDF/linked data, as UKSG was quite recent this link is at the top)

As I mention our Resource lists…

For each of our lists on Talis Aspire, there is the human readable version (html) and computer readable versions, eg

In HTML:

http://liblists.sussex.ac.uk/lists/p3030.html

In RDF/XML:

http://liblists.sussex.ac.uk/lists/p3030.rdf

In JSON (don’t worry what these mean, they are just computer readable versions of the same info):

http://liblists.sussex.ac.uk/lists/p3030.json

And in the same way for each BBC program/episode/series/person there is a webpage, and also a computer readable – rdf – version of the page page , e.g. The Today program:

http://www.bbc.co.uk/programmes/b006qj9z

http://www.bbc.co.uk/programmes/b006qj9z.rdf

There is tons of stuff I could point to about the BBC effort, here are some good introductions

http://www.bbc.co.uk/blogs/radiolabs/2009/04/brands_series_categories_and_t.shtml

http://blogs.talis.com/nodalities/2009/01/building-coherence-at-bbccouk.php

I saw Tom Scott give this presentation at the  ‘confused’ Open Knowledge Conference:

http://derivadow.com/2009/03/31/linking-bbccouk-to-the-linked-data-cloud/

http://www.bbc.co.uk/blogs/radiolabs/2008/07/music_beta_and_linked_data.shtml

A Paper on BBC/DBpedia with good introduction on background to /programmes

http://www.georgikobilarov.com/publications/2009/eswc2009-bbcdbpedia.pdf

RDF

===

RDF is just a way of asserting facts (knowledge)

“RDF is designed to represent knowledge in a distributed world.”

(from http://rdfabout.com/quickintro.xpd)

“RDF is very simple. It is no more than a way to express and process a series of simple assertions. For example: This article is authored by Uche Ogbuji. This is called a statement in RDF and has three structural parts: a subject (“this article”), a predicate (“is authored by”), and an object (“Uche Ogbuji”). “

(from http://www.ibm.com/developerworks/library/w-rdf/)

Example:

Imagine a book (referred by its ISBN).

Our catalogue asserts who wrote, and also asserts what copies we have

Amazon assert what price they offer for that book.

OCLC assert what other ISBNs are related to that item

Librarything assert what tags have been given for that item.

All these assertions are distributed across the web

But what if one system could use them all to display relevant information on one page, and created appropriate links to other pages.

#######################

Things to include…

http://en.wikipedia.org/wiki/Ontology_(computer_science)

http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data

http://www.slideshare.net/ostephens/the-semantic-web-1336258

http://vocab.org/aiiso/schema

(created for talis aspire – i think)

Tower and the cloud (not really linked data):

http://www.worldcat.org/oclc/265381796

Semantic web for the working ontologist (we have a copy):

http://beta.lib.sussex.ac.uk/ABL/?itemid=|library/marc/talis|991152

Freebase http://www.vimeo.com/1513562

Fantastic YouTube video from Davos by Tom Ilube:

http://www.youtube.com/watch?v=k_zoEeWOBuo

Any pointers to good descriptions/explanations of RDF? (I think this is the most difficult area). Clearly this is mainly a set of links, and not a talk, but I will probably use this as a basis of what I will try and say.
All comment welcome.

Academic discovery and library catalogues

A slightly disjointed post. The useful Librarytecholgy.org website by Marshall Breeding announced the eXtensibleCataloge project has just released a number of webcasts in preparation for their software release later this year.

eXtensible Cataloge webcast screenshot
eXtensible Cataloge webcast screenshot

I’ve come across this project before, and put a little simply, is in the same field as the Next Generation Catalogues such as Primo, Aquabrowser and VuFind.

However where these are discreet packages, this seems like a more flexible set of tools and modules, and a framework which libraries can build on. I didn’t manage to watch all the screencasts but the 30mins or so that I watched was informative.

As an aside, while the screen consisted of a powerpoint presentation the presenter appeared in a small box at the bottom, and watching him speak oddly made listening to what was being said more easily digestible (or perhaps just gave my eyes something to focus on!).

This looks really interesting, and will be good to see how this compares to other offerings, certainly looks like they are taking a different angle, and perhaps the biggest question will be how much time will it take to configure such a flexible and powerful setup (especially with the small amount of technical staff found in most UK HE Libraries). Anyway, worth checking out, using various metadata standards and using – amongst others – SOLR and Drupal as a base.

While on the eXtensible Cataloge website I came across a link to this blog post from Alex Golub (Rex) an ‘adjunct assistant professor of anthropology at the University of Hawai’i Manoa‘. It talks about a typical day as he Discovers and evaluates research and learn about others in the same academic dicipline. Again, well worth a read.

It starts off with an email from Amazon.com recommending a particular book. He notes:

In exchange for giving Amazon.com too much of my money, I’ve trained it (or its trained me?) to tell me how to make it make me give it more money in exchange for books.

It doesn’t take a genius to see that the library catalogue could potentially offer a similar service. A Library Catalogue would be well placed to build up a history of what you have borrowed and produce a list of recommend items. But would this only suggest items your library has, and would it be limited by the relatively small user base; if there are only a few academics/researchers with a similar interest then this will be of limited use in producing books you may be interested in (i.e. serendipity).

This is where the JISC TILE project comes in (and I blogged about an event I attended about TILE a few months a go). If we could share this data at a national level (for example) we could create far more useful services, in this case it could draw on the borrowing habits of many researchers in the same field, and could – if you wish – recommend books not yet in your own Library. As well as the TILE project, Ex Libris have announced a new product called bx which sounds like it will do a similar thing with journals.

Another nugget from the blog post mentioned above is that he uses the recommendations & reviews in Amazon as a way to evaluate the book and its author:

So I click on the amazon.com link and read more reviews, from authors whose work I know and respect.

I’ve been discussing with colleagues the merits and issues with allowing user reviews in an academic library catalogue. I hadn’t considered a use such as this. Local reviews would have been of limited use as other authors in the same field that a researcher respects (as he describes in the quote) are likely to be based at other institutions (and we would be naive to expect such a flood of reviews to a local system that every book had a number of good reviews). Again, maybe a more centralised review system is needed for academic libraries, though preferably not one which requires licensing from a third party at some expense!

And briefly, while we are talking about library catalogues. I see that the British Libraries ‘beta catalogue‘ (running on Primo) has Tag functionality out the box, and I’m pleased to see they have been this quite a central feature, with a ‘tag’ link right about the main search box. This link takes you to a list of the most frequently used and most recently added tags. Creating a new way to browse and discover items. What I love about the Folksonomy approach is that so often users find ways of using tags in ways you would never expect. For example, would a cataloger think to record an item in a museum as ‘over engineered‘? (I think the answer would be no, but it occurs to me I know nothing regarding museum cataloging standards). Could finding examples of over engineered items be useful for someone? of course! (from the Brooklyn Museum online collections, found via Mike Ellis’ excellent Electronic Museum blog). The Library of Congress on flickr pilot springs to mind as well.

So I guess to conclude all this, the quest continues in how we can ensure libraries (and their online catalogues and other systems) provide researchers and users with what they want, and use technology to enable them to discover items that in the past they might have missed.

short urls, perl and base64

One of my many many many faults is coming up with (in my blinkered eyes – good) ideas, thinking about them non-stop for 24hours, developing every little detail and aspect. Then spending a few hours doing some of the first things required. then getting bored and moving on to something else. Repeat ad nauseum.

Today’s brilliant plan (to take over the world)

Over the weekend it was ‘tinyurl.com’ services and specifically creating my own one.

I had been using is.gd almost non-stop all week, various things at work had meant sending out URLs to other people both formally and on services like twitter. Due to laziness it was nearly always easier to just make another shortURL for the real URL in question than to find the one I made earlier. It seemed a waste. One more short code used up when it was not really needed. The more slap-dash we are in needlessly creating short URLs, the quicker they become not-so-short URLs.

Creating my own one seemed like a fairly easy thing to do. Short domain name, bit of php or perl and a mysql database, create a bookmarklet button etc.

Developing the idea

But why would anyone use mine and not someone elses?

My mind went along the route of doing more with the data collected (compared to tinyurl.com and is.gd). I noticed that when a popular news item / website / viral come out, many people will be creating the same short URL (especially on twitter).

What if the service said how many – and who – had already shortened that URL. What if it made the list of all shortened URLs public (like the twitter homepage). The stats and information that could be produced with data about the urls being shortened, number of click throughs, etc, maybe even tags. Almost by accident I’m creating a bookmarking social networking site.

This would require the user to log in (where as most do not), not so good, but this would give it a slightly different edge to others, and help fight spam, and not so much of a problem if users only have to log in once.

I like getting all wrapped up in an idea as it allows me to bump in to things i would not otherwise. Like? like…

  • This article runs through some of the current short URL services
  • The last one it mentions is snurl.com, I had come across the name on Twitter, but had no idea it offers so much more, with click-thru stats and a record of the links you have shortened. It also has the domain name sn.im (.im being the isle of man). Looks excellent (but they stole some of my ideas!)

    snurl.com
    snurl.com
  • Even though domains like is.gd clearly exist, it seems – from the domain registrars I tried – that you can not buy two digit .gd domains. though three letter ones seem to start from $25 a year.
  • the .im domain looked like it could be good. But what to call any potential service??? Hang-on… what about tr.im! what a brilliant idea. fits. genius. Someone had, again, stolen my idea. besides, when I saw it could be several hundred pounds other top level domains started to look more attractive
  • tr.im mentioned above, is a little like snurl.com. looks good, though mainly designed to work with twitter. Includes lots of stats. Both have a nice UI. Damn these people who steal my ideas and implement them far better than I ever could. :)
  • Meanwhile…. Shortly is an app you can download yourself to run your own short url service.
  • Oh and in terms of user authentication php user class seemed worth playing with.
  • Writing the code seemed fairly easy, but how would I handle creating those short codes (the random digits after the domain name). They seem to increment while keeping as small as possible.
  • Meanwhile I remember an old friend and colleague from Canterbury had written something like this years a go, and look! he had put the source code up as well.
  • This was good simple perl, but I discovered that it just used hexadecimal numbers as the short codes, which themselves are just the hex version of the DB auto-increment id. nice and simple but would mean the codes become longer more quickly than other algorithms.
  • I downloaded the script above and quickly got it working.
  • I asked on twitter and got lots of help from bencc (who wrote the script above) and lescarr.
  • Basically the path to go down was base64 (i.e. 64 dgits in a number system, instead of the usual 10), which was explained to me with the help of a awk script in a tweet. I got confused for a while as the only obvious base64 perl lib actually converts text/binary for MIME email, and created longer, not shorter, codes than the original (decimal) id numbers as created by the database.
  • I did find a cpan perl module to convert decimal numbers to base64 called Math::BaseCnv. Which I was able to get working with ease.
  • It didn’t take long to edit the script from Ben’s spod.cx site, and add the Base64 code so that it produced short codes using all lower case, upper case and numbers.
  • you can see it yourself – if I haven’t broken it again – at http://u.nostuff.org/
  • You can even add a bookmarklet button using this code
  • Finally, something I should have done years a go, and setup mod_rewrite to make the links look nice, e.g. http://u.nostuff.org/3

So I haven’t built my (ahem, brilliant) idea. Of course the very things that would have made it different (openly showing what URLs have been bookmarked, by who, and how many click throughs, and tags) were the very thing that would make it time consuming. And sites like snurl.com and tr.im had already done such a good job.

So while I’m not ruling out creating my own really simple service (and infact u.nostuff.org already exists) and I learned about mod_rewrite, base64 on cpan, and a bunch of other stuff, the world is spared yet-another short URL service for the time being.

IMAP5 : or developments that should happen in email

When I first started work I was responsible for a UNIX server which acted as a departmental mail server. I remember in my inexperience randomly deciding to upgrade from a IMAP2 to IMAP4, with zero testing or consultation with the organisation email system administrator, luckily it worked!

Since then IMAP has stood still. In fact most internet protocols and RFCs have stood still, but that’s a different blog post.

It seems to me that it is in need of updating. I confess I am no longer a email administrator nor do I keep up with the ’email standards’ world.

I think any new additions to the IMAP standard should be optional and email clients and servers should be able to handle talking the current/earlier versions of the protocol.

I also think any new version should handle a efficient negotiation of what the server AND client can handle, that is to say: what features they want to (or can)  use.

What should a new version contain?

  • a ‘sync’ option, for working offline, and then sync’ing with the server, with the server handling what needs updating.
  • Address book : option feature. Allows contacts to be stored on server. Specifically the same server (and settings) as the email IMAP server. There are already addressbook servers but this would be the same as the imap server: same protocol, same ssl/security. A user can log in from anywhere and get the same address book. (of course the server is free to act as a proxy where it reads/write the addressbook details from a specific addressbook server, but the important thing is that the client works with one server).
  • Filters. Yes many MTA (email delivery) agents, such as exim, can have filters. But you normally have to log in to the server and know their syntax to set these up. The other current option is to use a client like Thunderbird open with filters set. I keep my work PC running 24/7 to filter mail-list emails and move stuff to ‘Junk’. But I would like to see an optional addition to the IMAP protocol which allows clients (like Thunderbird) to set the filtering at the server. The user uses the client interface, but the filtering happens at the server end. I see this as one of the most pressing needs of an email protocol.
  • As mentioned above, a IMAP5 (or whatever) should negotiate which of the above it can handle, if it doesn’t do ‘addresses/contacts’ for example, fine, the client reverts to storing this locally. And of course, it is up to the software software and administrator if the server stores filters/addresses locally or acts as a proxy for another application.
  • Email sending: this may be the most controversal suggestion. At the moment you need to setup two protocols within a mail client, an incoming mail (IMAP/POP) and a outgoing, both with ports, security, encryption etc. Other internet protocols, like usenet, can handle both. Why should IMAP handle both. Accept the same as SMTP, and simple pass it on to a defined SMTP client. It means the user only has to setup and authenticate against one server.
  • Handle POP. This may sound odd, but what if a client said ‘Hello IMAP server, do you speak POP’, and if it gets a positive answer a POP exchange takes place. Why? Some users/clients may prefer it. Yet by letting IMAP handle it the email server only needs to be listening on one port and run one server. Ideally the IMAP server would recognise when a client is atarting a POP session and automatically revert to POP-mode. i.e. you could use an old POP client which has been told to connect to the IMAP server port and the IMAP server recogines the client only talks ‘POP’ and reverts to acting like a POP server.

Finally I would like to see an ‘auto configure’ option for email. Thing about it. If I tell my email client that my email address is cjk@company.co.uk then it should be possible for my email client to query company.co.uk for server settings for user ‘cjk’, in the same way that a client can get DNS or MX information for a domain. I can not see that (though I am fairly ignorant on the matter) specifying server details (IMAP, security level, port, etc) via a protocol for a user account should not be a security risk (though it may be definition show that an email address exists), and will mean that a user just has to provide their  email address and password to setup Outlook Express or Thunderbird.

Thinking out loud here. If this is stupid, or if I have missed something, let me know in the comments.

webpad : a web based text editor

So I have WordPress (and in fact Drupal, Joomla, mediawiki, Moodle, damn those Dreamhost 1-click installs) as a way of running my website.

But there are still many pages which are outside of a content management system. Especially simple web app projects (such as ircount and stalisfield) and old html files.

It can be a pain to constantly ftp in to the server, or use ssh. Editing via ssh can be a pain, especially over a dodgy wireless connection, or when you want to close the lid to your macbook.

But trying to find something to fit this need didn’t come up with any results. Many hits were either tinyMCE clones which are WYSIWYG html editors that convert input in to html, do good for coding.

Webpad screenshot
Webpad screenshot

Until I came across Webpad. It not only suited my needs perfectly, but it is well designed and implemented.

After a quick install (more or less simply copying the files), you simply enter a specified username and password, and once authenticated you are presented with a line of icons at the top. Simple select the ‘open’ icon to browse to the file you wish to edit on your web server and you’re away!

It’s simple, yet well written and serves its purpose well. If there was one thing I would I suggest for future development it would be improved file management functionality. You can create directories and delete files from the file open dialog box. But I can’t see a way to delete directories, or move/copy files. Deleting directories is of use, as many web apps (wikis, blogs, cms) require you to upgrade the software, edit a config file, and then delete the install directory, or similar.

Oh, and it’s free!

Check out webpad by Beau Lebens on dentedreality.com.au

Mashed Libraries

Exactly a week a go I was coming home from Mashed Libraries in London (Birkbeck).

I wont bore you with details of the day (or more to the point, I’m lazy and others have already done it better than i could (of course, I should have made each one of those words a link to a different blog but I’m laz… or never-mind)).

Thanks to Owen Stephens for organising, UKOLN for sponsoring and Dave Flanders (and Birkbeck) for the room.

During the afternoon we all got to hacking with various sites and services.

I had previously played around with the Talis Platform (see long winded commentary here, got it seems weird that at the time I really didn’t have a clue what I was playing with, and it was only a year a go!).

I built a basic catalogue search based on the ukbib store. I called it Stalisfield (which is a small village in Kent).

But one area I had never got working was the Holdings. So I decided to set to work on that. Progress was slow, but then Rob Styles sat down next to me and things started to move. Rob help create Talis Cenote (which I nicked most of the code from) and generally falls in to that (somewhat large) group of ‘people much smarter than me’.

We (well I) wanted to show which Libraries had the book in question, and plot them on a Google Map. So once we had a list of libraries we needed to connect to another service to get the location for each of these libraries. The service which fitted this need was the Talis Directory (Silkworm). This raised a point with me, it was a good job there was a Talis service which used the same underlying ID codes for the libraries i.e. the holdings service and the directory both used the same ID number. It could have been a problem if we needed to get the geo/location data from something like OCLC or Librarytechnology.org, what would we have searched on? a Libraries name? hardly a reliable term to use (e.g. The University of Sussex Library is called ‘UNIV OF SUSSEX LIBR’ in OCLC!). Do Libraries need a code which can be used to cross reference them between different web services (a little like ISBNs for books)?

Using the Talis Silkworm Directory was a little more challenging than first thought, and the end result was a very long URL which used SPARQL (something which looks a steep learning curve to me!).

In the mean time, I signed up for Google Maps, and gave myself a crash course in setting it up (I’m quite slow to pick these things up). So we had the longitude and latitude co-ordinates for each library, and we had a Google Map on the page, we just needed to connect to the two.

Four people trying to debug the last little bit of code for my little project
Four people at Mashedlibrary trying to debug the last little bit of my code.

Time was running short, so I was glad to take a back seat and watch (and learn) while Rob went to in to speed-javascript mode. This last part proved to be elusive. The PHP code which was generating javascript code was just not quite working. In the end the (final) problem was related to the order I was outputting the code, but we were out of time, and this required more than five minutes.

Back home, I fixed this (though I never would have known I needed to do this without help).

You can see an example here, and here and here (click on the link at the top to go back to the bib record for the item, which, by the way, should show a Google Book cover at the bottom, though this only works for a few books).

You can click on a marker to see the name of library, and the balloon also has a link which should take you straight to item in question on the library’s catalogue.

It is a little slow, partly due to my bad code and partly due to what it is doing:

  1. Connecting to the Talis Platform to get a list of libraries which have the book in question (quick)
  2. For each library, connect to the Talis Silkworm Directory and perform a SPARQL query to get back some XML which includes the geo co-ordinates. (geo details not available for all libraries)
  3. Finally generate some javascript code to plot each library on to a Google map.
  4. As this last point needs to be done in the <head> of the page, it is only at this point that we can push the page out to the browser.

I added one last little feature.

It is all well and good to see which libraries have the item you are after, but you are probably iterested in libraries near you. So I used the Maxmind GeoLite City code-library to get the user’s rough location, and then centering the map on this (which is clearly not good for those trying to use it outside the UK!). This seems to work most of the time, but it depends on your ISP, some seem more friendly in their design towards this sort of thing. Does the map centre on your location?