Research in the Open: How Mandates Work in Practice

Today I’m at the RSP/RIN Research in the Open: How Mandates Work in Practice at the impressive RIBA 66 Portland Place.

Slides can be found here (not available when I made this post, as semi excuse as to why my notes miss so much). These are rough notes, which I’m making available in case others are interested, apologies for mistakes and don’t take it as gospel!

After an introduction by Stéphane Goldstein, kicking off with Robert Kiley from the Wellcome Trust.

Wellcome trust mandate since 2006, anyone receiving funding from Wellcome Trust must deposit in to pubmed, now uk pubmed central. SHERPA Juliet lists 48 funder policies/mandates.

Two routes to complying to their mandate: (route 1) publisher in open access / hybrid journal (preferred), Wellcome will normally pay any associated fees. However when paying the publisher, they expect a certain level of service in return (deposited on behalf of author, final version available at time of publication, certain level of re-use. Route 2 Author self-archives author’s final version within 6 months of publication. It was stressed that the first option is very much preferred.

“Publication costs are legitimate research costs”. To fund Open Access fees for ALL research they fund would, they estimate. take up 1-2% of their budget.

Risk of ‘Double payment’ (author fees and subscriptions). OUP have a good model here.

Still to do:

  • Improve compliance (roughly 33%, significant increase after letters to VCs),
  • improve mechanisms (Elsevier introduced OA workflow which resulted in significant increase in deposits, but funders/institutions/publishers all need to play a part here),
  • Clarifying Publishers OA Policies  (and re-use rights, didn’t catch this).

Research Councils UK – Astrid Wissenburg, ESRC

Starts of by talking about drivers for OA in the RC. Value for money, ensuring research is used, infrastructure and more.

Principles: Accessible, Quality (peer review), preservation (she’s moving through the slides fast)

April 2009 study in to OA impact, provides options for RC to consider.Findings

  • Significant shift in favour of OA over last decade
  • Knowledge/awareness still limited. Confusion
  • Engagement with OA varies by subject area.
  • Too early to access impact of RCs policies.
  • Drivers
    • Not speed of dissemination
    • principles of free access
    • co-authors views are a big influence (mandates less so!)
    • some evidence that OA increases citation just after publication
    • limited compliance monitoring by finders
    • concern about impact of learned societies (but no evidence of libraries cancelling journals)
    • little evidence of use by non-researchers (CJK comment: interesting, I would imagine this may grow, wish newspapers would link/cite journal articles)

Both models (oa journals/repositories) supported by RCs, level playing field.

Pay to publish findings: limited use, barriers, costs, awareness, not RAE. would lead to redistribution of costs from non-academic to academic areas.

OA Deposit (repositories): from grant application from 1 Oct 2006, so a three year project starting then will only be finished in Autumn 2009. Acknowledges embargos but ‘at earliest opportunity’.

75% researchers were not aware of the mandate. diversity across subjects. ‘In general, no active deposit’.

A slide showing % of awareness broken down by RC, interesting.

From the highest level RCs are committed to supporting OA (this will increase). But change takes time.

Some issues: what do to with embargo periods, difficult for funders to manage (are there incentives we could use), depends on existence of repositories, multiple deposit options confusing to researchers, awareness/understanding.

UKPubMed Central – Paul Davey, Engagement Manager, UKPubMed Central

Aims to become the information resource of choice for biomedical sector.

Principles: freely available, added to UK pubmed central, freely copied and reused.

Departmental of Health have clear policy to make research freely available.

95% of papers submitted are taken care of (deposited?) by the authors. only 0.5% submitted by academics (PIs/colleagues)

1.6 milion papers in uk pubmed central. 366 thousand downloads last month.

Core benefits: transparency, cutting down duplication, greater visibility.

Text mining, grabbing key terms from an article  (a little like  OpenCalais does)

Mentions EBI’s CiteXplore, encouraging academics to ink to other research.

Pubmed UK includes funding/grant facilities search. Can link articles to funding grants.

In short, backing from key funders, will make researchers more efficient, researcher’s visibility will increase.

Beta out in the Autumn, new site in Jan 2012.

Questions:

Worried about text mining, need for humans to moderate this. response: Limited finding in this area so human intervention also limited. really need specialist to answer this fully.

Question about increasing visibility of UK pubmed central, referring to Google, response: getting indexed by Google very much part of increasing visibility.

Question about Canadian ‘pubmed central’, response confirms this and mentions talk of a European pubmed central. Potential of European funders using UK pubmed central as a place to deposit research (like everything here, not sure if I’ve noted this right).

PEER – Pioneering collaboration between publishers, repositories and researchers – Julia Wallace

Funded by EC, not a ‘publisher project’.

Three key stages of publication: NISO Author’s original, NISO Accepted Manuscript, NISO version of record.

Starts of talking about the project, interesting stuff but failed to take notes.

From the website:

PEER (Publishing and the Ecology of European Research), supported by the EC eContentplus programme, will investigate the effects of the large-scale, systematic depositing of authors’ final peer-reviewed manuscripts (so called Green Open Access or stage-two research output) on reader access, author visibility, and journal viability, as well as on the broader ecology of European research. The project is a collaboration between publishers, repositories and researchers and will last from 2008 to 2011.

Seven members: including a publisher group, university, funders etc. Various publishers involved, big and small and about six European repositories taking part.

Approach / content:

  • Publishers contribute 300 journals, plus control
  • Maximises deposit and access in participating repositories
  • 50% publisher submitted 50% author submitted.
  • Good quality, range of impact factors. Publishers set embargo periods, up to 36 months.

Publishers will deposit articles in to the repositories via a central depot for their 50% of articles submitted (50% fulltext, metadata for the remaining 50%). Publishers will invite authors to deposit for the ‘author’ 50%

Technical: using PDFA-1 (where possible) and SWORD

Three strands: Behaviour, Usage (looking at raw log files), Economic. Also looking at Model Development (the three strands will look in to this).

Question about why they chose PDF (not very good for text mining). A: wide range of subjects and publishers means that PDF the best fit

Economic Implications of Alternative Scholarly Publishing Models, also Loughborough University’s Institutional Mandate – Charles Oppenheim, Loughborough University

‘Houghton report’ looks at costs and benefits of scholarly publishing.

Link to report http://hdl.handle.net/2134/4137

Link to main website and models http://www.cfses.com/EI-ASPM/

  • Massive savings by using OA, UK would benefit from this.
  • Savings include: quicker searching, less negotiations, savings not just in library budgets
  • 2,300 activity items costed.
  • This report currently final word in economics of OA.
  • Charles Talks about the various methods and work involved in producng this report.
  • a 5% incease in accessibility would lead to savings (or extra money to spend) in research/he/RCs
  • Hard to compare UK toll/open access publishing costs as one pays for UK access to content from across the world, the other pays for UK content to be world wide accessible.
  • Keen to role this out to other countires
  • Publishers response to report: furious!

Now for something completly different: Loughborough approve a mandae a few months a go, to come in to affect Oct 09. An intergral part of academic personal research plans (only those research items in the IR will be considered at the review). Now have over 4,000 items

Lunch and audioboo

During lunch I did an experiment using audioboo. Would I be able to summarise the morning, on the fly with no planning, in a brief audio recording. The answer, as you can discover, is ‘no’, but fun to try, and made me think of what I had taken in during the morning. Link to audioboo recording. or try the embedded version below.

Institutional Mandates – Paul Ayris, University College London

Paul starts off by shoing a number of Venn diagrams, for example: 90% of its research is available online, 40% available to an NHS hospital

What do UCL academics want

  • as authors: visbility / impact
  • as readers: access
  • delivery 24×7 anywhere

UCL madate, a case study:

Looking global is an important part of UCL (for PIs rankings etc). Number of systems in their publication system: Symplectic, IRIS, eprints, data mart (and portico, FIS, HR). Symplectic (or similar tool) and IRIS seem central in this model. Plan to automatically extra metadata from other external places (publication repositories.

How did they get the mandate? Paul spoke at UCLs senate (Academic Board), the agreed: all academic staff should record they own publication on a UCL publication system, and, teaching materials should all be deposited in their eprints systems.

UCL are going to set up a publication board to over see the OA rollout; to advise, monitor, oversee presentation and more.

Next steps: market/exploit, set standards for online publication, to advise on ongoing resource issues in this area. Also, establish processes, Statistics and management information, advise on multimedia, copyright issues.

‘Open Access is the natural way for a global university to achieve its objectives’

Question about blurring the line between dissemination and publication, and that some of UCLs aims seem more fitting of ‘publication’. Paul agrees, still trying to figure this out.

HEFCE – Paul Hubbard, Head of Research Policy, HEFCE

Policy: Research is a process which leads to insights for sharing. So Scholarly Communication matters to HEFCE. Prompt and accessible publishing is essential for a world class research system.

Supporting research: JISC, RIN, Programmes to support national research libraries (UKRR), UKRDS. Mentions Boston Spa (BL) document centre as an example of our world class sharing.

Internet opens up new ways of scholarly communication and sharing.

What do HEFCE want to see:

  • Widest and earliest dissemination of public research.
  • IP shared effectively with the people best placed to exploit it (CJK comment, i don’t think it is publishers!)

Committed to: UK maintaining world leading research, funding that fosters autonomy and dynamism, research quality assessment regime that supports rather than inhibits new developments.

As we move forward, things may be unclear those HEIs with repositories will be at an advantage.

Paul finishes up with a personal view of scholarly communications in 2030. He sees to forms of communication: discussion (building up ideas), and writing up a formal firm idea/conclusion based on these. HEFCE supports – through the likes of JISC – a range of tools and systems to enable this. (sorry that was an awful summary, he said much more than that!).

Answered a question as to why IRs, HEIs are the places to administrate/manage. Websites people go to see research for a particular subject need to be overlay systems harvesting from IRs.

[hmm, does ‘university requirement’ sound better than mandate?]

Institutional Policies and Processes for Mandate Compliance – Bill Hubbard, SHERPA, University of Nottingham

99.9% of academics do not object to Open Access, but need to show it will not change how they work. Librarians going to be much more part of the research process. Most people (including most publishers) are in favour of Open Access.

Other pressures on the systems, lack of peer reviewers, rising prices of journals, growing need for different forms of scholarly communications (e-lab books, multimedia), public demand for highest value for money ‘public should get what they pay for’,

Not if we change, but how we change. Research has to change seamlessly. Mandates have a value-added basis with fast delivery of benefits. Need integrated processes, need integrated support (we don’t want researchers to hear different messages from their Uni, funder, publisher, etc).

Authors need to know ‘what do i meed to do’. Need to make it less confusing, need to make it clear when they can get help.

First step compliance: how can funders improve compliance, how can authors be supported?

All 1994 and Russel Group now have IR (Reading, I think, just setting one up now).

Compliance for mandates makes it better for us admin/support staff, and for the University generally.

Institutions need a compliance officer (perhaps repository manager). Funders need to ensure these people have the information they need. Share compliance information

I’ve missed so much of Bill’s talk here, he moves fast (and passionately) and lots of points.

After Bill’s talk there was a panel session.

Twitter

Finally check out some of the useful tweets from the day. (Twitter search only goes back about a month or so, so this link may not work after a certain date). Jim Richardson also created a permanent copy with the (new to me) webcitation website.

Conclusions

With such dodgy note taking I feel some concise summary is in order!

  • Mandates are happening, by Universities and by Funders.
  • HEFCE want research to be accessible to as many as possible as quickly as possible.  Coming from HEFCE, this holds a lot of weight.
  • Funders (Research Councils / Wellcome) put mandates in place several years a go. They have not sat back and said ‘job done’. They are building on this foundation. How can they check? How can they enforce/encourage? How can they assist? How can they automate? How can they work with publishers and HE to share this information? Expect more to come in this area.
  • Wellcome Trust prefers submission to Open Access Journals rather than author depositing in to a repository at a later date.
  • HE Mandates are coming, we alreay have a few in the UK. Making them an intergral part of an academic’s review seems like a good idea. My opinion is that this is reasonable – even if there are those who disagree – surely an employer can (and does in every other sector) ask for a record of what an employee has been working on, and a copy of the end output, i.e. the full text in an IR.
  • The report ‘Economic implications of alternative scholarly publishing models : exploring the costs and benefits. JISC EI-ASPM Project‘ is a thourough comprehensive look at the economic costs of Open Access and new forms of Scholorarly Communications.
  • I think we are starting to see the larger Universities developing sophisticated network of systems to manage research/publications/OA/research-funding. See slide 10 of Paul Ayris presentation, and this article about Imperial’s setup as two examples.
  • It makes sense to share information (between IT systems) between funders, HE and publishers. Examples: Funders sharing (bibliographic) information to a University about publications from its researchers, Universities (or publishers) passing information to funders linking publications to funding (or even the other way round?).
  • This is an area which is still developing, fast, and will of course involve a culture change. Publishers seem unsure how to handle this new world.

Library search/discovery apps : intro

There’s a lot of talk in the Library world about ‘next generation catalogues’, library search tools and ‘discovery’. There’s good reason for this talk, in this domain the world has been turned on its head.

History in a nutshell:

  • The card catalogue became the online catalogue, the online catalogue let users search for physical items within the Library.
  • Journals became online journals. Libraries needed to let users find the online journals they subscribed to through various large and small publishers and hosts. They built simple in-house databases (containing journal titles and links to their homepages), or added them to the catalogue, or used a third party web based tool. As the number of e-journals grew, most ended up using the last option, a third party tool (which could offer other services, such as link resolving, and do much of the heavy lifting with managing a knowledge base).
  • But users wanted one place to search. Quite understandable. If you are after a journal, why should you look in one place to see if there is a physical copy, and another place if they had access to it online. Same with books/ebooks.
  • So libraries started to try and find ways to add online content to catalogue systems in bulk (which weren’t really designed for this). Aquabrowser : Uni Sussex beta catalogue

The online catalogues (OPAC) were simple web interfaces supplied with the much larger Library management system (ILS or LMS) which ran the backend the public never saw. These were nearly always slow, ugly, unloved and not very useful.

A couple of years a go(ish), we saw the birth of the next generation catalogue, or search and discovery tools. I could list them, but the Disruptive Technology Library Jester does an excellent job here. I strongly suggest you take a look.

Personally, I think I first heard about Aquabrowser. At the time a new OPAC which was miles ahead of those supplied with Library systems and was (I think) unique as a web catalogue interface not associated with a particular system, and shock, not from an established Library Company. The second system I heard about was probably Primo from Ex Libris. At first not understanding what it was: It sounds like Metalib (another product from the same company which cross-searches various e-resource), is Primo replacing it? Or replacing the OPAC? It took a while to appreciate that this was something that sat on top of the rest. From then, VuFind, LibraryFind and more.

While some where traditional commercial products (Primo, Encore, Aquabrowser), many more were open source solutions, a number of which developed at American Libraries. Often built on common (and modern) technology stacks such as Apache solr/Lucene, Drupal, php/java, mysql/postgres etc.Primo : British Library

In the last year or so a number of major Libraries have started to use one of these ‘Discovery Systems’ for example: the BL and Oxford using Primo, National Libraries of Scotland & Wales and Harvard have purchased Aquabrowser and the LSE is trying VuFind. At Sussex (where I work) we have purchased and implemented Aquabrowser. We’ve added data enrichments such as table of contents (searchable and visible on records), book covers and the ability to tag and review items (tag/reviewing has been removed for various reasons) .

It would be a mistake to put all of these in to one basket. Some focus on being a OPAC replacement, others on being a unified search tool, searching both local and online items. Some focus on social tools, tagging & reviewing. Some work out the box others are just a set of components which a Library can sow together, and some are ‘SaaS’.

It’s an area that is fast changing. Just recently an established Library web app Company announced a forthcoming product called ‘Summon’, which takes searching a library’s online content a step further.

So what do libraries go for, it’s not just potentially backing the wrong horse, but backing the wrong horse when everyone one else had moved on to dog racing!

And within all this it is important to remember ‘what do users actually want’. From the conversations and articles I’ve read, they want a Google search box, but one which returns results from trusted sources and academic content. Whether they are looking for a specific book, specific journal, a reference/citation, or one/many keywords. And not just one which searches the metadata, but one which brings back results based on the full text of items as well. There are some that worry that too many results are confusing. As Google proves, an intelligent ranking system makes the number of results irrelevant.

Setting up (and even reviewing) most of these systems take time, and if users start to add data (tags, reviews) to one system, then changing could cause problems (so should we be using third party tag/rating/review systems?).

You may be interested in some other articles I’ve written around this:

There’s a lot talk about discovery tools, but what sort to go for, who to back? And many issues have yet to be resolved. I’m come on to those next…

Free e-books online via University of Pittsburgh Press

The University of Pittsburgh Press has put nearly 500 out of print books online and Open Access. You can access them via their Digital Editions website.  This is excellent news, making work which could be lost openly available to all.

iversity of Pittsburgh Press Digital Editions - Open Access free ebooks
University of Pittsburgh Press Digital Editions - Open Access free ebooks

For years there has been a movement towards making Journal articles Open Access, i.e. publicly available. However some subjects (especially in the Humanities) publish much of their research in books, not journals. Letting the world gain from the (normally publicly funded) research contained within books is more complex, and it’s not an area I fully understand. The author normally receives royalties from book sales. However I understand this are normally very small 99% of the time, and normally tail down to tiny amounts after a few years. What if funders and Universities demanded that any book written with their money (or during their employment) must be made publicly available after x number of years (let’s say 10 years)? Academics and Publishers would not welcome the move, but would still allow a window where they can gain revenue, and if this became the norm it would be something they just have to accept. Meanwhile, once open access, the book becomes much easier to archive and preserve, and ensure the knowledge is available to all in the long term. Just a thought. Continue reading

ircount : Repository Record Statistics

I’ve updated Repository Record  Statistics (which I refer to as ircount).

Some key points:

  • Repository names with non-standard characters were not displaying properly. They now should, though there are some parts of the site where I have not updated the code yet. Many Russian IRs are also not showing the correct name, even though it is correct in the database, which I will look in to.
  • If a repository changes its name (in ROAR) then the new name should be shown in ircount
  • When looking at the details for one Repository, you can now compare it with any other repository, not just those from the same country. You’ll see two drop down boxes, one for those from the same country (for convenience) and one listing all repositories.
  • I’ve removed the Full Text numbers, which only appeared for some repositories. They were inaccurate and not very useful.
  • There’s now a link on the homepage to ircount news (which is actually just posts on this blog which have been tagged ‘ircount’, like this one).

The code is now in a subversion (svn) repository, my first time using such a system, which should help me keep track of changes. I can make it available to others if anyone is interested.

The future

There are other changes planned…. At the moment this is all in one big database table. Each week it collects lots of info about repositories from ROAR, including their name etc, and saves one row per repository (for each week). This means lots of infomation (such as a repository’s name) is duplicated each week. It also means that when selecting the name to be displayed you have to be careful to select the latest entry for the repository in question (something which hit me badly when trying to fix the name problem, and something I still haven’t got around to fixing for all SQL queries). I’m working on improving the back-end design.

I’m also thinking about periodically connecting to the OAI-PMH interface for each IR to collect certain details directly. Though this will be quite a change of direction, at the moment, ircount’s philosophy has been simply collecting from ROAR and reporting on what it gets back. Do I want to go down the road and loose this simple model.

I’m also pondering on ways to keep track of the number of full text items in each IR (you can see some initall thoughts on this here), though this will open a big can of worms.

The stats table which shows growth of repositories (based on number of records) over time for a given country (this one), could do with some improvements. RSS feeds for various things are also on the to do list.

Technical details

How did I fix these things, and add the new features. I made these changes a few months a go (on a test area) and the exact details have already slipped my mind.

Funny characters in Repository names

Any name with a non-standard character (a-z, A-Z, 0-9) has always displayed as garbage, this became more of an issue when I expanded ircount to include repositories world-wide. Below you can see an example from a Swedish repository: Växjö University.Example of incorrect name

The script which collects the data each week outputs its collected data in to a text file as well as writing it all to the database (really just as a backup and for debugging). The names displayed in the text file were the same messed up format, just as they were on the website. As the text file had nothing to do with the MySQL database or PHP front-end, I concluded the problem was with the actually grabbing the data from the ROAR website, which uses PERL’s LWP::simple.

This was a red herring. In the end I knocked up a script which just collected the file and output it to a textfile, and all worked fine. I gradually added code from the main script and it all was still working fine. So why did not main script not work.

In the end, I can’t remember the details but I think starting a new log file, or using a different filename (stupid I know) and other random things made the funny character problem go away.  Which meant the problem was now with the DB after all, which made more sense.

In the end I found this excellent page http://www.gyford.com/phil/writing/2008/04/25/utf8_mysql_perl.php

Converting the database tables/fields to utf8_general_ci, and adding a couple of lines of code to the perl script to ensure the connection to the db was in utf (both outlined in the webpage linked to above) sorted this out. The final step was ensuring that the front end user interface selected the most recent repository name for a given IR, as older entries in the db would still have the incorrect name.

Country names

When I started collecting data for all repositories around the world I needed a way for users to be able to select a particular country. ROAR provided two digit codes, but how to display proper country names. I found a soultion using a simple to use PHP library detailed here (note it’s on two pages).

Known Bugs

  • Some Repository names still displayed wrong, e.g Russian names. Anyone know why?!
  • Table may show old names, and sometimes show two seperate rows if a repository has changed it’s name in ROAR
  • When comparing repositories, sometimes the graph does not display, especially when comparing four. This is due to the amount of data being passed to Google Charts is exceeding the maximum size of a URL (can’t remember of the top of my head but it is about 2,000 characters). This should be fixable as there is no need to pass so much data to Google charts, just need to be a little more intelligent in preparing the URL.
  • Some HTML tables are not displaying correctly, some border lines are missing, almost at random.