The Data Imperative: Libraries and Research Data : comment

I put this in to a seperate post. It continues on from my previous post, but didn’t want my notes of the day to be taken over by my ill thought views.

Personal Thoughts

Reluctant to give some thoughts as I know so little about the service. However… (!)

There seems to be two clear areas here: Data formatting and Data storing. There is some linkage (Preserving surely covers both, formats can become obsolete, Servers die), yet the two seem to be somewhat seperate.

Both require IT skills, but IT is a broad church, the former is technical metadata (and is very much IT and library) and in the general area that I sees covered in the Eduserv efoundations blog.

The latter in its simplest form is hard core infrastructure. Disks, sans, servers, security, but also has elements at the application level (how do we access it, using what software, repositories? CRIS? Fedora?).

On another issue, while it is easy to say that libraries should take the lead, I think we need to be cautious. With the current climate of frozen or decreasing budgets nationally, and journal subscription pressure, how wise is it to go to the University’s executive and demand funding for resources/staff for data management. We know it’s important and could make the process of research more efficient, but there are other things higher up a Universities list of priorities (NSS/atracting good students, REF, research funding). Even at a library level, journals help researchers do research (which brings funding), and keep students happy because we have the stuff they need (NSS). How many journals should we cancel to focus on Research Data? Why? The recent JISC call will help with providing a business case.

The problem at the moment is that there are not enough clear benefits for most Universities to steam ahead with this. Let’s clarify this: not enough benefits for the institution itself. The benefits are for the UK as a while (actually, the while world). It’s the UK-wide economy and research that will benefit. So maybe it needs UK-wide funding. It’s easier to convince someone (or something) to spend money when the benefits for them are clear. In this case the benefits are for UK so it should the UK which sets aside explicit cash (via HEFCE, JISC, and so on).

And this is happening, with the JISC call (talked about today), amongst other things it will help build examples.

But I’m not sure if the institutional level is the best one. Australia has been successful with a centralised approach. We have a number of small Universities, and those which only have one or two departments which are research active. Yet the resources/knowledge required of them will be similar to that of a large institution. Will this leave them at a disadvantage?

On another note, it seems the range of data is vast. When dicussing this, I always – incorrectly – picture text based data, of vearying size, perhaps using XML. Of course this is blinkered. For auido, images and similar should a data service just provide a method to download, or a method to browse and view/listen? When it comes to storage and delivery, should we just treat all data as ‘blobs’ – things to be downloaded as a file, and we no nothing more with it? This makes it easy and repository softwareapplications (eprints/dspace/fedora) are well placed to cater to this need. But I get the impression that this is somewhat simplistic. Perhaps this means a data service needs a clear scope, otherwise we could end up building front end applications which mimic flickr, youtube and all in one. A costly path to go down.

[all views are my own. are wrong, badly worded, ill thought, why are you reading this?, just think the opposite and it will be right, etc]

Research in the Open: How Mandates Work in Practice

Today I’m at the RSP/RIN Research in the Open: How Mandates Work in Practice at the impressive RIBA 66 Portland Place.

Slides can be found here (not available when I made this post, as semi excuse as to why my notes miss so much). These are rough notes, which I’m making available in case others are interested, apologies for mistakes and don’t take it as gospel!

After an introduction by Stéphane Goldstein, kicking off with Robert Kiley from the Wellcome Trust.

Wellcome trust mandate since 2006, anyone receiving funding from Wellcome Trust must deposit in to pubmed, now uk pubmed central. SHERPA Juliet lists 48 funder policies/mandates.

Two routes to complying to their mandate: (route 1) publisher in open access / hybrid journal (preferred), Wellcome will normally pay any associated fees. However when paying the publisher, they expect a certain level of service in return (deposited on behalf of author, final version available at time of publication, certain level of re-use. Route 2 Author self-archives author’s final version within 6 months of publication. It was stressed that the first option is very much preferred.

“Publication costs are legitimate research costs”. To fund Open Access fees for ALL research they fund would, they estimate. take up 1-2% of their budget.

Risk of ‘Double payment’ (author fees and subscriptions). OUP have a good model here.

Still to do:

  • Improve compliance (roughly 33%, significant increase after letters to VCs),
  • improve mechanisms (Elsevier introduced OA workflow which resulted in significant increase in deposits, but funders/institutions/publishers all need to play a part here),
  • Clarifying Publishers OA Policies  (and re-use rights, didn’t catch this).

Research Councils UK – Astrid Wissenburg, ESRC

Starts of by talking about drivers for OA in the RC. Value for money, ensuring research is used, infrastructure and more.

Principles: Accessible, Quality (peer review), preservation (she’s moving through the slides fast)

April 2009 study in to OA impact, provides options for RC to consider.Findings

  • Significant shift in favour of OA over last decade
  • Knowledge/awareness still limited. Confusion
  • Engagement with OA varies by subject area.
  • Too early to access impact of RCs policies.
  • Drivers
    • Not speed of dissemination
    • principles of free access
    • co-authors views are a big influence (mandates less so!)
    • some evidence that OA increases citation just after publication
    • limited compliance monitoring by finders
    • concern about impact of learned societies (but no evidence of libraries cancelling journals)
    • little evidence of use by non-researchers (CJK comment: interesting, I would imagine this may grow, wish newspapers would link/cite journal articles)

Both models (oa journals/repositories) supported by RCs, level playing field.

Pay to publish findings: limited use, barriers, costs, awareness, not RAE. would lead to redistribution of costs from non-academic to academic areas.

OA Deposit (repositories): from grant application from 1 Oct 2006, so a three year project starting then will only be finished in Autumn 2009. Acknowledges embargos but ‘at earliest opportunity’.

75% researchers were not aware of the mandate. diversity across subjects. ‘In general, no active deposit’.

A slide showing % of awareness broken down by RC, interesting.

From the highest level RCs are committed to supporting OA (this will increase). But change takes time.

Some issues: what do to with embargo periods, difficult for funders to manage (are there incentives we could use), depends on existence of repositories, multiple deposit options confusing to researchers, awareness/understanding.

UKPubMed Central – Paul Davey, Engagement Manager, UKPubMed Central

Aims to become the information resource of choice for biomedical sector.

Principles: freely available, added to UK pubmed central, freely copied and reused.

Departmental of Health have clear policy to make research freely available.

95% of papers submitted are taken care of (deposited?) by the authors. only 0.5% submitted by academics (PIs/colleagues)

1.6 milion papers in uk pubmed central. 366 thousand downloads last month.

Core benefits: transparency, cutting down duplication, greater visibility.

Text mining, grabbing key terms from an article  (a little like  OpenCalais does)

Mentions EBI’s CiteXplore, encouraging academics to ink to other research.

Pubmed UK includes funding/grant facilities search. Can link articles to funding grants.

In short, backing from key funders, will make researchers more efficient, researcher’s visibility will increase.

Beta out in the Autumn, new site in Jan 2012.


Worried about text mining, need for humans to moderate this. response: Limited finding in this area so human intervention also limited. really need specialist to answer this fully.

Question about increasing visibility of UK pubmed central, referring to Google, response: getting indexed by Google very much part of increasing visibility.

Question about Canadian ‘pubmed central’, response confirms this and mentions talk of a European pubmed central. Potential of European funders using UK pubmed central as a place to deposit research (like everything here, not sure if I’ve noted this right).

PEER – Pioneering collaboration between publishers, repositories and researchers – Julia Wallace

Funded by EC, not a ‘publisher project’.

Three key stages of publication: NISO Author’s original, NISO Accepted Manuscript, NISO version of record.

Starts of talking about the project, interesting stuff but failed to take notes.

From the website:

PEER (Publishing and the Ecology of European Research), supported by the EC eContentplus programme, will investigate the effects of the large-scale, systematic depositing of authors’ final peer-reviewed manuscripts (so called Green Open Access or stage-two research output) on reader access, author visibility, and journal viability, as well as on the broader ecology of European research. The project is a collaboration between publishers, repositories and researchers and will last from 2008 to 2011.

Seven members: including a publisher group, university, funders etc. Various publishers involved, big and small and about six European repositories taking part.

Approach / content:

  • Publishers contribute 300 journals, plus control
  • Maximises deposit and access in participating repositories
  • 50% publisher submitted 50% author submitted.
  • Good quality, range of impact factors. Publishers set embargo periods, up to 36 months.

Publishers will deposit articles in to the repositories via a central depot for their 50% of articles submitted (50% fulltext, metadata for the remaining 50%). Publishers will invite authors to deposit for the ‘author’ 50%

Technical: using PDFA-1 (where possible) and SWORD

Three strands: Behaviour, Usage (looking at raw log files), Economic. Also looking at Model Development (the three strands will look in to this).

Question about why they chose PDF (not very good for text mining). A: wide range of subjects and publishers means that PDF the best fit

Economic Implications of Alternative Scholarly Publishing Models, also Loughborough University’s Institutional Mandate – Charles Oppenheim, Loughborough University

‘Houghton report’ looks at costs and benefits of scholarly publishing.

Link to report

Link to main website and models

  • Massive savings by using OA, UK would benefit from this.
  • Savings include: quicker searching, less negotiations, savings not just in library budgets
  • 2,300 activity items costed.
  • This report currently final word in economics of OA.
  • Charles Talks about the various methods and work involved in producng this report.
  • a 5% incease in accessibility would lead to savings (or extra money to spend) in research/he/RCs
  • Hard to compare UK toll/open access publishing costs as one pays for UK access to content from across the world, the other pays for UK content to be world wide accessible.
  • Keen to role this out to other countires
  • Publishers response to report: furious!

Now for something completly different: Loughborough approve a mandae a few months a go, to come in to affect Oct 09. An intergral part of academic personal research plans (only those research items in the IR will be considered at the review). Now have over 4,000 items

Lunch and audioboo

During lunch I did an experiment using audioboo. Would I be able to summarise the morning, on the fly with no planning, in a brief audio recording. The answer, as you can discover, is ‘no’, but fun to try, and made me think of what I had taken in during the morning. Link to audioboo recording. or try the embedded version below.

Institutional Mandates – Paul Ayris, University College London

Paul starts off by shoing a number of Venn diagrams, for example: 90% of its research is available online, 40% available to an NHS hospital

What do UCL academics want

  • as authors: visbility / impact
  • as readers: access
  • delivery 24×7 anywhere

UCL madate, a case study:

Looking global is an important part of UCL (for PIs rankings etc). Number of systems in their publication system: Symplectic, IRIS, eprints, data mart (and portico, FIS, HR). Symplectic (or similar tool) and IRIS seem central in this model. Plan to automatically extra metadata from other external places (publication repositories.

How did they get the mandate? Paul spoke at UCLs senate (Academic Board), the agreed: all academic staff should record they own publication on a UCL publication system, and, teaching materials should all be deposited in their eprints systems.

UCL are going to set up a publication board to over see the OA rollout; to advise, monitor, oversee presentation and more.

Next steps: market/exploit, set standards for online publication, to advise on ongoing resource issues in this area. Also, establish processes, Statistics and management information, advise on multimedia, copyright issues.

‘Open Access is the natural way for a global university to achieve its objectives’

Question about blurring the line between dissemination and publication, and that some of UCLs aims seem more fitting of ‘publication’. Paul agrees, still trying to figure this out.

HEFCE – Paul Hubbard, Head of Research Policy, HEFCE

Policy: Research is a process which leads to insights for sharing. So Scholarly Communication matters to HEFCE. Prompt and accessible publishing is essential for a world class research system.

Supporting research: JISC, RIN, Programmes to support national research libraries (UKRR), UKRDS. Mentions Boston Spa (BL) document centre as an example of our world class sharing.

Internet opens up new ways of scholarly communication and sharing.

What do HEFCE want to see:

  • Widest and earliest dissemination of public research.
  • IP shared effectively with the people best placed to exploit it (CJK comment, i don’t think it is publishers!)

Committed to: UK maintaining world leading research, funding that fosters autonomy and dynamism, research quality assessment regime that supports rather than inhibits new developments.

As we move forward, things may be unclear those HEIs with repositories will be at an advantage.

Paul finishes up with a personal view of scholarly communications in 2030. He sees to forms of communication: discussion (building up ideas), and writing up a formal firm idea/conclusion based on these. HEFCE supports – through the likes of JISC – a range of tools and systems to enable this. (sorry that was an awful summary, he said much more than that!).

Answered a question as to why IRs, HEIs are the places to administrate/manage. Websites people go to see research for a particular subject need to be overlay systems harvesting from IRs.

[hmm, does ‘university requirement’ sound better than mandate?]

Institutional Policies and Processes for Mandate Compliance – Bill Hubbard, SHERPA, University of Nottingham

99.9% of academics do not object to Open Access, but need to show it will not change how they work. Librarians going to be much more part of the research process. Most people (including most publishers) are in favour of Open Access.

Other pressures on the systems, lack of peer reviewers, rising prices of journals, growing need for different forms of scholarly communications (e-lab books, multimedia), public demand for highest value for money ‘public should get what they pay for’,

Not if we change, but how we change. Research has to change seamlessly. Mandates have a value-added basis with fast delivery of benefits. Need integrated processes, need integrated support (we don’t want researchers to hear different messages from their Uni, funder, publisher, etc).

Authors need to know ‘what do i meed to do’. Need to make it less confusing, need to make it clear when they can get help.

First step compliance: how can funders improve compliance, how can authors be supported?

All 1994 and Russel Group now have IR (Reading, I think, just setting one up now).

Compliance for mandates makes it better for us admin/support staff, and for the University generally.

Institutions need a compliance officer (perhaps repository manager). Funders need to ensure these people have the information they need. Share compliance information

I’ve missed so much of Bill’s talk here, he moves fast (and passionately) and lots of points.

After Bill’s talk there was a panel session.


Finally check out some of the useful tweets from the day. (Twitter search only goes back about a month or so, so this link may not work after a certain date). Jim Richardson also created a permanent copy with the (new to me) webcitation website.


With such dodgy note taking I feel some concise summary is in order!

  • Mandates are happening, by Universities and by Funders.
  • HEFCE want research to be accessible to as many as possible as quickly as possible.  Coming from HEFCE, this holds a lot of weight.
  • Funders (Research Councils / Wellcome) put mandates in place several years a go. They have not sat back and said ‘job done’. They are building on this foundation. How can they check? How can they enforce/encourage? How can they assist? How can they automate? How can they work with publishers and HE to share this information? Expect more to come in this area.
  • Wellcome Trust prefers submission to Open Access Journals rather than author depositing in to a repository at a later date.
  • HE Mandates are coming, we alreay have a few in the UK. Making them an intergral part of an academic’s review seems like a good idea. My opinion is that this is reasonable – even if there are those who disagree – surely an employer can (and does in every other sector) ask for a record of what an employee has been working on, and a copy of the end output, i.e. the full text in an IR.
  • The report ‘Economic implications of alternative scholarly publishing models : exploring the costs and benefits. JISC EI-ASPM Project‘ is a thourough comprehensive look at the economic costs of Open Access and new forms of Scholorarly Communications.
  • I think we are starting to see the larger Universities developing sophisticated network of systems to manage research/publications/OA/research-funding. See slide 10 of Paul Ayris presentation, and this article about Imperial’s setup as two examples.
  • It makes sense to share information (between IT systems) between funders, HE and publishers. Examples: Funders sharing (bibliographic) information to a University about publications from its researchers, Universities (or publishers) passing information to funders linking publications to funding (or even the other way round?).
  • This is an area which is still developing, fast, and will of course involve a culture change. Publishers seem unsure how to handle this new world.

Free e-books online via University of Pittsburgh Press

The University of Pittsburgh Press has put nearly 500 out of print books online and Open Access. You can access them via their Digital Editions website.  This is excellent news, making work which could be lost openly available to all.

iversity of Pittsburgh Press Digital Editions - Open Access free ebooks
University of Pittsburgh Press Digital Editions - Open Access free ebooks

For years there has been a movement towards making Journal articles Open Access, i.e. publicly available. However some subjects (especially in the Humanities) publish much of their research in books, not journals. Letting the world gain from the (normally publicly funded) research contained within books is more complex, and it’s not an area I fully understand. The author normally receives royalties from book sales. However I understand this are normally very small 99% of the time, and normally tail down to tiny amounts after a few years. What if funders and Universities demanded that any book written with their money (or during their employment) must be made publicly available after x number of years (let’s say 10 years)? Academics and Publishers would not welcome the move, but would still allow a window where they can gain revenue, and if this became the norm it would be something they just have to accept. Meanwhile, once open access, the book becomes much easier to archive and preserve, and ensure the knowledge is available to all in the long term. Just a thought. Continue reading Free e-books online via University of Pittsburgh Press