<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nostuff.org &#187; ircount</title>
	<atom:link href="http://www.nostuff.org/words/tag/ircount/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nostuff.org/words</link>
	<description>living up to its name</description>
	<lastBuildDate>Sat, 13 Apr 2013 21:35:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>ircount : update</title>
		<link>http://www.nostuff.org/words/2010/ircount-update/</link>
		<comments>http://www.nostuff.org/words/2010/ircount-update/#comments</comments>
		<pubDate>Sun, 30 May 2010 17:45:42 +0000</pubDate>
		<dc:creator>Chris Keene</dc:creator>
				<category><![CDATA[libraries, library technologies & open data]]></category>
		<category><![CDATA[ircount]]></category>
		<category><![CDATA[repositories]]></category>
		<category><![CDATA[roar]]></category>

		<guid isPermaLink="false">http://www.nostuff.org/words/?p=584</guid>
		<description><![CDATA[One Sunday morning in January this year I got an email sent automatically from the webhosting company. It contained the output of the script that ran weekly, when all ran fine the script produced no output. When something went wrong the error messages were emailed to me. Judging by the length of the email something big had gone wrong. [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-repository-record-statistics/' rel='bookmark' title='ircount : Repository Record Statistics'>ircount : Repository Record Statistics</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-development/' rel='bookmark' title='ircount development'>ircount development</a></li>
</ol>
</div>
]]></description>
				<content:encoded><![CDATA[<p>One Sunday morning in January this year I got an email sent automatically from the webhosting company. It contained the output of the script that ran weekly, when all ran fine the script produced no output. When something went wrong the error messages were emailed to me. Judging by the length of the email something big had gone wrong.</p>
<p>The script collected data from http://roar.eprints.org/ &#8211; to be used as this weeks &#8216;number of records&#8217; for each repository.</p>
<p>The reason became clear quickly. A major revamp to ROAR had just been launch, showing off a new interface, which used the Eprints software as a platform (essential a repository or repositories). This was a great leap forward but unfortunately removed the simple text file I used to collect the data, and what was more, the IDs for each IR had changed.</p>
<p>I finally got around to fixing this in May. The most fiddly bit was linking the data I collected now with the data I already had. This involved matching URLs and repository names.</p>
<p>Anyways. Things should be more or less as they were. A few little tweaks have been added. A few bugs still remain.</p>
<p>As ever you can view the code and changes here: <a href="http://trac.nostuff.org/ircount/browser/trunk">http://trac.nostuff.org/ircount/browser/trunk</a></p>
<p>And checkout the svn here: <a href="http://svn.nostuff.org/ircount/">http://svn.nostuff.org/ircount/</a></p>
<p>ircount can be found here: <a href="http://www.nostuff.org/ircount/">http://www.nostuff.org/ircount/</a></p>
<a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Twitter" href="http://twitter.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-9q&#038;text=Hey%20check%20this%20out"><img alt="twitter" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/twitter.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Google+" href="https://plus.google.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-9q"><img alt="google_plus" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/google_plus.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Reddit" href="http://www.reddit.com/submit?url=http%3A%2F%2Fwp.me%2FpeBTe-9q&#038;title=ircount+%3A+update"><img alt="reddit" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/reddit.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share by email" href="mailto:?subject=ircount+%3A+update&#038;body=Hey%20check%20this%20out:%20http%3A%2F%2Fwp.me%2FpeBTe-9q"><img alt="mail" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/mail.png" /></a><a target="_blank" title="WordPress Social Media Feather" href="http://synved.com/wordpress-social-media-feather/" style="color:#444; text-decoration:none; font-size:8px; margin-left:5px;vertical-align:10px;white-space:nowrap;"><span>by <img style="margin:0;padding:0;" alt="feather" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/icon.png" /></a></span><div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-repository-record-statistics/' rel='bookmark' title='ircount : Repository Record Statistics'>ircount : Repository Record Statistics</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-development/' rel='bookmark' title='ircount development'>ircount development</a></li>
</ol></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nostuff.org/words/2010/ircount-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ircount development</title>
		<link>http://www.nostuff.org/words/2009/ircount-development/</link>
		<comments>http://www.nostuff.org/words/2009/ircount-development/#comments</comments>
		<pubDate>Thu, 26 Nov 2009 20:59:18 +0000</pubDate>
		<dc:creator>Chris Keene</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[libraries, library technologies & open data]]></category>
		<category><![CDATA[me me me]]></category>
		<category><![CDATA[ircount]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.nostuff.org/words/?p=483</guid>
		<description><![CDATA[I've finally got around to spending a bit of time on the ircount code.

This post goes through some of the techy stuff behind it. If you're just interested in features, I'm afraid there's none yet, but you can now compare more than 4 repositories, but that's as far as you'll want to read.<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-repository-record-statistics/' rel='bookmark' title='ircount : Repository Record Statistics'>ircount : Repository Record Statistics</a></li>
<li><a href='http://www.nostuff.org/words/2010/ircount-update/' rel='bookmark' title='ircount : update'>ircount : update</a></li>
</ol>
</div>
]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve finally got around to spending a bit of time on the<em> <a href="http://www.nostuff.org/ircount/">ircount</a></em> code.</p>
<p>This post goes through some of the techy stuff behind it. If you&#8217;re just interested in features, I&#8217;m afraid there&#8217;s none yet, but you can now compare more than 4 repositories, but that&#8217;s as far as you&#8217;ll want to read.<span id="more-483"></span></p>
<p><strong>Subversion and Track</strong></p>
<p>The first thing I did a few months a go was create a Subversion repository. I seemed to time this quite well in learning how to use subversion just as every man and his dog gets excited by GIT.</p>
<p>I also installed <a href="http://trac.edgewall.org/">Trac</a> (using <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://www.nostuff.org/words/dreamhost-hosting/">Dreamhost</a>&#8216;s useful one-click install), which I haven&#8217;t really used other than to browse the code and view changes (time-line).</p>
<p>The repository is publicly available from here: <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://svn.nostuff.org/ircount/">http://svn.nostuff.org/ircount/</a></p>
<p>The trac site (which can be used to browse the code) is found at: <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://trac.nostuff.org/ircount/">http://trac.nostuff.org/ircount/</a></p>
<p>It took me a while to get svn working well. Originally I would edit the files in a local working copy of my Macbook, and then use subversion to load these to the server to test. Of course, this meant every little change had to be manually checked in to test it. I got apache and php working on the Macbook, and setup the Mysql db (on the Dreamhost server) to allow connections remotely which allowed me to test/use files located in my local copy. This seems to work well.</p>
<p>I use <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://code.google.com/p/svnx/">svnx</a> as a GUI Mac client. It&#8217;s free and is easy to use.</p>
<p><strong>The code. The rewrite</strong></p>
<p>I don&#8217;t do much code writing or developing. Anyone glancing at my work will take that as plain obvious.</p>
<p>I&#8217;ve realised when writing code I have a tendency to be very linear and unconsciously put efficiency before good design. Anything more than one database call per page was unthinkable, loading in other pages a sin, which often led to large unwieldy while loops processing the results from a massive database call. Database calls are mixed with html output mixed with logic.</p>
<p>This is just about ok when showing information about one Institutional Repository. But when comparing a number, it becomes unreadable, and not in any state to be reused. The key aspect of comparing a number of IRs is that you need to ascertain a number of facts for the page as a whole (the earliest data collection date for the page, the higher record count for the page &#8211; for the chart for example &#8211; which could be from any of the IRs).</p>
<p>My  aim was that the page files should be little more than calls to a few discrete functions.</p>
<p>I&#8217;m not quite there yet, but it&#8217;s a start. <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://trac.nostuff.org/ircount/browser/trunk/archive.php">archive.php</a> is mainly a set of function calls, but there is still too much in there, and random bits of html code dotted around. There&#8217;s also too many arrays holding information the repository (php) objects can provide. <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://trac.nostuff.org/ircount/browser/trunk/include.php">include.php</a> holds the functions, but is now a little bit unwieldy itself, with functions ordered randomly. The third file of note is <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://trac.nostuff.org/ircount/browser/trunk/class.archive.php">class.archive.php</a>. This is a repository object, which can grab data from the db for the repository, and return it to the calling page in various ways.</p>
<p>My original plan was to merge the code for showing one repository, with the code to show more than one, to make it easier to implement changes (not having to update two files). By the end of it, I&#8217;m now wondering if it would be easier to have two files again, for the little changes, which both call the same core functions.</p>
<p><strong>Google Chart</strong></p>
<p><a href="http://code.google.com/apis/chart/">Google Chart </a>is a great API and I recommend anyone to play with it.</p>
<p>However one of the problems of the last version of ircount is that the URLs for the chart images often became too long (more than 2,000 characters) resulting in no chart being shown. The Chart URL includes each data point separated by a comma, so four repositories multiplied by 100 weeks (for example), multiplied by 4 or 5 digits per datapoint (4 digit number plus a comma), it soon adds up.</p>
<p>One solution was to only pass data per month rather than per week (roughly reducing the number of data points to a quarter of the original). Another would have been (and probably will do in the future) to make use of Google Chart&#8217;s encoding function, made easy using<a href="http://bendodson.com/2008/02/28/google-extended-encoding-made-easy/"> these helpful functions</a>.</p>
<p>Overcoming this in an efficient way was a challenge. Originally I had a Google Chart PHP object. IR data would be passed to it in one method, and another method would return the URL.</p>
<p>This seemed sensible at the time, but deciding which php object did what became confusing. For example, the first thing the chart object needed to do was decide if the URL would be too long for the Repositories in <a href=http://092.me>question</a>, taking in to account the data we had for each repository. Should the chart object loop around each data point for each repository to first decide how many there are? or should the repository object be handling this by telling the chart object how much data it has? How to avoid the need to loop through the same data several times. Does it matter which object does the work? It&#8217;s for the chart, so the chart object should do it, yet other parts of the page may want this info about the repositories, so the repository objects should provide it for all.</p>
<p>In the end, I did away with the chart object and used a function instead, which is passed an array of repository objects, which in turn handle a lot of work.</p>
<p><strong>Future</strong></p>
<p>The foundations are about there. For any page I (or anyone else) wishes to create. A couple of lines are all that are needed to take one or more repository IDs passed in the URL and load in all the data for them, ready to be used as needed. We can then easily call a table or chart to display for these repositories (or a subset of them).</p>
<p>As I mentioned above, the only real improvements are the ability to show more than 4 repositories at once (the chart stops showing once you get to about nine repositories), and the chart is more robust and will now show when it would have failed to do so in the past.</p>
<p>Google Chart does have an encoding which allows far more to be passed in a condensed way, and <a class="libx-autolink" style="border-bottom: 1px dotted;" href="http://bendodson.com/2008/02/28/google-extended-encoding-made-easy/">this php function</a> looks very useful for using it.</p>
<p>If I was starting again today I would look to use a framework such as <a href="http://framework.zend.com/">Zend Framework</a> or <a href="http://cakephp.org/">CakePHP</a>, or maybe even have a go at <a href="http://rubyonrails.org/">Ruby on Rails</a>. But perhaps a third rewrite is a little over the top for now.</p>
<p>I need to tidy up the <a href="http://www.nostuff.org/ircount/table.php?country=uk">table view</a> a bit (some <a href="http://trac.nostuff.org/ircount/browser/tags/09nov/table.php">nasty code there</a>) and then look to a few new features, may be collecting more data, and exposing it in some computer friendly formats such as atom.</p>
<p>So<a href="http://www.nostuff.org/ircount/"> ircount</a> is really no more than a play thing for a bad coder to make mistakes and learn a little bit along the way. slowly. but if you have any ideas or thoughts I would love to hear them.</p>
<a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Twitter" href="http://twitter.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-7N&#038;text=Hey%20check%20this%20out"><img alt="twitter" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/twitter.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Google+" href="https://plus.google.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-7N"><img alt="google_plus" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/google_plus.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Reddit" href="http://www.reddit.com/submit?url=http%3A%2F%2Fwp.me%2FpeBTe-7N&#038;title=ircount+development"><img alt="reddit" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/reddit.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share by email" href="mailto:?subject=ircount+development&#038;body=Hey%20check%20this%20out:%20http%3A%2F%2Fwp.me%2FpeBTe-7N"><img alt="mail" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/mail.png" /></a><a target="_blank" title="WordPress Social Media Feather" href="http://synved.com/wordpress-social-media-feather/" style="color:#444; text-decoration:none; font-size:8px; margin-left:5px;vertical-align:10px;white-space:nowrap;"><span>by <img style="margin:0;padding:0;" alt="feather" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/icon.png" /></a></span><div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-repository-record-statistics/' rel='bookmark' title='ircount : Repository Record Statistics'>ircount : Repository Record Statistics</a></li>
<li><a href='http://www.nostuff.org/words/2010/ircount-update/' rel='bookmark' title='ircount : update'>ircount : update</a></li>
</ol></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nostuff.org/words/2009/ircount-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ircount : Repository Record Statistics</title>
		<link>http://www.nostuff.org/words/2009/ircount-repository-record-statistics/</link>
		<comments>http://www.nostuff.org/words/2009/ircount-repository-record-statistics/#comments</comments>
		<pubDate>Thu, 14 May 2009 19:34:53 +0000</pubDate>
		<dc:creator>Chris Keene</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[libraries, library technologies & open data]]></category>
		<category><![CDATA[ircount]]></category>

		<guid isPermaLink="false">http://www.nostuff.org/words/?p=335</guid>
		<description><![CDATA[I&#8217;ve updated Repository Record  Statistics (which I refer to as ircount). Some key points: Repository names with non-standard characters were not displaying properly. They now should, though there are some parts of the site where I have not updated the code yet. Many Russian IRs are also not showing the correct name, even though it [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2010/ircount-update/' rel='bookmark' title='ircount : update'>ircount : update</a></li>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-development/' rel='bookmark' title='ircount development'>ircount development</a></li>
</ol>
</div>
]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve updated <em>Repository Record  Statistics</em> (which I refer to as <em>ircount</em>).</p>
<p>Some key points:</p>
<ul>
<li>Repository names with non-standard characters were not displaying properly. They now should, though there are some parts of the site where I have not updated the code yet. Many <a href="http://www.nostuff.org/ircount/index.php?country=ru">Russian</a> IRs are also not showing the correct name, even though it is correct in the database, which I will look in to.</li>
<li>If a repository changes its name (in ROAR) then the new name should be shown in ircount</li>
<li>When looking at the details for one Repository, you can now compare it with any other repository, not just those from the same country. You&#8217;ll see two drop down boxes, one for those from the same country (for convenience) and one listing all repositories.</li>
<li>I&#8217;ve removed the Full Text numbers, which only appeared for some repositories. They were inaccurate and not very useful.</li>
<li>There&#8217;s now a link on the homepage to ircount news (which is actually just posts on this blog which have been <a href="http://www.nostuff.org/words/tag/ircount/">tagged &#8216;ircount&#8217;,</a> like this one).</li>
</ul>
<p>The code is now in a subversion (svn) repository, my first time using such a system, which should help me keep track of changes. I can make it available to others if anyone is interested.</p>
<h3>The future</h3>
<p>There are other changes planned&#8230;. At the moment this is all in one big database table. Each week it collects lots of info about repositories from ROAR, including their name etc, and saves one row per repository (for each week). This means lots of infomation (such as a repository&#8217;s name) is duplicated each week. It also means that when selecting the name to be displayed you have to be careful to select the latest entry for the repository in <a href=http://092.me>question</a> (something which hit me badly when trying to fix the name problem, and something I still haven&#8217;t got around to fixing for all SQL queries). I&#8217;m working on improving the back-end design.</p>
<p>I&#8217;m also thinking about periodically connecting to the OAI-PMH interface for each IR to collect certain details directly. Though this will be quite a change of direction, at the moment, ircount&#8217;s philosophy has been simply collecting from ROAR and reporting on what it gets back. Do I want to go down the road and loose this simple model.</p>
<p>I&#8217;m also pondering on ways to keep track of the number of full text items in each IR (you can see some initall <a href="http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/">thoughts on this here</a>), though this will open a big can of worms.</p>
<p>The stats table which shows growth of repositories (based on number of records) over time for a given country (<a href="http://www.nostuff.org/ircount/table.php?country=uk">this one</a>), could do with some improvements. RSS feeds for various things are also on the to do list.</p>
<h3>Technical details</h3>
<p>How did I fix these things, and add the new features. I made these changes a few months a go (on a test area) and the exact details have already slipped my mind.</p>
<p><strong>Funny characters in Repository names</strong></p>
<p>Any name with a non-standard character (a-z, A-Z, 0-9) has always displayed as garbage, this became more of an issue when I expanded ircount to include repositories world-wide. Below you can see an example from a <a href="http://www.nostuff.org/ircount/archive.php?id=http%3A%2F%2Fwww.diva-portal.org%2Fvxu%2F20051208131516&amp;country=se">Swedish repository</a>: Växjö University.<a href="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content//2009/04/picture-2.png"><img class="alignright size-medium wp-image-336" title="Example of incorrect name" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content//2009/04/picture-2-300x13.png" alt="Example of incorrect name" width="300" height="13" /></a></p>
<p>The script which collects the data each week outputs its collected data in to a text file as well as writing it all to the database (really just as a backup and for debugging). The names displayed in the text file were the same messed up format, just as they were on the website. As the text file had nothing to do with the MySQL database or PHP front-end, I concluded the problem was with the actually grabbing the data from the ROAR website, which uses PERL&#8217;s LWP::simple.</p>
<p>This was a red herring. In the end I knocked up a script which just collected the file and output it to a textfile, and all worked fine. I gradually added code from the main script and it all was still working fine. So why did not main script not work.</p>
<p>In the end, I can&#8217;t remember the details but I think starting a new log file, or using a different filename (stupid I know) and other random things made the funny character problem go away.  Which meant the problem was now with the DB after all, which made more sense.</p>
<p>In the end I found this excellent page <a href="http://www.gyford.com/phil/writing/2008/04/25/utf8_mysql_perl.php">http://www.gyford.com/phil/writing/2008/04/25/utf8_mysql_perl.php</a></p>
<p>Converting the database tables/fields to <dfn title="Unicode (multilingual), case-insensitive">utf8_general_ci,</dfn> and<dfn title="Unicode (multilingual), case-insensitive"> </dfn>adding a couple of lines of code to the perl script to ensure the connection to the db was in utf (both outlined in the webpage linked to above) sorted this out. The final step was ensuring that the front end user interface selected the most recent repository name for a given IR, as older entries in the db would still have the incorrect name.</p>
<p><strong>Country names</strong></p>
<p>When I started collecting data for all repositories around the world I needed a way for users to be able to select a particular country. ROAR provided two digit codes, but how to display proper country names. I found a soultion <a href="http://www.devx.com/webdev/Article/38732/1763/">using a simple to use PHP library detailed here</a> (note it&#8217;s on two pages).</p>
<p><strong>Known Bugs</strong></p>
<ul>
<li>Some Repository names still displayed wrong, e.g <a href="http://www.nostuff.org/ircount/archive.php?id=http%3A%2F%2Femelya.socionet.ru%2Foai%2Fdbjsqw_1%2Foai.xml20071016112534&amp;country=ru">Russian names</a>. Anyone know why?!</li>
<li>Table may show old names, and sometimes show two seperate rows if a repository has changed it&#8217;s name in ROAR</li>
<li>When comparing repositories, sometimes the graph does not display, especially when comparing four. This is due to the amount of data being passed to<a href="http://code.google.com/apis/chart/"> Google Charts</a> is exceeding the maximum size of a URL (can&#8217;t remember of the top of my head but it is about 2,000 characters). This should be fixable as there is no need to pass so much data to Google charts, just need to be a little more intelligent in preparing the URL.</li>
<li>Some HTML tables are not displaying correctly, some border lines are missing, almost at random.</li>
</ul>
<a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Twitter" href="http://twitter.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-5p&#038;text=Hey%20check%20this%20out"><img alt="twitter" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/twitter.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Google+" href="https://plus.google.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-5p"><img alt="google_plus" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/google_plus.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Reddit" href="http://www.reddit.com/submit?url=http%3A%2F%2Fwp.me%2FpeBTe-5p&#038;title=ircount+%3A+Repository+Record+Statistics"><img alt="reddit" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/reddit.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share by email" href="mailto:?subject=ircount+%3A+Repository+Record+Statistics&#038;body=Hey%20check%20this%20out:%20http%3A%2F%2Fwp.me%2FpeBTe-5p"><img alt="mail" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/mail.png" /></a><a target="_blank" title="WordPress Social Media Feather" href="http://synved.com/wordpress-social-media-feather/" style="color:#444; text-decoration:none; font-size:8px; margin-left:5px;vertical-align:10px;white-space:nowrap;"><span>by <img style="margin:0;padding:0;" alt="feather" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/icon.png" /></a></span><div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2010/ircount-update/' rel='bookmark' title='ircount : update'>ircount : update</a></li>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-development/' rel='bookmark' title='ircount development'>ircount development</a></li>
</ol></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nostuff.org/words/2009/ircount-repository-record-statistics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Playing with OAI-PMH with Simple DC</title>
		<link>http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/</link>
		<comments>http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/#comments</comments>
		<pubDate>Thu, 13 Nov 2008 14:15:14 +0000</pubDate>
		<dc:creator>Chris Keene</dc:creator>
				<category><![CDATA[interesting]]></category>
		<category><![CDATA[libraries, library technologies & open data]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[web and blogs]]></category>
		<category><![CDATA[ircount]]></category>
		<category><![CDATA[oai]]></category>
		<category><![CDATA[oai-pmh]]></category>
		<category><![CDATA[repositories]]></category>

		<guid isPermaLink="false">http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/</guid>
		<description><![CDATA[Setting up ircount has got me quite interested in OAI-PMH, so I thought I would have a little play. I was particularly interested in seeing if there was a way to count the number of full text items in a repository, as ROAR does not generally provide this information. Perl script I decided to use [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
</ol>
</div>
]]></description>
				<content:encoded><![CDATA[<p>Setting up <a title="ircount - count of records in repositories" href="http://www.nostuff.org/ircount/">ircount</a> has got me quite interested in <a href="http://en.wikipedia.org/wiki/OAI-PMH">OAI-PMH</a>, so I thought I would have a little play. I was particularly interested in seeing if there was a way to count the number of full text items in a repository, as <a href="http://roar.eprints.org/?action=home&amp;q=&amp;country=uk&amp;version=&amp;type=institutional&amp;order=recordcount&amp;submit=Filter">ROAR</a> does not generally provide this information.</p>
<p><strong>Perl script</strong></p>
<p>I decided to use the <a href="http://search.cpan.org/dist/HTTP-OAI/">http::oai perl module</a> by Tim Brody (who not-so-coincidentally is also responsible for ROAR, which ircount gets its data from).</p>
<p>A couple of hours later I have a very basic script which will roughly report on the number of records and the number of full text items within a repository, you just need to pass it a URL for the OAI-PMH interface.</p>
<p>To show the outcome of my efforts, <a href="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/sussexout.txt">here is the verbose output of the script</a> when pointed at the University of Sussex repository <a href="http://eprints.sussex.ac.uk/">(Sussex Research Online</a>).</p>
<p>Here is the output for a sample record (<a href="http://eprints.sussex.ac.uk/perl/oai2?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:eprints.sussex.ac.uk:67">see here for the actual oai output</a> for this record, you may want to &#8216;view source&#8217; to see the XML):</p>
<blockquote>
<pre>oai:eprints.sussex.ac.uk:67 2006-09-19</pre>
<pre>Retreat of chalk cliffs in the eastern English Channel during the last century</pre>
<pre>relation: http://eprints.sussex.ac.uk/67/01/Dornbusch_coast_1124460539.pdf</pre>
<pre>MATCH http://eprints.sussex.ac.uk/67/01/Dornbusch_coast_1124460539.pdf</pre>
<pre>relation: http://www.journalofmaps.com/article_depository/europe/Dornbusch_coast_1124460539.pdf</pre>
<pre>dc.identifier: http://eprints.sussex.ac.uk/67/</pre>
<pre>full text found for id oai:eprints.sussex.ac.uk:67, current total of items with fulltext 6</pre>
<pre>id oai:eprints.sussex.ac.uk:67 is the 29 record we have seen</pre>
</blockquote>
<p>It first lists the identifier and date, the next line shows the title, it then shows a <em>dc.relation</em> field which contains a full text item on the eprints server, because it looks like a full text item and on the same server the next line shows it has found a line that MATCHed the criteria which means we add this item to the count of items with full text items attached.</p>
<p>The next line is another <em>dc.identifier</em>, again pointing to a fulltext URL for this item. However this time it is on a different server (i.e. the publishers), so this line is not treated as a fulltext item, and so it does not show a MATCH (i.e. had the first identifier line not existed, this record would not be considered one with a fulltext item).</p>
<p>Finally another <em>dc.identifier</em> is shown, then a summary generated by the script concluding that this item does have fulltext, is the sixth record seen with fulltext, and is the 29th record we have seen.</p>
<p>The script, as we will now see, has to use various &#8216;hacky&#8217; methods to try and guess the number of fulltext items within a repository, as different systems populate simple Dublin Core in different ways.</p>
<p><strong>Repositories and OAI-PMH/Simple Dublin Core.</strong></p>
<p>It quickly became clear on experimenting with different repositories that the different repository software populate Simple Dublin Core in a different manner. Here are some examples:</p>
<p><strong><a href="http://www.eprints.org/software/">Eprints</a>2:</strong> As you can see above in the Sussex example, fulltext items are added as a <em>dc.relation</em> field, but so too are any publisher/official URLs, which we don&#8217;t want to count. The only way to differentiate between the two is to check the domain name within the <em>dc.relation</em> url and see if it matches that of the OAI interface we are working with. This is no means solid, quite possible for a system to have more than one hostname and what the user gives as the OAI URL may not match what the system gives as the URLs for fulltext items.</p>
<p><strong>Eprints3</strong>: I&#8217;ll use the Warwick repository for this, see the <a href="http://wrap.warwick.ac.uk/46/">HTML</a> and <a href="http://wrap.warwick.ac.uk/cgi/oai2?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:wrap.warwick.ac.uk:46">OAI-PMH</a> for the record used in this example.</p>
<blockquote>
<pre>&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;</pre>
<pre>&lt;dc:identifier&gt;http://wrap.warwick.ac.uk/46/1/WRAP_Slade_jel_paper_may07.pdf&lt;/dc:identifier&gt;</pre>
<pre>&lt;dc:relation&gt;http://dx.doi.org/10.1257/jel.45.3.629&lt;/dc:relation&gt;</pre>
<pre>&lt;dc:identifier&gt;Lafontaine, Francine and Slade, Margaret (2007) Vertical integration and firm boundaries: the evidence. Journal of Economic Literature, Vol.45 (No.3). pp. 631-687. ISSN 0022-0515&lt;/dc:identifier&gt;</pre>
<pre>&lt;dc:relation&gt;http://wrap.warwick.ac.uk/46/&lt;/dc:relation&gt;</pre>
</blockquote>
<p>Unlike Eprints2, the fulltext item is now in a <em>dc.identifier</em> field, the official/publisher URL is still a <em>dc.relation </em>field, which makes it easier to count the former without the latter. EP3 also seems to provide a citation of the item which is also in a <em>dc.identifier</em> as well. (as an aside: EPrints 3.0.3-rc-1, as used by Birkbeck and Royal Holloway, seems to act differently, missing out any reference to the fulltext).</p>
<p><strong><a href="http://www.dspace.org/">Dspace</a>:</strong> I&#8217;ll use Leicester&#8217;s repository, see the <a href="https://lra.le.ac.uk/handle/2381/1835">HTML</a> and <a href="http://lra.le.ac.uk/dspace-oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:lra.le.ac.uk:2381/12">OAI-PMH</a> for the record used. (I was going to use Bath&#8217;s but looks like they have just moved to Eprints!)</p>
<blockquote>
<pre>&lt;dc:identifier&gt;http://hdl.handle.net/2381/12&lt;/dc:identifier&gt;</pre>
<pre>&lt;dc:format&gt;350229 bytes&lt;/dc:format&gt;</pre>
<pre>&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;</pre>
</blockquote>
<p>This is very different to Eprints. <em>DC.identifier</em> is used for a link to the html page for this item (like eprints2 but unlike eprints3 which uses <em>dc.relation</em> for this). However it does not mention either the fulltext item or the official/publisher url at all (this record has both). The only clue that this has a full text item is the <em>dc.format</em> (&#8216;application/pdf&#8217;), and so my hacked up little script looks out for this as well.</p>
<p>I looked at a few other Dspace based repositories (Brunel <a href="http://bura.brunel.ac.uk/handle/2438/1099">HTML</a> / <a href="http://bura.brunel.ac.uk/dspace-oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:bura.brunel.ac.uk:2438/1099">OAI</a> ; MIT <a href="http://dspace.mit.edu/handle/1721.1/5451?show=full">HTML</a> / <a href="http://dspace.mit.edu/oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:dspace.mit.edu:1721.1/5451">OAI</a>) and they seemed to produce the same sort of output, though not being familiar with Dspace I don&#8217;t know if this is because they were all the same version or if the OAI-PMH interface has stayed consistent between versions.</p>
<p>I haven&#8217;t even checked out Fedora, bepress Digital Commons or DigiTool yet (all this is actually quite time consuming).</p>
<p><strong>Commentary</strong></p>
<p>I&#8217;m reluctant to come up with any conclusions because I know the people who developed all this are so damn smart. When I read the articles and posts produced by those (who were) on the OAI-PMH working group, or were in some way involved, it is clear they have a vast understanding of standards, protocols, metadata, and more. Much of what I have read is clear and well written and yet I still struggle to understand it due to my own metal shortcomings!</p>
<p>Yet what I have found above seems to suggest we still have a way to go in getting this right.</p>
<p>Imagine a service which will use data from repositories: &#8216;Geography papers archive&#8217;, &#8216;UK Working papers online&#8217;, &#8216;Open Academic Books search&#8217; (all fictional web sites/services which could be created which harvest data from repositories, based on a subject/type subset).</p>
<p>Repositories are all about open access to the full text of research, and it seems to me that harvesters need to be able to presume that the fulltext item, and other key elements, will be in a particular field. And perhaps it isn&#8217;t too wild to suggest that one field should be used for one purpose, for example, both Dspace and Eprints provide a full citation of the item in the DC metadata, which an external system may find useful in some way, however it is in the dc.identifier field, yet various other bits of information are also in the very same field, so anyone wishing to extract citations would need to run some sort of messy test to try and ascertain which identifier field, if any, contains the citation they wish to use.</p>
<p>To some extent things can be improved by getting repository developers, harvester developers and OAI/DC experts round a table to agree a common way of using the format. Hmm, but ring any bells? I&#8217;ve always thought that the existence of the <a href="http://www.ukoln.ac.uk/interop-focus/bath/">Bath profile</a> was probably a sign of underlying problems with Z39.50 (though am almost totally ignorant on z39.50). even this will only solve some problems, the issue of multiple &#8216;real world&#8217; elements being put in to the same field (both identifier and relation are used for a multiple of purposes), as mentioned above, is still a problem.</p>
<p>I know nothing about metadata nor web protocols (left with me, we would all revert to tab delimited files!), so am reluctant to suggest or declare what should happen. But there must be a better fit for our needs than Simple DC. Qualified DC being a candidate (I think, again, I know <em>nuffing</em>). see <a href="http://www.ukoln.ac.uk/repositories/digirep/index/Issues_with_current_use_of_simple_DC">this page</a> highlighting some of the issues with simple dc.</p>
<p>I guess one problem is that it is easy to fall in to the trap of presuming <em>repository item </em>=<em> article/paper</em>. When of course if could be almost anything, the former would be easy to narrowly define, but the latter &#8211; which is the reality &#8211; is much harder to give a clear schema for. Perhaps we need &#8216;profiles&#8217; for the common different item types (articles/theses/images). I think this is the point that people will point out that (a) this has been discussed a thousand times already (b) has probably already been done!. So I&#8217;ll shut up and move on (<a href="http://efoundations.typepad.com/efoundations/2008/05/swap-and-ore.html">here&#8217;s one example</a> of what has already been said).</p>
<p>Other notes:</p>
<ul>
<li>I wish OAI-PMH had a machine readable way of telling clients if they can harvest items, reuse the data, or even access it at all (apologies if it does allow this already). The human text of an IR policy may forbid me sucking up the data and making it searchable elsewhere, but how will I know this?</li>
<li>Peter Millington of RSP/SHERPA recently <a href="http://www.opendoar.org/demos/psh_prototype">floated the idea of a OAI-PMH verb/command to report the total number of items</a>. His point is that it should be simple for OAI servers to report such a number with ease (probably a simple SQL COUNT(*)) but at the moment OAI-PMH clients &#8211; like mine &#8211; have to manually count each item, parsing thousands of lines of data, which can take minutes, creating processing requirements for both server and client, to <a href=http://092.me>answer</a> a simple <a href=http://092.me>question</a> of how many items are there? I echo and support Peter&#8217;s idea of creating a count verb to resolve this.</li>
<li>Would be very handy if OAI-PMH servers could give an application name and version number as part of the response to the &#8216;Identify&#8217; verb. Would be very useful when trying to work around the differences between applications and software versions.</li>
</ul>
<p><strong>Back to the script</strong></p>
<p>Finally. I&#8217;m trying to judge how good the little script is, does it report an accurate number of full text items. If you run an IR and would be happy for me to run the script against your repository (I don&#8217;t think it creates a high load on the server), then please reply to this post. Ideally with your OAI-PMH URL and how many full text items you think you have, though neither are essential. I&#8217;ll attach the results to a comment to this post.</p>
<p>Food for thought, I&#8217;m pondering the need to check the dc.type of an item, and only count items of certain types, e.g. should we include images? one image of a piece of research sounds fine, 10,000 images suddenly distorts the numbers. Should it include all items, or just those that are of certain types (article, thesis etc)?</p>
<a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Twitter" href="http://twitter.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-3y&#038;text=Hey%20check%20this%20out"><img alt="twitter" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/twitter.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Google+" href="https://plus.google.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-3y"><img alt="google_plus" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/google_plus.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Reddit" href="http://www.reddit.com/submit?url=http%3A%2F%2Fwp.me%2FpeBTe-3y&#038;title=Playing+with+OAI-PMH+with+Simple+DC"><img alt="reddit" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/reddit.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share by email" href="mailto:?subject=Playing+with+OAI-PMH+with+Simple+DC&#038;body=Hey%20check%20this%20out:%20http%3A%2F%2Fwp.me%2FpeBTe-3y"><img alt="mail" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/mail.png" /></a><a target="_blank" title="WordPress Social Media Feather" href="http://synved.com/wordpress-social-media-feather/" style="color:#444; text-decoration:none; font-size:8px; margin-left:5px;vertical-align:10px;white-space:nowrap;"><span>by <img style="margin:0;padding:0;" alt="feather" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/icon.png" /></a></span><div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/' rel='bookmark' title='ircount : new location, new functionality'>ircount : new location, new functionality</a></li>
</ol></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ircount : new location, new functionality</title>
		<link>http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/</link>
		<comments>http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 20:18:08 +0000</pubDate>
		<dc:creator>Chris Keene</dc:creator>
				<category><![CDATA[libraries, library technologies & open data]]></category>
		<category><![CDATA[me me me]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[you need this website]]></category>
		<category><![CDATA[ircount]]></category>
		<category><![CDATA[oai]]></category>
		<category><![CDATA[oai-pmh]]></category>
		<category><![CDATA[open acces]]></category>
		<category><![CDATA[repositories]]></category>

		<guid isPermaLink="false">http://www.nostuff.org/words/?p=209</guid>
		<description><![CDATA[A while a go, I released a simple website which reported on the number of items in UK repositories over time. It collected its data from ROAR but by collecting it on a weekly basis could provide a table showing growth week by week. First it has a new home: http://www.nostuff.org/ircount/ Secondly, it now collects [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2010/ircount-update/' rel='bookmark' title='ircount : update'>ircount : update</a></li>
<li><a href='http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/' rel='bookmark' title='Playing with OAI-PMH with Simple DC'>Playing with OAI-PMH with Simple DC</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-development/' rel='bookmark' title='ircount development'>ircount development</a></li>
</ol>
</div>
]]></description>
				<content:encoded><![CDATA[<p>A while a go, I released a simple website which reported on the number of items in UK repositories over time. It collected its data from <a href="http://roar,eprints.org/">ROAR</a> but by collecting it on a weekly basis could provide a table showing growth week by week.</p>
<p>First it has a new home: <strong><a href="http://www.nostuff.org/ircount/">http://www.nostuff.org/ircount/</a></strong></p>
<p>Secondly, it now collects data for every institutional (and departmental) repository registered in ROAR across the world. Not just the UK. It has been collecting the data since July.</p>
<p>The country integration isn&#8217;t perfect, you have to select a country, and then you are more or less restricted to that country (though you can hack it, see the &#8216;info&amp;help&#8217;), and there is a lot of potential with improving this. There are also a couple of bugs, for example when comparing four repositories it seems to (a) forget which country you were dealing with, and (b) it stops showing the graph/chart.</p>
<p>I&#8217;m currently looking at trying to make an educated guess at how many fulltext items are in a given repository. This is proving to be a steep learning curve in the joys of OAI-PMH, and how the different repository systems (and the different versions on these systems) have allocated information about the fulltext in to different Dublin Core (DC) elements. But this is for another post.</p>
<p>In the mean time, I hope the worldwide coverage is of some use, and feel free to leave any comments.</p>
<a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Twitter" href="http://twitter.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-3n&#038;text=Hey%20check%20this%20out"><img alt="twitter" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/twitter.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Google+" href="https://plus.google.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-3n"><img alt="google_plus" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/google_plus.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Reddit" href="http://www.reddit.com/submit?url=http%3A%2F%2Fwp.me%2FpeBTe-3n&#038;title=ircount+%3A+new+location%2C+new+functionality"><img alt="reddit" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/reddit.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share by email" href="mailto:?subject=ircount+%3A+new+location%2C+new+functionality&#038;body=Hey%20check%20this%20out:%20http%3A%2F%2Fwp.me%2FpeBTe-3n"><img alt="mail" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/mail.png" /></a><a target="_blank" title="WordPress Social Media Feather" href="http://synved.com/wordpress-social-media-feather/" style="color:#444; text-decoration:none; font-size:8px; margin-left:5px;vertical-align:10px;white-space:nowrap;"><span>by <img style="margin:0;padding:0;" alt="feather" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/icon.png" /></a></span><div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2010/ircount-update/' rel='bookmark' title='ircount : update'>ircount : update</a></li>
<li><a href='http://www.nostuff.org/words/2008/playing-with-oai-pmh-with-simple-dc/' rel='bookmark' title='Playing with OAI-PMH with Simple DC'>Playing with OAI-PMH with Simple DC</a></li>
<li><a href='http://www.nostuff.org/words/2009/ircount-development/' rel='bookmark' title='ircount development'>ircount development</a></li>
</ol></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nostuff.org/words/2008/ircount-new-location-new-functionality/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>UK repositories : growth of records</title>
		<link>http://www.nostuff.org/words/2008/uk-repositories-growth-of-records/</link>
		<comments>http://www.nostuff.org/words/2008/uk-repositories-growth-of-records/#comments</comments>
		<pubDate>Sun, 30 Mar 2008 19:06:36 +0000</pubDate>
		<dc:creator>CJK</dc:creator>
				<category><![CDATA[libraries, library technologies & open data]]></category>
		<category><![CDATA[web and blogs]]></category>
		<category><![CDATA[you need this website]]></category>
		<category><![CDATA[ircount]]></category>

		<guid isPermaLink="false">http://www.nostuff.org/words/2008/uk-repositories-growth-of-records/</guid>
		<description><![CDATA[For a while now I&#8217;ve been running a weekly script which connects to ROAR and grabs the number of records for each UK based Institutional Repository. I&#8217;ve finally got around to writing a web front end to this, which you can see here. All quite basic at the moment, and I have lots of ideas [...]<div class='yarpp-related-rss'>

Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/or08-open-repositories-2008/' rel='bookmark' title='or08: Open Repositories 2008'>or08: Open Repositories 2008</a></li>
</ol>
</div>
]]></description>
				<content:encoded><![CDATA[<p>For a while now I&#8217;ve been running a weekly script which connects to <a href="http://roar.eprints.org">ROAR</a> and grabs the number of records for each UK based Institutional Repository. I&#8217;ve finally got around to writing a web front end to this, which <a href="http://researchonline.lib.sussex.ac.uk/ir_stat/index.php">you can see here</a>. All quite basic at the moment, and I have lots of ideas of what I could do to improve this (one idea based on the compare average number of deposits per repository). Have a look and let me know what you think, and let me know of any bugs.</p>
<p><strong><a href="http://researchonline.lib.sussex.ac.uk/ir_stat/index.php">UK Repository Records Statistics</a> </strong>(the name sucks!)</p>
<a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Twitter" href="http://twitter.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-1s&#038;text=Hey%20check%20this%20out"><img alt="twitter" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/twitter.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Google+" href="https://plus.google.com/share?url=http%3A%2F%2Fwp.me%2FpeBTe-1s"><img alt="google_plus" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/google_plus.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share on Reddit" href="http://www.reddit.com/submit?url=http%3A%2F%2Fwp.me%2FpeBTe-1s&#038;title=UK+repositories+%3A+growth+of+records"><img alt="reddit" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;margin-right:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/reddit.png" /></a><a class="synved-social-button synved-social-button-share" target="_blank" rel="nofollow" title="Share by email" href="mailto:?subject=UK+repositories+%3A+growth+of+records&#038;body=Hey%20check%20this%20out:%20http%3A%2F%2Fwp.me%2FpeBTe-1s"><img alt="mail" class="synved-share-image" width="24" style="width:24px;margin:0;margin-bottom:5px;" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/social/regular/24x24/mail.png" /></a><a target="_blank" title="WordPress Social Media Feather" href="http://synved.com/wordpress-social-media-feather/" style="color:#444; text-decoration:none; font-size:8px; margin-left:5px;vertical-align:10px;white-space:nowrap;"><span>by <img style="margin:0;padding:0;" alt="feather" src="http://d1xh8d5g4rj6lx.cloudfront.net/words/wp-content/plugins/social-media-feather/synved-social/image/icon.png" /></a></span><div class='yarpp-related-rss'>
<p>Related posts:<ol>
<li><a href='http://www.nostuff.org/words/2008/or08-open-repositories-2008/' rel='bookmark' title='or08: Open Repositories 2008'>or08: Open Repositories 2008</a></li>
</ol></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.nostuff.org/words/2008/uk-repositories-growth-of-records/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 10.576 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-06-19 17:03:19 -->

<!-- Compression = gzip -->