DINI e.V.
DINI e.V.
               

DINI / Projekte / OA-Statistik / english

deutsch | english

Project Description

The effort of producing and publishing a text is in vain if it is not read and noticed. This works with scientific publications in the same way as it does in the realm of fiction.

The ease of access experienced with Open Access publications – lacking any need for authentification, financial transactions or personal identification makes it much easier to achieve a satisfying level of reception in a scientific community. This and similar hypotheses can be investigated by empirical analysis.

Requests can be measured easily as webservers store most of the necessary pieces of information for internal managment purposes. (Any approach which tries to surpass this level of insight has to face among others the problem that HTTP is a one way protocol. This makes it impossible on this layer to learn anything definitive about dependent variables like success of file transfer or the time an user stays on a document before moving on.)

Scientific Publications cover a wide variety of publishers, hosts, business models, usage models, publication stages, logical and technical presentation. Therefore it is important to learn which portions of the publication space can be and which agents want to be included in the sampling. For those willing to participate only two aspects are relevant:

  1. What data needs to be gathered?
  2. How can it be transferred to the statistics provider?

Open-Access-Statistics (OA-S) is a joint project adressing these questions. Starting in July 2008 an infrastructure for the standardised accumulation of heterogenous web log data with an emphasis on institutional repositories will be built. In tight cooperation with the Network of Open Access Repositories (OA-N) various added value services will be made available to users.

The German Research Foundation funds OA-S in the start-up phase. Later on the project partners will supply it as a routine service. The initial idea was formulated by DINI (Deutsche Initiative für Netzwerkinformation / German Initiative for Network Information), more precisely by their workgroup Electronic Publishing. OA-Statistics can be seen as part of DINI's initiative to build a network of certified repositories across Germany and is one of three proposals addressing related themes:

Project Partners of OA-S are Georg-August Universitaet Goettingen (State- and University Library), Humboldt-Universitaet zu Berlin (Computer- and Mediaservice), Saarland University (Saarland University and State Library), and the University Stuttgart (University Library).

The actions undertaken are linked with national and international cooperations among others Digital Repository Infrastructure Vision for European Research (DRIVER), Ligue des Bibliothèques Européennes de Recherche (LIBER), and Joint Information Systems Committee (JISC).

From the perspective of the central service/statistics provider, various data providers are sources for access data. In OA-Statistics, these will be the participating repositories (Berlin, Goettingen, Saarbruecken and Stuttgart), and in the next stage of expansion all DINI-certified repositories (http://www.dini.de/no_cache/service/dini-zertifikat/zertifizierte-server/)

In the long run, connections to other repositories are to be expected. The infrastructure is planned to be open for national and international repository providers to join in and benefit from the data aggregating and processing services provided by the central service provider.

Strictly speaking access events will be collectable only on different levels of granularity. Not only do different repositories use different software solutions (DSpace, OPUS, edoc), but there are also qualitative differences between the information gathered on a server actually hosting documents and the information horizont on a link-resolving (SFX, Ovid) or license-controlling (HAN) server.

The aggregates derived by the statistics provider from the access data generated locally will be hosted on a central server, referred to as Service Provider. Local repositories will be able to create added value services by integrating statistics into the documents' frontdoors. Another currently popular example would be a recommendation system based on click stream analysis. In the beginning repositories will probably concentrate on the portion of data which is describing their digital objects such as reliable usage frequencies distinguishing between local and and international visitors.

The DINI-Certificate will be extended by proposals concerning and supporting data collection, presentation of statistical information and integration of the own repository into the OA-N networking infrastructure in order to propagate the result of this project. Guidelines, technical documentation, and software implementations will be provided to interested repositories. Thus they can – with a reasonably small effort – realise and concentrate on services which support the user in his quest for knowledge.

OA-Statistics was presented and discussed at various workshops and conferences such as "Supporting the scientific information landscape in Germany" (dini.de/veranstaltungen/workshops/oa-netzwerk-2008/) and the 10th InetBib Conference (ub.uni-dortmund.de/inetbib2008/).

OA-Statistics received several letters of support, among others from the LIBER Access Division.

Workflows

The OAS infrastructure is two-tier. Firstly, the data providers generate logs about document usage and pseudonymize user information (e.g. IP addresses). In the following step they process usage information (add a unique document ID, transforms data into OpenURL ContextObjects etc.) and finally offer the information via OAI-PMH.

Workflow of the OAS data provider

Secondly, the central service provider collects the usage events from each single data provider and processes this data. It deduplicates documents (e.g. it sums up the hits on files with the same content on different servers) and also deduplicates users, so it is possible to create download graphs or to conduct clickstream analysis. It also processes the data according to the three standards mentioned beforehand (including the removal of non-human access and considering standard-specific parameters like double-click spans). After the calculation the usage data will be retransferred to the distributed services (the data providers) and to the Open Access Network service.

Workflow of the OAS service provider

Terms of use