OLNet+-+Draft+Paper

back to main page
=DRAFT= =Tracking Downloadable OER Content= One of the great promises of Open Educational Resources is their ability to be reused because of the rights that open licenses grant and the minimal costs to replicate and share electronic media in general. In the realm of informal sharing, anecdotal stories abound about the reuse of freely and openly licensed shared resources, often backed up with numbers from individually controlled, commodity web analytic services. These help foster the “[|power of positive narcissism]," providing additional evidence to individual content owners of the value of sharing.

Such evidence has been much harder to find within formal OER projects. Very few of the systems deployed to share OER provide views and reuse metrics back to the content owners. In the case of OER “repository” models that share content that has been removed from its original context of use, and share it in such a way that the content can be downloaded and used in totally new contexts (instead of merely linked in place), very little is known of what becomes of downloaded resources and how often they are reused. This lack of data is a critical failing in providing evidence of actual benefit. It also greatly undercuts individual motivations to share, which are often buttressed by evidence of actual reuse.

The goal of my work as an OLNet Fellow was to examine the possible ways in which tracking data could be generated for resources that are downloaded out of their original context, implement and document a solution in a specific context, in the initial case for content being shared through BCcampus' Shareable Online Learning Resource (http://solr.bccampus.ca) service, and finally report back on initial findings on the user experience and the overall value of the resulting data in a follow up report.

=Requirements= There are three high level requirements any solution must fulfill in order to be successful, the first technical, the second two more social in nature.
 * 1) The solution should return data on where and how often resources are used after they are downloaded from their original context
 * 2) The solution must work as much as possible within existing workflow and not require effort on the part of content owners that is not commiserate with the reward they will receive nor should it place the onus on content owners to solve institutional needs for reporting requirements.
 * 3) The solution should be palatable to the OER community, both content authors and reusers. Ideally it should preserve the privacy of the end users reusing the content or have their permission before sending any data back to a tracking server

=Evaluating Possible Approaches= The lack of data on reuse of data on digital resources is not specific to OER alone, but it is exacerbated in the cases where content is downloaded out of its original context and reused elsewhere. The solution we sought should, in addition to providing data back of how often a resource was viewed and from where after it was downloaded, also be simple for the content owners to implement. We sought to find an approach that would integrate directly into existing workflows and systems and not require onerous additional steps or account creation. We also sought an approach that could provide both data to the content owners but also data to other users as well as aggregate data across the service. Finally, to the extent to which it was possibly, we hoped to find a solution that was inexpensive or free, allowed us to keep full ownership of the tracking data, and avoided hosting it on US-based servers (for fear of conflicts between local privacy policies and the US Patriot Act).

We began by assessing a [|very useful draft of possible ways to track resources] (cf http://wiki.cetis.ac.uk/Web_search) created by Phil Baker of CETIS. To this list of potential approaches we added the possibility of using one of the emerging commercial services such as [|Tynt], [|Fairshare]or [|ImageStamper] as other possible solutions.

Baker identifies 5 approaches to tracking OER reuse (though really only 4 are distinct):
 * 1) Web usage stats
 * 2) Google and other online Analytics
 * 3) URL redirects
 * 4) Web bugs
 * 5) Web search

The first, Web Usage Stats, is based on analyzing HTTP log files. While this may give you some data on the use of resources on their original server, it will not tell you anything about the resource once it has been moved somewhere else.

Similarly, the third approach, URL redirects, the process of linking not to the resource directly but to a redirection service intermediary that would also collect usage data suffers from the same issue, that it would work only for content that was consistently hosted in the same location (or else would require an unlikely workflow in order that links within the resource point back to a URL redirection service.)

The fifth approach, Web Search, is a possibility and deserves some additional investigation. However, its major shortcoming is that it will only ever return results that are visible on the public web. Within the context of reuse of resources in formal education, where content will very often be shared with students within an authenticated environment, or in the context of individual users viewing content on their desktop, this approach would fail to register these as part of the data. Furthermore, this approach is unlikely to yield clean, easily interpretable results of how many times resources are actually used, and instead simply returns the new locations in which they are posted publicly.

In addition to these 5 approaches we consider the possibility of using a service like [|Tynt] or reproducing its functionality, which is in essence a javascript library that causes attribution information and links to be inserted into the browsers clipboard any time content is copied off a page where it is active. Not only was this approach ruled out because it does not directly solve the problem we are trying to address, it was felt that such an approach runs the risk of greatly alienating end-users as it essentially hijacks what is otherwise a standard browser function of copy/pasting in a way that seems harmful to the spirit in which OER are shared. Similarly, while potentially of some interest, services like [|Fairshare] are very similar to Web Search described above, and will ultimately only report back on public reuses of content, while a service like [|ImageStamper] is also of interest but both specific to image tracking and possibly a bit too similar to more conventional DRM watermark approaches to be palatable for Open Education content.

Thus we were left with Web bugs/Analytics (with Google Analytics being a specific case of this) as the remaining choice to consider. Nick Freer of the Open University/OLNet project wrote up a good initial description of how this could be implemented in RSS feeds and what some of the potential issues are. Upon further examination and initial testing it was found that indeed, the same technique which is employed to retrieve tracking data for sites can be used to retrieve data for web content that is downloaded and used elsewhere.

Why not just use Google Analytics?
In the approach tried here and described below, we used the Open Source web analytics package Piwik to generate the tracking code and report the tracking data. The obvious questions this raises though, is why not to use the dominant free web analytics package, Google Analytics, and indeed how what is being proposed here is any different than what other OER repositories like Connexions are already doing to track OER use using Google Analytics (cf. http://www.edtechpost.ca/wordpress/2010/07/12/olnet-tracking-oer-first-stab/).


 * results are shareable
 * results aren't aggregate-able across an organizational repository
 * depends too much on the lone individual
 * harder to take advantage of open source API to build into existing business process

=Constraints=
 * ~ Approach Name ||~ Pros ||~ Cons ||
 * Web Search || * no additional work required by content owners; content is found based on searching for certain strings in the original content || * works for the open web only
 * difficult to get a picture of unique uses (jumbled together in search results)
 * only reports copies of the original content, not actual views of it ||
 * Forced Citation (cf. Tynt) || * ensures that content which is copied and pasted into a new location includes a reference to the original source || * hijacks a standard browser process; intrusive to normal workflow of content re-users ||
 * Web bug || * provides data on both views and new locations where content is being reused
 * works on both public and private web locations as well as from the users desktop || * only tracks "web" content like HTML pages
 * May employ javascript which can be a problem with certain reuse environments (e.g. VLEs) ||
 * Google Analytics || * provides data on both views and new locations where content is being reused
 * works on both public and private web locations as well as from the users desktop || * Stats are not easily shared / public
 * Places responsibility on content owner even though others (funder) may be equally motivated
 * Employs javascript, which some services will strip out when the content is re-uploaded ||

=Current Context= The author is also the manager of a repository service for the province of British Columbia, Canada. This service has been put into place primarily to facilitate the sharing the content produced through BC's Online Program Development Fund (http://www.bccampus.ca/online-program-development-fund-opdf-2/) administered by BCcampus, a collaborative online learning agency that supports BC’s public post-secondary institutions.

As part of receiving this provincial funding, recipients agree to share the resulting content under one of two license, either a Creative Commons Attribution-ShareAlike 2.0 Canada License, or else a BC Commons license, a regional license introduced by BCcampus to facilitate sharing between the 25 public post-secondary institutions in BC. As part of the orientation process, content developers are pointed at the BC Commons License Generator (http://solr.bccampus.ca/bcc/BCcommons/publish/publish.html) and encouraged to choose the appropriate license and generate a small html snippet to insert into their content template.

The repository, SOL*R (http://solr.bccampus.ca/) currently uses the Equella repository software from The Learning Edge. The content is predominantly LMS-focused, instructor-facilitated learning materials, typically either multi-page html sites, IMS Content Packages, other proprietary export formats from different LMS, as well as rich media like Flash and video content. For the content which can be previewed (typically native web content) the software offers the ability to view it in place (and potentially link to it), though in practice few do. The more predominant use case is for instructors to use the 'Preview' functional to quickly assess the usefulness of the resource, but then to download it from the repository, either for remixing or upload to their own LMS. While the repository software can currently produce statistics about the number of times a resource //record// is viewed, and the number of times a resource //license// is manually agreed to, it obviously does not produce any statistics on the viewing and reuse of the content once its left the repository, the key motivator behid this research.

=Proposed Solution= The proposed solution in the case of BCcampus is to build on to the existing capabilities of the open source web analytics program Piwik (http://piwik.org/) and link it to the //existing// workflow of content contributors to SOL*R. Content owners are already encouraged to use the BC Commons license generator to insert a license declaration in their content. The license generator will be updated to now include another question asking the content owners if they wish to receive tracking data about their content after it is downloaded from SOL*R. If they agree to this, a small placeholder will be inserted into the content directly following the license text.

The reason for only inserting a placeholder at this point is that the content does not yet have a permanent URL or identifier associated with it, and will not get one until it is uploaded to SOL*R. Once content has been contributed and this URL/UUID assigned, a script, run as an hourly cron job at the server OS level, will crawl the filestore where the content is deposited. The script will be looking for instances of the placeholder that was inserted as part of the license.

---edit---

Later on, the content contributor deposits this content in SOL*R ([]). Based on the unique ID assigned by SOL*R (e.g. 44055c53-85ca-c32f-c67d-5356660a361a) in the contribution process, a script on the server will
 * Search the content in the file store that has been recently added and identify those that have the distinct tracking comment in them
 * Look up the system generated UUID for that content (in the case of SOL*R this is contained in the directory name in which the folder is contained)
 * •Via the API, send a query to the Piwik analytics password using the UUID to generate a new tracking code, and retrieve the code to be inserted in the content
 * Find and replace the distinct tracking comment with the unique tracking code from piwik

Once the content has been uploaded, both views “in place” (e.g. within the repository itself, in ‘preview’ mode) as well as views of the content after it has been downloaded will generate the following tracking data:


 * How many times the content has been viewed
 * The referring location of the URL that led to the content
 * The IP address of the user viewing the content









=Considerations, Issues and Constraints= javascript, workaround "Some web content management systems limit what JavaScript can be embedded into resources. Also, this approach is a complete non-starter if the resources are hosted on a third-party web2.0 site."

only tracking html "The tracking code only works for HTML documents; if your OERs are in another format then you need another approach" (http://wiki.cetis.ac.uk/Online_analytics)

won't work if tracking code/image is removed or blocked

can only get data on what is being tracked - if no tracking code has been placed on a page, no data will return. Thus the recommendation will be to insert the license/placeholder as part of a template that will then show up on ALL subsequent pages.

=Next Steps / Questions=