welcome to theresourcedepot
|This document provides an introductory overview of theresourcedepot,
a transactional archiving solution hosted in the cloud.
Transactional Archiving consists of selectively capturing and storing transactions that take place
between a web client (browser) and a web server.
Most existing web archives recurrently send out bots to crawl the content of web servers. This results in observations of a server's content at the time of crawling. Since the crawling frequency is generally not aligned with the change rate of a server's resources, this approach is typically not able to capture all versions of a server's resource. The resulting archive may provide an acceptable overview of a server's evolution over time, but it will not provide an accurate representation of the server's entire history.
A Transactional Archive, however, captures every version of a resource as it is being requested by a browser. The resulting archive is effectively representative of a server's entire history, although versions of resources that are never requested by a browser will also never be archived.
Existing web archives, like the Internet Archive, may benefit from Transactional Archives by importing the detailed archived content in bulk.
theresourcedepot is a Transactional Archive solution,
that archives content from web servers in the cloud. Currently, only Apache servers are supported.
In order to activate transactional archiving in theresourcedepot,
a filter, called |
theresourcedepot incorporates the elegant functionality of the Memento protocol to easily discover and retrieve archived content. The Content Server can make TimeGates that theresourcedepot makes available for its resources discoverable via HTTP Link Headers and robots.txt files, so that Memento clients can easily access the archived versions.
theresourcedepot archives content from various Content Servers. Each Content Server is given a unique ID and is archived in a separate database. To ensure that submitted content originates from a legitimate Content Server, theresourcedepot registers its IP address, and only accepts content from registered IP addresses.
The installation instructions for
mod_ta Installation Instructions
Installing mod_ta Apache Extension:
To install the mod_ta filter that will allow activation of transactional archiving, run the following command:
sudo /usr/sbin/apxs -c -i –a mod_ta.c
This will install the mod_ta filter in your Apache Web Server. The installation is successful if the file
Add the following lines to the apache configuration file:
<IfModule ta_module>And configure as follows:
To let Memento clients, access archived content in theresourcedepot,
the following services are provided. Please refer to the Memento protocol for additional information.
To access the TimeGate at theresourcedepot for the resource with URI
curl -D headers.txt -H Accept-Datetime:'Wed, 29 Sep 2008 12:00:04 GMT’ \
To retrieve a Memento (archived version of a resource) from theresourcedepot for the resource with URI
curl -D headers.txt \In this URI,
To retrieve a link-value formatted TimeMap from theresourcedepot for the resource with URI
Pushing content into theresourcedepot:
Every time the server responds to a HTTP GET request for the resource with URI
and uses the HTTP PUT method to submit the following data related to the client's request: