CIS TECHNICAL REPORTS
Home People Undergraduate Graduate Research Contact
 

TECHNICAL REPORT ARCHIVE


Compressing File Collections with a TSP-Based Approach

TR-CIS-2004-02 (04/29/2004)
Dimitre Trendafilov, Nasir Memon, Torsten Suel

pdf version of this paper

Abstract
Delta compression techniques solve the problem of encoding a given target file with respect to one or more reference files. Recent work has demonstrated the benefits of using such techniques in the context of file collection compression. In these scenarios, files are often better compressed by computing deltas with respect to other similar files from the same collection, as opposed to compressing each file by itself. It is known that the optimal set of such delta encodings, assuming that only a single reference file is used for each target file, can be found by computing an optimal branching on a directed graph.

In this paper we propose two techniques for improving the compression of file collections. The first one utilizes deltas computed with respect to more than one file, while the second one improves the compressibility of batched file collections, such as tar archives, using standard compression tools. Both techniques are based on a reduction to the Traveling Sales Person problem on directed weighted graphs. We present experiments demonstrating the benefits of our methods.

Back to previous page

 
  poly thinking