Deduplication is a specialized data compression technique designed to eliminate duplicate copies of repeating data. It comprises three different methods, each with its own set of rules and extent functions. Depending on the nature of your data, deduplication ratios will vary. Most software is able to achieve 15 percent deduplication or better, depending on the method used and the nature of the data itself.
File level deduplication looks at the entire content of a file and generates a hash. It then combines the data compiled about the file with other metrics, such as time stamps and file size, in order to identify duplicates. It then builds links to the duplicates.
Block level deduplication slices file contents into fixed-sized blocks and creates a hash based on the block data. Identical blocks trigger reporting of duplicate files. These blocks are often very small—8 or 32kb, in many cases.
Array level deduplication functions at block level, but uses variable size blocks. They apply various criteria to each block to determine if or where duplicates exist.
Advantages of Deduplication
The primary advantage of each deduplication method is the reduction of instances where networks must process the same data. This saves on productivity, bandwidth, and storage, and significantly shortens the amount of time it takes to do a thorough backup. The specific advantages include:
• Superior Flexibility
Deduplication allows data that does not deduplicate well to be left in a non-deduplicated state. This helps increase efficiency and decrease time, and places a lighter load on system resources. Restores are faster and users are able to provision data housed on an existing storage system, saving as much as 90 percent of the cost associated with appliance-based storage solutions.
• Shortened Backup Windows
Admins can schedule deduplication outside backup windows, which can lead to a significant reduction in time, budget, and human resources.
Disadvantages of Deduplication
There are, however, specific costs to using deduplication, which are necessary to understand. For example, the chunks of meta data linked via deduplication can become rather large. Also, when backing up and restoring, deduplication becomes a complex and less-efficient process if backups have been running for a while. Purging these large files also becomes less efficient.
On a whole, deduplication is a good practice. It is a reliable backup method and saves in bandwidth and storage, while also helping minimize costs involved in implementing and executing data backup solutions.