What is Duplicacy?
Duplicacy is a new generation cross-platform cloud backup tool based on the idea of Lock-Free Deduplication. By means of what is this Lock-Free deduplication?. We first look into some general terms then moving forward with this.
Deduplication is the identification and elimination of duplicate blocks within a dataset. It is similar to compression, which only identifies redundant blocks in a single file. Deduplication can find redundant blocks of data between files from different directories, different data types, even different servers in different locations.
How Deduplication works?
The usual way that deduplication works is that data to be deduped is chopped up into chunks. A chunk is one or more contiguous blocks of data. Now these small series of chunks will then be compared against all previous chunks seen by a given deduplication system. The way the comparison works is that each chunk is run through a deterministic cryptographic hashing algorithm, such as SHA-1, SHA-2, or SHA-256, which creates what is called a hash. For example, if one enters “Data” into a SHA-256 hash calculator, you get the following hash value:
cec3a9b89b2e391393d0f68e4bc12a9fa6cf358b3cdf79496dc442d52b8dd528. If the hashes of two chunks match, they are considered identical, because even the smallest change causes the hash of a chunk to change eg, if you change “D” to “d” in above “data” then your will be
3a6eb0790f39ac87c94f3856b2dd2c5d110e6811602261a9a923d3bb23adc8b7. A SHA-256 hash is 256 bits. If you create a 256-bit hash for an 8 MB chunk, you save almost 8 MB every time you back up that same chunk. This is why deduplication is such a space saver.
What is Lock-Free Deduplication?
In lock-free deduplication, Backup system doesn’t use a centralized indexing database for tracking all existing chunks and instead, to check if a chunk has already been uploaded before, one can just perform a file lookup via the file storage API using the file name derived from the hash of the chunk. This effectively turns a cloud storage offering only a very limited set of basic file operations into a powerful modern backup backend capable of both block-level and file-level deduplication. More importantly, the absence of a centralized indexing database means that there is no need to implement a distributed locking mechanism on top of the file storage.
By eliminating the chunk indexing database, lock-free duplication not only reduces the code complexity but also makes the deduplication less error-prone. Each chunk is saved individually in its own file, and once saved there is no need for modification. Data corruption is therefore less likely to occur because of the immutability of chunk files. Another benefit that comes naturally from lock-free duplication is that when one client creates a new chunk, other clients that happen to have the same original file will notice that the chunk already exist and therefore will not upload the same chunk again. This pushes the deduplication to its highest level – clients without knowledge of each other can share identical chunks with no extra effort.
- Cross Computer Deduplication
- Centralize Chunk Database-less Approach ( Lock-free Deduplication )
- Faster Performance
- Supports wide variety of storage solutions like - S3, Backblaze, DropBox, SFTP, Microsoft OneDrive, Google Drive etc.
- Feature Comparison with others
Note - Read More Here - https://github.com/gilbertchen/duplicacy
Backup Servers with Duplicacy GUI
Note: Duplicacy GUI is not a free ( it gives 15 day trial ). If you are a home user duplicacy pricing lies 20$ for first year and 5$ after every subsequent year.
Author Akash Rajvanshi
LastMod Wednesday, February 3, 2021
License Akash Rajvanshi