ddhad grown way too large. This post explains what happened and how to solve it.
As described in this post, I use
dd takes anything
ddto take system snapshots, in case I mess up my installation. What
dddoes is a bit-to-bit copy of the input device, no matter the content. It does not care about file system, format, or even if there is any real information at a given location.
System disc fullI had a little mishappening while backing up my data with rsync.
A mistake in the destination disc resulted in all my data (500 GB) being copied to the system disc (120 GB).
Since there was not enough room for the whole data,
rsynccopied what it could then stopped when the system disc was full.
Backup sizeNext time I took a system snapshot with
dd, the resulting image file was 57 GB. Since the original clean image was 14 GB, this size was unexpected.
Deleted filesWhen files are deleted form a disc, the memory area they occupy is marked as free. Any program that wants to use memory to store data may use this freed area and overwrite its content.
The act of deleting does not discard the data though. The bit content is still in place, although marked as overwritable. This is why deleting the content of large folders is much faster than copying (only about five seconds for 200 GB of data, as I could accidentally experience on my machine), marking as free is much faster than overwriting all the bits.
After filling up my system disc with backup data, all bits in the free memory areas had been written to. Deleting this data did mark these areas as free, but the bit content was left.
As mentioned before,
ddtakes all bits no matter what they mean. This includes the free memory areas. My
ddsystem backup included all the data backup I had mistakenly put on the disc then erased.
The system backups I make run also through
gzip. The areas of my system disc that had never been written to had homogeneous data (all ones or all zeroes), making it easy for
gzipto compress very effectively. With real data in these areas,
gzipcould not compress as much.
- The system disc had first all 0 or all 1 in the unused memory areas,
- the erroneous data backup wrote data to these formerly unused locations,
- the data backup was deleted, thus freeing the memory areas again, but leaving the bit values as they were,
- the next
dddid backup these freed memory areas into the image file,
gzipcould not compress as easily as when there were large chunks of homogeneous data.
Clearing the free areasTo fill up the free areas is easy: just create a big file full of zeroes, and this file will be placed wherever there is free memory.
Delete the file full of zeroes, and these areas will be marked as free again.
To do this,
ddcan be used again, with /dev/zero as input.
/dev/zerois an infinite stream that produces zeroes as long as requested.
Since there is no limit given in that command line,
ddgoes on until the disc is full.
Next system backup... was only 19 GB instead of 57. No actual data was removed, only the free areas were replaced with zeroes, to give
gzipa little push.
This can also serve as a reminder: deleted files are easy to recover until the areas where they were placed is overwritten. Zeroing the free areas as described here is not sufficient either, there are analysis methods that can recreated data even if overwritten. If you really want to make sure that data is destroyed, you should overwrite it with random data using
/dev/random, and do that several times. Or bring your disc to a grinder.