Overclockers Australia Forums

OCAU News - Wiki - QuickLinks - Pix - Sponsors  

Go Back   Overclockers Australia Forums > Specific Hardware Topics > Storage & Backup

Notices


Sign up for a free OCAU account and this ad will go away!
Search our forums with Google:
Reply
 
Thread Tools
Old 8th November 2016, 1:44 AM   #1
demiurge3141 Thread Starter
Member
 
Join Date: Aug 2005
Location: Melbourne 3073
Posts: 807
Default A parity/ecc program for photo storage?

I have about 2TB of personal photos/videos, I keep a copy on NAS with mirrored disks, and a copy on a 2.5' external usb drive. Yesterday I ran freefilesync to check file contents, it took 10 hours but at the end I found that bitrot has hit three of my files on the external drive (it's very easy to see once you open the image file). Fortunately I was able to recover using my NAS copies.

Now this has me thinking, is there a parity program that automatically generates a parity/recovery file for each file on a drive without one, and checks the file against the parity file for each file that has one? Even a commandline program would work as I can write a batch file. Pure checksum programs like MD5/SHA1 unfortunately won't do recovery.

Quickpar/par2 would generate parity files for the entire collection at once, but I don't want to regenerate the whole parity file every time I add some photos. Plus if every file has its own parity/recovery file it makes moving them around much easier.

Now before you go all ZFS on me remember this is a removable drive, and I need to be able to connect it to a PC and download images from a camera.
demiurge3141 is offline   Reply With Quote

Join OCAU to remove this ad!
Old 8th November 2016, 2:08 PM   #2
HobartTas
Member
 
Join Date: Jun 2006
Posts: 490
Default

Sadly ZFS is the only thing that manages bitrot and recovery automatically, but to partially answer your question assuming you have the data on your PC (and one or more backup copies) what I used to do was winrar them up using zero compression because most of the time the stuff wasn't compressible but the advantage of that would be that each file was checksummed, you couldn't repair the damaged file but at least you could detect it easily and replace it with a good one. Although I don't know of any there may be other software around that does this sort of thing.

Unfortunately, the storage medium you are using (NTFS) is a filesystem that typically doesn't notify you of bad blocks and more importantly just replaces them with blank data which gives you the problems you are currently experiencing. You need to build in redundancy of some sort to repair the data like the PAR2 system you mentioned on a per file basis or alternatively with either disk mirrors or Raid 5/6 if you want to do this for the entire filesystem.

The only way I could conceive doing this on a single external would be to install a VM on your PC which runs a ZFS capable filesystem like Solaris/FreeBSD/Linux with ZOL and format the external with ZFS, you then have to set the copies=n flag to something higher than the default value of 1, so setting it to 2 would mean that 2 copies of your data would be written and since both would be independently checksummed the invalid block would be repaired with the good copy, however, you would incur substantial overhead as say a 6TB drive would only hold 3 TB's worth of data (2 TB with say copies=3 and so on), but if you need this to be done in a transparent and hassle free manner without having to run programs all the time checking parity then such an arrangement might be useful for you.
HobartTas is offline   Reply With Quote
Old 8th November 2016, 7:14 PM   #3
demiurge3141 Thread Starter
Member
 
Join Date: Aug 2005
Location: Melbourne 3073
Posts: 807
Default

Quote:
Originally Posted by HobartTas View Post
Sadly ZFS is the only thing that manages bitrot and recovery automatically, but to partially answer your question assuming you have the data on your PC (and one or more backup copies) what I used to do was winrar them up using zero compression because most of the time the stuff wasn't compressible but the advantage of that would be that each file was checksummed, you couldn't repair the damaged file but at least you could detect it easily and replace it with a good one. Although I don't know of any there may be other software around that does this sort of thing.

Unfortunately, the storage medium you are using (NTFS) is a filesystem that typically doesn't notify you of bad blocks and more importantly just replaces them with blank data which gives you the problems you are currently experiencing. You need to build in redundancy of some sort to repair the data like the PAR2 system you mentioned on a per file basis or alternatively with either disk mirrors or Raid 5/6 if you want to do this for the entire filesystem.

The only way I could conceive doing this on a single external would be to install a VM on your PC which runs a ZFS capable filesystem like Solaris/FreeBSD/Linux with ZOL and format the external with ZFS, you then have to set the copies=n flag to something higher than the default value of 1, so setting it to 2 would mean that 2 copies of your data would be written and since both would be independently checksummed the invalid block would be repaired with the good copy, however, you would incur substantial overhead as say a 6TB drive would only hold 3 TB's worth of data (2 TB with say copies=3 and so on), but if you need this to be done in a transparent and hassle free manner without having to run programs all the time checking parity then such an arrangement might be useful for you.
I don't need the file system to do checksums in the background. The disk is seldom connected. I can see the files are correctly transferred when I quickly scroll through the new thumbnails. All I need is a slightly more sophisticated checksum algorithm that can also correct single bit errors. I'm sure something like that exists?
demiurge3141 is offline   Reply With Quote
Old 16th November 2016, 9:17 AM   #4
AManEatingDuck
Member
 
AManEatingDuck's Avatar
 
Join Date: Feb 2002
Location: Auckland, NZ
Posts: 284
Default

Could something like SnapRaid work?
__________________
Who are you and how did you get in here?
I'm a locksmith, I'm a locksmith
AManEatingDuck is offline   Reply With Quote
Old 16th November 2016, 11:04 AM   #5
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,952
Default

ReFS with will do it (although I can't quite find how it goes about working with a single drive - i'm not sure if you get more than just checksumming in/out)

https://technet.microsoft.com/en-us/...or=-2147217396

Reading this it seems that you get *some* protection/correction, but if it gets binned - you can't just pickup an alternate copy stream to carry on.


With ReFS - forget OSX or Linux support. So long as you stay within 8/10, you'll be fine. All this said - you're not replacing a backup at all. You're simply ensuring that your backup is correct.
NSanity is online now   Reply With Quote
Old 16th November 2016, 11:16 AM   #6
elvis
Old school old fool
 
elvis's Avatar
 
Join Date: Jun 2001
Location: Brisbane
Posts: 28,509
Default

Quote:
Originally Posted by HobartTas View Post
Sadly ZFS is the only thing that manages bitrot and recovery automatically
Incorrect. There are many filesystems that do:
http://forums.overclockers.com.au/sh....php?t=1195906

Application level, I use PAR2 via the "parchive" application to do Reed–Solomon / Erasure Code error correction on files (which I personally store on top of a BtrFS Linux filesystem)
https://en.wikipedia.org/wiki/Parchive

If files get lost, renamed, deleted or whatever, I can rebuild them with a given percentage level of parity (which means extra disk space, but that's fine). I store parchive data along side anything I back up to optical media or on cloud backups as well.

[edit] Actually read the OP, you already use PAR2. Yeah, pain in the bum to regenerate files all the time, but you're out of options without it being done at the file system level.

I tend to store photos by date inside folders in ISO 8601 format (YYYY-MM-DD), so I don't often go back and add photos to an old date, which means I don't need to do a whole lot of PAR2 updates. Instead I do it from the previous day in one hit, and that way the PAR2 information per folder stays relevant. That's a workflow workaround on the technology limit, and may or may not work for you if your workflow is different.

But given we're talking long term store/archive here, then maybe you need to consider not adding new photos to old folders, and instead creating new folders every time to reduce PAR2 overwrites of existing data.
__________________
Play old games with me!

Last edited by elvis; 16th November 2016 at 11:24 AM.
elvis is offline   Reply With Quote
Old 16th November 2016, 11:24 AM   #7
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,952
Default

Quote:
Originally Posted by elvis View Post
[edit] Actually read the OP,
A+ same boat.

I would be converting my NAS to BTRFS.

And probably buying a second one, and periodically mirroring them.

Failing that (and honestly this is probably the better option), look to a cloud backup provider who has versioning/checksumming behind their infrastructure, integrates with your NAS and give them $. Backblaze uses Reed Solomon, not sure on others.

Last edited by NSanity; 16th November 2016 at 11:30 AM.
NSanity is online now   Reply With Quote
Old 16th November 2016, 11:54 AM   #8
cvidler
Member
 
cvidler's Avatar
 
Join Date: Jun 2001
Location: Canberra
Posts: 10,378
Default

Still think PAR2 is the solution here.

You don't have to do an entire drive at a time, you could do it by folder or by file (and easily script it).

e.g. my photo archive is organised into per day folders for raw images, once done they're never going to change, so a PAR2 of a whole folder of files is a workable solution. And finished post'ed photos of an event or whatever also end up as a folder that's not likely to change. Videos too. gopro footage organised by day, post'ed work by event.


Someone above mentioned RAR. RAR supports recovery data too. Downside being it ties up your files into an archive which PAR doesn't have to. It may not be a problem if you're organisation keeps your folders (thus archives) small/manageable/stagnant enough.
__________________
We might eviscerate your arguments, but we won't hurt you. Honest! - Lucifers Mentor
⠠⠵
[#]
cvidler is offline   Reply With Quote
Old 16th November 2016, 11:56 AM   #9
elvis
Old school old fool
 
elvis's Avatar
 
Join Date: Jun 2001
Location: Brisbane
Posts: 28,509
Default

And in case anyone doesn't read the "Next gen filesystems" thread I linked to above, DO NOT USE BTRFS IN RAID5/6. It's unstable, and eats your data.

BtrFS in RAID1 or RAID10 is stable. I recommend RAID1 for long term storage. Despite the name, it's not actually traditional RAID1, and merely insists your data lives on two physically separate devices. Under this model you can mix and match any weird combination of drives (I have 5 disks in my home array of all different sizes), and you simply get access to N/2 space as every block ensures it writes to two different devices.

My home BtrFS RAID1 array has suffered two total disk failures at different times in the last year, both of which were replaced without data loss with larger disks, which also let my array grow in space each time, keeping the N/2 space ratio.

More details in the thread I linked above.

Quote:
Originally Posted by cvidler View Post
Still think PAR2 is the solution here.

You don't have to do an entire drive at a time, you could do it by folder or by file (and easily script it).
This is what I do. Easy to script and log total drive verification too.
__________________
Play old games with me!
elvis is offline   Reply With Quote
Old 14th December 2016, 11:43 AM   #10
flain
Member
 
Join Date: Oct 2005
Posts: 1,973
Default

I'll just add multipar https://multipar.eu/ is an option, consider it just a more up to date quickpar (since quickpar is no longer maintained)

It's got a few nifty features, namely multicore support and GPU acceleration making larger par sets a lot quicker to produce.
flain is offline   Reply With Quote
Old 14th December 2016, 12:20 PM   #11
ae00711
Member
 
ae00711's Avatar
 
Join Date: Apr 2013
Posts: 981
Default

Quote:
Originally Posted by flain View Post
I'll just add multipar https://multipar.eu/ is an option, consider it just a more up to date quickpar (since quickpar is no longer maintained)

It's got a few nifty features, namely multicore support and GPU acceleration making larger par sets a lot quicker to produce.
this looks interesting; how do I use it?
__________________
I LOVE CHING LIU
SMOKING IS UN-AUSTRALIAN
I prefer email to PM!

ae00711 is offline   Reply With Quote
Old 14th December 2016, 5:27 PM   #12
flain
Member
 
Join Date: Oct 2005
Posts: 1,973
Default

Quote:
Originally Posted by ae00711 View Post
this looks interesting; how do I use it?
Much the same as quickpar. You simply load the gui up, choose the directory you want to create pars for and it go.. the amount of redundancy you want and hit create. It leaves .par2 files in the original folder. From that point on you can check the integrity and repair files as long as the data loss is less than the redundancy amount.

It's pretty handy if you have say 500MB of photos at 10MB each and you created 1 par file of 10MB, then you could loose any single photo from the collection and it would be recoverable from the .par2 file, like magic.
flain is offline   Reply With Quote
Old 14th December 2016, 6:01 PM   #13
ae00711
Member
 
ae00711's Avatar
 
Join Date: Apr 2013
Posts: 981
Default

Quote:
Originally Posted by flain View Post
Much the same as quickpar. You simply load the gui up, choose the directory you want to create pars for and it go.. the amount of redundancy you want and hit create. It leaves .par2 files in the original folder. From that point on you can check the integrity and repair files as long as the data loss is less than the redundancy amount.

It's pretty handy if you have say 500MB of photos at 10MB each and you created 1 par file of 10MB, then you could loose any single photo from the collection and it would be recoverable from the .par2 file, like magic.
f**k me! it's brought my rig to it's knee's!
rig: 8C/16T @ 3.6
48GB ram
both usage @ 80-90%
what does it prefer? cores or ram?
__________________
I LOVE CHING LIU
SMOKING IS UN-AUSTRALIAN
I prefer email to PM!

ae00711 is offline   Reply With Quote
Old 14th December 2016, 7:38 PM   #14
wwwww
Member
 
wwwww's Avatar
 
Join Date: Aug 2005
Location: Melbourne
Posts: 4,141
Default

Quote:
Originally Posted by demiurge3141 View Post
Pure checksum programs like MD5/SHA1 unfortunately won't do recovery.
You can recover single bit errors from the error file + a checksum with ease. Just cycle through each bit, flip it, run the checksum again and look for a match. This shouldn't take more than a day to correct on a reasonable fast PC using a multi-threaded method for comparing.

Considering bit errors are like 1 in 10^14 or so, it's like a 1 in 10,000,000 chance that any 10MB file which already has an error will have another bit error.

Even if you have two bit errors, they can still be solved using the same method but you are unfortunately limited to files under 100MB as anything bigger and the time taken to complete the task will exceed the expected lifetime of the sun with present processor speeds.
__________________
wPrime 2.10 | Super PI 1.9

Quote:
Originally Posted by NSanity View Post
This is literally the worst advice on the internet, ever.

Last edited by wwwww; 14th December 2016 at 7:41 PM.
wwwww is offline   Reply With Quote
Old 14th December 2016, 7:46 PM   #15
elvis
Old school old fool
 
elvis's Avatar
 
Join Date: Jun 2001
Location: Brisbane
Posts: 28,509
Default

Quote:
Originally Posted by ae00711 View Post
f**k me! it's brought my rig to it's knee's!
rig: 8C/16T @ 3.6
48GB ram
both usage @ 80-90%
what does it prefer? cores or ram?
Solomon reed error correction is all CPU calc. Data blocks are quite small. This is similar to how RAID5/6 works, which is often done on controllers with very small amounts of RAM. Likewise QR codes use it, which need to be processed on embedded systems with very low RAM.

If you're looking to speed it up, GPU might be an option. Just reading the Multipar website, it says it can do it on GPU, so that might be an option.

But if you're doing it multi threaded on lots of data, then yeah, it's going to consume all of your CPU resources to do this as quickly as it can. Use your standard process priority management tools to lower the priority of this to "idle" if you plan to create PAR files for terabytes of data while you use your system for other things.

Quote:
Originally Posted by wwwww View Post
You can recover single bit errors from the error file + a checksum with ease. Just cycle through each bit, flip it, run the checksum again and look for a match. This shouldn't take more than a day to correct on a reasonable fast PC using a multi-threaded method for comparing.
You can, but that's terribly impractical compared to things like PAR that are designed to (a) recover a lot more than just a bit flip, and (b) have simple tools provided for multiple operating systems to automate the process.
__________________
Play old games with me!
elvis is offline   Reply With Quote
Reply

Bookmarks

Sign up for a free OCAU account and this ad will go away!

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +10. The time now is 11:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
OCAU is not responsible for the content of individual messages posted by others.
Other content copyright Overclockers Australia.
OCAU is hosted by Micron21!