A parity/ecc program for photo storage?

Discussion in 'Storage & Backup' started by demiurge3141, Nov 8, 2016.

  1. demiurge3141

    demiurge3141 Member

    Joined:
    Aug 19, 2005
    Messages:
    882
    Location:
    Melbourne 3073
    I have about 2TB of personal photos/videos, I keep a copy on NAS with mirrored disks, and a copy on a 2.5' external usb drive. Yesterday I ran freefilesync to check file contents, it took 10 hours but at the end I found that bitrot has hit three of my files on the external drive (it's very easy to see once you open the image file). Fortunately I was able to recover using my NAS copies.

    Now this has me thinking, is there a parity program that automatically generates a parity/recovery file for each file on a drive without one, and checks the file against the parity file for each file that has one? Even a commandline program would work as I can write a batch file. Pure checksum programs like MD5/SHA1 unfortunately won't do recovery.

    Quickpar/par2 would generate parity files for the entire collection at once, but I don't want to regenerate the whole parity file every time I add some photos. Plus if every file has its own parity/recovery file it makes moving them around much easier.

    Now before you go all ZFS on me remember this is a removable drive, and I need to be able to connect it to a PC and download images from a camera.
     
  2. HobartTas

    HobartTas Member

    Joined:
    Jun 22, 2006
    Messages:
    556
    Sadly ZFS is the only thing that manages bitrot and recovery automatically, but to partially answer your question assuming you have the data on your PC (and one or more backup copies) what I used to do was winrar them up using zero compression because most of the time the stuff wasn't compressible but the advantage of that would be that each file was checksummed, you couldn't repair the damaged file but at least you could detect it easily and replace it with a good one. Although I don't know of any there may be other software around that does this sort of thing.

    Unfortunately, the storage medium you are using (NTFS) is a filesystem that typically doesn't notify you of bad blocks and more importantly just replaces them with blank data which gives you the problems you are currently experiencing. You need to build in redundancy of some sort to repair the data like the PAR2 system you mentioned on a per file basis or alternatively with either disk mirrors or Raid 5/6 if you want to do this for the entire filesystem.

    The only way I could conceive doing this on a single external would be to install a VM on your PC which runs a ZFS capable filesystem like Solaris/FreeBSD/Linux with ZOL and format the external with ZFS, you then have to set the copies=n flag to something higher than the default value of 1, so setting it to 2 would mean that 2 copies of your data would be written and since both would be independently checksummed the invalid block would be repaired with the good copy, however, you would incur substantial overhead as say a 6TB drive would only hold 3 TB's worth of data (2 TB with say copies=3 and so on), but if you need this to be done in a transparent and hassle free manner without having to run programs all the time checking parity then such an arrangement might be useful for you.
     
  3. demiurge3141

    demiurge3141 Member

    Joined:
    Aug 19, 2005
    Messages:
    882
    Location:
    Melbourne 3073
    I don't need the file system to do checksums in the background. The disk is seldom connected. I can see the files are correctly transferred when I quickly scroll through the new thumbnails. All I need is a slightly more sophisticated checksum algorithm that can also correct single bit errors. I'm sure something like that exists?
     
  4. AManEatingDuck

    AManEatingDuck Member

    Joined:
    Feb 7, 2002
    Messages:
    284
    Location:
    Auckland, NZ
    Could something like SnapRaid work?
     
  5. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,190
    Location:
    Canberra
    ReFS with will do it (although I can't quite find how it goes about working with a single drive - i'm not sure if you get more than just checksumming in/out)

    https://technet.microsoft.com/en-us/library/hh831724(v=ws.11).aspx?f=255&MSPPError=-2147217396

    Reading this it seems that you get *some* protection/correction, but if it gets binned - you can't just pickup an alternate copy stream to carry on.


    With ReFS - forget OSX or Linux support. So long as you stay within 8/10, you'll be fine. All this said - you're not replacing a backup at all. You're simply ensuring that your backup is correct.
     
  6. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    29,975
    Location:
    Brisbane
    Incorrect. There are many filesystems that do:
    http://forums.overclockers.com.au/showthread.php?t=1195906

    Application level, I use PAR2 via the "parchive" application to do Reed–Solomon / Erasure Code error correction on files (which I personally store on top of a BtrFS Linux filesystem)
    https://en.wikipedia.org/wiki/Parchive

    If files get lost, renamed, deleted or whatever, I can rebuild them with a given percentage level of parity (which means extra disk space, but that's fine). I store parchive data along side anything I back up to optical media or on cloud backups as well.

    [edit] Actually read the OP, you already use PAR2. Yeah, pain in the bum to regenerate files all the time, but you're out of options without it being done at the file system level.

    I tend to store photos by date inside folders in ISO 8601 format (YYYY-MM-DD), so I don't often go back and add photos to an old date, which means I don't need to do a whole lot of PAR2 updates. Instead I do it from the previous day in one hit, and that way the PAR2 information per folder stays relevant. That's a workflow workaround on the technology limit, and may or may not work for you if your workflow is different.

    But given we're talking long term store/archive here, then maybe you need to consider not adding new photos to old folders, and instead creating new folders every time to reduce PAR2 overwrites of existing data.
     
    Last edited: Nov 16, 2016
  7. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,190
    Location:
    Canberra
    A+ same boat.

    I would be converting my NAS to BTRFS.

    And probably buying a second one, and periodically mirroring them.

    Failing that (and honestly this is probably the better option), look to a cloud backup provider who has versioning/checksumming behind their infrastructure, integrates with your NAS and give them $. Backblaze uses Reed Solomon, not sure on others.
     
    Last edited: Nov 16, 2016
  8. cvidler

    cvidler Member

    Joined:
    Jun 29, 2001
    Messages:
    10,656
    Location:
    Canberra
    Still think PAR2 is the solution here.

    You don't have to do an entire drive at a time, you could do it by folder or by file (and easily script it).

    e.g. my photo archive is organised into per day folders for raw images, once done they're never going to change, so a PAR2 of a whole folder of files is a workable solution. And finished post'ed photos of an event or whatever also end up as a folder that's not likely to change. Videos too. gopro footage organised by day, post'ed work by event.


    Someone above mentioned RAR. RAR supports recovery data too. Downside being it ties up your files into an archive which PAR doesn't have to. It may not be a problem if you're organisation keeps your folders (thus archives) small/manageable/stagnant enough.
     
  9. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    29,975
    Location:
    Brisbane
    And in case anyone doesn't read the "Next gen filesystems" thread I linked to above, DO NOT USE BTRFS IN RAID5/6. It's unstable, and eats your data.

    BtrFS in RAID1 or RAID10 is stable. I recommend RAID1 for long term storage. Despite the name, it's not actually traditional RAID1, and merely insists your data lives on two physically separate devices. Under this model you can mix and match any weird combination of drives (I have 5 disks in my home array of all different sizes), and you simply get access to N/2 space as every block ensures it writes to two different devices.

    My home BtrFS RAID1 array has suffered two total disk failures at different times in the last year, both of which were replaced without data loss with larger disks, which also let my array grow in space each time, keeping the N/2 space ratio.

    More details in the thread I linked above.

    This is what I do. Easy to script and log total drive verification too.
     
  10. flain

    flain Member

    Joined:
    Oct 5, 2005
    Messages:
    1,980
    I'll just add multipar https://multipar.eu/ is an option, consider it just a more up to date quickpar (since quickpar is no longer maintained)

    It's got a few nifty features, namely multicore support and GPU acceleration making larger par sets a lot quicker to produce.
     
  11. ae00711

    ae00711 Member

    Joined:
    Apr 9, 2013
    Messages:
    1,069
    this looks interesting; how do I use it?
     
  12. flain

    flain Member

    Joined:
    Oct 5, 2005
    Messages:
    1,980
    Much the same as quickpar. You simply load the gui up, choose the directory you want to create pars for and it go.. the amount of redundancy you want and hit create. It leaves .par2 files in the original folder. From that point on you can check the integrity and repair files as long as the data loss is less than the redundancy amount.

    It's pretty handy if you have say 500MB of photos at 10MB each and you created 1 par file of 10MB, then you could loose any single photo from the collection and it would be recoverable from the .par2 file, like magic.
     
  13. ae00711

    ae00711 Member

    Joined:
    Apr 9, 2013
    Messages:
    1,069
    f**k me! it's brought my rig to it's knee's! :upset:
    rig: 8C/16T @ 3.6
    48GB ram
    both usage @ 80-90%
    what does it prefer? cores or ram?
     
  14. wwwww

    wwwww Member

    Joined:
    Aug 22, 2005
    Messages:
    4,302
    Location:
    Melbourne
    You can recover single bit errors from the error file + a checksum with ease. Just cycle through each bit, flip it, run the checksum again and look for a match. This shouldn't take more than a day to correct on a reasonable fast PC using a multi-threaded method for comparing.

    Considering bit errors are like 1 in 10^14 or so, it's like a 1 in 10,000,000 chance that any 10MB file which already has an error will have another bit error.

    Even if you have two bit errors, they can still be solved using the same method but you are unfortunately limited to files under 100MB as anything bigger and the time taken to complete the task will exceed the expected lifetime of the sun with present processor speeds.
     
    Last edited: Dec 14, 2016
  15. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    29,975
    Location:
    Brisbane
    Solomon reed error correction is all CPU calc. Data blocks are quite small. This is similar to how RAID5/6 works, which is often done on controllers with very small amounts of RAM. Likewise QR codes use it, which need to be processed on embedded systems with very low RAM.

    If you're looking to speed it up, GPU might be an option. Just reading the Multipar website, it says it can do it on GPU, so that might be an option.

    But if you're doing it multi threaded on lots of data, then yeah, it's going to consume all of your CPU resources to do this as quickly as it can. Use your standard process priority management tools to lower the priority of this to "idle" if you plan to create PAR files for terabytes of data while you use your system for other things.

    You can, but that's terribly impractical compared to things like PAR that are designed to (a) recover a lot more than just a bit flip, and (b) have simple tools provided for multiple operating systems to automate the process.
     
  16. ae00711

    ae00711 Member

    Joined:
    Apr 9, 2013
    Messages:
    1,069
    shortly after making that post I fired up AIDA64 to check CPU temps........BSOD. First BSOD I've had in aaaaaaages. Not happy.

    Any idea which GPU it needs? (NV/AMD?)
     
  17. ae00711

    ae00711 Member

    Joined:
    Apr 9, 2013
    Messages:
    1,069

    to answer my own question:

    AMD is definitely used - a) I can enable GPU use in app b) running the app + GPU-Z shows GPU load @ 99% :paranoid:
     
  18. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    29,975
    Location:
    Brisbane
    I'm not sure why you're reacting the way you are to something that is calculation bound using up all of your resources.

    This would be the same if you were encoding video, 3D rendering, or calculating some large/infinite mathematical series (Pi calcs, etc). All of those would use 100% of your resources for hours on end too.

    Fairly standard stuff.
     
  19. demiurge3141

    demiurge3141 Member

    Joined:
    Aug 19, 2005
    Messages:
    882
    Location:
    Melbourne 3073
    That's what you want, right?
     
  20. ae00711

    ae00711 Member

    Joined:
    Apr 9, 2013
    Messages:
    1,069
    didn't know anything about the app! :p

    ya! :thumbup:
     

Share This Page