1. OCAU Merchandise is available! Check out our 20th Anniversary Mugs, Classic Logo Shirts and much more! Discussion in this thread.
    Dismiss Notice

Large volumes of personal data for backup

Discussion in 'Storage & Backup' started by akashra, Sep 13, 2017.

  1. akashra

    akashra Member

    Joined:
    Apr 25, 2003
    Messages:
    3,988
    Location:
    Melbourne, AU
    Hi all,

    Going through the hardware my father has left behind, we're finding a LOT of data on various storage. A lot of this is either timing data from motorsports (we did all the timekeeping for about 50% of all motorsports across Australia between the early 80s until the early 00s) which I have most of preserved. The other bulk of data is geneology research - including photos and thousands of scanned documents (birth certificates, documentation of migrations, that kind of thing).

    I also have the unenviable task of potentially having to syphon off data from hundreds of 1.2MB floppies for the Commodorre 64 (old timing data) - some of this has already been done though.

    Going through receipts of things he's bought, I've found at least 6 3TB drives he'll have around somewhere in the last 4 years, PLUS there's a Qnap 569L with 5x3TB Reds, and a HP Microserver with 4x2TB drives.

    I'm open to suggestions as to what's the best way to consolidate this data and store it - preferably cloud-based so we don't have to actually run the infrastructure, so I can then just have say a couple of 8TB archive drives powered off with the data locally. Maybe rsync to these periodically. Upstream bandwidth isn't a huge issue (quotas might be) - I have a fast link at home and an even faster link at work.

    Right now, I don't know the total volume - but I could hazard a guess there's between 5-10TB of data that needs consolodating and keeping short-term, and 2-5TB of data that actually needs keeping long-term.

    So... open to any idea here that keeps costs down.
     
  2. maldotcom2

    maldotcom2 Member

    Joined:
    Feb 18, 2006
    Messages:
    2,040
    Well obviously it's data that is both irreplaceable and represents a lot of work, both in gathering of initial information and you consolidating it. So you will want one onsite copy and one offsite copy at minimum. You're on the right track, whether you want to spin up a small footprint NAS for the onsite copy or just use external drives is your choice. Perhaps the question you need to ask is regarding affordable cloud storage.
     
  3. EvilGenius

    EvilGenius Member

    Joined:
    Apr 26, 2005
    Messages:
    10,934
    Location:
    elsewhere
  4. g00nster

    g00nster Member

    Joined:
    Sep 10, 2004
    Messages:
    353
    Location:
    Melbourne
    If you go that way buy some bulk internal storage as BlackBlaze requires USB portable drives be connected (at least) once every 30-days to retain data

    https://help.backblaze.com/hc/en-us/articles/217665398-Backing-up-External-Hard-Drives

     
  5. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    46,578
    Location:
    Brisbane
    How long do you want to keep this for, and what's the daily delta likely to be? Those two answers have more impact on what my suggestion would be.
     
  6. ^catalyst

    ^catalyst Member

    Joined:
    Jun 27, 2001
    Messages:
    12,013
    Location:
    melbourne
    sidebar: how are you going to deal with the floppies? real c64 or emulation etc?
     
  7. OP
    OP
    akashra

    akashra Member

    Joined:
    Apr 25, 2003
    Messages:
    3,988
    Location:
    Melbourne, AU
    Indefinitely. Family history/geneology research may be used by someone one maybe two generations down the track. The motorsport results are pretty much the original record of every event in Australia for a good 20 year or so period - it's not data I want to see thrown away.

    I have a number of C64/C128 drives there. Some of them are original and unmodified. Others have been modified to allow them to be used with an IBM/PC. I've not yet figured out that, but my gut feeling is the simplest way might be to write a simple application that simply reads the disk sector by sector and writes it to the serial port and image them that way, then write something to read the data on the PC. And I'm sure someone's already done exactly that so I won't have to write it - or there's a better way around doing it. But that'd be my fallback. MR-SCATS (Motor Racing Scoring and Timing System - the software) also has code in there to read from those PC-enabled C64/C128 drives directly, so that might be another option. And yes, I have all the source code for that already... but it's written in Clipper.

    As luck would have it one of the other things dad left behind was a shelf full of books on programming/ASM for the C64. Fortunately - again - he also left behind shelves of books on Clipper, Blinker and Funcky. I just tried to google Funcky (with some clipper keywords) and came up blank. Yikes!



    Overall on the whole I expect that no matter what online solution I use, I'll also be buying some 8TB archive drives and sticking the data on that, unpowered. That would be the second backup.
     
    Last edited: Sep 15, 2017
  8. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    46,578
    Location:
    Brisbane
    That's the "time" part. What about the "delta" part? Are you looking to get this data stored, and that's the end of it? Or is it likely to be added to over time?

    If the answer to the time part is "forever", I recommend tape. There simply is no other mechanism as reliable for putting data on a shelf and having it there for a long time. There's nothing stopping you doing other things as well (also shove the active portion of the data in the cloud if you like), but for longevity tape wins every battle.

    Cost of entry can be annoying, but thankfully the manufacturers of the LTO standard were kind enough to ensure multi-generational support. LTO can write generation N and N-1, and read generation N-2. So an LTO6 drive can write to 6 and 5 media, and read from gen 4 media.

    https://en.wikipedia.org/wiki/Linear_Tape-Open

    Right now, LTO5 is readily available, and fairly cost effective. Expect $600-$700 for a tape drive (you'll probably need a SAS card to go with it too), and tapes are around $50 a pop for 1.5TB of raw space, with approximately 2:1 hardware compression (I recommend compressing in a lossless format in application-specific software if you have the luxury of time).

    In general for things you really, really care about, I also recommend PAR2 information on top, with whatever space sacrifice you can tolerate.

    https://en.wikipedia.org/wiki/Parchive

    When archiving to tape, I make at least 2 copies of everything (and again, adding PAR2 information if I can), and storing them somewhere that doesn't have a wild temperature and humidity swing. (Preferably with the different copies in different physical sites to avoid loss from fire).

    LTO tape, under correct storage conditions, should last at least 30 years. I would factor in a plan to do a repeat restore, verify and re-archive to new media every 5-10 years at least (by which stage newer media should be available and offer more storage per tape). With any luck, by then cloud storage should also have come down in price, as should bandwidth (although I have my doubts with the NBN).
     
  9. jajjj

    jajjj Member

    Joined:
    May 31, 2005
    Messages:
    491
    Location:
    Brisbane
  10. mooboyj

    mooboyj Member

    Joined:
    Sep 13, 2005
    Messages:
    1,070
    I'd get in touch with CAMS in regards to the timing data.
     
  11. power

    power Member

    Joined:
    Apr 20, 2002
    Messages:
    68,299
    Location:
    brisbane
    this too, spreading the data through the community is a good way to ensure it will be kept.

    Genealogy look at giving copies to the guys that run ancestry.
     
  12. OP
    OP
    akashra

    akashra Member

    Joined:
    Apr 25, 2003
    Messages:
    3,988
    Location:
    Melbourne, AU
    Right now, just the data, no changes. This is really now a case of if *I* get hit by a bus - so then it would fall to whoever it's passed on to from me to need to process. Stephen already made sure to get as much of the data in one place so I could work on one day in the future. Me being killed suddenly and with no warning is a very real possibility due to driver attitudes towards cyclists.

    Oh. Yes, it is.

    Yeah, right now the data is in its raw format - it wouldn't be of any use to CAMS. Someone was given tons of data for archival and records purposes - but I'm now sure what level of processing was done on that data and what he got. It might just be something like results, fastest laps - I very much doubt it would include every time line crossing, every speed trap speed, sector splits etc. This is the raw data - the direct entry data from the manual days, to the raw data from the timing hardware, and everything in between (ie, processed for the systems internal use). But that would require exporting for it to be usable by average Joe.

    I don't plan to drop what he's worked on altogether - and in the recent weeks/months/years we'd had discussions that I'll bring back what he started, when the time is right. It'll come back one day bigger and better. I've already had discussions with some in the pointy end of the motorsport community letting them know that I'm thoroughly disappointed by the lack of progress since I left the sport about 10 years ago and what I left them with in the early 00s, and plan to come back showing them what can actually be done.

    If I'm going to release the data in its entirity, I'll need to do more work on processing it so it's useful. Right now that's not in the plan.
     

Share This Page

Advertisement: