C# - Reading first 1MB of file for MD5 hashing

Discussion in 'Programming & Software Development' started by CaveDog, Jun 2, 2008.

  1. CaveDog

    CaveDog Member

    Joined:
    Aug 16, 2001
    Messages:
    527
    Location:
    3131.melbourne.vic.au
    Guys, I have a problem I have been trying to work through with no luck so far.

    Ill preface to say that I only know C# as far as I have taught it to myself using online tutorials, and code snippets. If its a complex answer, please try and dumb it down for me.

    I would like to hash the first 1MB (1,073,741,824 bytes) of a file as a MD5 sum. The reason is, I have some media files that are unique in the first MB, and can be anywhere in size from 16MB, to 2,873MB. I have discovered how to hash the entire contents, but this takes too long for the large files. The files contents gets appended systematically, and I would like to identify these files uniquely.

    Can anyone please provide me with a example of how I would read x bytes into a variable and perform the hash on it? If possible, I would like to salt the variable with a string.

    Cheers for any help.
     
  2. STIK79

    STIK79 Member

    Joined:
    Jun 21, 2002
    Messages:
    1,056
    Location:
    Adelaide
    you've got a GB there (2^30) - 1 MB = 2^20 = 1048576 bytes
     
  3. jezza323

    jezza323 Member

    Joined:
    Apr 7, 2005
    Messages:
    1,374
    Location:
    Brisbane
    create a file stream, read either 1mb from it, or a for loop reading chunks until you get to 1mb, into a buffer, close the filestream

    and then do whatever you like with the buffer contents
     
  4. z2177199

    z2177199 Member

    Joined:
    Apr 19, 2002
    Messages:
    1,994
    Location:
    Sydney
    I think you have to do it in 2 stage

    Stage 1. - to get the 1st 1Mb
    You can open read the file as biniary and read exactly 1Mb of data only to a string

    example
    '---read from and write to a binary file
    sbuffer as string (or array of byte)
    s1 = New FileStream("FILENAME", FileMode.Open, FileAccess.Read)

    br = New BinaryReader(s1)

    Dim byteRead As Byte
    Dim j As Integer
    Dim iLen as long
    iLen = 1Meg or (br.BaseStream.Length() - 1) whatever smaller
    For j = 0 To iLen-1
    byteRead = br.ReadByte
    sbuffer = sbuffer + byteread
    Next
    br.Close()


    Stage 2 - to do the MD5 on string
    Read this http://blog.brezovsky.net/en-text-2.html
    http://www.codersource.net/csharp_sha_md5_encryption.html
     
  5. STIK79

    STIK79 Member

    Joined:
    Jun 21, 2002
    Messages:
    1,056
    Location:
    Adelaide
    seems to work for me... haven't tested more than a couple of hashes though lol

    Code:
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                string fileName = "c:\\somebigfile";
                Console.WriteLine("hash is - " + MD5Sum(fileName, "salted",(int)Math.Pow(2,20)));
                
            }
    
    
            public static string MD5Sum(string fileName,string salt,int byteCount)
            {
                byte[] saltBuf = System.Text.Encoding.UTF8.GetBytes(salt);
                byte[] buffer = new byte[byteCount];
                Stream stream = File.OpenRead(fileName);
    
                int bytesRead = stream.Read(buffer,0,buffer.Length);
              
                if (bytesRead != 0)
                {
                    MemoryStream ms = new MemoryStream();
                    ms.Write(saltBuf, 0, saltBuf.Length);
                    ms.Write(buffer, 0, buffer.Length);
                    MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
                    buffer = md5.ComputeHash(ms.GetBuffer());
                    StringBuilder sb = new StringBuilder();
                    foreach (byte b in buffer)
                    {
                        sb.Append(b.ToString("x2").ToLower());
                    }
                    return sb.ToString();
                }
                else
                {
                    //Better to throw an exception but...
                    return null;
                }
             }
        }
    }
    
     
  6. cybathug

    cybathug Member

    Joined:
    Sep 14, 2004
    Messages:
    219
    Just as an aside, IIRC, this is how Kazaa worked back in the day. It only hashed the first n bytes of a file. This was great for finding multiple sources for the one file, named differently on each system (And let's face it - collisions would be so rare, and it may have even taken the file size as a factor). This was not so great for avoiding corruption - if a few bits were flipped or whatnot more than n bytes into the file, you'd have no idea, and you'd then be sending out corrupt data to others down the track.

    IIRC, some anti-P2P groups eventually took advantage of this to pollute the system with junk.

    The moral of the story- don't use hash algorithms on part of a file if you don't trust someone, or some procedure (e.g. an unreliable transport mechanism), to get the other parts of the file right. If this is just for personal use (e.g. dupe checking) then you should be fine.
     
  7. OP
    OP
    CaveDog

    CaveDog Member

    Joined:
    Aug 16, 2001
    Messages:
    527
    Location:
    3131.melbourne.vic.au
    Wow, thanks! Thats great! Ill implement your code tonight, and away I go!

    Thanks all for your help.
     
  8. STIK79

    STIK79 Member

    Joined:
    Jun 21, 2002
    Messages:
    1,056
    Location:
    Adelaide
    no warranties/blah/blah lol
     
  9. Elyzion

    Elyzion Member

    Joined:
    Oct 27, 2004
    Messages:
    6,975
    Location:
    Singapore
    Ok um....

    WTF
     
  10. dalek

    dalek Member

    Joined:
    Jun 27, 2001
    Messages:
    50
    Location:
    Wollongong
    or you could just use shell commands:

    Code:
    dd if=filename bs=1M count=1 | md5sum
    
    (+cygwin if you are on windows)
     
  11. Oblong Cheese

    Oblong Cheese Member

    Joined:
    Aug 31, 2001
    Messages:
    10,581
    Location:
    Brisbane
    There is always someone who has to post an irrelevant solution. :D:thumbup:
     
  12. Luke212

    Luke212 Member

    Joined:
    Feb 26, 2003
    Messages:
    9,448
    Location:
    Sydney
    pretty damned elegant tho i have no idea how that works ;)
     
  13. Foliage

    Foliage Member

    Joined:
    Jan 22, 2002
    Messages:
    32,058
    Location:
    Sleepwithyourdadelaide
    Whats wrong with that?
     
  14. vanjastar

    vanjastar Member

    Joined:
    Nov 3, 2005
    Messages:
    228
    Location:
    Perth
    It was VB code in a C# thread. Pretty odd.
     
  15. Foliage

    Foliage Member

    Joined:
    Jan 22, 2002
    Messages:
    32,058
    Location:
    Sleepwithyourdadelaide
    Not that hard to translate though.
     
  16. Luke212

    Luke212 Member

    Joined:
    Feb 26, 2003
    Messages:
    9,448
    Location:
    Sydney
    was thinking the same thing :p
     
  17. Bradzac

    Bradzac Member

    Joined:
    Aug 17, 2003
    Messages:
    1,620
    You're an expert on everything and you obviously know more than me, therefore you should know this.

    I'll give you a hint... it's in the for loop.

    Edit: Compare it to Stik79's code if that isn't a big enough hint!
     
    Last edited: Jun 6, 2008
  18. Bradzac

    Bradzac Member

    Joined:
    Aug 17, 2003
    Messages:
    1,620
    But is there really such a thing as an "Irrelevant solution" ? ;)

    I declare oxymoron on that! :p
     

Share This Page