Guys, I have a problem I have been trying to work through with no luck so far. Ill preface to say that I only know C# as far as I have taught it to myself using online tutorials, and code snippets. If its a complex answer, please try and dumb it down for me. I would like to hash the first 1MB (1,073,741,824 bytes) of a file as a MD5 sum. The reason is, I have some media files that are unique in the first MB, and can be anywhere in size from 16MB, to 2,873MB. I have discovered how to hash the entire contents, but this takes too long for the large files. The files contents gets appended systematically, and I would like to identify these files uniquely. Can anyone please provide me with a example of how I would read x bytes into a variable and perform the hash on it? If possible, I would like to salt the variable with a string. Cheers for any help.
create a file stream, read either 1mb from it, or a for loop reading chunks until you get to 1mb, into a buffer, close the filestream and then do whatever you like with the buffer contents
I think you have to do it in 2 stage Stage 1. - to get the 1st 1Mb You can open read the file as biniary and read exactly 1Mb of data only to a string example '---read from and write to a binary file sbuffer as string (or array of byte) s1 = New FileStream("FILENAME", FileMode.Open, FileAccess.Read) br = New BinaryReader(s1) Dim byteRead As Byte Dim j As Integer Dim iLen as long iLen = 1Meg or (br.BaseStream.Length() - 1) whatever smaller For j = 0 To iLen-1 byteRead = br.ReadByte sbuffer = sbuffer + byteread Next br.Close() Stage 2 - to do the MD5 on string Read this http://blog.brezovsky.net/en-text-2.html http://www.codersource.net/csharp_sha_md5_encryption.html
seems to work for me... haven't tested more than a couple of hashes though lol Code: namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string fileName = "c:\\somebigfile"; Console.WriteLine("hash is - " + MD5Sum(fileName, "salted",(int)Math.Pow(2,20))); } public static string MD5Sum(string fileName,string salt,int byteCount) { byte[] saltBuf = System.Text.Encoding.UTF8.GetBytes(salt); byte[] buffer = new byte[byteCount]; Stream stream = File.OpenRead(fileName); int bytesRead = stream.Read(buffer,0,buffer.Length); if (bytesRead != 0) { MemoryStream ms = new MemoryStream(); ms.Write(saltBuf, 0, saltBuf.Length); ms.Write(buffer, 0, buffer.Length); MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider(); buffer = md5.ComputeHash(ms.GetBuffer()); StringBuilder sb = new StringBuilder(); foreach (byte b in buffer) { sb.Append(b.ToString("x2").ToLower()); } return sb.ToString(); } else { //Better to throw an exception but... return null; } } } }
Just as an aside, IIRC, this is how Kazaa worked back in the day. It only hashed the first n bytes of a file. This was great for finding multiple sources for the one file, named differently on each system (And let's face it - collisions would be so rare, and it may have even taken the file size as a factor). This was not so great for avoiding corruption - if a few bits were flipped or whatnot more than n bytes into the file, you'd have no idea, and you'd then be sending out corrupt data to others down the track. IIRC, some anti-P2P groups eventually took advantage of this to pollute the system with junk. The moral of the story- don't use hash algorithms on part of a file if you don't trust someone, or some procedure (e.g. an unreliable transport mechanism), to get the other parts of the file right. If this is just for personal use (e.g. dupe checking) then you should be fine.
or you could just use shell commands: Code: dd if=filename bs=1M count=1 | md5sum (+cygwin if you are on windows)
You're an expert on everything and you obviously know more than me, therefore you should know this. I'll give you a hint... it's in the for loop. Edit: Compare it to Stik79's code if that isn't a big enough hint!