Process methods

Unit: PJMD5

Applies to: ~>1.0

procedure Process(const X: TBytes; const StartIdx, Count: Cardinal);
  overload;
procedure Process(const X: TBytes; const Count: Cardinal); overload;
procedure Process(const X: TBytes); overload;
procedure Process(const Buf; const Count: Cardinal); overload;
procedure Process(const S: RawByteString); overload;
procedure Process(const S: ShortString); overload;
procedure Process(const S: WideString); overload;
procedure Process(const S: UnicodeString; const Encoding: TEncoding); overload;
procedure Process(const S: UnicodeString); overload;
procedure Process(const Stream: TStream; const Count: Int64); overload;
procedure Process(const Stream: TStream); overload;

Description

There a several different overloaded versions of the Process method. Each method adds the data passed to it via its parameters to the current MD5 hash.

The advantages of these methods over the similar Calculate methods are:

Process methods can be called more than once: for example they can be called in a loop adding data to the hash on each occasion.
The methods enable data of different types to be added to the same hash.
In the case of the TStream variant, the size of the buffer to use to read the stream can be changed from the default before calling the method.

The disadvantage of Process is that an instance of TPJMD5 must be created before the method can be used.

Similar groups of methods are described below:

Byte array versions
Untyped buffer version
ANSI string version
ShortString version
WideString version
Unicode string versions
TStream versions

Byte array versions

procedure Process(const X: TBytes; const StartIdx, Count: Cardinal);
  overload;
procedure Process(const X: TBytes; const Count: Cardinal); overload;
procedure Process(const X: TBytes); overload;

These methods add bytes from a TBytes array to the current hash.

The first version adds Count bytes to the hash, starting from index StartIdx in byte array X. If there are less than Count bytes in the array counting from StartIdx then an EPJMD5 exception is raised. If StartIdx is beyond the end of the array or if Count is zero no data is processed.
The second version adds Count bytes from the beginning of byte array X to the hash. X must have at least Count elements otherwise an EPJMD5 exception is raised. If Count is zero then no data is processed.
The last version adds all the content of byte array X to the hash. If the array is empty then no data is processed.

Byte array example

Suppose you have read a file into a byte array and want its MD5 hash. However, to save processing time, if the array is longer that 32Kb you just take the hash of the first and last 16Kb of data from the array. Here’s a function to do that:

function MD5OfArray(const A: TBytes): TPJMD5Digest;
var
  MD5: TPJMD5;
const
  ChunkSize = 16 * 1024;
  MaxSize = 2 * ChunkSize;
begin
  MD5 := TPJMD5.Create;
  try
    if Length(A) > MaxSize then
    begin
      MD5.Process(A, ChunkSize); // 1st 16Kb
      MD5.Process(A, Length(A) - ChunkSize, ChunkSize);  // last 16Kb
    end
    else
      MD5.Process(A); // array <= 32Kb, process it all
    Result := MD5.Digest;
  finally
    MD5.Free;
  end;
end;

Untyped buffer version

procedure Process(const Buf; const Count: Cardinal); overload;

This method adds Count bytes from untyped buffer Buf to the current hash. Buf must contain at least Count bytes.

Untyped buffer example

Suppose you have two variables, Foo of type Byte and Bar of type Int64 and you need the MD5 checksum of both of them. Here’s the code to do it:

var
  Foo: Byte;
  Bar: Int64;
  MD5: TPJMD5;
begin
  Foo := 42;
  Bar := -56;
  MD5 := TPJMD5.Create;
  try
    MD5.Process(Foo, SizeOf(Foo));
    MD5.Process(Bar, SizeOf(Bar));
    MD5.Finalize; // optional
    ShowMessage(MD5.Digest); // implicitly casts Digest to string
  finally
    MD5.Free;
  end;
end;

ANSI string version

procedure Process(const S: RawByteString); overload;

Adds the ordinal value of all the characters from an ANSI string S to the current hash. S can have any code page.

ShortString version

procedure Process(const S: ShortString); overload;

Adds the ordinal value of all the characters from the ShortString S to the current hash.

WideString version

procedure Process(const S: WideString); overload;

Adds the ordinal value of all the WideChar characters from the WideString parameter S to the current hash.

Unicode string versions

procedure Process(const S: UnicodeString; const Encoding: TEncoding); overload;
procedure Process(const S: UnicodeString); overload;

Each of these methods adds data from a Unicode string S to the current hash. Before adding to the hash the string is converted to a sequence of bytes. The first version uses the encoding passed in the Encoding parameter to perform the conversion, while the second version uses the TEncoding.Default encoding.

Unicode string examples

Suppose you have two text files that have the same text but may have different amounts of white space or different kinds of line endings. You want the MD5 hash to depend only on the words and not the white space.

One solution is to read all the words from a file into a string list, ignoring intervening white space and then build the MD5 hash from the words. Assuming the words are in a string list you can get the MD5 hash as follows using the following function:

function MD5OfStrings(const Words: TStrings): TPJMD5Digest;
var
  MD5: TPJMD5;
  Word: string;
begin
  MD5 := TPJMD5.Create;
  try
    for Word in Words do
      MD5.Process(Word);
    Result := MD5.Digest;
  finally
    MD5.Free;
  end;
end;

This code uses the system default encoding of the words in the string, which could mean that different hashes are returned on systems running on different locales. To get round this, use UTF8 (or Unicode) for the encoding. Here’s an example using UTF8:

function MD5OfStrings(const Words: TStrings): TPJMD5Digest;
var
  MD5: TPJMD5;
  Word: string;
begin
  MD5 := TPJMD5.Create;
  try
    for Word in Words do
      MD5.Process(Word, TEncoding.UTF8);
    Result := MD5.Digest;
  finally
    MD5.Free;
  end;
end;

TStream versions

procedure Process(const Stream: TStream; const Count: Int64); overload;
procedure Process(const Stream: TStream); overload;

Each of these methods adds bytes from the stream Stream to the current hash. The stream is read from the current position. To read the from the start of the stream set its Position property to 0. Both methods modify the stream’s Position property.

The first version reads Count bytes from the stream if possible. If Count is greater than number of bytes available then an EPJMD5 exception is raised. The second version reads to the end of the stream, processing Stream.Size - Stream.Position bytes.

The stream is read into an internal buffer before adding the data to the hash. The buffer’s size is given by the ReadBufferSize property and can be changed by assigning a new value to the property.

TStream example

Suppose you have a file containing multiple streams or “storages” and you have opened a TStream onto each storage in the file. You want to get a MD5 hash of all of them. However some can be very large so to save processing time you only take the hash of the first 32Kb of each stream.

The following function will do the job: it is passed an array of TStream objects and returns the MD5 digest:

function GetStreamHashes(const Streams: array of TStream): TPJMD5Digest;
var
  MD5: TPJMD5;
  Stream: TStream;
const
  MaxSize = Int64(32 * 1024);
begin
  MD5 := TPJMD5.Create;
  try
    for Stream in Streams do
    begin
      Stream.Position := 0;
      if Stream.Size > MaxSize then
        MD5.Process(Stream, MaxSize) // process first 32Kb of stream
      else
        MD5.Process(Stream); // stream <= 32Kb - process it all
    end;
    Result := MD5.Digest;
  finally
    MD5.Free;
  end;
end;

Welcome to the new DelphiDabbler Code Library Documentation.