MD5 How-to: How To Get the MD5 Hash of a String List

Applies to: ~>1.0

This how-to involves getting the MD5 hash of Unicode strings which is explained here

It also assumes you understand the difference between the TPJMD5.Calculate and TPJMD5.Process methods which is explained here.

There are two possible ways to create a hash of a TStringList object:

Convert the string list to text and take the hash of the result.
Take each string from the list and add one at a time to the hash.

Each of these approaches will give a different hash, so you need to decide on one approach and stick to it if you want have repeatable results.

To see the differences, start a new Delphi VCL forms application and drop two edit controls on the form. Create the following FormCreate event handler:

procedure TForm1.FormCreate(Sender: TObject);
var
  D: TPJMD5Digest;
  MD5: TPJMD5;
  Strings: TStrings;
  S: string;
begin
  Strings := TStringList.Create;
  Strings.Add('The');
  Strings.Add('cat');
  Strings.Add('sat');
  Strings.Add('on');
  Strings.Add('the');
  Strings.Add('mat');

  // 1st approach
  Strings.LineBreak := #13#10;
  D := TPJMD5.Calculate(Strings.Text, TEncoding.UTF8);
  Edit1.Text := D;

  // 2nd approach
  MD5 := TPJMD5.Create;
  try
    for S in Strings do
      MD5.Process(S, TEncoding.UTF8);
    Edit2.Text := MD5.Digest;
  finally
    MD5.Free;
  end;
end;

The reason why a TPJMD5Digest record can be assigned to the string Text property of the edit controls in the above code is explained in the How to Get a digest string how-to.

Running this program displays the following values in the edit controls:

c8b029b7698b23a5962e7cc21a75653a (MD5 of Strings.Text)
780c94281a0b1e10395098c690a91d26 (MD5 of each string in Strings)

The first approach converts the string list to text, with each line separated by the string stored in the TStrings.LineBreak property. It then uses one of the Unicode overloads of TPJMD5.Calculate to get the required digest. The resulting hash includes the line break characters.

The second approach adds each string from the string list in turn to the same hash. It uses one of the TPJMD5.Process Unicode overloaded methods to do this.

There are advantages and disadvantages of each approach:

The second approach gives the same MD5 hash if you insert empty lines into the string list. This is because an empty string added to a hash makes no difference to it. (Try adding one or more Strings.Add('xxx'); statements to the above code to check this). The first approach gives a different hash because of the extra line breaks included in the string (providing that the TStrings.LineBreak property is not the empty string).
With the first approach changing the TStrings.LineBreak property will change the hash for the same string list. Therefore you must be careful to ensure that the line break is always the same. (Try changing the line Strings.LineBreak := #13#10; to Strings.LineBreak := #10;in the above code to confirm this).
The first approach introduces additional data into the mix (the line breaks) meaning that the hash doesn’t only relate to the list contents.

You must decide which of the approaches to use. If empty lines are not significant I would opt for the second approach as being more “pure”. However if empty lines are significant I would use the first approach.

MD5 How-to: How To Get the MD5 Hash of a String List

See Also

Links

Welcome to the new DelphiDabbler Code Library Documentation.

MD5 How-to: How To Get the MD5 Hash of a String List

See Also

Links