Monday, 6 August 2018

Simple Base64 encoder in C#

Just read a bit about Base64 encoding and decided to try out writing my own in Linqpad for strings! The way you Base64 encode is to treat each char in the input string as a byte value with groups of six bytes (this byte value is zero padded left) and then mapping the 2^6 values into a Base64 table and outputting the corresponding values into a resulting string. Note that you also pad the Base64 string with '=' char to ensure the entire bit length is divisible with three. That is why Base64 strings in .NET always have 0,1 or 2 '=' chars at the end. The characters used in the Base64 encoding is in .NET the chars [A-z] and [0-9] plus + and slash /.

void Main()
{
 string wordToBase64Encode = "Hello world from base64 encoder. This is a sample input string, does it work?"; 
 string wordBase64EncodedAccordingToNet = Convert.ToBase64String(Encoding.ASCII.GetBytes(wordToBase64Encode)).Dump();
 string wordBase64EncodedAccordingToCustomBase64 = wordToBase64Encode.ToBase64String().Dump();
 (("Are the two strings equal?: ") + string.Equals(wordBase64EncodedAccordingToNet, wordBase64EncodedAccordingToCustomBase64)).Dump();
 
}

public static class LowLevelBase64Extensions {

    private static readonly char[] _base64Table =
 {
  'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 
  'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
  'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 
  'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
  '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/',
 };

 public static string ToBase64String(this string toBeEncoded){
     byte[] toBeEncodedByteArray = Encoding.ASCII.GetBytes(toBeEncoded);
  string toBeEncodedBinaryString = 
  string.Join("", 
   toBeEncodedByteArray.Select(b => (Convert.ToString(b, 2).PadLeft(8, '0'))));
   
  //toBeEncodedBinaryString.Length.Dump();
   
  int padCharacters = toBeEncodedBinaryString.Length % 3;
  //toBeEncoded.Dump();
       
  StringWriter sw = new StringWriter();

  for (var i = 0; i < toBeEncodedBinaryString.Length; i = i + 6)
  {
   string encodingToMap =
    toBeEncodedBinaryString.Substring(i, Math.Min(toBeEncodedBinaryString.Length - i, 6))
    .PadRight(6, '0');
   //encodingToMap.Dump();
   byte encodingToMapByte = Convert.ToByte(encodingToMap, 2);
   char encodingMapped = _base64Table[(int)encodingToMapByte];
   sw.Write(encodingMapped);
  }
  
  sw.Write(new String('=', padCharacters));
  
  //Check 24 bits / 3 byte padding 
  return sw.ToString();
 }
}

Note - the casting to int of the encodingToMapByte also works if you cast the byte representation to short or byte. I have compared this custom Base64Encoder with .NET's own Convert.ToBase64String(), and they give equal strings in the use cases I have tried out. Make note that this is just playing around, .NET's own method is optimized for speed. This is more to show what happens with the input string to generate a Base64 encoded string. You can use this code as a basis for custom encoding shemes like Base32. I have used Encoding.ASCII.GetBytes to look up the ASCII code. So this implementation primarily supports input strings in languages using ASCII and Extended ASCII characters, at least that is what I have tested it with. The output in Linqpad shows the following corresponding results:

No comments:

Post a Comment