Saturday, 4 January 2014

Compressing a byte array in C# with GZipStream

In .NET 4.0 or later versions, it is possible to compress a byte array with GZipStream and therefore the GZip algorithm. The GZipStream can be outputted to an array or a file. The code below shows a wrapper class for compressing a byte array, decompressing it and a unit test that reads all the bytes in text file, then compresses it, decompresses it and checks that the decompressed byte array has the same byte values as the bytes read from the text file. Compression and decompression code next:

using System;
using System.IO;
using System.IO.Compression;

namespace TestCompression
{
    
    /// 
    /// Compresses or decompresses byte arrays using GZipStream
    /// 
    public static class ByteArrayCompressionUtility
    {

        private static int BUFFER_SIZE = 64*1024; //64kB

        public static byte[] Compress(byte[] inputData)
        {
            if (inputData == null)
                throw new ArgumentNullException("inputData must be non-null");

            using (var compressIntoMs = new MemoryStream())
            {
                using (var gzs = new BufferedStream(new GZipStream(compressIntoMs, 
                 CompressionMode.Compress), BUFFER_SIZE))
                {
                    gzs.Write(inputData, 0, inputData.Length);
                }
                return compressIntoMs.ToArray(); 
            }
        }

        public static byte[] Decompress(byte[] inputData)
        {
            if (inputData == null)
                throw new ArgumentNullException("inputData must be non-null");

            using (var compressedMs = new MemoryStream(inputData))
            {
                using (var decompressedMs = new MemoryStream())
                {
                    using (var gzs = new BufferedStream(new GZipStream(compressedMs, 
                     CompressionMode.Decompress), BUFFER_SIZE))
                    {
                        gzs.CopyTo(decompressedMs);
                    }
                    return decompressedMs.ToArray(); 
                }
            }
        }

        //private static void Pump(Stream input, Stream output)
        //{
        //    byte[] bytes = new byte[4096];
        //    int n;
        //    while ((n = input.Read(bytes, 0, bytes.Length)) != 0)
        //    {
        //        output.Write(bytes, 0, n); 
        //    }
        //}
        


    }

}


In the code, memorystreams are used and the ToArray() method is used to generate byte arrays. The GZipStream can have a compression mode of either Compress or Decompress. The GZipStream in the compress and decompress methods are wrapped with BufferedStream with a buffer size of 64kB. This is done to be able to handle larger files. I have tested this code in a unit test with a lorem ipsum generated text file about 5,5 MB. The unit test is shown next:

using System;
using NUnit.Framework;
using System.Text;
using System.IO;
using System.Linq;


namespace TestCompression.Test
{
    [TestFixture]
    public class UnitTest1
    {

        [Test]
        public void CompressAndUncompressString()
        {
            byte[] inputData = File.ReadAllBytes("Lorem1.txt");
            byte[] compressedData = ByteArrayCompressionUtility.Compress(inputData);
            byte[] decompressedData = ByteArrayCompressionUtility.Decompress(compressedData);

            Assert.IsNotEmpty(inputData);
            Assert.IsNotEmpty(decompressedData);
            Assert.IsTrue(inputData.SequenceEqual(decompressedData));

            Console.WriteLine("Compressed size: {0:F2}%", 
             100 * ((double)compressedData.Length / (double)decompressedData.Length));

            //string outputString = Encoding.UTF8.GetString(decompressedData);

        }

    }
}


Output of this unit test is shown next:

------ Test started: Assembly: TestCompression.Test.dll ------

Compressed size: 28,74%

1 passed, 0 failed, 0 skipped, took 18,87 seconds (NUnit 2.6.2).



To generate a lorem ipsum text file, you can use a lorep ipsum generator here: http://loripsum.net

3 comments: