Skip to content

Add Huffman in C#. #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 29, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions chapters/data_compression/huffman/code/cs/HuffmanCoding.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
// submitted by Julian Schacher (jspp) with help by gustorn.
using System;
using System.Collections.Generic;
using System.Linq;

namespace HuffmanCoding
{
public class EncodingResult
{
public List<bool> BitString { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't much more efficient than a regular String and it's much more annoying to debug and print. If you seriously want to produce a packed binary result then I'd go with System.Collections.BitArray but I think a regular String works better for educational purposes.

Copy link
Member Author

@june128 june128 Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go for the BitArray then. I don't think a string is fitting in this case, since no real compression would be achieved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point isn't compression, it's showcasing thr algorithm. And there was no compression achieved with List<bool> either. I'd just go with String (as almost all other implementations in the AAA do).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean since we usually save whole bytes anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings in C# are UTF16 encoded which is 2 bytes. If you save the bitstring as a List<bool> that's N bytes, where N is the length of the bitstring. That means you literally only have any compression if the bitstring is a single bit long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think just having this be a string would make things a lot easier

public Dictionary<char, List<bool>> Dictionary { get; set; }
public HuffmanCoding.Node Tree { get; set; }

public EncodingResult(List<bool> bitString, Dictionary<char, List<bool>> dictionary, HuffmanCoding.Node tree)
{
this.BitString = bitString;
this.Dictionary = dictionary;
this.Tree = tree;
}
}

public static class HuffmanCoding
{
// The Node class used for the Huffman Tree.
public class Node : IComparable<Node>
{
public Node LeftChild { get; set; }
public Node RightChild { get; set; }
public List<bool> BitString { get; set; } = new List<bool>();
public int Weight { get; set; }
public string Key { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char would work here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the key?

Copy link
Contributor

@zsparal zsparal Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe char? so you can set it to null in the branches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key of a node is string, so that a parent node's/branch's key is a combination of all it's children's keys.


// Creates a leaf. So just a node is created with the given values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are totally superfluous

Copy link
Member Author

@june128 june128 May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? They are explaining what is done pretty well.

public static Node CreateLeaf(char key, int weight) => new Node(key.ToString(), weight, null, null);
// Creates a branch. Here a node is created by adding the keys and weights of both childs together.
public static Node CreateBranch(Node leftChild, Node rightChild) => new Node(leftChild.Key + rightChild.Key, leftChild.Weight + rightChild.Weight, leftChild, rightChild);
private Node(string key, int weight, Node leftChild, Node rightChild)
{
this.Key = key;
this.Weight = weight;
this.LeftChild = leftChild;
this.RightChild = rightChild;
}

public int CompareTo(Node other) => this.Weight - other.Weight;
}

// Node with biggest value at the top.
class NodePriorityList
{
public int Count => nodes.Count;

private List<Node> nodes = new List<Node>();

public NodePriorityList() { }
public NodePriorityList(List<Node> givenNodes)
{
this.nodes = givenNodes.ToList();
this.nodes.Sort();
}

public void Add(Node newNode)
{
var index = ~this.nodes.BinarySearch(newNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct. I think it should be along the lines of:

var index = this.nodes.BinarySearch(newNode);
if (index < 0)
    this.nodes.Insert(~index, newNode);
else
    this.nodes.Insert(index, newNode);

if (index == this.nodes.Count)
{
this.nodes.Add(newNode);
return;
}
this.nodes.Insert(~index, newNode);
}

public Node Pop()
{
var first = this.nodes.First();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First throws an exception if the source collection is empty, this is a bad idea. I would just do:

var result = this.nodes[0];
this.nodes.RemoveAt(0);
return result;

if (first != null)
this.nodes.Remove(first);
return first;
}
}

public static EncodingResult Encode(string input)
{
var root = CreateTree(input);
var dictionary = CreateDictionary(root);
var bitString = CreateBitString(input, dictionary);

return new EncodingResult(bitString, dictionary, root);
}

public static string Decode(EncodingResult result)
{
var output = "";
Node currentNode = result.Tree;
foreach (var boolean in result.BitString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably name this bit

{
// Go down the tree.
if (!boolean)
currentNode = currentNode.LeftChild;
else
currentNode = currentNode.RightChild;

// Check if it's a leaf node.
if (currentNode.Key.Count() == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check if all of the children are null. That's how you determine the leaves in a binary tree

Copy link
Contributor

@zsparal zsparal May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really not check for leaf nodes this way. It works when the alphabet of whatever you want to encode consists of single characters. You can also run Huffman coding on words, in which case this would faily miserably. The nice solution is to add something like this to Node:

public IsLeaf => LeftChild == null && RightChild == null;

// Then you can just do
if (currentNode.IsLeaf)
{
    // ...
}

{
output += currentNode.Key;
currentNode = result.Tree;
}
}
return output;
}

private static Node CreateTree(string input)
{
// Create a List of all characters and their count in input by putting them into nodes.
var nodes = input
.GroupBy(c => c)
.Select(n => Node.CreateLeaf(n.Key, n.Count()))
.ToList();

// Convert list of nodes to a NodePriorityList.
var nodePriorityList = new NodePriorityList(nodes);

// Create Tree.
while (nodePriorityList.Count > 1)
{
// Pop the two nodes with the smallest weights from the nodePriorityList and create a parentNode with the CreateBranch method. (This method adds the keys and weights of the childs together.)
var leftChild = nodePriorityList.Pop();
var rightChild = nodePriorityList.Pop();
var parentNode = Node.CreateBranch(leftChild, rightChild);

nodePriorityList.Add(parentNode);
}

return nodePriorityList.Pop();
}

private static Dictionary<char, List<bool>> CreateDictionary(Node root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm usually a big proponent for performance, but seriously, just use the recursive method here. With C# 7's local functions you can even make a nice, local function for the recursive part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll take a look, but I found the non-recursive version easy to understand. Didn't tried the recursive one tho.

{
var dictionary = new Dictionary<char, List<bool>>();

var stack = new Stack<Node>();
stack.Push(root);
Node temp;

while (stack.Count != 0)
{
temp = stack.Pop();

if (temp.Key.Count() == 1)
dictionary.Add(temp.Key[0], temp.BitString);
else
{
if (temp.LeftChild != null)
{
temp.LeftChild.BitString.AddRange(temp.BitString);
temp.LeftChild.BitString.Add(false);
stack.Push(temp.LeftChild);
}
if (temp.RightChild != null)
{
temp.RightChild.BitString.AddRange(temp.BitString);
temp.RightChild.BitString.Add(true);
stack.Push(temp.RightChild);
}
}
}

return dictionary;
}

private static List<bool> CreateBitString(string input, Dictionary<char, List<bool>> dictionary)
{
var bitString = new List<bool>();
foreach (var character in input)
bitString.AddRange(dictionary[character]);

return bitString;
}
}
}
40 changes: 40 additions & 0 deletions chapters/data_compression/huffman/code/cs/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// submitted by Julian Schacher (jspp) with help by gustorn.
using System.Collections;
using System.Collections.Generic;

namespace HuffmanCoding
{
class Program
{
static void Main(string[] args)
{
var result = HuffmanCoding.Encode("aaaabbbccd");
// Print dictionary.
foreach (var entry in result.Dictionary)
{
var bitString = "";
Copy link
Contributor

@zsparal zsparal May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just: string.Join("", entry.Value.Select(bit => bit ? '1' : '0')), but if you take the advice of just using strings, this whole thing is unnecessary.

foreach (var value in entry.Value)
{
if (value)
bitString += "1";
else
bitString += "0";
}
System.Console.WriteLine(entry.Key + " " + bitString);
}
// Print bitString.
var readableBitString = "";
foreach (var boolean in result.BitString)
{
if (boolean)
readableBitString += "1";
else
readableBitString += "0";
}
System.Console.WriteLine($"{readableBitString} count: {readableBitString.Length}");

var originalString = HuffmanCoding.Decode(result);
System.Console.WriteLine(originalString);
}
}
}
6 changes: 6 additions & 0 deletions chapters/data_compression/huffman/huffman.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,10 @@ Whether you use a stack or straight-up recursion also depends on the language, b
{% sample lang="hs" %}
### Haskell
[import, lang:"haskell"](code/haskell/huffman.hs)
{% sample lang="cs" %}
### C# #
HuffmanCoding.cs
[import, lang:"csharp"](code/cs/HuffmanCoding.cs)
Program.cs
[import, lang:"csharp"](code/cs/Program.cs)
{% endmethod %}