-
-
Notifications
You must be signed in to change notification settings - Fork 359
Add Huffman in C#. #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Huffman in C#. #82
Conversation
{ | ||
public class EncodeResult | ||
{ | ||
public List<bool> BitString { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't much more efficient than a regular String
and it's much more annoying to debug and print. If you seriously want to produce a packed binary result then I'd go with System.Collections.BitArray
but I think a regular String
works better for educational purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go for the BitArray then. I don't think a string is fitting in this case, since no real compression would be achieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point isn't compression, it's showcasing thr algorithm. And there was no compression achieved with List<bool>
either. I'd just go with String
(as almost all other implementations in the AAA do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean since we usually save whole bytes anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings in C# are UTF16 encoded which is 2 bytes. If you save the bitstring as a List<bool>
that's N
bytes, where N
is the length of the bitstring. That means you literally only have any compression if the bitstring is a single bit long.
// The Node class used for the Huffman Tree. | ||
public class Node | ||
{ | ||
public Node[] Children { get; set; } = new Node[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have two children, Left
and Right
. This only makes it easier to misuse the class
public Node[] Children { get; set; } = new Node[2]; | ||
public List<bool> BitString { get; set; } = new List<bool>(); | ||
public int Weight { get; set; } | ||
public string Key { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char
would work here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe char?
so you can set it to null
in the branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key of a node is string, so that a parent node's/branch's key is a combination of all it's children's keys.
|
||
public void AddNode(Node newNode) | ||
{ | ||
if (Nodes.Count == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have Node
implement IComparable<Node>
then use something like:
var insertAt = Math.Max(Nodes.BinarySearch(newNode), 0);
Nodes.Insert(insertAt, newNode)
public List<Node> Nodes { get; private set; } = new List<Node>(); | ||
|
||
public NodePriorityList() { } | ||
public NodePriorityList(List<Node> nodes) => Nodes = nodes.OrderByDescending(n => n.Weight).ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have Node
implement IComparable
and just do:
public NodePriorityList(List<Node> nodes)
{
Nodes = nodes.ToList();
Nodes.Sort();
}
|
||
namespace HuffmanCoding | ||
{ | ||
public class EncodeResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better name would be EncodingResult
currentNode = currentNode.Children[1]; | ||
|
||
// Check if it's a leaf node. | ||
if (currentNode.Key.Count() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just check if all of the children are null. That's how you determine the leaves in a binary tree
{ | ||
// Create a List of all characters and their count in input by putting them into nodes. | ||
var nodes = new List<Node>(); | ||
foreach (var character in input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a nicer way to do this. Pseudo-ish code with a toy example:
var input = "Hello, World";
var nodes = input
.GroupBy(c => c)
.Select(n => new Node { Key = n.Key, Weight = n.Count() })
.ToList();
// Create Tree. | ||
while (nodes.Count > 1) | ||
{ | ||
var parentNode = new Node("", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have a default constructor so you don't need to pass dummy parameters. I would actually have something like:
public class Node
{
public static Node CreateLeaf(char key, int weight) { return new Node(...); }
public static Node CreateBranch(Node left, Node right) { return new Node(...); }
private Node(...) { /* regular Node constructor with every parameter */ }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also create two constructors for the Node
class. One that takes key
and weight
and another one that takes a leftChild
and rightChild
. Your approach seems more declarative tho. What do you think?
nodes.RemoveAt(nodes.Count - 1); | ||
}; | ||
nodePriorityList.AddNode(parentNode); | ||
if (parentNode.Weight > 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here, the weights are not percentages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's left from debugging and I forgot to remove it :S
// Add the two nodes with the smallest weight to the parent node and remove them from the tree. | ||
for (int i = 0; i < 2; i++) | ||
{ | ||
parentNode.Children[i] = nodes.Last(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a nice method for this on the priority queue (and don't use the regular list at this point). Something like:
var left = nodeQueue.Pop();
var right = nodeQueue.Pop();
nodeQueue.AddNode(Node.CreateBranch(left, right)); // the weight will be calculated in `CreateBranch` as `left.Weight + right.Weight`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the improvements with using a queue? If I use it BinarySearch()
wouldn't be available. I mean, I can re-implement that probably, but if the queue doesn't provide improvements, I see no reason to do so.
} | ||
// Convert list of nodes to a NodePriorityList. | ||
var nodePriorityList = new NodePriorityList(nodes); | ||
nodes = nodePriorityList.Nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this, just use the queue. In fact, just make Nodes
private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the Node
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the Nodes
property on NodePriorityList
return nodePriorityList.Nodes[0]; | ||
} | ||
|
||
private static Dictionary<char, List<bool>> CreateDictionary(Node root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I'm usually a big proponent for performance, but seriously, just use the recursive method here. With C# 7's local functions you can even make a nice, local function for the recursive part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'll take a look, but I found the non-recursive version easy to understand. Didn't tried the recursive one tho.
Thanks for the review @Gustorn ! I will enhance the code in the following days. |
…bugging. Add mention.
…the readableBitString to console. Have Node implement IComparable<Node> and improve the code according to that. Improve the creation of the initial list of nodes. Refactor NodePriorityList. Improve Node creation.
{ | ||
var output = ""; | ||
Node currentNode = result.Tree; | ||
foreach (var boolean in result.BitString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably name this bit
public int Weight { get; set; } | ||
public string Key { get; set; } | ||
|
||
// Creates a leaf. So just a node is created with the given values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are totally superfluous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? They are explaining what is done pretty well.
{ | ||
public class EncodingResult | ||
{ | ||
public List<bool> BitString { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think just having this be a string
would make things a lot easier
// Print dictionary. | ||
foreach (var entry in result.Dictionary) | ||
{ | ||
var bitString = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just: string.Join("", entry.Value.Select(bit => bit ? '1' : '0'))
, but if you take the advice of just using strings, this whole thing is unnecessary.
|
||
public void Add(Node newNode) | ||
{ | ||
var index = ~this.nodes.BinarySearch(newNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct. I think it should be along the lines of:
var index = this.nodes.BinarySearch(newNode);
if (index < 0)
this.nodes.Insert(~index, newNode);
else
this.nodes.Insert(index, newNode);
|
||
public Node Pop() | ||
{ | ||
var first = this.nodes.First(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First
throws an exception if the source collection is empty, this is a bad idea. I would just do:
var result = this.nodes[0];
this.nodes.RemoveAt(0);
return result;
currentNode = currentNode.RightChild; | ||
|
||
// Check if it's a leaf node. | ||
if (currentNode.Key.Count() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should really not check for leaf nodes this way. It works when the alphabet of whatever you want to encode consists of single characters. You can also run Huffman coding on words, in which case this would faily miserably. The nice solution is to add something like this to Node
:
public IsLeaf => LeftChild == null && RightChild == null;
// Then you can just do
if (currentNode.IsLeaf)
{
// ...
}
…y we're not using something like a BitArray. Change the name of the enumerator "boolean" to "bit". Fix the Pop method by using "this.nodes[0]" instead of "this.nodes.First()". Improve the way to find out, if a Node is a Leaf.
…ty bobbity". Remove unnecessary braces.
Ok, everything should be done here now. But I may need to change the mention, so the merging needs to be delayed a bit. @Gustorn did a great job here again and without him the code would be worse. So I don't know if I should just write "with great help by x" in the files, since the help was big or just "with help by x", so things don't get subjective. |
I think it's a good idea to mention who reviewed it, so |
I'm not terribly attached to having contributor comments on the top of the files. It's clear from the git history and from the PRs. If someone wants to give some credit, I think something like: |
I think, I'll go for |
Should be done now. [put in party emoji here I guess] :) |
ToDo
EncodeResult
toEncodingResult
.IComparable
.Nodes
property ofNodePriorityList
private.NodePriorityList
to providePop()
.string
instead of aList<bool>
.boolean
tobit
.Node
from theNodePriorityList
.