-
-
Notifications
You must be signed in to change notification settings - Fork 46.8k
Added Burrows-Wheeler transform algorithm. #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t number_theory folder
…e 'python3 -m doctest -v data_structures/hashing/*.py' and 'python3 -m doctest -v data_structures/stacks/*.py' were failing not finding hash_table.py and stack.py modules.
…ere is a space after it the script will break making it much more error prone.
…. Limited line length to 79 and executed python black over all scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I approve as is but consider using PEP8 and PEP484 --> PEP526...
https://pep8.org/#programming-recommendations
https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html
compression/burrows_wheeler.py
Outdated
""" | ||
|
||
|
||
def all_rotations(string): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Python type hints https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html They take some getting used to but can catch bugs. Our automated testing already runs mypy on all pull requests.
Unfortunately, string is the name of a module in the Python Standard Library and str is the name of a Python data type so to avoid shadowing them, do not use those words as variable names. This is one of the very few places where I would advocate using a single letter variable name: s.
compression/burrows_wheeler.py
Outdated
:param str string: The string that will be rotated len(string) times. | ||
:return: A list with len(string) rotations of the parameter string. | ||
:rtype: list[str] | ||
:raises TypeError: If the string parameter type is not str. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints + a textual description might actually be more helpful than restructured text but if you stick with restructured text then avoid repeating things that are already in the type hints.
compression/burrows_wheeler.py
Outdated
... | ||
TypeError: The parameter string type must be str. | ||
""" | ||
if not (type(string) is str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 says
Object type comparisons should always use isinstance() instead of comparing types directly.
flake8 E721 will find these automatically.
compression/burrows_wheeler.py
Outdated
:raises TypeError: If the string parameter type is not str. | ||
Examples: | ||
|
||
>>> all_rotations("^BANANA|") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> all_rotations("^BANANA|") # doctest: +NORMALIZE_WHITESPACE
This comment is a doctest directive that allows you to lose the backslashes and indent the way your want because PEP8 does not advocate the use of backslashes. It says:
The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.
compression/burrows_wheeler.py
Outdated
... | ||
ValueError: The parameter string must not be empty. | ||
""" | ||
if not (type(string) is str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance()
compression/burrows_wheeler.py
Outdated
|
||
|
||
if __name__ == "__main__": | ||
string = input("Provide a string that I will generate its BWT transform: ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider doing input().strip() to remove leading and/or trailing whitespace from user input.
@cclauss my last commit takes into account your last comments. Thanks they were really helpful. If everything is ok I think it can be merged. |
* Added doctest and more explanation about Dijkstra execution. * tests were not passing with python2 due to missing __init__.py file at number_theory folder * Removed the dot at the beginning of the imported modules names because 'python3 -m doctest -v data_structures/hashing/*.py' and 'python3 -m doctest -v data_structures/stacks/*.py' were failing not finding hash_table.py and stack.py modules. * Moved global code to main scope and added doctest for project euler problems 1 to 14. * Added test case for negative input. * Changed N variable to do not use end of line scape because in case there is a space after it the script will break making it much more error prone. * Added problems description and doctests to the ones that were missing. Limited line length to 79 and executed python black over all scripts. * Changed the way files are loaded to support pytest call. * Added __init__.py to problems to make them modules and allow pytest execution. * Added project_euler folder to test units execution * Changed 'os.path.split(os.path.realpath(__file__))' to 'os.path.dirname()' * Added Burrows-Wheeler transform algorithm. * Added changes suggested by cclauss
I added a docstring with parameters and return explanation but I'm not sure what is the right way to write it, I looked at google for a reference and only found PEP 257 that does not define exactly how these things should be written in a strict way.
I also found this stack overflow question that talks about PEP 287 and how reStructuredText (reST) format is used by Sphinx to generate documentation and seems to be a good approach.
I think it would be nice to adopt a style to standardize the project documentation. What you guys think?