Open
Description
We don’t have a lot of Word docs in our DB, but there are a few and Analysts have noted that they are a pain. That said, we aren’t any worse than the existing tool (Versionista), plus we can do edgi-govdata-archiving/web-monitoring-ui#186, so this isn’t a high priority.
I don’t know if there are any great Linux tools out there for rendering a .doc
file, but there certainly a few libraries that can handle .docx
, like Mammoth: https://github.com/mwilliamson/python-mammoth, which can convert to HTML, Markdown, or plain text, any of which we could then diff with existing algorithms.
We could also use a service like Zamzar to convert, then diff.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Inbox