Skip to content

Add differ for word docs? #7

Open
@Mr0grog

Description

@Mr0grog

We don’t have a lot of Word docs in our DB, but there are a few and Analysts have noted that they are a pain. That said, we aren’t any worse than the existing tool (Versionista), plus we can do edgi-govdata-archiving/web-monitoring-ui#186, so this isn’t a high priority.

I don’t know if there are any great Linux tools out there for rendering a .doc file, but there certainly a few libraries that can handle .docx, like Mammoth: https://github.com/mwilliamson/python-mammoth, which can convert to HTML, Markdown, or plain text, any of which we could then diff with existing algorithms.

We could also use a service like Zamzar to convert, then diff.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Inbox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions