Skip to content

Consistent interface to get text and bytes #895

Open
@jdavid

Description

@jdavid

Follow up from #610 #790 and #893

General policy:

  • Do not implement str()
  • To get unicode string use .text
  • In .text use UTF-8 and replace (the rationale for replace is explained in Patch: Add __str__ and __bytes__ for undecoded content. #790 (comment))
  • To get the byte string use .data or .raw (this is to be decided)
  • For attributes the name of the attribute returns text, prefix with raw_ to get bytes. For instance Signature.name and Signature.raw_name
  • Implement the buffer protocol, bytes(..) where appropriate

Open for discussion.

TODO:

  • Replace TreeEntry._name by .raw_name
  • Replace DiffLine.content by .text
  • Inventory all the places where we get bytes, text, or the buffer protocol
  • Settle on .data or .raw
  • Replace DiffLine.raw_content by .data or .raw
  • Replace Object.read_raw() by .data (or .raw), then remove Blob.data (it will inherit from Object)
  • Settle on str() bytes() and the buffer protocol

The case of Oid, what we've now:

  • oid.raw returns the byte string (that's good, unless we decide to settle on .data)
  • str(oid) and oid.hex both return the hex representation, always <str> (bytes in Python 2 and text in Python 3)
  • Oid is the only place where we implement str(...)
  • Object.hex and TreeEntry.hex behave the same, they return always <str>. Apparently these are the only places where we always return <str>.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions