Skip to content

Google doc dates returned as unicode (e.g., \ue907) #2547

Open
@nick-youngblut

Description

@nick-youngblut

Example code:

import os
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
from google.auth import default

def get_document_dates(doc_id, creds_file=None):
    scopes = ['https://www.googleapis.com/auth/documents.readonly']
    if creds_file and os.path.exists(creds_file):
        creds = Credentials.from_service_account_file(creds_file, scopes=scopes)
    else:
        creds, project = default(scopes=scopes)
    
    # Build the Docs API service
    service = build('docs', 'v1', credentials=creds)
    
    # Get the document
    document = service.documents().get(
        documentId=doc_id,
        fields='body'  
    ).execute()
    
    # Access the document's content
    content = document.get('body').get('content')
    
    # Process each element
    for element in content:
        if 'paragraph' in element:
            paragraph = element.get('paragraph')
            elements = paragraph.get('elements', [])
            
            for elem in elements:
                print(elem)

The first section of the doc:

Image

I want to parse the date via the python API: Jan 13, 2025.

The first few elements printed:

{'startIndex': 1, 'endIndex': 5, 'textRun': {'content': '\ue907 | ', 'textStyle': {}}}
{'startIndex': 5, 'endIndex': 6, 'richLink': {'richLinkId': 'kix.p3Xj3hkh7bXl', 'textStyle': {}, 'richLinkProperties': {'title': 'Asana Board New NGS Submissions', 'uri': 'https://www.google.com/calendar/event?eid=XXX'}}}
{'startIndex': 6, 'endIndex': 7, 'textRun': {'content': '\n', 'textStyle': {}}}
{'startIndex': 7, 'endIndex': 18, 'textRun': {'content': 'Attendees: ', 'textStyle': {}}}

The date is returned in the first element as \ue907. How can that be converted to a date?

Note: there is a richLinkId in the second element, but that is for a separate calendar element, and not the Jan 13, 2025 date element.

More generally, why are date elements returned as unicode instead of something easier to work with?

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions