Skip to content

open in xarray without dask? #138

Closed
Closed
@rsignell

Description

@rsignell

I have a kerchunked dataset that loads in about 20s if I use Dask, and about 1s if I don't:

import fsspec
import xarray as xr

combined_parquet_aws = 's3://usgs-coawst/useast-archive/combined.parq'

fs_ref = fsspec.implementations.reference.ReferenceFileSystem(
    combined_parquet_aws, remote_protocol="s3", target_protocol="s3", lazy=True)


# Method 1 (with Dask) -- takes 15-30s:
ds = xr.open_dataset(
    fs_ref.get_mapper(), engine="zarr", drop_variables=['dstart'],
    backend_kwargs={"consolidated": False}, chunks={})

# Method 2 (no Dask) -- takes 1-3s:
ds = xr.open_dataset(
    fs_ref.get_mapper(), engine="zarr", drop_variables=['dstart'],
    backend_kwargs={"consolidated": False})

When I want to use Intake to open into Xarray, I have always used to_dask() (Method 1):

import intake
intake_catalog_url = 's3://usgs-coawst/useast_archive/coawst_useast.yml'
cat = intake.open_catalog(intake_catalog_url)
coawst = cat['COAWST_USEAST_Archive']
ds = coawst.to_dask() 

I tried .to_chunked() and it took the same amount of time as .to_dask()

How can I specify Method 2 using Intake (and get the datasets opening in a few seconds intead of 15-30!)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions