Closed
Description
I have a kerchunked dataset that loads in about 20s if I use Dask, and about 1s if I don't:
import fsspec
import xarray as xr
combined_parquet_aws = 's3://usgs-coawst/useast-archive/combined.parq'
fs_ref = fsspec.implementations.reference.ReferenceFileSystem(
combined_parquet_aws, remote_protocol="s3", target_protocol="s3", lazy=True)
# Method 1 (with Dask) -- takes 15-30s:
ds = xr.open_dataset(
fs_ref.get_mapper(), engine="zarr", drop_variables=['dstart'],
backend_kwargs={"consolidated": False}, chunks={})
# Method 2 (no Dask) -- takes 1-3s:
ds = xr.open_dataset(
fs_ref.get_mapper(), engine="zarr", drop_variables=['dstart'],
backend_kwargs={"consolidated": False})
When I want to use Intake to open into Xarray, I have always used to_dask()
(Method 1):
import intake
intake_catalog_url = 's3://usgs-coawst/useast_archive/coawst_useast.yml'
cat = intake.open_catalog(intake_catalog_url)
coawst = cat['COAWST_USEAST_Archive']
ds = coawst.to_dask()
I tried .to_chunked()
and it took the same amount of time as .to_dask()
How can I specify Method 2 using Intake (and get the datasets opening in a few seconds intead of 15-30!)?
Metadata
Metadata
Assignees
Labels
No labels