Skip to content

TFDV on Dataflow getting OOMs frequently #190

Open
@cyc

Description

@cyc

I am using TFDV 1.2.0 and have a problem where I am consistently getting workers OOMing on Dataflow even with very large instance types (e.g. n2-highmem-16 and n2-highmem-32). I've tried decreasing --number_of_worker_harness_threads to half the available number of CPUs in hopes of decreasing memory usage to no avail. The OOMs typically happen at the very end of the job when it is done loading the data but still trying to combine the stats together.

The dataset itself is quite large, with 4000-5000 features spanning ~1e9 rows (and some features are quite long varlen). Do you have any tips for how to decrease memory usage on workers? I suspect that the issue may come from some high-cardinality (~1e8) string features we have, as computing the frequency counts for these features is probably very memory-intensive.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions