Skip to content

CI: enable parallel testing on arm64 build #36719 #38905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

fangchenli
Copy link
Member

@fangchenli fangchenli commented Jan 2, 2021

@fangchenli
Copy link
Member Author

It took about 19min to setup the environment and cache...

@jreback
Copy link
Contributor

jreback commented Jan 2, 2021

make sure it's actually using the workers in setup.py (-j)

and u can use 4

@jreback
Copy link
Contributor

jreback commented Jan 2, 2021

looks like it completed (with some failure)

mark the longest tests with arm_slow and try more workers

@fangchenli
Copy link
Member Author

I did a simple test on my Travis account.

os: linux

dist: bionic

language: python

python: 3.8

arch:
  - arm64
  - arm64-graviton2

addons:
  apt:
    packages:
      - libsnappy-dev

branches:
  only:
    - main

before_install:
  - |
    git clone https://github.com/pandas-dev/pandas.git
    cd pandas
install:
  - python -c 'import os,sys,fcntl; flags = fcntl.fcntl(sys.stdout, fcntl.F_GETFL); fcntl.fcntl(sys.stdout, fcntl.F_SETFL, flags&~os.O_NONBLOCK);'
  - python -m pip install --no-deps -U pip wheel setuptools
  - python -m pip install cython numpy python-dateutil pytz pytest pytest-xdist hypothesis
  - python setup.py build_ext -j4
  - python -m pip install -e . --no-build-isolation --no-use-pep517

script:
  - pytest -n 4 -m 'not slow and not network and not clipboard' pandas --junitxml=test-data.xml

It only took about 7 min to setup and 21 min to run the test. Let's see if using 4 cores would cut the test time to 20 min range.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2021

let start by limiting the directories it is running

we do this for the windows builds (for reference)

@fangchenli
Copy link
Member Author

let start by limiting the directories it is running

we do this for the windows builds (for reference)

I did some tests locally. The config that affects the runtime the most is the distribution algorithm. We were using loadfile, which group tests by file. It's extremely slow on arm machine. I changed it to the default value no, which assigns tests to workers one by one. Now, the test only took 1131.13s.

@jreback
Copy link
Contributor

jreback commented Jan 3, 2021

wow this looks great

ok can u open an issue for the failing arm tests themselves, xfail them and then make this build a required one

@jreback jreback added the CI Continuous Integration label Jan 3, 2021
@jreback jreback added this to the 1.3 milestone Jan 3, 2021
@jreback jreback added the ARM aarch64 architecture label Jan 3, 2021
@fangchenli
Copy link
Member Author

wow this looks great

ok can u open an issue for the failing arm tests themselves, xfail them and then make this build a required one

I xfailed those tests. But they still failed on CI. I tested them with py38 and numpydev on M1 mbp. The rolling test passed without xfail. But the other two tests passed with xfail.

@jreback
Copy link
Contributor

jreback commented Jan 3, 2021

hmm do we need to disable the cache for arm?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green

@jreback
Copy link
Contributor

jreback commented Jan 4, 2021

/azp run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@fangchenli fangchenli changed the title CI: test 2 workers on arm64 #36719 CI: enable parallel testing on arm64 build #36719 Jan 4, 2021
@azure-pipelines
Copy link
Contributor

Commenter does not have sufficient privileges for PR 38905 in repo pandas-dev/pandas

@fangchenli
Copy link
Member Author

The test failure in py37 macOS build is caused by the --dist=no.

@jreback
Copy link
Contributor

jreback commented Jan 4, 2021

note that travis is currently not running at all :-< working on a credit issue there

@jreback
Copy link
Contributor

jreback commented Jan 24, 2021

let's close for now as travis does not look like its coming back

@jreback jreback closed this Jan 24, 2021
@fangchenli fangchenli deleted the multicore-arm64 branch March 18, 2021 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM aarch64 architecture CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: arm Travis build timing out
2 participants