Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pre-commit in CI as well to reduce deduplication #11211

Closed
VannTen opened this issue May 19, 2024 · 15 comments · Fixed by #11226
Closed

Use pre-commit in CI as well to reduce deduplication #11211

VannTen opened this issue May 19, 2024 · 15 comments · Fixed by #11226
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@VannTen
Copy link
Contributor

VannTen commented May 19, 2024

What would you like to be added

Switch some of our CI (all the tests mirroring our pre-commit hooks) to pre-commit.ci

Why is this needed

This would avoid duplication between .pre-commit-config.yaml and our .gitlab-ci/ files

It also free some of our CI resources.
We should check if the pre-commit CI works correctly with the Github status API, (prow relies on that to merge PRs) and we probably don't want the autofix PR stuff.

I don't know who can add the app to the repository though, are approvers also repository admins ?
@yankay @mzaian @floryut

@VannTen VannTen added the kind/feature Categorizes issue or PR as related to a new feature. label May 19, 2024
@yankay
Copy link
Member

yankay commented May 20, 2024

What would you like to be added

Switch some of our CI (all the tests mirroring our pre-commit hooks) to pre-commit.ci

Why is this needed

This would avoid duplication between .pre-commit-config.yaml and our .gitlab-ci/ files

It also free some of our CI resources. We should check if the pre-commit CI works correctly with the Github status API, (prow relies on that to merge PRs) and we probably don't want the autofix PR stuff.

I don't know who can add the app to the repository though, are approvers also repository admins ? @yankay @mzaian @floryut

HI @VannTen

Great Idea.

@floryut may have the permission :-)

@ant31
Copy link
Contributor

ant31 commented May 21, 2024

I've added it to the project

See the linked PR (#11223).

Looks like the pre-commit can be run from the CI, to avoid duplication as mentioned:
to add in .gitlab-ci.yml:

build:
  image: python:3.8
  script:
    - pip install pre-commit
    - pip install .
    - pre-commit run --all-files

That way it can fail on a early stage and not trigger all the tests/build if needed

@VannTen
Copy link
Contributor Author

VannTen commented May 21, 2024 via email

@ant31
Copy link
Contributor

ant31 commented May 21, 2024

pre-commit in gitlab ci it's not avoiding duplications ?

Which duplications are you talking about then ?

@VannTen
Copy link
Contributor Author

VannTen commented May 21, 2024

I mean if we run it in gitlab CI and in pre-commit.ci

(Though yeah it's still deduplicating between .pre-commit.config.yaml and .gitlab-ci/ , but we should probably not run them twice, I think ?)

@ant31
Copy link
Contributor

ant31 commented May 22, 2024

My understanding is that some of the test are duplicated in .gitlab.yaml and in .pre-commit-config.yaml

For example:
precommit:

  - repo: https://github.com/adrienverge/yamllint.git
    rev: v1.27.1
    hooks:
      - id: yamllint
        args: [--strict]

.gitlab-ci/lint.yaml

---
yamllint:
  extends: .job
  stage: unit-tests
  tags: [light]
  variables:
    LANG: C.UTF-8
  script:
    - yamllint --strict .
  except: ['triggers', 'master']

Instead of having them duplicated as you suggested, we can have all the pre-commit-config.yaml run as one or few jobs in the current pipeline by using the command pre-commit run --all-files

@VannTen
Copy link
Contributor Author

VannTen commented May 22, 2024

Instead of having them duplicated as you suggested, we can have all the pre-commit-config.yaml run as one or few jobs in the current pipeline by using the command pre-commit run --all-files

Yes, there is two differents things we are talking about I think:

  1. don't duplicate .pre-commit-config.yaml into .gitlab-ci/
  2. Choose if we run pre-commit hooks from our CI (gitlab-ci) with either a single stage or using for instance a dynamic child pipeline OR in pre-commit.ci (via the Github app), and not in gitlab CI.

I think 1. is needed anyway, but for 2 I'm not sure what is the preferred option. pre-commit.ci is more rigid but we don't have to maintain it. We should also check if it block tide merge correctly.

@ant31
Copy link
Contributor

ant31 commented May 22, 2024

Yes 100% with 1.
for 2., I tend to go for fewer systems than more. It will still be something to look at, configure, upgrade...
Another issue is that it's no longer part of the pipeline and tests will move to the next stage regardless of the precommit.ci result.
Currently, the pipeline stops at the 'unit-tests' stage. If any is failing, we should continue to do that as spinning VM and other tests are much more expensive in resources.

Untitled

@VannTen
Copy link
Contributor Author

VannTen commented May 22, 2024 via email

@ant31
Copy link
Contributor

ant31 commented May 22, 2024

I actually think we should have less dependencies between jobs

We don't want to spend "money" on tests for a PR that doesn't pass unit tests and or fails to deploy a simple setup.
It got to this configuration because we had too many parallel long-running jobs on every push, and it used too much.

How many pushes per PR on average? How many commits failed on early tests?
The pipeline is set up to use more resources at each stage increasingly.

If we could achieve that, it would mean faster pipelines and maximize resource utilization.

That's assuming resources are not utilized; they are.
Faster pipeline: Yes, but in practice no when waiting time for an available runner or VM is added.

Each job on each stage is configured to run in parallel when possible. Sometimes they don't, because it has already reached the limit.

We have a fixed set of resources / $ to use.
That said, the pipeline can still be reorganized and improved, but an overall guideline is that it should not use much "more".

@VannTen
Copy link
Contributor Author

VannTen commented May 23, 2024 via email

@VannTen VannTen changed the title Switching our lint test to use pre-commit ci Use pre-commit in CI as well to reduce deduplication May 23, 2024
@ant31
Copy link
Contributor

ant31 commented May 23, 2024

Such changes were explored several years back (everything in a single stage) and rolled-back due to too high usage and we had even more resources than now: #2325

I don't want to use more resources, I want to use the same amount, but
faster (in other words, maximize resource utilization). The cluster does not autoscale, right ?

It does for some jobs up to 12 machines. Otherwise, it's only one machine running it all, and it's already heavily used in term of memory, of course there are low-usage period in the day.
There are still various improvements possible, for example, those machines that are created on each push:
They are: 1. long to get online, 2: expensive (we could have one more server for that cost).
It would already get things faster without them (the docker-machine jobs)

auto-cancellation on re-push / fail (that'd need a rework of the
provisioning so that GC is independant from gitlab CI jobs, I think.
Maybe doable with ownerRefs in the cluster 🤔)

Last I checked, there's already a GC on schedule, so that's fine. And a new push on the same PR should canceled previous pipeline (if not it's missconfigured)

Even better would be Coscheduling (so that we only schedule a complete testcase inventory in max usage scenario). Since we now have it in Kubespray, that might be easier that I think.

^ not sure to understand this

@ant31
Copy link
Contributor

ant31 commented May 23, 2024

@VannTen Once we remove the autoscaling part (all jobs with the tag "c3.small.x86"), things can be tested and remapped.
Worst case, it's reverted.
Happy to try few stages and see

@VannTen
Copy link
Contributor Author

VannTen commented May 23, 2024 via email

@ant31
Copy link
Contributor

ant31 commented May 23, 2024

That's the docker-machine jobs ?

Yes

Coscheduling[1] would ensure that a set of pods are either all scheduled together or not at all (well, until more resources are freed). So it minimizes the impact of "pending" jobs.

sounds good

It does not work completely, because some jobs are marked interruptible: false, so once they are started the pipeline does not cancel. I'm not sure if it's because it would left stuff behind otherwise ?

I don't know. I suggest to remove interruptible: false and evaluate the situation.

Yeah but the schedule is hourly right ?

Yes, also the VM / docker-machines had an automatic teardown after 4h (I reduced to 1h last week)

What I was thinking about was having ownerRefs of the Pod job in created Kubevirt Objects, so once the job is over and the pod discarded, the Kubevirt stuff is always gc'ed by k8s immediately.

Yes, the idea was in the air at some point. Not sure why it wasn't implemented, could be lack of time from maintainers or a blocker that I'm not aware off. Worth to investigate again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants