Why is my distribution of tests unbalanced in the Queue Mode?
If you have slow test files that are a bottleneck, this could lead to one of the parallel jobs running tests for too long. You could split test files by test examples.
When you retry a CI build, and you use the
KNAPSACK_PRO_FIXED_QUEUE_SPLIT=trueenvironment variable, then you can see that parallel jobs sometimes do not finish work at a similar time.
When you use the Queue Mode, you most like have set the environment variable
Thanks to that, when you retry an individual parallel job or retry the whole CI build (retry all the parallel jobs), the tests will be distributed the same way as during the first run of the CI build.
This allows users to retry individual parallel jobs and run the same set of tests as when tests were assigned to the job from the Queue API for the first time.
Your parallel jobs can run the Knapsack Pro command at different times:
- one of the parallel jobs could start work later because CI server resources were not available
- there could be a random delay in starting the Knapsack Pro command when loading your project dependencies from CI cache or the database setup is slower on one of the parallel jobs. Simply, some steps that happen before running the Knapsack Pro command could vary in time between the parallel jobs (before Knapsack Pro is ran).
All of the above can lead to the fact that the Knapsack Pro commands could start work at different times across your parallel jobs during the very first CI build run. This problem of the Knapsack Pro commands starting at a different time is mitigated thanks to the Queue Mode so that the jobs finish at a similar time. It also means some of the parallel jobs could run more tests than others.
When you retry a CI build/parallel job and you use
KNAPSACK_PRO_FIXED_QUEUE_SPLIT=true, you expect the same set of tests to run on parallel jobs. This is possible thanks to the caching of the distribution of tests on the API side. It could happen that some of the parallel jobs run more tests than others because this is how tests were distributed during the first CI build run before the distribution of tests was cached.
If you would like to run a CI build in Queue Mode (without caching on the API side) to get the optimal CI build time for the retried CI build, you could set
but then you shouldn't retry individual jobs because this can lead to a bug. If you retry an individual job, a new queue will start on the API side, and the retried job would consume all tests from the queue.
Because of that, when you use
KNAPSACK_PRO_FIXED_QUEUE_SPLIT=false then you must always retry all parallel jobs.
There is a trade-off that comes down to this:
KNAPSACK_PRO_FIXED_QUEUE_SPLIT=true) being able to retry individual parallel jobs (it preserves CI resources because you don't have to retry all parallel jobs)
KNAPSACK_PRO_FIXED_QUEUE_SPLIT=false) being able to run faster CI builds on retry, but this means all parallel jobs must be retried (and you need to ensure developers can't accidentally retry only one job because this job would run for a very long time to execute the whole test suite because no other job would be consuming the queue).