One scenario in which this bug occurs:
Prerequisites:
- using the execunix.c implementation
- may detect multiple finished jobs at the same time (in
exec_wait()) and call all of their make_closure() callbacks in
sequence
- globs.max_jobs set to 2
Notes:
- <<< X. f() >>> marks a function call f() made at depth X
- [...] are additional comments about the current program state
- We use the following target dependency tree (X <-- Y meaning Y is a
dependency (source) for X):
A
B
C <-- B
D <-- B
E <-- A, B
<<< 1. make1() >>>
...
<<< 2. make1c(B) >>>
...
[stack: <empty>]
[running jobs: A & B]
<<< 3. exec_wait() >>>
- detects job A as failed (which leaves a partially built
target A)
- detects job B as completed successfully
<<< 4. make_closure(A) >>>
- pushes state MAKE1D(A)
<<< 4. make_closure(B) >>>
- pushes state MAKE1D(B)
[control returns to make1()]
[stack: MAKE1D(A), MAKE1D(B)]
[running jobs: <none>]
- goes on to process the next state on the stack - MAKE1D(B)
<<< 2. make1d(B) >>>
- morphs its state node into MAKE1C(B)
[control returns to make1()]
[stack: MAKE1D(A), MAKE1C(B)]
[running jobs: <none>]
- goes on to process the next state on the stack - MAKE1C(B)
<<< 2. make1c(B) >>>
- pops its state off the stack
- there are no more commands to execute for building B so that
target is considered built and we notify its parents C, D & E
about this by pushing MAKE1B(E), MAKE1B(D) & MAKE1B(C) states
[control returns to make1()]
[stack: MAKE1D(A), MAKE1B(E), MAKE1B(D), MAKE1B(C)]
[running jobs: <none>]
- goes on to process the next state on the stack - MAKE1B(C)
<<< 2. make1b(C) >>>
- there are no more dependencies target C has to wait for so it
goes on to build target C by morphing its state to MAKE1C(C)
[control returns to make1()]
[stack: MAKE1D(A), MAKE1B(E), MAKE1B(D), MAKE1C(C)]
[running jobs: <none>]
- goes on to process the next state on the stack - MAKE1C(C)
<<< 2. make1c(C) >>>
- pops its state off the stack
<<< 3. exec_cmd(C) >>>
- runs a new job for building target C
[control returns to make1()]
[stack: MAKE1D(A), MAKE1B(E), MAKE1B(D)]
[running jobs: C]
- goes on to process the next state on the stack - MAKE1B(D)
<<< 2. make1b(D) >>>
- there are no more dependencies target D has to wait for so it
goes on to build target D by morphing its state to MAKE1C(D)
[control returns to make1()]
[stack: MAKE1D(A), MAKE1B(E), MAKE1C(D)]
[running jobs: C]
- goes on to process the next state on the stack - MAKE1C(D)
<<< 2. make1c(D) >>>
- pops its state off the stack
<<< 3. exec_cmd(D) >>>
- runs a new job for building target D.
- since we are already at the globs.max_jobs limit - wait for
one of the jobs to complete
<<< 3. exec_wait() >>>
- detects job C as interrupted
<<< 4. make_closure(C) >>>
- raises the intr interupt flag
- pushes state MAKE1D(C)
[control returns to make1()]
[stack: MAKE1D(A), MAKE1B(E), MAKE1D(C)]
[running jobs: D]
- goes on to process the next state on the stack - MAKE1D(C)
- intr is set - pops the MAKE1C(D) state from the stack
- as an unrelated issue, note that from now we are accessing the
MAKE1C(D) stack node information from 'released memory' which
smells but works fine as long as we do not push another state
on the stack due to the way released node memory is managed
internally - via a separate 'free list' to be reused instead
of actually releasing it
<<< 2. make1d(C) >>>
- morphs its (already popped!) state node into MAKE1C(C)
[control returns to make1()]
[stack: MAKE1D(A), MAKE1B(E)]
[running jobs: D]
- goes on to process the next state on the stack - MAKE1B(E)
- intr is set - pops the MAKE1B(E) state from the stack
<<< 2. make1b(E) >>>
- there are more unbuilt dependencies (A) left for target E so
it just wants to pop its state off the stack but since that
state has already been popped it actually pops off the next
state MAKE1D(A)
[control returns to make1()]
[stack: <empty>]
[running jobs: D]
...
As you can see in the last shown state, the MAKE1D(A) state got popped prematurely and never got processed - thus causing the partially built target A never to be removed.
There are several issues present in this scenario and fixing any of them would solve the problem at hand. This commit just picks 'the easiest one'.
[SVN r79412]
BUG: May cause access violations (crashes, core dumps or other undefined behaviour) by dereferencing a NULL pointer on non-Windows builds.
[SVN r79311]
command status when max buffer not size not
unlimited and buffer full. Change character
before buffer null terminator to be a newline
so command status appears on its own line.
[SVN r79184]
If -p option value 0 is specified (the default), the child's stdout & stderr output streams are both collected into a single pipe and sent merged to the build process's stdout output.
If any other -p option value is specified, the child's stdout & stderr output streams are collected separately and redirected based on the -p parameter value:
1 - stdout to stdout, stderr forgotten
2 - stdout forgotten, stderr to stderr
3 - stdout to stdout, stderr to stderr.
[SVN r79123]