Hello, This patchset is an attempt at cleaning up ptrace and signal behaviors, especially the interaction between ptrace and group stop. There are quite some number of problems in the area and the current behavior is often racy, indeterministic and sometimes outright buggy. This patchset aims to clean up the muddy stuff, and define and implement a clear interaction between ptrace and job control. This patchset contains the following fourteen patches. 0001-signal-fix-SIGCONT-notification-code.patch 0002-freezer-fix-a-race-during-freezing-of-TASK_STOPPED-t.patch 0003-freezer-remove-superflous-try_to_freeze-loop-in-do_s.patch 0004-signal-don-t-notify-parent-if-not-stopping-after-tra.patch 0005-signal-fix-premature-completion-of-group-stop-when-i.patch 0006-signal-use-GROUP_STOP_PENDING-to-avoid-stopping-mult.patch 0007-ptrace-add-why-to-ptrace_stop.patch 0008-ptrace-make-do_signal_stop-use-ptrace_stop-if-the-ta.patch 0009-ptrace-clean-transitions-between-TASK_STOPPED-and-TR.patch 0010-ptrace-don-t-consume-group-count-from-ptrace_stop.patch 0011-ptrace-make-group-stop-notification-reliable-against.patch 0012-ptrace-reorganize-__ptrace_unlink-and-ptrace_untrace.patch 0013-ptrace-make-SIGCONT-notification-reliable-against-pt.patch 0014-ptrace-remove-the-extra-wake_up_process-from-ptrace_.patch 0001-0002 are self-contained fixes. 0003-0004 are preparation patches. 0005 fixes over-consumption of group_stop_count by ptraced tasks which can lead to premature completion of group stop with some of the tasks in the group still running. 0006 prevents a ptraced task from being stopped multiple times for the same group stop instance. 0007-0009 updates the code such that a ptracee always enters and leaves TASK_TRACED on its own. ptracer no longer changes tracee's state underneath it; instead, it tells the tracee to enter the target state. A TASK_TRACED task is guaranteed to be stopped inside ptrace_stop() after executing the arch hooks while TASK_STOPPED task is ...
After calling freeze_task(), try_to_freeze_tasks() see whether the
task is stopped or traced and if so, considers it to be frozen;
however, nothing guarantees that either the task being frozen sees
TIF_FREEZE or the freezer sees TASK_STOPPED -> TASK_RUNNING
transition. The task being frozen may wake up and not see TIF_FREEZE
while the freezer fails to notice the transition and believes the task
is still stopped.
This patch fixes the race by making freeze_task() always go through
fake_signal_wake_up() for applicable tasks. The function goes through
the target task's scheduler lock and thus guarantees that either the
target sees TIF_FREEZE or try_to_freeze_task() sees TASK_RUNNING.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/freezer.c | 9 +++++++--
kernel/power/process.c | 6 ++++++
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index bd1d42b..66ecd2e 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -104,8 +104,13 @@ bool freeze_task(struct task_struct *p, bool sig_only)
}
if (should_send_signal(p)) {
- if (!signal_pending(p))
- fake_signal_wake_up(p);
+ fake_signal_wake_up(p);
+ /*
+ * fake_signal_wake_up() goes through p's scheduler
+ * lock and guarantees that TASK_STOPPED/TRACED ->
+ * TASK_RUNNING transition can't race with task state
+ * testing in try_to_freeze_tasks().
+ */
} else if (sig_only) {
return false;
} else {
diff --git a/kernel/power/process.c b/kernel/power/process.c
index e50b4c1..eb2c88a 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -64,6 +64,12 @@ static int try_to_freeze_tasks(bool sig_only)
* perturb a task in TASK_STOPPED or TASK_TRACED.
* It is "frozen enough". If the task does wake
* up, it will immediately call try_to_freeze.
+ *
+ * Because freeze_task() goes through p's
+ * scheduler lock after setting ...Looks good. I'll take this one to my tree, if you don't mind. Thanks, --
do_signal_stop() is used only by get_signal_to_deliver() and after a
successful signal stop, it always calls try_to_freeze(), so the
try_to_freeze() loop around schedule() in do_signal_stop() is
superflous and confusing. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
---
kernel/signal.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index fe004b5..0a6816a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
}
/* Now we don't run again until woken by SIGCONT or SIGKILL */
- do {
- schedule();
- } while (try_to_freeze());
+ schedule();
tracehook_finish_jctl();
current->exit_code = 0;
--
1.7.1
--
Acked-by: Rafael J. Wysocki <rjw@sisk.pl> --
Now that group stop pending and consumed states are properly tracked
per task, there is no need to consume group_stop_count from
ptrace_stop() which is inaccurate.
The only behavior change is the increased likelihood of missing
notifications for group stops if the thread group contains one or more
ptraced tasks, but they never were reliable in the presence of ptraced
tasks. With this change, consume_group_stop() is called only if group
stop is pending. Make sure it is so by adding WARN_ON_ONCE().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
kernel/signal.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 8341667..c084ea8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -225,6 +225,8 @@ static inline void print_dropped_signal(int sig)
static bool consume_group_stop(void)
{
+ WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_PENDING));
+
if (!(current->group_stop & GROUP_STOP_CONSUME))
return false;
@@ -1657,13 +1659,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
return;
}
- /*
- * If there is a group stop in progress,
- * we must participate in the bookkeeping.
- */
- if (current->signal->group_stop_count > 0)
- consume_group_stop();
-
current->last_siginfo = info;
current->exit_code = exit_code;
--
1.7.1
--
task->signal->group_stop_count is used to tracke the progress of group
stop. It's initialized to the number of tasks which need to stop for
group stop to finish and each stopping or trapping task decrements.
However, each task doesn't keep track of whether it decremented the
counter or not and if woken up before the group stop is complete and
stops again, it can decrement the counter multiple times.
Please consider the following example code.
static void *worker(void *arg)
{
while (1) ;
return NULL;
}
int main(void)
{
pthread_t thread;
pid_t pid;
int i;
pid = fork();
if (!pid) {
for (i = 0; i < 5; i++)
pthread_create(&thread, NULL, worker, NULL);
while (1) ;
return 0;
}
ptrace(PTRACE_ATTACH, pid, NULL, NULL);
while (1) {
waitid(P_PID, pid, NULL, WSTOPPED);
ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
}
return 0;
}
The child creates five threads and the parent continuously traps the
first thread and whenever the child gets a signal, SIGSTOP is
delivered. If an external process sends SIGSTOP to the child, all
other threads in the process should reliably stop. However, due to
the above described bug, the first thread will often end up consuming
group_stop_count multiple times and SIGSTOP often ends up stopping
none or part of the other four threads.
This patch adds a new field task->group_stop which is protected by
siglock and uses GROUP_STOP_CONSUME flag to track which task is still
to consume group_stop_count to fix this bug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
include/linux/sched.h | 6 ++++++
kernel/signal.c | 32 +++++++++++++++++++++++++-------
2 files changed, 31 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c79e92..93157a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1245,6 +1245,7 @@ struct task_struct ...Everything is fine without ptrace, I hope. I didn't even read the test-case ;) Yes, known problems. ptrace is very wrong when it comes to group_stop_count/SIGNAL_STOP_STOPPED/etc. Almost everything is wrong. Cough, this is fixed in utrace ;) It doesn't use ptrace_stop/ Yes, currently the tracee never knows how it should react to This doesn't look exactly right. If we participate (decrement the counter), we should stop even if we race with ptrace_detach(). And what if consume_group_stop() returns true? We should set SIGNAL_STOP_STOPPED and notify ->parent. Otherwise looks correct at first glance... Of course, there are more problems. To me, even ptrace_resume()->wake_up_process() is very wrt jctl. Every caller should check ->group_stop_count != 0. do_signal_stop() does this too in fact. May be it would be cleaner to move this check into consume_group_stop() and remove WARN_ON_ONCE(). This way it is more clear why prepare_signal(SIGCONT) do not reset task->group_stop, it has no effect unless ->group_stop_count is set by do_signal_stop() which updates ->group_stop for every thread. Probably consume_group_stop() should also set SIGNAL_STOP_STOPPED if it returns true. But, I didn't read the next patches yet... Oleg. --
Hey, Oleg. Yeah, I read about utrace in your previous posts but git grepping it gave me nothing. Ah, okay, it's out of tree patchset. Are you planning on merging it? Why not just fix up and extend ptrace? Is there some fundamental problem? I was thinking about adding PTRACE_SEIZE operation after this patchset which allows nesting and transparent operation (without the nasty implied SIGSTOP). As long as I can get that, I don't really mind whether it's p or utrace, but still am curious why we need something Yeah, exactly, both issues are dealt with later. I'm just fixing Yeap, it changes later. Thanks a lot for reviewing. -- tejun --
do_signal_stop() tests sig->group_stop_count one more time after
calling tracehook_notify_jctl() as it's allowed release siglock. If
group_stop_count has changed to zero, it no longer stops but still
notifies the parent. For both SIGCONT and KILL which could cause the
condition, this notification is unnecessary.
SIGCONT will be notified to the parent when the task calls
get_signal_to_deliver() right after returning from do_signal_stop()
which will handle the collapsed notification correctly by itself. The
notification from do_signal_stop() in this case would only cause
duplication. For SIGKILL, the imminent death of the task will be
notified to parent and it's completely superflous to report the
skipped stop.
Also, tracehook_notify_jctl() doesn't release siglock, so, currently,
none of these matters at all.
This patch updates do_signal_stop() such that it jumps out of the
function if group_stop_count has dropped during
tracehook_notify_jctl().
This doesn't cause any behavior difference as the condition never
triggers in the current code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
kernel/signal.c | 21 +++++++++++++--------
1 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 0a6816a..6f7407d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1763,15 +1763,20 @@ static int do_signal_stop(int signr)
notify = tracehook_notify_jctl(notify, CLD_STOPPED);
/*
* tracehook_notify_jctl() can drop and reacquire siglock, so
- * we keep ->group_stop_count != 0 before the call. If SIGCONT
- * or SIGKILL comes in between ->group_stop_count == 0.
+ * we test ->group_stop_count again. If SIGCONT or SIGKILL
+ * comes in between, ->group_stop_count == 0.
*/
- if (sig->group_stop_count) {
- if (!--sig->group_stop_count)
- sig->flags = SIGNAL_STOP_STOPPED;
- current->exit_code = ...Only if tracehook_notify_jctl() returns nonzero. That was the
point, tracehook_ can ask to notify even if we are not going to
Yes. But with this patch it is not possible to drop siglock in
I don't understand the motivation then (probably I should read
the next patches).
If we kill the ability to drop ->siglock in tracehook_notify_jctl(),
we can simplify the code further. No need to start with
->group_stop_count = 1 and then decrement it again.
We could do something like
static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;
int notify;
if (sig->group_stop_count) {
--sig->group_stop_count;
} else {
struct task_struct *t;
if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
unlikely(signal_group_exit(sig)))
return 0;
/*
* There is no group stop already in progress.
* We must initiate one now.
*/
sig->group_exit_code = signr;
for (t = next_thread(current); t != current; t = next_thread(t))
/*
* Setting state to TASK_STOPPED for a group
* stop is always done with the siglock held,
* so this check has no races.
*/
if (!(t->flags & PF_EXITING) &&
!task_is_stopped_or_traced(t)) {
sig->group_stop_count++;
signal_wake_up(t, 0);
}
}
if (!sig->group_stop_count) {
sig->flags = SIGNAL_STOP_STOPPED;
current->exit_code = sig->group_exit_code;
__set_current_state(TASK_STOPPED);
notify = CLD_STOPPED;
}
spin_unlock_irq(&current->sighand->siglock);
if (notify || current->ptrace) {
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, notify);
read_unlock(&tasklist_lock);
}
/* Now we don't run again until woken by SIGCONT or SIGKILL */
schedule();
tracehook_finish_jctl();
current->exit_code = 0;
return 1;
}
In short, this patch dismisses tracehook_notify_jctl().
Roland?
Oleg.
--
Hello, Oleg. The thing is that with the further changes in the series tracehook_notify_jctl() becomes rather pointless as do_signal_stop() becomes more involved with the ptrace logic and bypasses tracehook_notify_jctl(), so I'm basically getting it out of the way. tracehook might not be such a bad idea for parts which are further away but for parts which are this close to ptrace implementation, it seems more of obfuscation than helpful layering. Especially, restrictions and capabilities of tracehooks without actual (even proposed) users aren't very healthy to keep. Other people have to keep guessing what the intentions behind the unused features are, which is very frustrating and silly. Anyways, we can deal with tracehooks later. Let's just ignore tracehook related changes for now. Thanks. -- tejun --
Currently task->signal->group_stop_count is used to decide whether to
stop for group stop. However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.
This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress. The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, so the task won't stop multiple times for the same
group stop.
Note that currently GROUP_STOP_PENDING tracks the same state as
GROUP_STOP_CONSUME. Both always get set and cleared together without
releasing siglock inbetween. This will change with future patches.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
include/linux/sched.h | 1 +
kernel/signal.c | 21 +++++++++++++--------
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93157a4..1261993 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,6 +1760,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* task->group_stop flags
*/
+#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */
#ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/signal.c b/kernel/signal.c
index 7e2da0d..b859690 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -732,6 +732,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
t = p;
do {
unsigned int state;
+
+ t->group_stop = 0;
+
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
/*
* If there is a handler for SIGCONT, we must make
@@ -1742,8 +1745,8 @@ static int do_signal_stop(int signr)
struct signal_struct *sig = ...I am stucked at this point ;) Yes. but the tracee won't abuse ->group_stop_count, this was fixed by the previous patch. But, otoh, what if debugger resumes the tracee when the group stop was completed by other sub-threads ? The tracee will run with GROUP_STOP_PENDING set. ->group_stop_count is zero. If this tracee recieves a signal (or spurious TIF_SIGPENDING), suddenly it will notice GROUP_STOP_PENDING and report the stop to debugger. This looks a bit strange. OK, perhaps it makes sense to report the stop to "ack" the group stop which wasn't acked in ptrace_stop(). Or, if it was untraced after resume, it makes sense to "silently" stop as well. Hmm. This means, the ptraced task can initiate the group stop while it is already in progress... Debugger can constantly resume a tracee while the group stop is not finished. Finally this tracee can dequeue SIGSTOP without GROUP_STOP_PENDING. At first glance, nothing bad can happen, but I am not sure. We can have other ptraced threads which were resumed after Yes. I tried to study this series patch-by-patch. I think I should read the whole series to really understand the intermediate changes. I'll try to return on Monday. Cough. I didn't expect I forgot this code that much ;) Oleg. --
Hello, Oleg. Well, then the tracee continues. What this patch does is making each task in a group to stop once for a single group stop instance. If ptracer decides to resume the tracee (w/o sending SIGCONT, that is), then it can do so and the tracee won't stop for the same group stop Yeah, of course. That's the tracee participating in the group stop. Oh, the tracee _should_ always have TIF_SIGPENDING set or be guaranteed to run get_signal_to_deliver(). I think there are traced points where that is not true. We probably need to set TIF_SIGPENDING Hmmm.... right. I think it is better to test for GROUP_STOP_PENDING there. That happens on delivery of a new stop signal, so semantic-wise, it's correct. Given the statelessness of group stop across STOP/CONT attempts, I think it should be okay. I'll think Heh, I thought adding transparent/nestable ptrace attach would take me several days; instead, understanding the code and producing this patchset took me two weeks filled with swearing. This is a truly hairy piece of code. :-) Thanks a lot for reviewing. -- tejun --
Currently task->signal->group_stop_count is used to decide whether to stop for group stop. However, if there is a task in the group which is taking a long time to stop, other tasks which are continued by ptrace would repeatedly stop for the same group stop until the group stop is complete. This patch introduces GROUP_STOP_PENDING which tracks whether a task is yet to stop for the group stop in progress. The flag is set when a group stop starts and cleared when the task stops the first time for the group stop, so the task won't stop multiple times for the same group stop. Note that currently GROUP_STOP_PENDING tracks the same state as GROUP_STOP_CONSUME. Both always get set and cleared together without releasing siglock inbetween. This will change with future patches. Oleg spotted that a task might skip signal check even when its GROUP_STOP_PENDING is set. Fixed by updating recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of group_stop_count. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> --- recalc_sigpending_tsk() updated to test GROUP_STOP_PENDING too. This makes sure a ptracee always checks it after woken up. Whether a task already in TASK_STOPPED should have GROUP_STOP_PENDING set or not is a different issue. I think it probably would be better to do so so that the ptracer is notified that the tracee is participating in group stop. The necessary change is simple, we just need to replace task_is_stopped_or_traced() in group stop init path to task_is_stopped(). With later patches, CLD_STOPPED/CONTINUED notifications are made reliable w.r.t. ptrace by delaying them until ptrace detach if necessary, so letting ptracees participate in group stop doesn't cost anything. Anyways, that's a separate issue. The git tree is updated accordingly. Thanks. include/linux/sched.h | 1 + kernel/signal.c | 23 ++++++++++++++--------- 2 files changed, 15 insertions(+), 9 ...
Yes. But, afaics, this is not enough. Say, what about zap_other_threads() ? We shouldn't allow sub-threads to stop in this case. Basically, every time we clear ->group_stop_count we should also reset ->group_stop for every thread. Fortunately, every time we already do while_each_thread(). Oleg. --
Hello, Oleg. Yes, indeed. Will update accordingly. Thanks. -- tejun --
To prepare for cleanup of the interaction between group stop and
ptrace, add @why to ptrace_stop(). Existing users are updateda such
that there is no behavior change.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/signal.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index b859690..3b3e69c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1633,7 +1633,7 @@ static int sigkill_pending(struct task_struct *tsk)
* If we actually decide not to stop at all because the tracer
* is gone, we keep current->exit_code unless clear_code.
*/
-static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
+static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
{
@@ -1671,7 +1671,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
- do_notify_parent_cldstop(current, CLD_TRAPPED);
+ do_notify_parent_cldstop(current, why);
/*
* Don't want to allow preemption here, because
* sys_ptrace() needs this task to be inactive.
@@ -1730,7 +1730,7 @@ void ptrace_notify(int exit_code)
/* Let the debugger run. */
spin_lock_irq(&current->sighand->siglock);
- ptrace_stop(exit_code, 1, &info);
+ ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
spin_unlock_irq(&current->sighand->siglock);
}
@@ -1826,7 +1826,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
ptrace_signal_deliver(regs, cookie);
/* Let the debugger run. */
- ptrace_stop(signr, 0, info);
+ ptrace_stop(signr, CLD_TRAPPED, 0, info);
/* We're back. Did the debugger cancel the sig? */
signr = current->exit_code;
--
1.7.1
--
Group stop notifications are unreliable if one or more tasks of the
task group are being ptraced. If a ptraced task ends up finishing a
group stop, the notification is sent to the ptracer and the real
parent never gets notified.
This patch adds a new signal flag SIGNAL_NOTIFY_STOP which is set on
group stop completion and cleared after notification to the real
parent or together with other stopped flags on SIGCONT/KILL. This
guarantees that the real parent is notified correctly regardless of
ptrace. If a ptraced task is the last task to stop, the notification
is postponed till ptrace detach or canceled if SIGCONT/KILL is
received inbetween.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
include/linux/sched.h | 2 ++
kernel/signal.c | 20 +++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e78b1e5..3e40761 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -653,6 +653,8 @@ struct signal_struct {
#define SIGNAL_UNKILLABLE 0x00000040 /* for init: ignore fatal signals */
+#define SIGNAL_NOTIFY_STOP 0x00000100 /* notify parent of group stop */
+
/* If true, all threads except ->group_exit_task have pending SIGKILL */
static inline int signal_group_exit(const struct signal_struct *sig)
{
diff --git a/kernel/signal.c b/kernel/signal.c
index c084ea8..f2da456 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1739,7 +1739,6 @@ void ptrace_notify(int exit_code)
static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;
- int notify = 0;
if (!(current->group_stop & GROUP_STOP_PENDING)) {
unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
@@ -1780,8 +1779,9 @@ static int do_signal_stop(int signr)
*/
if (sig->group_stop_count == 1 &&
(current->group_stop & GROUP_STOP_CONSUME))
- notify = CLD_STOPPED;
- notify = ...Yes. I do not even know if this is bug or not, but certainly I agree, This can race with ptrace_attach() in between. IOW, this notification can go to the new tracer anyway. Oleg. --
Hello, Hmmm, okay, we should hold both locks when checking for notification to remove that race, or we can just tell do_notify_parent_cldstop() which parent to use. -- tejun --
Currently, SIGCONT notifications which are pending on ptrace attach or
occur while ptraced are reported to the tracer and never make it to
the real parent.
This patch adds a new signal flag SIGNAL_NOTIFY_CONT which is set when
a task is woken up by SIGCONT and cleared once the event is notified
to the parent. SIGNAL_CLD_MASK bits are no longer cleared after
notification. Combined with clearing SIGNAL_CLD_MASK if
!SIGNAL_NOTIFY_CONT on ptrace attach, these bits are set on ptrace
detach iff the tracee owes a notification to the real parent.
__ptrace_unlink() is updated to check these bits and reschedule
SIGCONT notification if necessary.
This combined with the previous changes makes ptrace attach/detach
mostly transparent with respect to job control signal handling. The
remaining problems are the extra unconditional wake_up_process() from
ptrace_detach() and SIGSTOP generated by ptrace_attach() clearing
pending SIGCONT. These will be dealt with future patches.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
include/linux/sched.h | 1 +
kernel/ptrace.c | 40 +++++++++++++++++++++++++++++++++++++++-
kernel/signal.c | 14 +++++++++-----
3 files changed, 49 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3e40761..4b7f3ca 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -654,6 +654,7 @@ struct signal_struct {
#define SIGNAL_UNKILLABLE 0x00000040 /* for init: ignore fatal signals */
#define SIGNAL_NOTIFY_STOP 0x00000100 /* notify parent of group stop */
+#define SIGNAL_NOTIFY_CONT 0x00000200 /* notify parent of continuation */
/* If true, all threads except ->group_exit_task have pending SIGKILL */
static inline int signal_group_exit(const struct signal_struct *sig)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 71141bf..a6c92ac 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -52,6 +52,7 @@ ...Currently, if the task is STOPPED on ptrace attach, it's left alone and the state is silently changed to TRACED on the next ptrace call. The behavior breaks the assumption that arch_ptrace_stop() is called before any task is poked by ptrace operations and is ugly in that a task manipulates the state of another task directly. With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and TRACED can be made clean. The tracer can use the flag to tell the tracee to retry stop on attach and detach. On retry, the tracee will enter the desired state the correct way. The lower 16bits of task->group_stop is used to remember the signal number which caused the last group stop. This is used while retrying for ptrace attach as the original group_exit_code could have been consumed with wait(2) by then. As the real parent may wait(2) and consume the group_exit_code anytime, the group_exit_code needs to be saved separately so that it can be used when switching from regular sleep to ptrace_stop(). This is recorded in the lower 16bits of task->group_stop. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> --- include/linux/sched.h | 1 + kernel/ptrace.c | 27 +++++++++++++++++++-------- kernel/signal.c | 23 +++++++++++++++++------ 3 files changed, 37 insertions(+), 14 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 1261993..e78b1e5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1760,6 +1760,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t * /* * task->group_stop flags */ +#define GROUP_STOP_SIGMASK 0xffff /* signr of the last group stop */ #define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */ #define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */ diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 99bbaa3..08b18f2 100644 --- a/kernel/ptrace.c +++ ...
This adds another user-visible change, this time it is more serious. OK. Now we have a window if the tracer attaches to the stopped task. Say, child = fork() if (!child) return child_do_something(); kill(child, SIGSTOP); wait(); // <--- ensures it is stopped ptrace(PTRACE_ATTACH, child); assert(ptrace(PTRACE_WHATEVER, child) == 0); Currently this code is correct. With this patch the assertion above can fail, the child may be running, changing its state from STOPPED This doesn't look right even without ptrace. Suppose we have to threads, T1 and T2, both stopped, SIGNAL_STOP_STOPPED is set. SIGCONT wakes them up. To simplify the discussion, suppose that T1 takes a long preemption when it returns from schedule(), right before it takes ->siglock again. T2 sends CLD_CONTINUED to parent and dequeues another SIGSTOP. It initiates another group stop, sees T1 as running, and stops with ->group_stop_count == 1. Now we are waiting for T1 which should participate. Finally T1 resumes and sees GROUP_STOP_PENDING. It goes to 'retry:' and stops, but nobody notifies the parent ang ->group_stop_count is still non-zero. Oleg. --
GDB now has code (as rewritten by Daniel Jacobowitz):
ptrace (PTRACE_ATTACH) has been done and:
linux_nat_post_attach_wait:
if (pid_is_stopped (pid)) ### it means `State' is `T (stopped)'
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LNPAW: Attaching to a stopped process\n");
/* The process is definitely stopped. It is in a job control
stop, unless the kernel predates the TASK_STOPPED /
TASK_TRACED distinction, in which case it might be in a
ptrace stop. Make sure it is in a ptrace stop; from there we
can kill it, signal it, et cetera.
First make sure there is a pending SIGSTOP. Since we are
already attached, the process can not transition from stopped
to running without a PTRACE_CONT; so we know this signal will
go into the queue. The SIGSTOP generated by PTRACE_ATTACH is
probably already in the queue (unless this kernel is old
enough to use TASK_STOPPED for ptrace stops); but since SIGSTOP
is not an RT signal, it can only be queued once. */
kill_lwp (pid, SIGSTOP);
/* Finally, resume the stopped process. This will deliver the SIGSTOP
(or a higher priority signal, just like normal PTRACE_ATTACH). */
ptrace (PTRACE_CONT, pid, 0, 0); ### line B
}
### point A
/* Make sure the initial process is stopped. The user-level threads
layer might want to poke around in the inferior, and that won't
work if things haven't stabilized yet. */
new_pid = my_waitpid (pid, &status, 0); ### this is in fact waitpid()
The problem is this code is already racy. At `point A' someone may
`kill (tracee, SIGSTOP)', `waitpid (tracee)' -- eats SIGSTOP, and then
my_waitpid() will hang because the <pid_is_stopped (pid)> block was not
executed to prevent it.
So if we are already discussing the ptrace race safety I would prefer some
kernel ptrace API suggestion how to safely ...Hello, Oleg. Hmmm... rather convoluted, but yeap. I can update the code to wait for stopped -> traced transition to complete, so that the transition Good point. I'll think about how the problem can be solved. Maybe the retry and pending states need to be separated. Thanks. -- tejun --
Hey, again. Ah, crap, I missed a comma between Rafael's and Pavel's email addresses. Sorry about that. Rafael, Pavel, the thread can be read from the following URL. Probably only patch 02 and 03 are of interest to you guys. http://thread.gmane.org/gmane.linux.kernel/1068420 Thanks. -- tejun --
