Re: [PATCH] audit: speedup for syscalls when auditing is disabled

Previous thread: [PATCH 1/4] regulator/wm831x-dcdc: fix potential NULL dereference by Axel Lin on Thursday, August 19, 2010 - 6:58 pm. (6 messages)

Next thread: Regression in 2.6.36-rc1 bisected to commit 9d8888c by Larry Finger on Thursday, August 19, 2010 - 7:26 pm. (3 messages)
From: Michael Neuling
Date: Thursday, August 19, 2010 - 7:13 pm

We found that when auditing is disabled using "auditctl -D", that
there's still a significant overhead when doing syscalls.  This overhead
is not present when a single never rule is inserted using "auditctl -a
task,never".  

Using Anton's null syscall microbenchmark from
http://ozlabs.org/~anton/junkcode/null_syscall.c we currently have on a
powerpc machine:

  # auditctl -D
  No rules
  # ./null_syscall
	  null_syscall:     739.03 cycles     100.00%
  # auditctl -a task,never
  # ./null_syscall
	  null_syscall:     204.63 cycles     100.00%

This doesn't seem right, as we'd hope that auditing would have the same
minimal impact when disabled via -D as when we have a single never rule.

The patch below creates a fast path when initialising a task.  If the
rules list for tasks is empty (the disabled -D option), we mark auditing
as disabled for this task.  

When this is applied, our null syscall benchmark improves in the
disabled case to match the single never rule case.

  # auditctl -D
  No rules
  # ./null_syscall
	  null_syscall:     204.62 cycles     100.00%
  # auditctl -a task,never
  # ./null_syscall
	  null_syscall:     204.63 cycles     100.00%

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
I'm not familiar with the auditing code/infrastructure so I may have
misunderstood something here

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 1b31c13..1cd6ec7 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -666,6 +666,11 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
 	enum audit_state   state;
 
 	rcu_read_lock();
+	/* Fast path.  If the list is empty, disable auditing */
+	if (list_empty(&audit_filter_list[AUDIT_FILTER_TASK])) {
+		rcu_read_unlock();
+		return AUDIT_DISABLED;
+	}
 	list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_TASK], list) {
 		if (audit_filter_rules(tsk, &e->rule, NULL, NULL, &state)) {
 			if (state == ...
From: Eric Paris
Date: Monday, August 23, 2010 - 10:56 am

I don't think this works at all.  I don't see how syscall audit'ing can
work.  What if I have nothing in the AUDIT_FILTER_TASK list but I want
to audit all 'open(2)' syscalls?  This patch is going to leave the task
in the DISABLED state and we won't ever be able to match on the syscall
rules.


I wonder if you could get much back, in terms of performance, by moving
the
         context->dummy = !audit_n_rules;
line to the top and just returning if context->dummy == 1;

I'll play a bit, but I thought that was supposed to be a safe thing to
do....


--

From: Michael Neuling
Date: Monday, August 23, 2010 - 7:11 pm

Sorry my bad.  I'm not too familiar with the audit infrastructure.

On reflection, we might have a bug in audit_alloc though.  Currently we
have this:

  int audit_alloc(struct task_struct *tsk)
  {
	  <snip>
	  state = audit_filter_task(tsk, &key);
	  if (likely(state == AUDIT_DISABLED))
		  return 0;

	  <snip>
	  set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
	  return 0;
  }

This gets called on fork.  If we have "task,never" rule, we hit this
state == AUDIT_DISABLED path, return immediately and the tasks
TIF_SYSCALL_AUDIT flags doesn't get set.  On powerpc, we check
TIF_SYSCALL_AUDIT in asm on syscall entry to fast path not calling the
syscall audit code.

This seems wrong to me as a "never" _task_ audit rule shouldn't effect
_syscall_ auditing?  Is there some interaction between task and syscall

We get 668.09 cycles with this optimisation, so it comes down a bit, but
no where near if the auditing is disabled altogether.

Like I said above, powerpc has a fast path in asm on system call entry
to check the thread_info flags for if syscall auditing is disabled.  If
it's disabled, we don't call the audit code, hence why it's very fast in

Thanks!

Mikey

--

From: Eric Paris
Date: Monday, August 23, 2010 - 8:43 pm

I'm guessing it actually bypasses audit if the flag is not set?  So we
might have a bug, but i'd be surprised since I think we tested audit on

There are 3 states for a given task, I don't remember the names off the
top of my head, so I'll guess with: on, off, build.  'Build' is the
state most processes usually live in.  In this state we collect audit
information about the task during the whole syscall and then we might
(likely) throw that information away at syscall exit.

Some types of audit rule, which alter this state, can be checked at
either 'entry' or 'exit' (first rule wins) At syscall entry we only have
enough information (questionable if we even have enough information at
all but that's a different question) to filter based on the task.  You
can create rules that will audit all tasks, or in your case will
explicitly disable auditing for all tasks.

Normally a process would be in the default 'build' state after syscall
entry, we will collect information about the syscall, and then we will
check syscall rules at exit.  Once you explicitly say 'I do not want any

Clean that patch up and send it.  Sounds like a win no matter what else

Here's a new idea to think about with obvious tradeoffs.  What do you
think about doing a little bit of assembly rejiggering?

Add a new spot in the assembly which will call a function which will
check if audit_n_rules > 0 and if so will set TIF_SYSCALL_AUDIT and if
not will clear TIF_SYSCALL_AUDIT?  It might make things slightly worse
on systems which explictly disable audit and the flag would always be
clear on every task (like you did with the explicit rule) but I'm
guessing might be a win on systems with no rules which are wasting time
on the audit slow path.....

-Eric

--

From: Michael Neuling
Date: Monday, August 23, 2010 - 10:56 pm

ok...

audit: speedup audit_syscall_entry when there are zero rules

This creates a check at the start of audit_syscall_entry to see if there
are zero rules in the audit filter list.  If there are zero rules return
immediately.

This buys about ~10% on a null syscall on powerpc.

Signed-off-by: Michael Neuling <mikey@neuling.org>

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 1b31c13..bc0872b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1579,6 +1579,9 @@ void audit_syscall_entry(int arch, int major,
 	if (unlikely(!context))
 		return;
 
+	context->dummy = !audit_n_rules;
+	if (context->dummy == 1)
+		return;
 	/*
 	 * This happens only on certain architectures that make system
 	 * calls in kernel_thread via the entry.S interface, instead of
@@ -1628,7 +1631,6 @@ void audit_syscall_entry(int arch, int major,
 	context->argv[3]    = a4;
 
 	state = context->state;
-	context->dummy = !audit_n_rules;
 	if (!context->dummy && state == AUDIT_BUILD_CONTEXT) {
 		context->prio = 0;
 		state = audit_filter_syscall(tsk, context, &audit_filter_list[AUDIT_FILTER_ENTRY]);

--

From: Michael Neuling
Date: Monday, August 23, 2010 - 10:56 pm

So on powerpc we have this in entry_64.S

#define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
#define _TIF_SYSCALL_T_OR_A (_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP)

	andi.	r11,r10,_TIF_SYSCALL_T_OR_A
	bne-	syscall_dotrace

and there is similar code in x86 in entry_64.S

#ifdef CONFIG_AUDITSYSCALL
	bt $TIF_SYSCALL_AUDIT,%edx
	jc sysret_audit
#endif

So I think other archs are broken too.

I think the bug is in the generic code though.  Shouldn't syscall

So does this mean that an "auditctl -a task,never" _should_ disable



This sounds good to me except for one thing...

If we set TIF_SYSCALL_AUDIT when audit_n_rules > 0, it'll change the
functionality from what we have now in the "auditctl -a task,never"
case.  Currently in this case, TIF_SYSCALL_AUDIT will not be set and
syscalls won't be audited.  I think this is a bug (as discussed above),
but I wanted to point it out anyway.

Anyway, I'll take a stab at this in a bit.

Regards,
Mikey
	
--

From: Eric Paris
Date: Tuesday, August 24, 2010 - 1:06 pm

Yes.  We only have one kind of auditing.  It audits what happens during
a syscall.  We have lots of different types of rules that tell the
kernel if we should actually send that info to userspace or not.  If we
hit a rule that says 'do not send anything to userspace' (like your
rule) the we should not send anything to userspace.  It does not matter
if some later rule would have sent it.  So given two rules like:

auditctl -a task,never
auditctl -a exit,always -S open

We hit the task,never rule first, so it wins.  We won't alloc a context,
we wont set the thread flag, we won't go slowly.  You can't get a task
out of this state.  If that task,never rule was removed and you created
a task it would get set up as 'build' and would at syscall exit match
the exit,always rule and the collected information would get dumped.
exit,* rules are a LOT more flexible than task,* rules (as exit,* rules
can match on things collected by the syscall whereas task,* rules can
only match on what we know at the beginning, pid, selinux context, uid,
etc.....)

My suggestion was to more liberaly clear the thread audit flag so we get
the assembly fastpath.  But this also requires that we set the flag when
appropriate.

-Eric

--

From: Michael Neuling
Date: Tuesday, August 24, 2010 - 8:11 pm

BTW, do you think we can do this in audit_syscall_exit() too?  

If I do, I get down to 387 cycles (739.03 vanilla, 668.09 with
audit_syscall_entry() optimisation, 204 best case) so about
another 50% perf improvement.

Patch was simply:

--- linux-next.orig/kernel/auditsc.c
+++ linux-next/kernel/auditsc.c
@@ -1681,7 +1683,7 @@ void audit_syscall_exit(int valid, long 
 
 	context = audit_get_context(tsk, valid, return_code);
 
-	if (likely(!context))
+	if (likely((!context) || (audit_n_rules == 0)))
 		return;
 
 	if (context->in_syscall && context->current_state == AUDIT_RECORD_CONTEXT)

--

From: Eric Paris
Date: Wednesday, August 25, 2010 - 4:59 am

No, I don't think that is safe, consider the case where we remove the
last rule while this task was inside a syscall.  It may have information
stored which is supposed to get freed in the syscall exit.

We could probably drop the if (!context) statement altogether and then
before the audit_get_context() call we could add

if (audit_dummy_context())
    return;

Which should be safe since that would imply there were no rules when we
entered the syscall.  I'm impressed with how much difference these 2
simple patches can make!

Looks like audit_get_context() could use some cleanups too.  What's the
point of setting the return code and crap like that when we know it's
not going anywhere.  In any case, if you want to clean up this last idea
and send it I'll make sure it gets queued up for the next go round.



--

From: Anton Blanchard
Date: Wednesday, August 25, 2010 - 8:34 pm

Hi Eric,

Here's another approach Mikey and I were discussing. We allocate the
tsk->audit_context as before, but we avoid setting the TIF_SYSCALL_AUDIT until
the first rule gets added.

We could look at clearing the flag when the rules go back to zero, but this
simple patch covers the most common case I think.

Anton
---

Index: powerpc.git/kernel/auditfilter.c
===================================================================
--- powerpc.git.orig/kernel/auditfilter.c	2010-08-26 08:04:19.998892577 +1000
+++ powerpc.git/kernel/auditfilter.c	2010-08-26 08:04:30.290374256 +1000
@@ -859,6 +859,21 @@ out:
 static u64 prio_low = ~0ULL/2;
 static u64 prio_high = ~0ULL/2 - 1;
 
+#ifdef CONFIG_AUDITSYSCALL
+static void enable_syscall_auditing(void)
+{
+	unsigned long flags;
+	struct task_struct *g, *t;
+
+	read_lock_irqsave(&tasklist_lock, flags);
+	do_each_thread(g, t) {
+		if (t->audit_context)
+			set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
+	} while_each_thread(g, t);
+	read_unlock_irqrestore(&tasklist_lock, flags);
+}
+#endif
+
 /* Add rule to given filterlist if not a duplicate. */
 static inline int audit_add_rule(struct audit_entry *entry)
 {
@@ -922,9 +937,14 @@ static inline int audit_add_rule(struct
 		list_add_tail_rcu(&entry->list, list);
 	}
 #ifdef CONFIG_AUDITSYSCALL
-	if (!dont_count)
+	if (!dont_count) {
 		audit_n_rules++;
 
+		/* Did we add our first rule? */
+		if (audit_n_rules == 1)
+			enable_syscall_auditing();
+	}
+
 	if (!audit_match_signal(entry))
 		audit_signals++;
 #endif
Index: powerpc.git/kernel/auditsc.c
===================================================================
--- powerpc.git.orig/kernel/auditsc.c	2010-08-26 08:04:19.998892577 +1000
+++ powerpc.git/kernel/auditsc.c	2010-08-26 08:04:30.390388654 +1000
@@ -886,7 +886,10 @@ int audit_alloc(struct task_struct *tsk)
 	context->filterkey = key;
 
 	tsk->audit_context  = context;
-	set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+
+	/* We postpone setting the thread ...
From: Eric Paris
Date: Friday, August 27, 2010 - 10:49 am

It just dawned on me where we are going to have problems.  We have
things other than syscall filter rules that can cause us to want the
collected audit info.  Namely SELinux (or other LSM) denials.

Crap.

So the change in audit_alloc() should probably be conditionalized on
more than just audit_n_rules().  Not exactly sure what that is though.  

It might also make our syscall entry/exit speedups not as great of an
idea as I thought.  I need to look for other audit users to see how
these things are oging to affect them   :(

-Eric

--

From: Anton Blanchard
Date: Monday, August 23, 2010 - 7:16 pm

That's a good point. What if we went through and created an audit context
for each thread at the point where we add a rule to the audit subsystem?

That would make the common case where no one touches audit go fast. It's
only once you add a rule that you get the syscall entry/exit overhead of
audit.

Anton
--

From: Eric Paris
Date: Monday, August 23, 2010 - 8:51 pm

It might be viable but I certainly couldn't say.  I'm a bit scared of
allocating and attaching a struct audit_context to a running task as we
have no idea if that task is about to exit a syscall with an unfilled
out audit_context struct nor do I personally off the top of my head know
the task flag setting rules (especially when you want to set other
people's flags)

Maybe another approach would be to move audit_alloc() into the syscall
entry path rather than only at fork().

I think the best solution is going to be some better way of
setting/clearing TIF_SYSCALL_AUDIT. (notice the thing is never cleared,
ever, today)

-Eric

--

Previous thread: [PATCH 1/4] regulator/wm831x-dcdc: fix potential NULL dereference by Axel Lin on Thursday, August 19, 2010 - 6:58 pm. (6 messages)

Next thread: Regression in 2.6.36-rc1 bisected to commit 9d8888c by Larry Finger on Thursday, August 19, 2010 - 7:26 pm. (3 messages)