Re: [PATCH 0/3] tracing/filters: Support specifying filter hook to a TRACE_EVENT field

Previous thread: [git pull] Input updates for 2.6.31-rc5 by Dmitry Torokhov on Wednesday, August 5, 2009 - 10:24 pm. (1 message)

Next thread: [perf/rc5] oops running perf record -s with multiple -e by Brice Goglin on Wednesday, August 5, 2009 - 11:26 pm. (6 messages)
From: Li Zefan
Date: Wednesday, August 5, 2009 - 11:05 pm

Currently only static strings and dynamic strings have their
own filter functions, other types of field are treated as
integers.

This patchset allows assigning a specific filter type to a
field, so a field which is defined as:

	__field_ext(const char *, str, FILTER_PTR_STR)

will be treated as a string but not a plain pointer, and then
we can set the filter like this:

	# echo 'str == foo' > filter

And it's easy to add more filter functions for different types
to turn these into valid operations:

	(dev is of type dev_t)
	# echo 'dev == 8:0' > filter

	(callsite is of type void * or unsigned long)
	# echo 'callsite == skb_free' > filter

[PATCH 1/3] tracing/filters: Add filter_type to struct ftrace_event_field
[PATCH 2/3] tracing/filters: Add __field_ext() to TRACE_EVENT
[PATCH 3/3] tracing/filters: Support filtering for char * strings
---
 include/linux/ftrace_event.h       |   12 +++++++-
 include/trace/ftrace.h             |   28 ++++++++++++++++----
 kernel/trace/trace.h               |    2 +
 kernel/trace/trace_events.c        |    9 ++++++-
 kernel/trace/trace_events_filter.c |   47 +++++++++++++++++++++++++----------
 kernel/trace/trace_export.c        |    7 +++--
 6 files changed, 79 insertions(+), 26 deletions(-)

--

From: Li Zefan
Date: Wednesday, August 5, 2009 - 11:06 pm

The type of a field is stored as a string in @type, and here
we add @filter_type which is an enum value.

This prepares for later patches, so we can specifically assign
different @filter_type for the same @type.

For example normally a "char *" field is treated as a ptr,
but we may want it to be treated as a string when doing filting.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/trace/trace.h               |    2 ++
 kernel/trace/trace_events.c        |    2 ++
 kernel/trace/trace_events_filter.c |   21 +++++++++++++--------
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3a87d46..b168927 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -759,6 +759,7 @@ struct ftrace_event_field {
 	struct list_head	link;
 	char			*name;
 	char			*type;
+	int			filter_type;
 	int			offset;
 	int			size;
 	int			is_signed;
@@ -804,6 +805,7 @@ extern int apply_subsystem_event_filter(struct event_subsystem *system,
 					char *filter_string);
 extern void print_subsystem_event_filter(struct event_subsystem *system,
 					 struct trace_seq *s);
+extern int filter_assign_type(const char *type);
 
 static inline int
 filter_check_discard(struct ftrace_event_call *call, void *rec,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 90cdec5..71b2085 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -44,9 +44,11 @@ int trace_define_field(struct ftrace_event_call *call, char *type,
 	if (!field->type)
 		goto err;
 
+	field->filter_type = filter_assign_type(type);
 	field->offset = offset;
 	field->size = size;
 	field->is_signed = is_signed;
+
 	list_add(&field->link, &call->fields);
 
 	return 0;
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 27c2dbe..4c8c5fd 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -477,10 +477,11 @@ static int ...
From: Li Zefan
Date: Wednesday, August 5, 2009 - 11:06 pm

Add __field_ext(), so a field can be assigned to a specific
filter_type, which matches a corresponding filter function.

For example, a later patch will allow this:
	__field_ext(const char *, str, FILTER_PTR_STR);

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/ftrace_event.h       |   11 +++++++++--
 include/trace/ftrace.h             |   28 ++++++++++++++++++++++------
 kernel/trace/trace_events.c        |    9 +++++++--
 kernel/trace/trace_events_filter.c |    6 ------
 kernel/trace/trace_export.c        |    7 ++++---
 5 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 26d3673..14c388e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -138,8 +138,15 @@ extern int filter_current_check_discard(struct ftrace_event_call *call,
 					void *rec,
 					struct ring_buffer_event *event);
 
+enum {
+	FILTER_STATIC_STRING = 1,
+	FILTER_DYN_STRING,
+	FILTER_OTHER,
+};
+
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
-			      char *name, int offset, int size, int is_signed);
+			      char *name, int offset, int size,
+			      int is_signed, int filter_type);
 
 #define is_signed_type(type)	(((type)(-1)) < 0)
 
@@ -167,7 +174,7 @@ do {									\
 #define __common_field(type, item, is_signed)				\
 	ret = trace_define_field(event_call, #type, "common_" #item,	\
 				 offsetof(typeof(field.ent), item),	\
-				 sizeof(field.ent.item), is_signed);	\
+				 sizeof(field.ent.item), is_signed, -1);\
 	if (ret)							\
 		return ret;
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 380b603..8d8518f 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -21,6 +21,9 @@
 #undef __field
 #define __field(type, item)		type	item;
 
+#undef __field_ext
+#define __field_ext(type, item, filter_type)	type	item;
+
 #undef __array
 #define __array(type, item, len)	type	item[len];
 
@@ -64,6 ...
From: Steven Rostedt
Date: Thursday, August 6, 2009 - 7:17 am

Yeah, I don't like the use of that magic -1 here.

Other than my comments, I like the patches so far.

--

From: Li Zefan
Date: Wednesday, August 5, 2009 - 11:06 pm

Usually, char * entries are dangerous in traces because the string
can be released whereas a pointer to it can still wait to be read from
the ring buffer.

But sometimes we can assume it's safe, like in case of RO data
(eg: __file__ or __line__, used in bkl trace event). If these RO data
are in a module and so is the call to the trace event, then it's safe,
because the ring buffer will be flushed once this module get unloaded.

To allow char * to be treated as a string:

	TRACE_EVENT(...,

		TP_STRUCT__entry(
			__field_ext(const char *, name, FILTER_PTR_STRING)
			...
		)

		...
	);

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/ftrace_event.h       |    1 +
 kernel/trace/trace_events_filter.c |   26 +++++++++++++++++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 14c388e..1a98a61 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -141,6 +141,7 @@ extern int filter_current_check_discard(struct ftrace_event_call *call,
 enum {
 	FILTER_STATIC_STRING = 1,
 	FILTER_DYN_STRING,
+	FILTER_PTR_STRING,
 	FILTER_OTHER,
 };
 
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 5e7f031..b16923e 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -163,6 +163,20 @@ static int filter_pred_string(struct filter_pred *pred, void *event,
 	return match;
 }
 
+/* Filter predicate for char * pointers */
+static int filter_pred_pchar(struct filter_pred *pred, void *event,
+			     int val1, int val2)
+{
+	char **addr = (char **)(event + pred->offset);
+	int cmp, match;
+
+	cmp = strncmp(*addr, pred->str_val, pred->str_len);
+
+	match = (!cmp) ^ pred->not;
+
+	return match;
+}
+
 /*
  * Filter predicate for dynamic sized arrays of characters.
  * These are implemented through a list of strings at the end
@@ -489,7 +503,8 @@ int ...
From: Steven Rostedt
Date: Thursday, August 6, 2009 - 7:21 am

I'm a little dense here, where do we protect against someone making a 
tracepoint that points to unsafe data?

--

From: Li Zefan
Date: Thursday, August 6, 2009 - 6:20 pm

We can't prevent anyone from doing insane things deliberately, but
we prevent from doing wrong things unconsciously.

Only if a TRACE_EVENT has a field defined as:

	__field_ext(char *, name, FILTER_PTR_STR)

Here using FILTER_PTR_STR explicitly, he should know what he's doing.

Anyway, he can make a ptr pointing to unsafe data this way:

	TP_STRUCT__entry(
		__field(char *, name)
	)
	TP_printk("%s", name)

--

From: Steven Rostedt
Date: Thursday, August 6, 2009 - 7:54 pm

I guess the thing I'm missing is what's the difference of the two? Why 
would a developer use __field_ext instead of doing it the unsafe way of 
just __field?

I guess I don't see the developer doing something wrong unconsciously. 
Well maybe I don't see this making the developer do it right 
unconsciously.

What protection is this giving us?

-- Steve

--

From: Li Zefan
Date: Thursday, August 6, 2009 - 8:12 pm

__field(char *) suggests it should be treated as plain pointer,
while __field_ext(char *, FILTER_PTR_STR) suggests he's aware it's
safe to dereference the pointer, for example the case in Frederic's
blk events.

In Frederic's initial version, "char *" field will always be
attached to ptr_str filter function. This is unsafe, because for
other fields defined as "char *" but not safe to dereference,
a user still can do this:

	# echo 'name == abc' > filter

Then we'll deref a pointer that can point to unsafe data.

In this patch, this won't happen, as long as the developer is
aware that his use of __field_ext(char *) is right.

Otherwise, he will just use normal __field(char *) and print
the pointer itself in TP_printk().

--

From: Steven Rostedt
Date: Thursday, August 6, 2009 - 8:22 pm

Ah, so the answer I'm looking for is:

The filtering will not dereference "char *" unless the developer 
explicitly sets FILTER_PTR_STR in __field_ext.

Is this above statement correct?

-- Steve

--

From: Li Zefan
Date: Thursday, August 6, 2009 - 8:24 pm

Exactly. Sorry that I didn't explain it clearly in the
first place. :)

--

From: Steven Rostedt
Date: Thursday, August 6, 2009 - 8:31 pm

No problem. It is quite often that things so blatantly obvious to a 
developer is totally foreign for the observer, that the simple explanation 
is commonly overlooked.

I'll pull in your patches, with the -1 update, as well as adding my 
above statement to this patch's change log.

I'll queue it for 32.

Thanks!

-- Steve

--

From: Steven Rostedt
Date: Thursday, August 6, 2009 - 7:23 am

I like the first two patches, but would like to see a V2 with the updates 
I've mentioned. The last patch I do not fully understand its purpose.

I'll wait for V2 to pull it in.

Thanks!

--

From: Li Zefan
Date: Thursday, August 6, 2009 - 6:08 pm

The last one is wanted by Frederic in his bkl trace events patchset:

	http://lkml.org/lkml/2009/8/1/26

In that patch a ptr_str filter hook is always attached to "char *"
--

Previous thread: [git pull] Input updates for 2.6.31-rc5 by Dmitry Torokhov on Wednesday, August 5, 2009 - 10:24 pm. (1 message)

Next thread: [perf/rc5] oops running perf record -s with multiple -e by Brice Goglin on Wednesday, August 5, 2009 - 11:26 pm. (6 messages)