Currently only static strings and dynamic strings have their own filter functions, other types of field are treated as integers. This patchset allows assigning a specific filter type to a field, so a field which is defined as: __field_ext(const char *, str, FILTER_PTR_STR) will be treated as a string but not a plain pointer, and then we can set the filter like this: # echo 'str == foo' > filter And it's easy to add more filter functions for different types to turn these into valid operations: (dev is of type dev_t) # echo 'dev == 8:0' > filter (callsite is of type void * or unsigned long) # echo 'callsite == skb_free' > filter [PATCH 1/3] tracing/filters: Add filter_type to struct ftrace_event_field [PATCH 2/3] tracing/filters: Add __field_ext() to TRACE_EVENT [PATCH 3/3] tracing/filters: Support filtering for char * strings --- include/linux/ftrace_event.h | 12 +++++++- include/trace/ftrace.h | 28 ++++++++++++++++---- kernel/trace/trace.h | 2 + kernel/trace/trace_events.c | 9 ++++++- kernel/trace/trace_events_filter.c | 47 +++++++++++++++++++++++++---------- kernel/trace/trace_export.c | 7 +++-- 6 files changed, 79 insertions(+), 26 deletions(-) --
The type of a field is stored as a string in @type, and here
we add @filter_type which is an enum value.
This prepares for later patches, so we can specifically assign
different @filter_type for the same @type.
For example normally a "char *" field is treated as a ptr,
but we may want it to be treated as a string when doing filting.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/trace/trace.h | 2 ++
kernel/trace/trace_events.c | 2 ++
kernel/trace/trace_events_filter.c | 21 +++++++++++++--------
3 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3a87d46..b168927 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -759,6 +759,7 @@ struct ftrace_event_field {
struct list_head link;
char *name;
char *type;
+ int filter_type;
int offset;
int size;
int is_signed;
@@ -804,6 +805,7 @@ extern int apply_subsystem_event_filter(struct event_subsystem *system,
char *filter_string);
extern void print_subsystem_event_filter(struct event_subsystem *system,
struct trace_seq *s);
+extern int filter_assign_type(const char *type);
static inline int
filter_check_discard(struct ftrace_event_call *call, void *rec,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 90cdec5..71b2085 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -44,9 +44,11 @@ int trace_define_field(struct ftrace_event_call *call, char *type,
if (!field->type)
goto err;
+ field->filter_type = filter_assign_type(type);
field->offset = offset;
field->size = size;
field->is_signed = is_signed;
+
list_add(&field->link, &call->fields);
return 0;
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 27c2dbe..4c8c5fd 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -477,10 +477,11 @@ static int ...Add __field_ext(), so a field can be assigned to a specific
filter_type, which matches a corresponding filter function.
For example, a later patch will allow this:
__field_ext(const char *, str, FILTER_PTR_STR);
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/ftrace_event.h | 11 +++++++++--
include/trace/ftrace.h | 28 ++++++++++++++++++++++------
kernel/trace/trace_events.c | 9 +++++++--
kernel/trace/trace_events_filter.c | 6 ------
kernel/trace/trace_export.c | 7 ++++---
5 files changed, 42 insertions(+), 19 deletions(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 26d3673..14c388e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -138,8 +138,15 @@ extern int filter_current_check_discard(struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
+enum {
+ FILTER_STATIC_STRING = 1,
+ FILTER_DYN_STRING,
+ FILTER_OTHER,
+};
+
extern int trace_define_field(struct ftrace_event_call *call, char *type,
- char *name, int offset, int size, int is_signed);
+ char *name, int offset, int size,
+ int is_signed, int filter_type);
#define is_signed_type(type) (((type)(-1)) < 0)
@@ -167,7 +174,7 @@ do { \
#define __common_field(type, item, is_signed) \
ret = trace_define_field(event_call, #type, "common_" #item, \
offsetof(typeof(field.ent), item), \
- sizeof(field.ent.item), is_signed); \
+ sizeof(field.ent.item), is_signed, -1);\
if (ret) \
return ret;
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 380b603..8d8518f 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -21,6 +21,9 @@
#undef __field
#define __field(type, item) type item;
+#undef __field_ext
+#define __field_ext(type, item, filter_type) type item;
+
#undef __array
#define __array(type, item, len) type item[len];
@@ -64,6 ...Yeah, I don't like the use of that magic -1 here. Other than my comments, I like the patches so far. --
Usually, char * entries are dangerous in traces because the string
can be released whereas a pointer to it can still wait to be read from
the ring buffer.
But sometimes we can assume it's safe, like in case of RO data
(eg: __file__ or __line__, used in bkl trace event). If these RO data
are in a module and so is the call to the trace event, then it's safe,
because the ring buffer will be flushed once this module get unloaded.
To allow char * to be treated as a string:
TRACE_EVENT(...,
TP_STRUCT__entry(
__field_ext(const char *, name, FILTER_PTR_STRING)
...
)
...
);
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/ftrace_event.h | 1 +
kernel/trace/trace_events_filter.c | 26 +++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 14c388e..1a98a61 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -141,6 +141,7 @@ extern int filter_current_check_discard(struct ftrace_event_call *call,
enum {
FILTER_STATIC_STRING = 1,
FILTER_DYN_STRING,
+ FILTER_PTR_STRING,
FILTER_OTHER,
};
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 5e7f031..b16923e 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -163,6 +163,20 @@ static int filter_pred_string(struct filter_pred *pred, void *event,
return match;
}
+/* Filter predicate for char * pointers */
+static int filter_pred_pchar(struct filter_pred *pred, void *event,
+ int val1, int val2)
+{
+ char **addr = (char **)(event + pred->offset);
+ int cmp, match;
+
+ cmp = strncmp(*addr, pred->str_val, pred->str_len);
+
+ match = (!cmp) ^ pred->not;
+
+ return match;
+}
+
/*
* Filter predicate for dynamic sized arrays of characters.
* These are implemented through a list of strings at the end
@@ -489,7 +503,8 @@ int ...I'm a little dense here, where do we protect against someone making a tracepoint that points to unsafe data? --
We can't prevent anyone from doing insane things deliberately, but we prevent from doing wrong things unconsciously. Only if a TRACE_EVENT has a field defined as: __field_ext(char *, name, FILTER_PTR_STR) Here using FILTER_PTR_STR explicitly, he should know what he's doing. Anyway, he can make a ptr pointing to unsafe data this way: TP_STRUCT__entry( __field(char *, name) ) TP_printk("%s", name) --
I guess the thing I'm missing is what's the difference of the two? Why would a developer use __field_ext instead of doing it the unsafe way of just __field? I guess I don't see the developer doing something wrong unconsciously. Well maybe I don't see this making the developer do it right unconsciously. What protection is this giving us? -- Steve --
__field(char *) suggests it should be treated as plain pointer, while __field_ext(char *, FILTER_PTR_STR) suggests he's aware it's safe to dereference the pointer, for example the case in Frederic's blk events. In Frederic's initial version, "char *" field will always be attached to ptr_str filter function. This is unsafe, because for other fields defined as "char *" but not safe to dereference, a user still can do this: # echo 'name == abc' > filter Then we'll deref a pointer that can point to unsafe data. In this patch, this won't happen, as long as the developer is aware that his use of __field_ext(char *) is right. Otherwise, he will just use normal __field(char *) and print the pointer itself in TP_printk(). --
Ah, so the answer I'm looking for is: The filtering will not dereference "char *" unless the developer explicitly sets FILTER_PTR_STR in __field_ext. Is this above statement correct? -- Steve --
Exactly. Sorry that I didn't explain it clearly in the first place. :) --
No problem. It is quite often that things so blatantly obvious to a developer is totally foreign for the observer, that the simple explanation is commonly overlooked. I'll pull in your patches, with the -1 update, as well as adding my above statement to this patch's change log. I'll queue it for 32. Thanks! -- Steve --
I like the first two patches, but would like to see a V2 with the updates I've mentioned. The last patch I do not fully understand its purpose. I'll wait for V2 to pull it in. Thanks! --
The last one is wanted by Frederic in his bkl trace events patchset: http://lkml.org/lkml/2009/8/1/26 In that patch a ptr_str filter hook is always attached to "char *" --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
| Linux Kernel Mailing List | <
