All blame goes to: color white,red "[^[:graph:]]+$" in .nanorc ;). Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/x86/kernel/entry_64.S | 200 ++++++++++++++++++++++---------------------- 1 files changed, 100 insertions(+), 100 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 983d85a..a3a7fb2 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -11,15 +11,15 @@ * * NOTE: This code handles signal-recognition, which happens every time * after an interrupt and after each system call. - * - * Normal syscalls and interrupts don't save a full stack frame, this is + * + * Normal syscalls and interrupts don't save a full stack frame, this is * only done for syscall tracing, signals or fork/exec et.al. - * - * A note on terminology: - * - top of stack: Architecture defined interrupt frame from SS to RIP - * at the top of the kernel process stack. + * + * A note on terminology: + * - top of stack: Architecture defined interrupt frame from SS to RIP + * at the top of the kernel process stack. * - partial stack frame: partially saved registers upto R11. - * - full stack frame: Like partial stack frame, but all register saved. + * - full stack frame: Like partial stack frame, but all register saved. * * Some macro usage: * - CFI macros are used to generate dwarf2 unwind information for better @@ -147,7 +147,7 @@ END(mcount) #ifndef CONFIG_PREEMPT #define retint_kernel retint_restore_args -#endif +#endif #ifdef CONFIG_PARAVIRT ENTRY(native_usergs_sysret64) @@ -166,14 +166,14 @@ ENTRY(native_usergs_sysret64) .endm /* - * C code is not supposed to know about undefined top of stack. Every time - * a C function with an pt_regs argument is called from the SYSCALL based + * C code is not supposed to know about undefined top of stack. Every time + * a C function with an pt_regs argument is called from the SYSCALL based * fast path FIXUP_TOP_OF_STACK ...
From: Alexander van Heukelum <heukelum@sleipnir.lusi.uni-sb.de> The macro "interrupt" in entry_64.S generates a lot of code. This patch moves most of its contents into an external function. It saves anywhere between 500 and 2500 bytes of text depending on the configuration. The function returns with an altered stackpointer. Dwarf2-annotations are most probably wrong or missing. There is a comment in the original code about saving rbp twice, but I don't understand what the code tries to do. First of all, the second copy of rbp is written below the stack. Second, if the current stack is already the irqstack, this second copy is overwritten. Third, as far as I can tell, ebp should not be saved in pt_regs at all at this stage as it is 'preserved' due to the C calling conventions. So I left this second copy out and everything seems to work fine. If it wouldn't, all exceptions and NMIs would be dangerous anyhow, as far as I can see. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/entry_64.S | 123 ++++++++++++++++++++++++++------------------ 1 files changed, 73 insertions(+), 50 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index a3a7fb2..eb57c23 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -247,6 +247,78 @@ ENTRY(native_usergs_sysret64) CFI_REL_OFFSET rsp,RSP /*CFI_REL_OFFSET ss,SS*/ .endm + +/* + * initial frame state for interrupts and exceptions + */ + .macro _frame ref + CFI_STARTPROC simple + CFI_SIGNAL_FRAME + CFI_DEF_CFA rsp,SS+8-\ref + /*CFI_REL_OFFSET ss,SS-\ref*/ + CFI_REL_OFFSET rsp,RSP-\ref + /*CFI_REL_OFFSET rflags,EFLAGS-\ref*/ + /*CFI_REL_OFFSET cs,CS-\ref*/ + CFI_REL_OFFSET rip,RIP-\ref + .endm + +/* initial frame state for interrupts (and exceptions without error code) */ +#define INTR_FRAME _frame RIP +/* initial frame state for exceptions with error code (and interrupts with + vector ...
This is not for the stack, but to keep the registers as the pt_regs struct expects them. We'll be casting this structure later, and not doing that --
On Mon, 17 Nov 2008 10:14:44 -0200, "Glauber Costa" <glommer@redhat.com> Hi Glauber, Thanks for your reply, but I'm afraid I still don't see why this second copy is needed. A CONFIG_FRAME_POINTER configuration seems to be working fine. Of course regs->bp will give a bogus value, but so will r12, r13, r14, r15, and rbx. However, if you have CONFIG_FRAME_POINTER=y, then rbp can be trusted to be used as a framepointer... so one should get it directly from the registers, not from regs->bp. Could you give me a configuration that fails, and describe what goes wrong? Probably the right fix is not to use regs->bp. Greetings, -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - mmm... Fastmail... --
The only duplication is in the apicinterrupt entry points which have expanded recently (when I wrote all that there weren't as many) I think it would be cleaner to just have a common apic_interrupt entry point similar to how the exceptions work that try to factor out "interrupt" like this. As more and more of them get added (I have another new one in recent) patches that will likely save more space. The only ugly part is that passing the handler to the common stub requires the manual pt_regs setup that the exception handler currently does. Because that could be factored out in a new macro. Or just copied (I have heard complaints To be honest I didn't understand this one either when it was added. In standard frame pointer format rbp has to be at the place pointed to by the real rbp register with the return address directly on top of it. But pushing %rbp below the pt_regs doesn't put it into this format, because the return address is at the wrong place. -Andi --
On Mon, 17 Nov 2008 13:53:17 +0100, "Andi Kleen" <andi@firstfloor.org> Hi Andi, I liked this way of calling an external function, because it circumvents exactly most of this ugly setup. For two bytes extra per stub, one can make this setup function a completely normal one (without relocating the return address on the stack). I intended the stubs to fit in one cache line (32 bytes) as it does not make much sense to make them smaller than that. A further advantage is that no indirect call is needed, but maybe this is not as slow anymore as it used to be? B.t.w., I intended to change the exception handler stubs in a similar way to get The second copy of %rbp is indeed placed at the 'correct' position inside the pt_regs struct. However, at this point only a partial stack frame is saved: the C argument registers and the scratch registers. r12-r15,ebp,and rbx will be saved only later if necessary. A problem could arise if some code uses pt_regs.bp of this partial stack frame, because it will contain a bogus value. Glauber's patch makes pt_regs.bp contain the right value most of the time... Which means that if the patch fixed something for him, the problem has only been made unlikely to happen. The place where things should be fixed are the places where pt_regs.bp is used, but not filled in. Greetings, -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - Access your email from home and the web --
It depends on the CPU. But always it requires resources which Ok. Hopefully it's worth the effort. The branch misprediction bubble should not be too bad, perhaps it'll make up for the other cycles you're adding. But even if it doesn't decreasing cache line foot print is Hmm I first thought it was rather so that backtracing over the exception stubs with FPs works better. But then it didn't even set up a correct frame for that so it might have been something else. The original design was that you should only use these extended registers in special calls that go through the PTREGS stubs. But then sysprof/oprofile application profiling was added which wants to do user space backtracing using FPs (which seemed pointless to me because most/all of 64bit user space and recently also pretty much all 32bit user space is compiled without FPs by default) The only way to make this work though is to always save RBP. But there is no need to do the strange double saving, you can just store it always directly. But I personally have doubts its worth making every interrupt /syscall slower since it won't work with most apps anyways. The only sure way to do these backtraces is to do dwarf2 unwinding in user space too which doesn't need hacks like this. -Andi --
[Andi Kleen - Mon, Nov 17, 2008 at 07:23:40PM +0100] ... | | Ok. Hopefully it's worth the effort. The branch misprediction bubble | should not be too bad, perhaps it'll make up for the other cycles | you're adding. But even if it doesn't decreasing cache line foot print is | always a good thing. ... May I turn in? :) Original .macro interrupt func has testl $3, CS(%rdi) je 1f SWAPGS 1: incl %gs:pda_irqcount jne 2f pop %rax mov %gs:pda_irqstackptr,%rsp push %rax 2: TRACE_IRQS_OFF Wouldn't we help the branch predictor a bit by testl $3, CS(%rdi) je 1f SWAPGS jne 2f 1: pop %rax mov %gs:pda_irqstackptr,%rsp push %rax 2: incl %gs:pda_irqcount TRACE_IRQS_OFF I hope I didn't miss anything but maybe it's just a churning or plain wrong. Don't shoot me :) - Cyrill - --
[Cyrill Gorcunov - Mon, Nov 17, 2008 at 10:22:16PM +0300] | [Andi Kleen - Mon, Nov 17, 2008 at 07:23:40PM +0100] | ... | | | | Ok. Hopefully it's worth the effort. The branch misprediction bubble | | should not be too bad, perhaps it'll make up for the other cycles | | you're adding. But even if it doesn't decreasing cache line foot print is | | always a good thing. | ... | | May I turn in? :) | I meant - turn into thread not "turn in" (ouch). - Cyrill - --
On Mon, 17 Nov 2008 22:29:41 +0300, "Cyrill Gorcunov" -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - Access all of your messages and folders wherever you are --
[Alexander van Heukelum - Mon, Nov 17, 2008 at 08:49:35PM +0100] | On Mon, 17 Nov 2008 22:29:41 +0300, "Cyrill Gorcunov" | <gorcunov@gmail.com> said: | > [Cyrill Gorcunov - Mon, Nov 17, 2008 at 10:22:16PM +0300] | > | [Andi Kleen - Mon, Nov 17, 2008 at 07:23:40PM +0100] | > | ... | > | | | > | | Ok. Hopefully it's worth the effort. The branch misprediction bubble | > | | should not be too bad, perhaps it'll make up for the other cycles | > | | you're adding. But even if it doesn't decreasing cache line foot | > print is | > | | always a good thing. | > | ... | > | | > | May I turn in? :) | > | | > | > I meant - turn into thread not "turn in" (ouch). | | Even more likely is that you meant: "May I jump in" ;) | | > - Cyrill - | -- | Alexander van Heukelum | heukelum@fastmail.fm | | -- | http://www.fastmail.fm - Access all of your messages and folders | wherever you are | It doesn't matter since I was killed out :) But I have some argument to justify myself -- je and jne don't depend on CF :-) - Cyrill - --
On Mon, 17 Nov 2008 22:22:16 +0300, "Cyrill Gorcunov" I'm sure it helps the branch predictor, though ;). -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - A fast, anti-spam email service. --
[Alexander van Heukelum - Mon, Nov 17, 2008 at 08:43:26PM +0100] | http://www.fastmail.fm - A fast, anti-spam email service. | ... (me: gone out in deep shame) - Cyrill - --
The macro "interrupt" in entry_64.S generates a lot of code and it is used more and more often. This patch moves most of its contents into an external function. This saves anywhere between 500 and 2500 bytes of text depending on the configuration. Dwarf2-annotations are most probably wrong or missing at all. v2 moves adjusting the stack to the caller. This avoids the ugly shuffle to handle the position of the return address on the stack. After this patch, a typical handler looks like this: <thermal_interrupt>: 68 05 ff ff ff pushq $0xffffffffffffff05 48 83 ec 50 sub $0x50,%rsp e8 72 f4 ff ff callq ffffffff80211260 <save_args> e8 ec 08 00 00 callq ffffffff802126df <smp_thermal_inter_interrupt> e9 16 fd ff ff jmpq ffffffff80211b0e <ret_from_intr> 0f 1f 84 00 00 nopl 0x0(%rax,%rax,1) 00 00 00 I think this approach (v2) is much cleaner than using the same strategy as for the exception handlers, where the address of the C-handler is passed to a common entry point which makes an indirect call to the handler. <coprocessor_error>: ff 15 f2 71 1c callq *0x1c71f2(%rip) # <pv_irq_ops+0x38> 00 6a 00 pushq $0x0 50 push %rax 48 8d 05 d9 11 lea 0x11d9(%rip),%rax # <do_coprocessor_error> 00 00 e9 4b 99 0f 00 jmpq ffffffff8030bbb0 <error_entry> 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1) 84 00 00 00 00 00 The advantage of _this_ way of doing things is that the stubs can probably be made to fit in 16 bytes, but it comes at the cost of doing an unnecessary indirect call. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Andi Kleen <andi@firstfloor.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: Thomas Gleixner <tglx@linutronix.de> arch/x86/kernel/entry_64.S | 135 ++++++++++++++++++++++++++------------------ 1 files changed, 81 insertions(+), 54 deletions(-) --- Hi all, I just want to give this one more shot ;). Comments? This patch is on top of ...
Without numbers, I'd doubt that the code size reduction outweighs the Indeed - do you have intentions to address this? Jan --
I have not measured yet. I'll see if I can highjack a machine. What Yes, I'ld like to get it right. What do you use to check the annotations? Greetings, --
No tool, if you mean that. Extensive changes I verify by looking at the dump, problems are usually found only when back traces don't come out right. Jan --
that's a fundamental weakness of all the CFI annotations. It is outright wrong to waste humans on this mechanic task: as it is abundantly clear to GAS where we change a stack pointer and by how much - it could emit magic annotations automatically just as much. So if you care about it, please fix this in the tools space. The entry_64.S impact of finegrained annotations is just too ugly for things like this. One limited exception is for basic stack frames where we do syscalls or call into other C code. (i.e. the patch proposed here would have to do that limited annotation) But the per instruction annotations currently in that code are madness and must either be cleaned up significantly via the use of GAS macros (so that all stack pointer manipulations go via a single macro invocation), or be completely auto-generated by GAS. Ingo --
Making gas auto-generate this is not really possible (much like ia64 requires the annotations to be inserted manually), mainly because gas can't know whether e.g. a push of a register is in order to preserve its value, or for some other purpose. I do have a set of macros for this in nlkd, maybe (as you're asking for it) I should get them out of there (and convert them to AT&T syntax). Jan --
but that's the exception. Most of the annotations could be i'd definitely like to have a look ... if you can make this clean enough, most of the resistence to CFI annotations will go away. The requirements is extreme cleanliness: single line in the source that gets us _both_ the instruction and the annotation. Also always insert the proper frame pointer as well, when we call into C. Make it as hard as possible to mess up the annotations - we could even run a build-time grep on the .S files that matter to see whether there's any (common) "naked" stack-manipulating instructions that shouldnt be used. Ingo --
Not really. An implicit .cfi_undefined can be auto-generated for an unannotated instruction with an output register. An implicit .cfi_register can be auto-generated for an unannotated register-to-register move. An implicit .cfi_same_value can be auto-generated when you can tell a register is being written with the register or stack slot tha the current CFI state says holds the caller's value of that register. Beyond that, it gets into either assumptions or hairy analysis of how stack slots are being used and so forth. I don't think auto-generation is very a useful angle to take for this any time soon. Explicit (but simple) macros in the assembly is what I favor. Thanks, Roland --
Do you mean macros that generate both the instruction and the CFI or separate? The major disadvantage of doing it together in a single macro is that it is not really readable for any assembler programmer anymore, but starts becoming a Linux specific assembler language. Likely not a good thing for maintenance. anyone who wants to know the real assembler would need to read objdump -S output, which is not nice. Perhaps it would be a reasonable readability improvement to just use shorter cfi macros which are not shouted? -Andi --
Not really. At the moment we have two parallel assembly languages which
say different things about the same instructions. In practice, almost
nobody understands the cfi parts, so they just get ignored while the x86
instructions change around them, leaving them either stale or missing.
If we had a sensible macro layer which emits both instructions and cfi
annotations, it at least means that people who write plain x86
instructions will simply get no annotations, and people who bother to
learn the (clearly and fully documented) macros will get the best of both.
J
--
I think that it would be nice to have macros for the most commonly annotatable instructions, e.g. push, and stack pointer movement. Just compactifying the code should improve readability, if perhaps not writability. -hpa --
Yes. Something that obviously relates to both the instruction and the semantic intent of the annotation: add_sp, sub_sp, save_reg, etc. And at least that will eliminate the differently-signed(!) constant for stack movement. --
Yes, I mean a single macro that produces both the instruction and the CFI pseudo-op to go with it. This is the essential characteristic that makes it an improvement for maintaining the code. The main problem we have now is that it's easy to write/modify plain assembly instructions and forget to add or update the CFI to match. A well-considered set of macros can solve this without making it any harder for the average assembly programmer to understand what each line of the source means intuitively. Thanks, Roland --
Hmm, but if the assembler cannot auto generate it how should the assembler writer know if he should use the macro or the direct instruction without understanding CFI? Also what will the assembler reader do? Do they first have to understand CFI to understand everything? I personally would probably just resort to objdump -S in this situation. I think you're saying that for the user the macros would be just equivalent, but if that's true they could be just auto generated by the assembler. But it's obviously not, so you'll end up with the Linux magic asm dialect (and its maintenance disadvantages) and you'll still require CFI knowledge to understand/write everything anyways. -Andi --
We already have a "Linux magic asm dialect" which require CFI knowledge. Nothing can change that other than dumping the requirement that we have valid CFI data. However, the current code is hard to read and easy to trip up on. We can at least make it easier, especially to read -- and making it easier to read will help writers, too. -hpa --
i dont buy that argument at all.
Yes, of course full "no changes to the current code at all" automation
is hard.
But _at minimum_ GAS should automate a large part of this and help us
out syntactically: make it really easy to human-annotate instructions
in a _minimal way_. Also, automate the easy bits while at it. Plus
warn about missing annotations - nesting errors, etc. etc.
Yeah. This is the second-best option - but has some limitations. Still
it is much better than what we have now.
What _clearly_ sucks is the current mess of:
CFI_ADJUST_CFA_OFFSET 8
/*CFI_REL_OFFSET ss,0*/
pushq %rax /* rsp */
CFI_ADJUST_CFA_OFFSET 8
CFI_REL_OFFSET rsp,0
pushq $(1<<9) /* eflags - interrupts on */
CFI_ADJUST_CFA_OFFSET 8
/*CFI_REL_OFFSET rflags,0*/
pushq $__KERNEL_CS /* cs */
CFI_ADJUST_CFA_OFFSET 8
/*CFI_REL_OFFSET cs,0*/
pushq \child_rip /* rip */
CFI_ADJUST_CFA_OFFSET 8
CFI_REL_OFFSET rip,0
pushq %rax /* orig rax */
CFI_ADJUST_CFA_OFFSET 8
Compared to what we could have (stupid mockup):
pushq_cf1 %rax /* rsp */
pushq_cf1 $(1<<9) /* eflags - interrupts on */
pushq_cf1 $__KERNEL_CS /* cs */
pushq_cf2 \child_rip /* rip */
pushq_cf1 %rax /* orig rax */
Whoever claims that this cannot be automated in _large_ part isnt
thinking it through really. Those CFI annotations should never have
been added in this form.
Ingo
--
Something like this would be a lot cleaner equivalent replacement:
PUSHQ %rax /* rsp */
PUSHQ $(1<<9) /* eflags - interrupts on */
PUSHQ $__KERNEL_CS /* cs */
PUSHQ \child_rip /* rip */
cfi_map rip, 0
PUSHQ %rax /* orig rax */
as most of the really annoying CFI annotations in entry_64.S that
obscruct code reading are just plain CFA offset modifications related
to stack shuffling.
[ Sidenote: trying to connect up RIP like that in the FAKE_STACK_FRAME
is pretty wrong to begin with - the annotation is incomplete up to
this point. ]
The problems are not caused by the prologue or epilogue annotations,
nor by any of the trickier stack shuffling annotations we do around
syscall/sysret and around exception frames. A lot of the frame formats
we use are special, controlled by hw details and we do have to map
those details to the debuginfo - it's an inevitably manual piece of
work.
It's the plain crappy:
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
popq %rdi
CFI_ADJUST_CFA_OFFSET -8
annotation spam that hurts readability the most. The "+8" and "-8"
CFA-offset lines are completely uninformative and they obsctruct the
reading of this already very trick type of source code (assembly
language).
It should be something like this:
PUSHQ %rdi
call schedule
POPQ %rdi
instead.
Ingo
--
Hi all,
Here is a combined patch that moves "save_args" out-of-line for
the interrupt macro and moves "error_entry" mostly out-of-line
for the zeroentry and errorentry macros.
The save_args function becomes really straightforward and easy
to understand, with the possible exception of the stack switch
code, which now needs to copy the return address of to the
calling function. Normal interrupts arrive with ((~vector)-0x80)
on the stack, which gets adjusted in common_interrupt:
<common_interrupt>:
(5) addq $0xffffffffffffff80,(%rsp) /* -> ~(vector) */
(4) sub $0x50,%rsp /* space for registers */
(5) callq ffffffff80211290 <save_args>
(5) callq ffffffff80214290 <do_IRQ>
<ret_from_intr>:
...
An apic interrupt stub now look like this:
<thermal_interrupt>:
(5) pushq $0xffffffffffffff05 /* ~(vector) */
(4) sub $0x50,%rsp /* space for registers */
(5) callq ffffffff80211290 <save_args>
(5) callq ffffffff80212b8f <smp_thermal_interrupt>
(5) jmpq ffffffff80211f93 <ret_from_intr>
Similarly the exception handler register saving function becomes
simpler, without the need of any parameter shuffling. The stub
for an exception without errorcode looks like this:
<overflow>:
(6) callq *0x1cad12(%rip) # ffffffff803dd448 <pv_irq_ops+0x38>
(2) pushq $0xffffffffffffffff /* no syscall */
(4) sub $0x78,%rsp /* space for registers */
(5) callq ffffffff8030e3b0 <error_entry>
(3) mov %rsp,%rdi /* pt_regs pointer */
(2) xor %esi,%esi /* no error code */
(5) callq ffffffff80213446 <do_overflow>
(5) jmpq ffffffff8030e460 <error_exit>
And one for an exception with errorcode like this:
<segment_not_present>:
(6) callq *0x1cab92(%rip) # ffffffff803dd448 <pv_irq_ops+0x38>
(4) sub $0x78,%rsp /* space for registers */
(5) callq ffffffff8030e3b0 <error_entry>
(3) mov %rsp,%rdi /* pt_regs pointer */
(5) mov 0x78(%rsp),%rsi /* load error code */
(9) movq ...Sorry, I'm away on a trip at the moment, so sorry for the delayed feedback. First of all, if we're going to go through common code here, we should do the vector number adjustment in save_args and be able to use the short form of pushq in the common case. What isn't clear to me is if we should just push a target field to the stack and then do an indirect call. That way we can do save_args and ret_from_intr inline, but at the expense of an indirect call which will not necessarily speculate cleanly. All of this qualify as "tweaking", which means we need to be very careful so we don't end up burning more performance than we gain, but all in all, I think this is a good idea. -hpa --
okay - will queue it up in tip/x86/irq, lets see how it goes. Ingo --
This add-on patch to x86: move entry_64.S register saving out
of the macros visually cleans up the appearance of the code by
introducing some basic helper macro's. It also adds some cfi
annotations which were missing.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
arch/x86/kernel/entry_64.S | 220 ++++++++++++++++++++++----------------------
1 files changed, 112 insertions(+), 108 deletions(-)
Hello Ingo,
This patch improves the CFI-situation in entry_64.S, but restricted
mostly to the areas touched by "x86: move entry_64.S register saving
out of the macros". I'm sure there will be some small errors somewhere,
but it compiles and runs fine.
Greetings,
Alexander
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5a12432..4a3a3f4 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -60,6 +60,23 @@
#define __AUDIT_ARCH_LE 0x40000000
.code64
+/*
+ * Some macro's to hide the most frequently occuring CFI annotations.
+ */
+ .macro cfi_pushq reg
+ pushq \reg
+ CFI_ADJUST_CFA_OFFSET 8
+ .endm
+
+ .macro cfi_popq reg
+ popq \reg
+ CFI_ADJUST_CFA_OFFSET -8
+ .endm
+
+ .macro cfi_store reg offset=0
+ movq %\reg, \offset(%rsp)
+ CFI_REL_OFFSET \reg, \offset
+ .endm
#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE
@@ -213,84 +230,84 @@ ENTRY(native_usergs_sysret64)
CFI_ADJUST_CFA_OFFSET -(6*8)
.endm
- .macro CFI_DEFAULT_STACK start=1
+/*
+ * initial frame state for interrupts (and exceptions without error code)
+ */
+ .macro EMPTY_FRAME start=1 offset=0
.if \start
- CFI_STARTPROC simple
+ CFI_STARTPROC simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,SS+8
+ CFI_DEF_CFA rsp,8+\offset
.else
- CFI_DEF_CFA_OFFSET SS+8
+ CFI_DEF_CFA_OFFSET 8+\offset
...So it's not standard assembler anymore. Welcome objdump -S ! -Andi --
very nice cleanup! This is exactly what should be done. Applied to
tip/x86/irq.
Note, i did a small rename:
cfi_pushq => pushq_cfi
cfi_popq => popq_cfi
cfi_store => movq_cfi
as the goal is to have the actual source code read mostly as regular
assembly code. The fact that the macro is equivalent to a
default-annotated pushq/popq/movq instruction is much more important
than the fact that it also does CFI annotations.
Also, while cfi_store is correct as well, the usual x86 assembly term
(and instruction used here) is movq.
Find below how the commit ended up looking like.
Thanks,
Ingo
----------------->
From 250f351f57022f125c19178af1b0335503f1a02d Mon Sep 17 00:00:00 2001
From: Alexander van Heukelum <heukelum@mailshack.com>
Date: Thu, 20 Nov 2008 14:40:11 +0100
Subject: [PATCH] x86: clean up after: move entry_64.S register saving out of the macros
This add-on patch to x86: move entry_64.S register saving out
of the macros visually cleans up the appearance of the code by
introducing some basic helper macro's. It also adds some cfi
annotations which were missing.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/entry_64.S | 220 ++++++++++++++++++++++----------------------
1 files changed, 112 insertions(+), 108 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5a12432..6df9d31 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -60,6 +60,23 @@
#define __AUDIT_ARCH_LE 0x40000000
.code64
+/*
+ * Some macro's to hide the most frequently occuring CFI annotations.
+ */
+ .macro pushq_cfi reg
+ pushq \reg
+ CFI_ADJUST_CFA_OFFSET 8
+ .endm
+
+ .macro popq_cfi reg
+ popq \reg
+ CFI_ADJUST_CFA_OFFSET -8
+ .endm
+
+ .macro movq_cfi reg offset=0
+ movq %\reg, \offset(%rsp)
+ CFI_REL_OFFSET \reg, \offset
+ .endm
#ifdef CONFIG_FUNCTION_TRACER
#ifdef ...Now I have a little problem with my next patch... I wanted to introduce cfi_load. Guess what assembly instruction that maps to ;). Greetings, -- Alexander van Heukelum --
heh ;-)
the restore direction could be named movq_cfi_restore, and have the same
order of arguments as the regular movq that it replaces. I.e.:
movq 8(%rsp),%r11
CFI_RESTORE r11
would map to:
movq_cfi_restore 8, r11
or so.
cfi_store has really a bad name: it's confusing whether it's the CFI
info we are storing/registering (which we are), or a 'store' instruction
(which this is too).
If then we should name it movq_cfi_store or movq_cfi_register - but
that's too long.
movq_cfi for the frame construction direction and movq_cfi_restore for
the frame deconstruction phase sounds like a good naming compromise, hm?
Ingo
--
How about movq_spill and movq_fill (losely following some ia64 terminology)? Jan --
Does not work... But if you are attached to the underscores, I think we can force it to work by using CPP to convert it to something the assembler does parse right: #define pushq_cfi pushq.cfi etc? Or is that just too ugly? -- Alexander van Heukelum --
[Alexander van Heukelum - Thu, Nov 20, 2008 at 04:57:44PM +0100] | On Thu, Nov 20, 2008 at 04:39:54PM +0100, Ingo Molnar wrote: | > | > * Alexander van Heukelum <heukelum@mailshack.com> wrote: | > | > > On Thu, Nov 20, 2008 at 04:04:12PM +0100, Ingo Molnar wrote: | > > > | > > > * Alexander van Heukelum <heukelum@mailshack.com> wrote: | > > > | > > > > This add-on patch to x86: move entry_64.S register saving out of the | > > > > macros visually cleans up the appearance of the code by introducing | > > > > some basic helper macro's. It also adds some cfi annotations which | > > > > were missing. | > > > > | > > > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | > > > > --- | > > > > arch/x86/kernel/entry_64.S | 220 ++++++++++++++++++++++---------------------- | > > > > 1 files changed, 112 insertions(+), 108 deletions(-) | > > > > | > > > > Hello Ingo, | > > > > | > > > > This patch improves the CFI-situation in entry_64.S, but restricted | > > > > mostly to the areas touched by "x86: move entry_64.S register saving | > > > > out of the macros". I'm sure there will be some small errors | > > > > somewhere, but it compiles and runs fine. | > > > | > > > very nice cleanup! This is exactly what should be done. Applied to | > > > tip/x86/irq. | > > > | > > > Note, i did a small rename: | > > > | > > > cfi_pushq => pushq_cfi | > > > cfi_popq => popq_cfi | > > > cfi_store => movq_cfi | | Does not work... But if you are attached to the underscores, I | think we can force it to work by using CPP to convert it to | something the assembler does parse right: | | #define pushq_cfi pushq.cfi | | etc? | | Or is that just too ugly? | | Alexander | | > > > as the goal is to have the actual source code read mostly as regular | > > > assembly code. The fact that the macro is equivalent to a | > > > default-annotated pushq/popq/movq instruction is much more important | > > > than the fact that it also does CFI ...
Hi Ingo, False alarm. It's happily building at the moment. Does it still fail for you? Greetings, --
changed it to pushq.cfi / popq.cfi / movq.cfi - does that read OK to you? It looks fine here. Ingo --
The save_rest function completes a partial stack frame for use
by the PTREGSCALL macro. This also avoid the indirect call in
PTREGSCALLs.
This adds the macro movq_cfi_restore to hide the CFI_RESTORE
annotation when restoring a register from the stack frame.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
arch/x86/kernel/entry_64.S | 91 +++++++++++++++++++++++++------------------
1 files changed, 53 insertions(+), 38 deletions(-)
Hello Ingo,
Here are three more patches cleaning up entry_64.S. They apply to
current tip/x86/irq. Thanks again for your help on the binutils
problems!
Greetings,
Alexander
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 92c5e18..ef95c45 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -78,6 +78,11 @@
CFI_REL_OFFSET \reg, \offset
.endm
+ .macro movq_cfi_restore offset reg
+ movq \offset(%rsp), %\reg
+ CFI_RESTORE \reg
+ .endm
+
#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE
ENTRY(mcount)
@@ -186,21 +191,21 @@ ENTRY(native_usergs_sysret64)
*/
/* %rsp:at FRAMEEND */
- .macro FIXUP_TOP_OF_STACK tmp
- movq %gs:pda_oldrsp,\tmp
- movq \tmp,RSP(%rsp)
- movq $__USER_DS,SS(%rsp)
- movq $__USER_CS,CS(%rsp)
- movq $-1,RCX(%rsp)
- movq R11(%rsp),\tmp /* get eflags */
- movq \tmp,EFLAGS(%rsp)
+ .macro FIXUP_TOP_OF_STACK tmp offset=0
+ movq %gs:pda_oldrsp,\tmp
+ movq \tmp,RSP+\offset(%rsp)
+ movq $__USER_DS,SS+\offset(%rsp)
+ movq $__USER_CS,CS+\offset(%rsp)
+ movq $-1,RCX+\offset(%rsp)
+ movq R11+\offset(%rsp),\tmp /* get eflags */
+ movq \tmp,EFLAGS+\offset(%rsp)
.endm
- .macro RESTORE_TOP_OF_STACK tmp,offset=0
- movq RSP-\offset(%rsp),\tmp
- movq \tmp,%gs:pda_oldrsp
- movq EFLAGS-\offset(%rsp),\tmp
- movq \tmp,R11-\offset(%rsp)
+ .macro RESTORE_TOP_OF_STACK tmp offset=0
+ movq RSP+\offset(%rsp),\tmp
+ movq \tmp,%gs:pda_oldrsp
+ movq EFLAGS+\offset(%rsp),\tmp
+ movq \tmp,R11+\offset(%rsp)
...Also expand the paranoid_exit0 macro into nmi_exit inside the nmi stub in the case of enabled irq-tracing. This gives a few hundred bytes code size reduction. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/x86/kernel/entry_64.S | 151 +++++++++++++++++++++++++++++-------------- 1 files changed, 102 insertions(+), 49 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index ef95c45..fad777b 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -353,6 +353,36 @@ ENTRY(save_rest) CFI_ENDPROC END(save_rest) +/* save complete stack frame */ +ENTRY(save_paranoid) + XCPT_FRAME 1 RDI+8 + cld + movq_cfi rdi, RDI+8 + movq_cfi rsi, RSI+8 + movq_cfi rdx, RDX+8 + movq_cfi rcx, RCX+8 + movq_cfi rax, RAX+8 + movq_cfi r8, R8+8 + movq_cfi r9, R9+8 + movq_cfi r10, R10+8 + movq_cfi r11, R11+8 + movq_cfi rbx, RBX+8 + movq_cfi rbp, RBP+8 + movq_cfi r12, R12+8 + movq_cfi r13, R13+8 + movq_cfi r14, R14+8 + movq_cfi r15, R15+8 + movl $1,%ebx + movl $MSR_GS_BASE,%ecx + rdmsr + testl %edx,%edx + js 1f /* negative -> in kernel */ + SWAPGS + xorl %ebx,%ebx +1: ret + CFI_ENDPROC +END(save_paranoid) + /* * A newly forked process directly context switches into this. */ @@ -1012,24 +1042,15 @@ END(spurious_interrupt) .endm /* error code is on the stack already */ - /* handle NMI like exceptions that can happen everywhere */ - .macro paranoidentry sym, ist=0, irqtrace=1 - SAVE_ALL - cld - movl $1,%ebx - movl $MSR_GS_BASE,%ecx - rdmsr - testl %edx,%edx - js 1f - SWAPGS - xorl %ebx,%ebx -1: + .macro paranoidentry sym ist=0 + subq $15*8, %rsp + CFI_ADJUST_CFA_OFFSET 15*8 + call save_paranoid + DEFAULT_FRAME 0 .if \ist movq %gs:pda_data_offset, %rbp .endif - .if \irqtrace TRACE_IRQS_OFF - .endif movq %rsp,%rdi movq ORIG_RAX(%rsp),%rsi movq $-1,ORIG_RAX(%rsp) @@ -1041,9 +1062,7 @@ END(spurious_interrupt) addq $EXCEPTION_STKSZ, per_cpu__init_tss + ...
DISABLE_INTERRUPTS(CLBR_NONE)/TRACE_IRQS_OFF is now always executed just before paranoid_exit. Move it there. Split out paranoidzeroentry, paranoiderrorentry, and paranoidzeroentry_ist to get more readable macro's. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/x86/kernel/entry_64.S | 102 ++++++++++++++++++++++---------------------- 1 files changed, 51 insertions(+), 51 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index fad777b..692c1da 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1026,6 +1026,39 @@ END(spurious_interrupt) CFI_ENDPROC .endm + .macro paranoidzeroentry sym + INTR_FRAME + PARAVIRT_ADJUST_EXCEPTION_FRAME + pushq $-1 /* ORIG_RAX: no syscall to restart */ + CFI_ADJUST_CFA_OFFSET 8 + subq $15*8, %rsp + call save_paranoid + TRACE_IRQS_OFF + movq %rsp,%rdi /* pt_regs pointer */ + xorl %esi,%esi /* no error code */ + call \sym + jmp paranoid_exit /* %ebx: no swapgs flag */ + CFI_ENDPROC + .endm + + .macro paranoidzeroentry_ist sym ist + INTR_FRAME + PARAVIRT_ADJUST_EXCEPTION_FRAME + pushq $-1 /* ORIG_RAX: no syscall to restart */ + CFI_ADJUST_CFA_OFFSET 8 + subq $15*8, %rsp + call save_paranoid + TRACE_IRQS_OFF + movq %rsp,%rdi /* pt_regs pointer */ + xorl %esi,%esi /* no error code */ + movq %gs:pda_data_offset, %rbp + subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp) + call \sym + addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp) + jmp paranoid_exit /* %ebx: no swapgs flag */ + CFI_ENDPROC + .endm + .macro errorentry sym XCPT_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME @@ -1042,27 +1075,20 @@ END(spurious_interrupt) .endm /* error code is on the stack already */ - .macro paranoidentry sym ist=0 - subq $15*8, %rsp + .macro paranoiderrorentry sym + XCPT_FRAME + PARAVIRT_ADJUST_EXCEPTION_FRAME + subq $15*8,%rsp CFI_ADJUST_CFA_OFFSET 15*8 call save_paranoid DEFAULT_FRAME ...
cool! Could you please also collapse it into just a single macro? The ENTRY/END sequence can be generated by the macro. (if you do it then please do it in a followup patch, i'll look at and apply your current series) Something like: PARANOID_ERROR_ENTRY(stack_segment) Could do all the magic, right? Ingo --
Impact: cleanup of entry_64.S Except for the order and the place of the functions, this patch should not change the generated code. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/x86/kernel/entry_64.S | 259 +++++++++++++++++++------------------------- 1 files changed, 109 insertions(+), 150 deletions(-) I chose to just reuse the existing names, but it's bikeshedding, so change it if you like ;) I now know why I did not hit the bug that was fixed by "x86: split out some macro's and move common code to paranoid_exit, fix"... *blush* I was doing my real-world testing using an i386-image of debian. Greetings, diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index e5ddf57..ec0ed9c 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -945,76 +945,70 @@ END(common_interrupt) /* * APIC interrupts. */ - .p2align 5 - - .macro apicinterrupt num,func +.macro apicinterrupt num sym do_sym +ENTRY(\sym) INTR_FRAME pushq $~(\num) CFI_ADJUST_CFA_OFFSET 8 - interrupt \func + interrupt \do_sym jmp ret_from_intr CFI_ENDPROC - .endm +END(\sym) +.endm -ENTRY(thermal_interrupt) - apicinterrupt THERMAL_APIC_VECTOR,smp_thermal_interrupt -END(thermal_interrupt) +#ifdef CONFIG_SMP +apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \ + irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt +#endif -ENTRY(threshold_interrupt) - apicinterrupt THRESHOLD_APIC_VECTOR,mce_threshold_interrupt -END(threshold_interrupt) +apicinterrupt 220 \ + uv_bau_message_intr1 uv_bau_message_interrupt +apicinterrupt LOCAL_TIMER_VECTOR \ + apic_timer_interrupt smp_apic_timer_interrupt #ifdef CONFIG_SMP -ENTRY(reschedule_interrupt) - apicinterrupt RESCHEDULE_VECTOR,smp_reschedule_interrupt -END(reschedule_interrupt) - - .macro INVALIDATE_ENTRY num -ENTRY(invalidate_interrupt\num) - apicinterrupt ...
Impact: moves some code out of .kprobes.text
KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
uses .popsection to get back to the previous section (.text, normally).
Also replace ENDPROC by END, for consistency.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
arch/x86/kernel/entry_64.S | 16 +++++++---------
1 files changed, 7 insertions(+), 9 deletions(-)
Hi Ingo,
One more small change for today. The xen-related functions
xen_do_hypervisor_callback and xen_failsafe_callback are put
in the .kprobes.text even in the current kernel: ignore_sysret
is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
KPROBE_END, but I guess the situation is harmless.
Greetings,
Alexander
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index ec0ed9c..477a428 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1190,7 +1190,7 @@ paranoid_schedule:
TRACE_IRQS_OFF
jmp paranoid_userspace
CFI_ENDPROC
-END(paranoid_exit)
+KPROBE_END(paranoid_exit)
/*
* Exception entry point. This expects an error code/orig_rax on the stack.
@@ -1282,7 +1282,7 @@ gs_change:
CFI_ADJUST_CFA_OFFSET -8
ret
CFI_ENDPROC
-ENDPROC(native_load_gs_index)
+END(native_load_gs_index)
.section __ex_table,"a"
.align 8
@@ -1336,7 +1336,7 @@ ENTRY(kernel_thread)
UNFAKE_STACK_FRAME
ret
CFI_ENDPROC
-ENDPROC(kernel_thread)
+END(kernel_thread)
child_rip:
pushq $0 # fake return address
@@ -1352,7 +1352,7 @@ child_rip:
mov %eax, %edi
call do_exit
CFI_ENDPROC
-ENDPROC(child_rip)
+END(child_rip)
/*
* execve(). This function needs to use IRET, not SYSRET, to set up all state properly.
@@ -1383,9 +1383,7 @@ ENTRY(kernel_execve)
UNFAKE_STACK_FRAME
ret
CFI_ENDPROC
-ENDPROC(kernel_execve)
-
-
+END(kernel_execve)
/* runs on exception stack */
KPROBE_ENTRY(nmi)
@@ -1460,14 +1458,14 @@ ENTRY(call_softirq)
decl %gs:pda_irqcount
...yeah. It narrows no-kprobes protection for that code, but it should indeed be fine (and that's the intention as well). Note that this is a reoccuring bug type, and rather long-lived. Can you think of any way to get automated nesting protection of both the .cfi_startproc/endproc macros and kprobes start/end? A poor man's solution would be to grep the number of start and end methods and enforce that they are equal. Ingo --
[Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100] | | * Alexander van Heukelum <heukelum@mailshack.com> wrote: | | > Impact: moves some code out of .kprobes.text | > | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END | > uses .popsection to get back to the previous section (.text, normally). | > Also replace ENDPROC by END, for consistency. | > | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | | applied to tip/x86/irq, thanks Alexander! | | > One more small change for today. The xen-related functions | > xen_do_hypervisor_callback and xen_failsafe_callback are put | > in the .kprobes.text even in the current kernel: ignore_sysret | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY / | > KPROBE_END, but I guess the situation is harmless. | | yeah. It narrows no-kprobes protection for that code, but it should | indeed be fine (and that's the intention as well). | | Note that this is a reoccuring bug type, and rather long-lived. Can | you think of any way to get automated nesting protection of both the | .cfi_startproc/endproc macros and kprobes start/end? A poor man's | solution would be to grep the number of start and end methods and | enforce that they are equal. | | Ingo | I think we could play with preprocessor and check if ENTRY/END matches. Looking now. - Cyrill - --
[Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300] | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100] | | | | * Alexander van Heukelum <heukelum@mailshack.com> wrote: | | | | > Impact: moves some code out of .kprobes.text | | > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END | | > uses .popsection to get back to the previous section (.text, normally). | | > Also replace ENDPROC by END, for consistency. | | > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | | | | applied to tip/x86/irq, thanks Alexander! | | | | > One more small change for today. The xen-related functions | | > xen_do_hypervisor_callback and xen_failsafe_callback are put | | > in the .kprobes.text even in the current kernel: ignore_sysret | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY / | | > KPROBE_END, but I guess the situation is harmless. | | | | yeah. It narrows no-kprobes protection for that code, but it should | | indeed be fine (and that's the intention as well). | | | | Note that this is a reoccuring bug type, and rather long-lived. Can | | you think of any way to get automated nesting protection of both the | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's | | solution would be to grep the number of start and end methods and | | enforce that they are equal. | | | | Ingo | | | | I think we could play with preprocessor and check if ENTRY/END matches. | Looking now. | | - Cyrill - Here is what I've done 1) Add some macros like: .macro __set_entry .set _ENTRY_IN, 1 .endm .macro __unset_entry .set _ENTRY_IN, 0 .endm .macro __check_entry .ifeq _ENTRY_IN .error "END should be used" .abort .endif .endm So the code ENTRY(mcount) __unset_entry retq __check_entry END(mcount) will fail like cyrill@lenovo linux-2.6.git $ make arch/x86/kernel/entry_64.o CHK include/linux/version.h CHK include/linux/utsrelease.h SYMLINK ...
looks good! Can we somehow detect a missing .cfi_endproc? That's another pattern i've seen. Ingo --
[Ingo Molnar - Sun, Nov 23, 2008 at 03:55:54PM +0100] | | * Cyrill Gorcunov <gorcunov@gmail.com> wrote: | | > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300] | > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100] | > | | | > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote: | > | | | > | | > Impact: moves some code out of .kprobes.text | > | | > | > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END | > | | > uses .popsection to get back to the previous section (.text, normally). | > | | > Also replace ENDPROC by END, for consistency. | > | | > | > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | > | | | > | | applied to tip/x86/irq, thanks Alexander! | > | | | > | | > One more small change for today. The xen-related functions | > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put | > | | > in the .kprobes.text even in the current kernel: ignore_sysret | > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY / | > | | > KPROBE_END, but I guess the situation is harmless. | > | | | > | | yeah. It narrows no-kprobes protection for that code, but it should | > | | indeed be fine (and that's the intention as well). | > | | | > | | Note that this is a reoccuring bug type, and rather long-lived. Can | > | | you think of any way to get automated nesting protection of both the | > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's | > | | solution would be to grep the number of start and end methods and | > | | enforce that they are equal. | > | | | > | | Ingo | > | | | > | | > | I think we could play with preprocessor and check if ENTRY/END matches. | > | Looking now. | > | | > | - Cyrill - | > | > Here is what I've done | > | > 1) Add some macros like: | > | > .macro __set_entry | > .set _ENTRY_IN, 1 | > .endm | > | > .macro __unset_entry | > .set _ENTRY_IN, 0 | > .endm | > | > .macro __check_entry | > ...
Looks like a good approach to me. But I assume the ENTRY cppmacro will include magic? Greetings, --
[Alexander van Heukelum - Sun, Nov 23, 2008 at 04:04:18PM +0100] | On Sun, Nov 23, 2008 at 05:12:37PM +0300, Cyrill Gorcunov wrote: | > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300] | > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100] | > | | | > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote: | > | | | > | | > Impact: moves some code out of .kprobes.text | > | | > | > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END | > | | > uses .popsection to get back to the previous section (.text, normally). | > | | > Also replace ENDPROC by END, for consistency. | > | | > | > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | > | | | > | | applied to tip/x86/irq, thanks Alexander! | > | | | > | | > One more small change for today. The xen-related functions | > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put | > | | > in the .kprobes.text even in the current kernel: ignore_sysret | > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY / | > | | > KPROBE_END, but I guess the situation is harmless. | > | | | > | | yeah. It narrows no-kprobes protection for that code, but it should | > | | indeed be fine (and that's the intention as well). | > | | | > | | Note that this is a reoccuring bug type, and rather long-lived. Can | > | | you think of any way to get automated nesting protection of both the | > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's | > | | solution would be to grep the number of start and end methods and | > | | enforce that they are equal. | > | | | > | | Ingo | > | | | > | | > | I think we could play with preprocessor and check if ENTRY/END matches. | > | Looking now. | > | | > | - Cyrill - | > | > Here is what I've done | > | > 1) Add some macros like: | > | > .macro __set_entry | > .set _ENTRY_IN, 1 | > .endm | > | > .macro __unset_entry | > .set _ENTRY_IN, 0 | > .endm | > | > .macro ...
i'd suggest to introduce another entry macro name for that, for the time being. If other architectures want to pick up the method, they can generalize and test it. ( but this is assembly magic so i'm doubtful - while the features used are generic GAS features that should work everywhere, binutils variants tend to be rather fragile. So lets go with some other name like X86_ENTRY()/X86_END() or so - or maybe ENTRY_CFI()/END_CFI(). ) Ingo --
[Ingo Molnar - Sun, Nov 23, 2008 at 04:31:40PM +0100] | | * Cyrill Gorcunov <gorcunov@gmail.com> wrote: | | > [Alexander van Heukelum - Sun, Nov 23, 2008 at 04:04:18PM +0100] | > | On Sun, Nov 23, 2008 at 05:12:37PM +0300, Cyrill Gorcunov wrote: | > | > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300] | > | > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100] | > | > | | | > | > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote: | > | > | | | > | > | | > Impact: moves some code out of .kprobes.text | > | > | | > | > | > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END | > | > | | > uses .popsection to get back to the previous section (.text, normally). | > | > | | > Also replace ENDPROC by END, for consistency. | > | > | | > | > | > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | > | > | | | > | > | | applied to tip/x86/irq, thanks Alexander! | > | > | | | > | > | | > One more small change for today. The xen-related functions | > | > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put | > | > | | > in the .kprobes.text even in the current kernel: ignore_sysret | > | > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY / | > | > | | > KPROBE_END, but I guess the situation is harmless. | > | > | | | > | > | | yeah. It narrows no-kprobes protection for that code, but it should | > | > | | indeed be fine (and that's the intention as well). | > | > | | | > | > | | Note that this is a reoccuring bug type, and rather long-lived. Can | > | > | | you think of any way to get automated nesting protection of both the | > | > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's | > | > | | solution would be to grep the number of start and end methods and | > | > | | enforce that they are equal. | > | > | | | > | > | | Ingo | > | > | | | > | > | | > | > | I think we could play with preprocessor and check if ENTRY/END matches. | > | > | Looking ...
[Alexander van Heukelum - Sun, Nov 23, 2008 at 04:04:18PM +0100] | On Sun, Nov 23, 2008 at 05:12:37PM +0300, Cyrill Gorcunov wrote: | > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300] | > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100] | > | | | > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote: | > | | | > | | > Impact: moves some code out of .kprobes.text | > | | > | > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END | > | | > uses .popsection to get back to the previous section (.text, normally). | > | | > Also replace ENDPROC by END, for consistency. | > | | > | > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | > | | | > | | applied to tip/x86/irq, thanks Alexander! | > | | | > | | > One more small change for today. The xen-related functions | > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put | > | | > in the .kprobes.text even in the current kernel: ignore_sysret | > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY / | > | | > KPROBE_END, but I guess the situation is harmless. | > | | | > | | yeah. It narrows no-kprobes protection for that code, but it should | > | | indeed be fine (and that's the intention as well). | > | | | > | | Note that this is a reoccuring bug type, and rather long-lived. Can | > | | you think of any way to get automated nesting protection of both the | > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's | > | | solution would be to grep the number of start and end methods and | > | | enforce that they are equal. | > | | | > | | Ingo | > | | | > | | > | I think we could play with preprocessor and check if ENTRY/END matches. | > | Looking now. | > | | > | - Cyrill - | > | > Here is what I've done | > | > 1) Add some macros like: | > | > .macro __set_entry | > .set _ENTRY_IN, 1 | > .endm | > | > .macro __unset_entry | > .set _ENTRY_IN, 0 | > .endm | > | > .macro ...
very nice! Looks like the way to go. Ingo --
I disagree to this and similar changes in this patch: Why do we need to get rid of the ENDPROC() here? It's a procedure that's being ended, and using ENDPROC() is the only (existing) way to mark something as a procedure in assembly code. And btw., while described so in the patch comment, this change has nothing to do with the subject of the patch. Jan --
Hallo Jan Beulich, You are right. ENDPROC(name) adds ".type name, @function;" as compared Right. I thought of END and ENDPROC as equivalent, so I added the change to this patch as a small cleanup only. But if we want this .type annotation, what about KPROBE_END? should it include one there too? I'm getting a feeling we would be better off removing KPROBE_ENTRY and KPROBE_END if favour of explicitly changing sections in the .S files? And using the ENDPROC annotation for all procedures? Greetings, --
Yes, it always bothered me that there's no KPROBE_ENDPROC() (or alternatively, as this being code is implied by the macro, it didn't do the It got explicitly added a while back, so there must have been a reason to *not* do the section adjustments explicitly. And given the current discussion I'd also assume that more hiding of code in macros is the preferred route. Jan --
entry_64.S is the only user of KPROBE_ENTRY / KPROBE_END on x86_64. This patch reorders entry_64.S and explicitly generates a separate section for functions that need the protection. The generated code before and after the patch is equal. Implicitly changing sections in assembly files makes it more difficult to follow why the assembler is doing certain things. For example, .p2align 5 KPROBE_ENTRY(...) was not doing what you would expect. Other section changes (__ex_table, .fixup, .init.rodata) are done explicitly already. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Jan Beulich <jbeulich@novell.com> --- arch/x86/kernel/entry_64.S | 444 ++++++++++++++++++++++---------------------- 1 files changed, 220 insertions(+), 224 deletions(-) Hallo Jan Beulich, Let's see if this catches on. On i386 there is also only one user of KPROBES_ENTRY / KPROBES_END: entry_32.S. If we remove those entries in a similar way, we can get rid of them tree-wide. Greetings, Alexander diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index f2d546e..38fcd05 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1002,7 +1002,7 @@ END(\sym) .endm .macro paranoidzeroentry sym do_sym -KPROBE_ENTRY(\sym) +ENTRY(\sym) INTR_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME pushq $-1 /* ORIG_RAX: no syscall to restart */ @@ -1015,11 +1015,11 @@ KPROBE_ENTRY(\sym) call \do_sym jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC -KPROBE_END(\sym) +END(\sym) .endm .macro paranoidzeroentry_ist sym do_sym ist -KPROBE_ENTRY(\sym) +ENTRY(\sym) INTR_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME pushq $-1 /* ORIG_RAX: no syscall to restart */ @@ -1035,15 +1035,11 @@ KPROBE_ENTRY(\sym) addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp) jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC -KPROBE_END(\sym) +END(\sym) .endm -.macro errorentry sym do_sym entry=0 -.if ...
I like things being done this way (including the goal of doing away with entry_64.S is the only user of KPROBE_ENTRY / KPROBE_END on x86_64. This patch reorders entry_64.S and explicitly generates a separate section for functions that need the protection. The generated code before and after the patch is equal. Implicitly changing sections in assembly files makes it more difficult to follow why the assembler is doing certain things. For example, .p2align 5 KPROBE_ENTRY(...) was not doing what you would expect. Other section changes (__ex_table, .fixup, .init.rodata) are done explicitly already. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Jan Beulich <jbeulich@novell.com> --- arch/x86/kernel/entry_64.S | 444 ++++++++++++++++++++++---------------------- 1 files changed, 220 insertions(+), 224 deletions(-) Hallo Jan Beulich, Let's see if this catches on. On i386 there is also only one user of KPROBES_ENTRY / KPROBES_END: entry_32.S. If we remove those entries in a similar way, we can get rid of them tree-wide. Greetings, Alexander diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index f2d546e..38fcd05 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1002,7 +1002,7 @@ END(\sym) .endm .macro paranoidzeroentry sym do_sym -KPROBE_ENTRY(\sym) +ENTRY(\sym) INTR_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME pushq $-1 /* ORIG_RAX: no syscall to restart */ @@ -1015,11 +1015,11 @@ KPROBE_ENTRY(\sym) call \do_sym jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC -KPROBE_END(\sym) +END(\sym) .endm .macro paranoidzeroentry_ist sym do_sym ist -KPROBE_ENTRY(\sym) +ENTRY(\sym) INTR_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME pushq $-1 /* ORIG_RAX: no syscall to restart */ @@ -1035,15 +1035,11 @@ KPROBE_ENTRY(\sym) addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp) jmp paranoid_exit /* %ebx: no swapgs flag */ ...
entry_32.S is now the only user of KPROBE_ENTRY / KPROBE_END, treewide. This patch reorders entry_64.S and explicitly generates a separate section for functions that need the protection. The generated code before and after the patch is equal. The KPROBE_ENTRY and KPROBE_END macro's are removed too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/x86/kernel/entry_32.S | 438 ++++++++++++++++++++++--------------------- include/linux/linkage.h | 8 - 2 files changed, 224 insertions(+), 222 deletions(-) Thanks for the comment. Here is the i386-side of the patch, including the removal of the KPROBE_ENTRY and KPROBE_END macro's. Greetings, Alexander diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index bd02ec7..6e96028 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -688,65 +688,6 @@ ENDPROC(name) /* The include is where all of the SMP etc. interrupts come from */ #include "entry_arch.h" -KPROBE_ENTRY(page_fault) - RING0_EC_FRAME - pushl $do_page_fault - CFI_ADJUST_CFA_OFFSET 4 - ALIGN -error_code: - /* the function address is in %fs's slot on the stack */ - pushl %es - CFI_ADJUST_CFA_OFFSET 4 - /*CFI_REL_OFFSET es, 0*/ - pushl %ds - CFI_ADJUST_CFA_OFFSET 4 - /*CFI_REL_OFFSET ds, 0*/ - pushl %eax - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET eax, 0 - pushl %ebp - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET ebp, 0 - pushl %edi - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET edi, 0 - pushl %esi - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET esi, 0 - pushl %edx - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET edx, 0 - pushl %ecx - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET ecx, 0 - pushl %ebx - CFI_ADJUST_CFA_OFFSET 4 - CFI_REL_OFFSET ebx, 0 - cld - pushl %fs - CFI_ADJUST_CFA_OFFSET 4 - /*CFI_REL_OFFSET fs, 0*/ - movl $(__KERNEL_PERCPU), %ecx - movl %ecx, %fs - UNWIND_ESPFIX_STACK - popl %ecx - CFI_ADJUST_CFA_OFFSET -4 - /*CFI_REGISTER es, ecx*/ - movl PT_FS(%esp), %edi # get the ...
[Alexander van Heukelum - Sun, Nov 23, 2008 at 10:08:28AM +0100] | Impact: cleanup of entry_64.S | | Except for the order and the place of the functions, this | patch should not change the generated code. | | Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | | --- | arch/x86/kernel/entry_64.S | 259 +++++++++++++++++++------------------------- | 1 files changed, 109 insertions(+), 150 deletions(-) | Hi Alexander, great! One moment is not obvious for me -- why we stopped to align interrupt section to 32 bytes? Did I miss anyhing? - Cyrill - --
I put a ".p2align 5" in earlier in the series which caused the apicinterrupts to be 32-byte aligned. But it is a hack, really, relying on the generated code per stub to be between 17 and 32 bytes, on the default alignment to be 16 bytes and all stubs to be in the .text section. I'm in favour of aligning all of the interrupt/exception stubs to 32 bytes, but it should be implemented the right way ;), which means that we need KPROBE_ENTRY_P5ALIGNED and so on :-/. Greetings, --
[Alexander van Heukelum - Sun, Nov 23, 2008 at 12:23:54PM +0100] | On Sun, Nov 23, 2008 at 12:21:36PM +0300, Cyrill Gorcunov wrote: | > [Alexander van Heukelum - Sun, Nov 23, 2008 at 10:08:28AM +0100] | > | Impact: cleanup of entry_64.S | > | | > | Except for the order and the place of the functions, this | > | patch should not change the generated code. | > | | > | Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> | > | | > | --- | > | arch/x86/kernel/entry_64.S | 259 +++++++++++++++++++------------------------- | > | 1 files changed, 109 insertions(+), 150 deletions(-) | > | | > | > Hi Alexander, | > | > great! One moment is not obvious for me -- why we | > stopped to align interrupt section to 32 bytes? | > Did I miss anyhing? | | I put a ".p2align 5" in earlier in the series which caused the | apicinterrupts to be 32-byte aligned. But it is a hack, really, | relying on the generated code per stub to be between 17 and 32 | bytes, on the default alignment to be 16 bytes and all stubs | to be in the .text section. | | I'm in favour of aligning all of the interrupt/exception stubs | to 32 bytes, but it should be implemented the right way ;), | which means that we need KPROBE_ENTRY_P5ALIGNED and so on :-/. | | Greetings, | Alexander | | > - Cyrill - | ah. thanks for info. - Cyrill - --
I'm sorry, I really don't follow that logic at all. Why the heck would KPROBE_ENTRY_P5ALIGNED be better than .p5align? For the record, I think we already have way to much macro crappage. It makes the code painful to read and hard to figure out what the real constraints on it is. -hpa --
Hello hpa, KPROBE_ENTRY_P5ALIGNED It isn't better, that's the point. It's needed, because we have this problem: .p2align 5 KPROBE_ENTRY(something) somecode KPROBE_END(something) This does not align "something", because something is not in the I agree with that to some point. But even without the macro's, the code is hard to read. As an alternative to more macro's, what do you think of Cyrills suggestion of putting CFI-annotations next to the instuctions, like: pushq %rax ; CFI_ADJUST_CFA_OFFSET 8 movq %rax, RAX+8(%rsp) ; CFI_REL_OFFSET rax, RAX+8 instead? That still leaves the problem that the instruction and the annotation might not match, but it leaves the code more readable for someone who is used to reading x86 assembly. Greetings, --
I think that is braindamaged. If KPROBE_ENTRY() changes the section behind the programmers back, that's completely and totally stupid in an utterly reckless sort of way. The author of a macro like that should be taken out and shot, because it makes the programmer think the code does something completely different than what the code actually does. I think using the macros for opcodes make sense, as the CFI operation is tied to the operation. That is a good way to use macros. -hpa --
yeah. But such things, if they pile up long enough, can result in real heh, that indeed explains :) Ingo --
applied to tip/x86/cleanups, thanks Alexander. This file still has a horrible code structure but now at least our eyes do not burn out when looking at it ;-) Ingo --
Lots of thanks! Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - I mean, what is it about a decent email service? --
