Re: [PATCH] x86: clean up after: move entry_64.S register savingout of the macros

Previous thread: Suspend regression in 2.6.25.18 by Rafael J. Wysocki on Sunday, November 16, 2008 - 6:04 am. (1 message)

Next thread: Turn CONFIG_STRICT_DEVMEM in sysctl dev.mem.restricted by Bernhard Walle on Sunday, November 16, 2008 - 7:47 am. (21 messages)
From: Alexander van Heukelum
Date: Sunday, November 16, 2008 - 7:29 am

All blame goes to: color white,red "[^[:graph:]]+$"
in .nanorc ;).

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
 arch/x86/kernel/entry_64.S |  200 ++++++++++++++++++++++----------------------
 1 files changed, 100 insertions(+), 100 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 983d85a..a3a7fb2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -11,15 +11,15 @@
  *
  * NOTE: This code handles signal-recognition, which happens every time
  * after an interrupt and after each system call.
- * 
- * Normal syscalls and interrupts don't save a full stack frame, this is 
+ *
+ * Normal syscalls and interrupts don't save a full stack frame, this is
  * only done for syscall tracing, signals or fork/exec et.al.
- * 
- * A note on terminology:	 
- * - top of stack: Architecture defined interrupt frame from SS to RIP 
- * at the top of the kernel process stack.	
+ *
+ * A note on terminology:
+ * - top of stack: Architecture defined interrupt frame from SS to RIP
+ * at the top of the kernel process stack.
  * - partial stack frame: partially saved registers upto R11.
- * - full stack frame: Like partial stack frame, but all register saved. 
+ * - full stack frame: Like partial stack frame, but all register saved.
  *
  * Some macro usage:
  * - CFI macros are used to generate dwarf2 unwind information for better
@@ -147,7 +147,7 @@ END(mcount)
 
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
-#endif	
+#endif
 
 #ifdef CONFIG_PARAVIRT
 ENTRY(native_usergs_sysret64)
@@ -166,14 +166,14 @@ ENTRY(native_usergs_sysret64)
 .endm
 
 /*
- * C code is not supposed to know about undefined top of stack. Every time 
- * a C function with an pt_regs argument is called from the SYSCALL based 
+ * C code is not supposed to know about undefined top of stack. Every time
+ * a C function with an pt_regs argument is called from the SYSCALL based
  * fast path FIXUP_TOP_OF_STACK ...
From: Alexander van Heukelum
Date: Sunday, November 16, 2008 - 7:29 am

From: Alexander van Heukelum <heukelum@sleipnir.lusi.uni-sb.de>

The macro "interrupt" in entry_64.S generates a lot of code. This
patch moves most of its contents into an external function. It
saves anywhere between 500 and 2500 bytes of text depending on the
configuration.

The function returns with an altered stackpointer.

Dwarf2-annotations are most probably wrong or missing.

There is a comment in the original code about saving rbp twice, but
I don't understand what the code tries to do. First of all, the
second copy of rbp is written below the stack. Second, if the current
stack is already the irqstack, this second copy is overwritten. Third,
as far as I can tell, ebp should not be saved in pt_regs at all at
this stage as it is 'preserved' due to the C calling conventions. So
I left this second copy out and everything seems to work fine. If it
wouldn't, all exceptions and NMIs would be dangerous anyhow, as far
as I can see.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Glauber Costa <gcosta@redhat.com>

---
 arch/x86/kernel/entry_64.S |  123 ++++++++++++++++++++++++++------------------
 1 files changed, 73 insertions(+), 50 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a3a7fb2..eb57c23 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -247,6 +247,78 @@ ENTRY(native_usergs_sysret64)
 	CFI_REL_OFFSET	rsp,RSP
 	/*CFI_REL_OFFSET	ss,SS*/
 	.endm
+
+/*
+ * initial frame state for interrupts and exceptions
+ */
+	.macro _frame ref
+	CFI_STARTPROC simple
+	CFI_SIGNAL_FRAME
+	CFI_DEF_CFA rsp,SS+8-\ref
+	/*CFI_REL_OFFSET ss,SS-\ref*/
+	CFI_REL_OFFSET rsp,RSP-\ref
+	/*CFI_REL_OFFSET rflags,EFLAGS-\ref*/
+	/*CFI_REL_OFFSET cs,CS-\ref*/
+	CFI_REL_OFFSET rip,RIP-\ref
+	.endm
+
+/* initial frame state for interrupts (and exceptions without error code) */
+#define INTR_FRAME _frame RIP
+/* initial frame state for exceptions with error code (and interrupts with
+   vector ...
From: Glauber Costa
Date: Monday, November 17, 2008 - 5:14 am

This is not for the stack, but to keep the registers as the pt_regs struct
expects them. We'll be casting this structure later, and not doing that
--

From: Alexander van Heukelum
Date: Monday, November 17, 2008 - 8:13 am

On Mon, 17 Nov 2008 10:14:44 -0200, "Glauber Costa" <glommer@redhat.com>

Hi Glauber,

Thanks for your reply, but I'm afraid I still don't see why this
second copy is needed.

A CONFIG_FRAME_POINTER configuration seems to be working fine. Of
course regs->bp will give a bogus value, but so will r12, r13,
r14, r15, and rbx. However, if you have CONFIG_FRAME_POINTER=y,
then rbp can be trusted to be used as a framepointer... so one
should get it directly from the registers, not from regs->bp.

Could you give me a configuration that fails, and describe what
goes wrong? Probably the right fix is not to use regs->bp.

Greetings,
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - mmm... Fastmail...

--

From: Andi Kleen
Date: Monday, November 17, 2008 - 5:53 am

The only duplication is in the apicinterrupt entry points which
have expanded recently (when I wrote all that there weren't as many)

I think it would be cleaner to just have a common apic_interrupt
entry point similar to how the exceptions work that try to factor
out "interrupt" like this. As more and more of them get added
(I have another new one in recent) patches that will likely
save more space.

The only ugly part is that passing the handler to the common
stub requires the manual pt_regs setup that the exception
handler currently does. Because that could be factored
out in a new macro. Or just copied (I have heard complaints

To be honest I didn't understand this one either when it was added. In 
standard frame pointer format rbp has to be at the place pointed to by the real 
rbp register with the return address directly on top of it. But pushing
%rbp below the pt_regs doesn't put it into this format, because
the return address is at the wrong place.

-Andi
--

From: Alexander van Heukelum
Date: Monday, November 17, 2008 - 8:37 am

On Mon, 17 Nov 2008 13:53:17 +0100, "Andi Kleen" <andi@firstfloor.org>

Hi Andi,

I liked this way of calling an external function, because it
circumvents exactly most of this ugly setup. For two bytes
extra per stub, one can make this setup function a completely
normal one (without relocating the return address on the stack).
I intended the stubs to fit in one cache line (32 bytes) as it
does not make much sense to make them smaller than that. A
further advantage is that no indirect call is needed, but maybe
this is not as slow anymore as it used to be? B.t.w., I intended
to change the exception handler stubs in a similar way to get

The second copy of %rbp is indeed placed at the 'correct' position
inside the pt_regs struct. However, at this point only a partial
stack frame is saved: the C argument registers and the scratch
registers. r12-r15,ebp,and rbx will be saved only later if necessary.
A problem could arise if some code uses pt_regs.bp of this partial
stack frame, because it will contain a bogus value. Glauber's patch
makes pt_regs.bp contain the right value most of the time... Which
means that if the patch fixed something for him, the problem has only
been made unlikely to happen. The place where things should be fixed
are the places where pt_regs.bp is used, but not filled in.

Greetings,
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Access your email from home and the web

--

From: Andi Kleen
Date: Monday, November 17, 2008 - 11:23 am

It depends on the CPU. But always it requires resources which

Ok. Hopefully it's worth the effort. The branch misprediction bubble
should not be too bad, perhaps it'll make up for the other cycles
you're adding. But even if it doesn't decreasing cache line foot print is 


Hmm I first thought it was rather so that backtracing over the exception
stubs with FPs works better. But then it didn't even set up
a correct frame for that so it might have been something else.

The original design was that you should only use these extended
registers in special calls that go through the PTREGS stubs.

But then sysprof/oprofile application profiling was added which
wants to do user space backtracing using FPs (which seemed pointless to me
because most/all of 64bit user space and recently also pretty
much all 32bit user space is compiled without FPs by default)

The only way to make this work though is to always save RBP.
But there is no need to do the strange double saving, you
can just store it always directly.

But I personally have doubts its worth making every interrupt
/syscall slower since it won't work with most apps anyways.
The only sure way to do these backtraces is to do dwarf2 unwinding
in user space too which doesn't need hacks like this.

-Andi
--

From: Cyrill Gorcunov
Date: Monday, November 17, 2008 - 12:22 pm

[Andi Kleen - Mon, Nov 17, 2008 at 07:23:40PM +0100]
...
| 
| Ok. Hopefully it's worth the effort. The branch misprediction bubble
| should not be too bad, perhaps it'll make up for the other cycles
| you're adding. But even if it doesn't decreasing cache line foot print is 
| always a good thing.
...

May I turn in? :)

Original .macro interrupt func has

	testl $3, CS(%rdi)
	je 1f
	SWAPGS
1:	incl %gs:pda_irqcount
	jne 2f
	pop %rax
	mov %gs:pda_irqstackptr,%rsp
	push %rax
2:	TRACE_IRQS_OFF

Wouldn't we help the branch predictor a bit by

	testl $3, CS(%rdi)
	je 1f
	SWAPGS
	jne 2f
1:	pop %rax
	mov %gs:pda_irqstackptr,%rsp
	push %rax	
2:	incl %gs:pda_irqcount
	TRACE_IRQS_OFF

I hope I didn't miss anything but maybe it's just a churning
or plain wrong. Don't shoot me :)

		- Cyrill -
--

From: Cyrill Gorcunov
Date: Monday, November 17, 2008 - 12:29 pm

[Cyrill Gorcunov - Mon, Nov 17, 2008 at 10:22:16PM +0300]
| [Andi Kleen - Mon, Nov 17, 2008 at 07:23:40PM +0100]
| ...
| | 
| | Ok. Hopefully it's worth the effort. The branch misprediction bubble
| | should not be too bad, perhaps it'll make up for the other cycles
| | you're adding. But even if it doesn't decreasing cache line foot print is 
| | always a good thing.
| ...
| 
| May I turn in? :)
|

I meant - turn into thread not "turn in" (ouch).
 
		- Cyrill -
--

From: Alexander van Heukelum
Date: Monday, November 17, 2008 - 12:49 pm

On Mon, 17 Nov 2008 22:29:41 +0300, "Cyrill Gorcunov"

-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Access all of your messages and folders
                          wherever you are

--

From: Cyrill Gorcunov
Date: Monday, November 17, 2008 - 12:54 pm

[Alexander van Heukelum - Mon, Nov 17, 2008 at 08:49:35PM +0100]
| On Mon, 17 Nov 2008 22:29:41 +0300, "Cyrill Gorcunov"
| <gorcunov@gmail.com> said:
| > [Cyrill Gorcunov - Mon, Nov 17, 2008 at 10:22:16PM +0300]
| > | [Andi Kleen - Mon, Nov 17, 2008 at 07:23:40PM +0100]
| > | ...
| > | | 
| > | | Ok. Hopefully it's worth the effort. The branch misprediction bubble
| > | | should not be too bad, perhaps it'll make up for the other cycles
| > | | you're adding. But even if it doesn't decreasing cache line foot
| > print is 
| > | | always a good thing.
| > | ...
| > | 
| > | May I turn in? :)
| > |
| > 
| > I meant - turn into thread not "turn in" (ouch).
| 
| Even more likely is that you meant: "May I jump in" ;)
| 
| > 		- Cyrill -
| -- 
|   Alexander van Heukelum
|   heukelum@fastmail.fm
| 
| -- 
| http://www.fastmail.fm - Access all of your messages and folders
|                           wherever you are
|

It doesn't matter since I was killed out :) But
I have some argument to justify myself -- je and jne
don't depend on CF :-)

		- Cyrill -
--

From: Alexander van Heukelum
Date: Monday, November 17, 2008 - 12:43 pm

On Mon, 17 Nov 2008 22:22:16 +0300, "Cyrill Gorcunov"




I'm sure it helps the branch predictor, though ;).

-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - A fast, anti-spam email service.

--

From: Cyrill Gorcunov
Date: Monday, November 17, 2008 - 12:49 pm

[Alexander van Heukelum - Mon, Nov 17, 2008 at 08:43:26PM +0100]
| http://www.fastmail.fm - A fast, anti-spam email service.
| 
...
(me: gone out in deep shame)

		- Cyrill -
--

From: Alexander van Heukelum
Date: Monday, November 17, 2008 - 10:52 am

The macro "interrupt" in entry_64.S generates a lot of code and it
is used more and more often. This patch moves most of its contents
into an external function. This saves anywhere between 500 and 2500
bytes of text depending on the configuration.

Dwarf2-annotations are most probably wrong or missing at all.

v2 moves adjusting the stack to the caller. This avoids the ugly
shuffle to handle the position of the return address on the stack.

After this patch, a typical handler looks like this:

<thermal_interrupt>:
68 05 ff ff ff  pushq  $0xffffffffffffff05
48 83 ec 50     sub    $0x50,%rsp
e8 72 f4 ff ff  callq  ffffffff80211260 <save_args>
e8 ec 08 00 00  callq  ffffffff802126df <smp_thermal_inter_interrupt>
e9 16 fd ff ff  jmpq   ffffffff80211b0e <ret_from_intr>
0f 1f 84 00 00  nopl   0x0(%rax,%rax,1)
00 00 00 

I think this approach (v2) is much cleaner than using the same
strategy as for the exception handlers, where the address of
the C-handler is passed to a common entry point which makes
an indirect call to the handler.

<coprocessor_error>:
ff 15 f2 71 1c  callq  *0x1c71f2(%rip)     # <pv_irq_ops+0x38>
00
6a 00           pushq  $0x0
50              push   %rax
48 8d 05 d9 11  lea    0x11d9(%rip),%rax   # <do_coprocessor_error>
00 00 
e9 4b 99 0f 00  jmpq   ffffffff8030bbb0 <error_entry>
66 66 2e 0f 1f  nopw   %cs:0x0(%rax,%rax,1)
84 00 00 00 00 00 

The advantage of _this_ way of doing things is that the stubs can
probably be made to fit in 16 bytes, but it comes at the cost of
doing an unnecessary indirect call.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>

 arch/x86/kernel/entry_64.S |  135 ++++++++++++++++++++++++++------------------
 1 files changed, 81 insertions(+), 54 deletions(-)

---

Hi all,

I just want to give this one more shot ;). Comments?

This patch is on top of ...
From: Jan Beulich
Date: Tuesday, November 18, 2008 - 1:09 am

Without numbers, I'd doubt that the code size reduction outweighs the

Indeed - do you have intentions to address this?

Jan

--

From: Alexander van Heukelum
Date: Tuesday, November 18, 2008 - 4:16 am

I have not measured yet. I'll see if I can highjack a machine. What

Yes, I'ld like to get it right. What do you use to check the
annotations?

Greetings,
--

From: Jan Beulich
Date: Tuesday, November 18, 2008 - 5:51 am

No tool, if you mean that. Extensive changes I verify by looking at the
dump, problems are usually found only when back traces don't come out
right.

Jan

--

From: Ingo Molnar
Date: Tuesday, November 18, 2008 - 7:03 am

that's a fundamental weakness of all the CFI annotations.

It is outright wrong to waste humans on this mechanic task: as it is 
abundantly clear to GAS where we change a stack pointer and by how 
much - it could emit magic annotations automatically just as much.

So if you care about it, please fix this in the tools space. The 
entry_64.S impact of finegrained annotations is just too ugly for 
things like this.

One limited exception is for basic stack frames where we do syscalls 
or call into other C code. (i.e. the patch proposed here would have to 
do that limited annotation)

But the per instruction annotations currently in that code are madness 
and must either be cleaned up significantly via the use of GAS macros 
(so that all stack pointer manipulations go via a single macro 
invocation), or be completely auto-generated by GAS.

	Ingo
--

From: Jan Beulich
Date: Tuesday, November 18, 2008 - 7:52 am

Making gas auto-generate this is not really possible (much like ia64
requires the annotations to be inserted manually), mainly because gas
can't know whether e.g. a push of a register is in order to preserve its
value, or for some other purpose.

I do have a set of macros for this in nlkd, maybe (as you're asking for it)
I should get them out of there (and convert them to AT&T syntax).

Jan

--

From: Ingo Molnar
Date: Tuesday, November 18, 2008 - 8:00 am

but that's the exception. Most of the annotations could be 

i'd definitely like to have a look ...

if you can make this clean enough, most of the resistence to CFI 
annotations will go away.

The requirements is extreme cleanliness: single line in the source 
that gets us _both_ the instruction and the annotation. Also always 
insert the proper frame pointer as well, when we call into C. Make it 
as hard as possible to mess up the annotations - we could even run a 
build-time grep on the .S files that matter to see whether there's any 
(common) "naked" stack-manipulating instructions that shouldnt be 
used.

	Ingo
--

From: Roland McGrath
Date: Tuesday, November 18, 2008 - 3:53 pm

Not really.  An implicit .cfi_undefined can be auto-generated for an
unannotated instruction with an output register.  An implicit .cfi_register
can be auto-generated for an unannotated register-to-register move.  An
implicit .cfi_same_value can be auto-generated when you can tell a register
is being written with the register or stack slot tha the current CFI state
says holds the caller's value of that register.  Beyond that, it gets into
either assumptions or hairy analysis of how stack slots are being used and
so forth.  

I don't think auto-generation is very a useful angle to take for this any
time soon.  Explicit (but simple) macros in the assembly is what I favor.


Thanks,
Roland
--

From: Andi Kleen
Date: Tuesday, November 18, 2008 - 4:35 pm

Do you mean macros that generate both the instruction and the CFI
or separate? The major disadvantage of doing it together in a 
single macro is that it is not really readable for any assembler 
programmer anymore, but starts becoming a Linux specific assembler 
language. Likely not a good thing for maintenance. anyone who
wants to know the real assembler would need to read objdump -S
output, which is not nice.

Perhaps it would be a reasonable readability improvement to just use shorter 
cfi macros which are not shouted?

-Andi
--

From: Jeremy Fitzhardinge
Date: Tuesday, November 18, 2008 - 4:36 pm

Not really.  At the moment we have two parallel assembly languages which 
say different things about the same instructions.  In practice, almost 
nobody understands the cfi parts, so they just get ignored while the x86 
instructions change around them, leaving them either stale or missing.

If we had a sensible macro layer which emits both instructions and cfi 
annotations, it at least means that people who write plain x86 
instructions will simply get no annotations, and people who bother to 
learn the (clearly and fully documented) macros will get the best of both.

    J
--

From: H. Peter Anvin
Date: Tuesday, November 18, 2008 - 4:44 pm

I think that it would be nice to have macros for the most commonly 
annotatable instructions, e.g. push, and stack pointer movement.  Just 
compactifying the code should improve readability, if perhaps not 
writability.

	-hpa
--

From: Jeremy Fitzhardinge
Date: Tuesday, November 18, 2008 - 5:08 pm

Yes.  Something that obviously relates to both the instruction and the 
semantic intent of the annotation: add_sp, sub_sp, save_reg, etc.  And 
at least that will eliminate the differently-signed(!) constant for 
stack movement.


--

From: Roland McGrath
Date: Tuesday, November 18, 2008 - 4:45 pm

Yes, I mean a single macro that produces both the instruction and the CFI
pseudo-op to go with it.  This is the essential characteristic that makes
it an improvement for maintaining the code.  The main problem we have now
is that it's easy to write/modify plain assembly instructions and forget to
add or update the CFI to match.  A well-considered set of macros can solve
this without making it any harder for the average assembly programmer to
understand what each line of the source means intuitively.


Thanks,
Roland
--

From: Andi Kleen
Date: Tuesday, November 18, 2008 - 5:06 pm

Hmm, but if the assembler cannot auto generate it how should the assembler
writer know if he should use the macro or the direct instruction without
understanding CFI?

Also what will the assembler reader do? Do they first have to understand
CFI to understand everything? I personally would probably just
resort to objdump -S in this situation.

I think you're saying that for the user the macros would be just
equivalent, but if that's true they could be just auto generated 
by the assembler. But it's obviously not, so you'll end up
with the Linux magic asm dialect (and its maintenance disadvantages)
and you'll still require CFI knowledge to understand/write everything 
anyways.

-Andi
--

From: H. Peter Anvin
Date: Tuesday, November 18, 2008 - 5:01 pm

We already have a "Linux magic asm dialect" which require CFI knowledge. 
  Nothing can change that other than dumping the requirement that we 
have valid CFI data.  However, the current code is hard to read and easy 
to trip up on.  We can at least make it easier, especially to read -- 
and making it easier to read will help writers, too.

	-hpa
--

From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 3:34 am

i dont buy that argument at all.

Yes, of course full "no changes to the current code at all" automation 
is hard.

But _at minimum_ GAS should automate a large part of this and help us 
out syntactically: make it really easy to human-annotate instructions 
in a _minimal way_. Also, automate the easy bits while at it. Plus 
warn about missing annotations - nesting errors, etc. etc.


Yeah. This is the second-best option - but has some limitations. Still 
it is much better than what we have now.

What _clearly_ sucks is the current mess of:

        CFI_ADJUST_CFA_OFFSET   8
        /*CFI_REL_OFFSET        ss,0*/
        pushq %rax /* rsp */
        CFI_ADJUST_CFA_OFFSET   8
        CFI_REL_OFFSET  rsp,0
        pushq $(1<<9) /* eflags - interrupts on */
        CFI_ADJUST_CFA_OFFSET   8
        /*CFI_REL_OFFSET        rflags,0*/
        pushq $__KERNEL_CS /* cs */
        CFI_ADJUST_CFA_OFFSET   8
        /*CFI_REL_OFFSET        cs,0*/
        pushq \child_rip /* rip */
        CFI_ADJUST_CFA_OFFSET   8
        CFI_REL_OFFSET  rip,0
        pushq   %rax /* orig rax */
        CFI_ADJUST_CFA_OFFSET   8

Compared to what we could have (stupid mockup):

        pushq_cf1 %rax			/* rsp */
        pushq_cf1 $(1<<9)		/* eflags - interrupts on */
        pushq_cf1 $__KERNEL_CS		/* cs */
        pushq_cf2 \child_rip		/* rip */
        pushq_cf1 %rax			/* orig rax */

Whoever claims that this cannot be automated in _large_ part isnt 
thinking it through really. Those CFI annotations should never have 
been added in this form.

	Ingo
--

From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 1:09 pm

Something like this would be a lot cleaner equivalent replacement:

         PUSHQ %rax			/* rsp 			  */
         PUSHQ $(1<<9)			/* eflags - interrupts on */
         PUSHQ $__KERNEL_CS		/* cs 			  */
         PUSHQ \child_rip		/* rip			  */
          cfi_map rip, 0
         PUSHQ %rax			/* orig rax		  */

as most of the really annoying CFI annotations in entry_64.S that 
obscruct code reading are just plain CFA offset modifications related 
to stack shuffling.

[ Sidenote: trying to connect up RIP like that in the FAKE_STACK_FRAME
  is pretty wrong to begin with - the annotation is incomplete up to
  this point. ]

The problems are not caused by the prologue or epilogue annotations, 
nor by any of the trickier stack shuffling annotations we do around 
syscall/sysret and around exception frames. A lot of the frame formats 
we use are special, controlled by hw details and we do have to map 
those details to the debuginfo - it's an inevitably manual piece of 
work.

It's the plain crappy:

        pushq %rdi
        CFI_ADJUST_CFA_OFFSET 8
        call schedule
        popq  %rdi
        CFI_ADJUST_CFA_OFFSET -8

annotation spam that hurts readability the most. The "+8" and "-8" 
CFA-offset lines are completely uninformative and they obsctruct the 
reading of this already very trick type of source code (assembly 
language).

It should be something like this:

        PUSHQ %rdi
        call schedule
        POPQ %rdi

instead.

	Ingo
--

From: Alexander van Heukelum
Date: Tuesday, November 18, 2008 - 5:18 pm

Hi all,

Here is a combined patch that moves "save_args" out-of-line for
the interrupt macro and moves "error_entry" mostly out-of-line
for the zeroentry and errorentry macros.

The save_args function becomes really straightforward and easy
to understand, with the possible exception of the stack switch
code, which now needs to copy the return address of to the
calling function. Normal interrupts arrive with ((~vector)-0x80)
on the stack, which gets adjusted in common_interrupt:

<common_interrupt>:
(5)  addq   $0xffffffffffffff80,(%rsp)		/* -> ~(vector) */
(4)  sub    $0x50,%rsp				/* space for registers */
(5)  callq  ffffffff80211290 <save_args>
(5)  callq  ffffffff80214290 <do_IRQ>
<ret_from_intr>:
     ...

An apic interrupt stub now look like this:

<thermal_interrupt>:
(5)  pushq  $0xffffffffffffff05			/* ~(vector) */
(4)  sub    $0x50,%rsp				/* space for registers */
(5)  callq  ffffffff80211290 <save_args>
(5)  callq  ffffffff80212b8f <smp_thermal_interrupt>
(5)  jmpq   ffffffff80211f93 <ret_from_intr>

Similarly the exception handler register saving function becomes
simpler, without the need of any parameter shuffling. The stub
for an exception without errorcode looks like this:

<overflow>:
(6)  callq  *0x1cad12(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
(2)  pushq  $0xffffffffffffffff			/* no syscall */
(4)  sub    $0x78,%rsp				/* space for registers */
(5)  callq  ffffffff8030e3b0 <error_entry>
(3)  mov    %rsp,%rdi				/* pt_regs pointer */
(2)  xor    %esi,%esi				/* no error code */
(5)  callq  ffffffff80213446 <do_overflow>
(5)  jmpq   ffffffff8030e460 <error_exit>

And one for an exception with errorcode like this:

<segment_not_present>:
(6)  callq  *0x1cab92(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
(4)  sub    $0x78,%rsp				/* space for registers */
(5)  callq  ffffffff8030e3b0 <error_entry>
(3)  mov    %rsp,%rdi				/* pt_regs pointer */
(5)  mov    0x78(%rsp),%rsi			/* load error code */
(9)  movq   ...
From: H. Peter Anvin
Date: Wednesday, November 19, 2008 - 10:54 am

Sorry, I'm away on a trip at the moment, so sorry for the delayed feedback.

First of all, if we're going to go through common code here, we should 
do the vector number adjustment in save_args and be able to use the 
short form of pushq in the common case.

What isn't clear to me is if we should just push a target field to the 
stack and then do an indirect call.  That way we can do save_args and 
ret_from_intr inline, but at the expense of an indirect call which will 
not necessarily speculate cleanly.

All of this qualify as "tweaking", which means we need to be very 
careful so we don't end up burning more performance than we gain, but 
all in all, I think this is a good idea.

	-hpa
--

From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 1:16 pm

okay - will queue it up in tip/x86/irq, lets see how it goes.

	Ingo
--

From: Alexander van Heukelum
Date: Thursday, November 20, 2008 - 6:40 am

This add-on patch to x86: move entry_64.S register saving out
of the macros visually cleans up the appearance of the code by
introducing some basic helper macro's. It also adds some cfi
annotations which were missing.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
 arch/x86/kernel/entry_64.S |  220 ++++++++++++++++++++++----------------------
 1 files changed, 112 insertions(+), 108 deletions(-)

Hello Ingo,

This patch improves the CFI-situation in entry_64.S, but restricted
mostly to the areas touched by "x86: move entry_64.S register saving
out of the macros". I'm sure there will be some small errors somewhere,
but it compiles and runs fine.

Greetings,
    Alexander

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5a12432..4a3a3f4 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -60,6 +60,23 @@
 #define __AUDIT_ARCH_LE	   0x40000000
 
 	.code64
+/*
+ * Some macro's to hide the most frequently occuring CFI annotations.
+ */
+	.macro cfi_pushq reg
+	pushq \reg
+	CFI_ADJUST_CFA_OFFSET 8
+	.endm
+
+	.macro cfi_popq reg
+	popq \reg
+	CFI_ADJUST_CFA_OFFSET -8
+	.endm
+
+	.macro cfi_store reg offset=0
+	movq %\reg, \offset(%rsp)
+	CFI_REL_OFFSET \reg, \offset
+	.endm
 
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -213,84 +230,84 @@ ENTRY(native_usergs_sysret64)
 	CFI_ADJUST_CFA_OFFSET	-(6*8)
 	.endm
 
-	.macro	CFI_DEFAULT_STACK start=1
+/*
+ * initial frame state for interrupts (and exceptions without error code)
+ */
+	.macro EMPTY_FRAME start=1 offset=0
 	.if \start
-	CFI_STARTPROC	simple
+	CFI_STARTPROC simple
 	CFI_SIGNAL_FRAME
-	CFI_DEF_CFA	rsp,SS+8
+	CFI_DEF_CFA rsp,8+\offset
 	.else
-	CFI_DEF_CFA_OFFSET SS+8
+	CFI_DEF_CFA_OFFSET 8+\offset
 ...
From: Andi Kleen
Date: Thursday, November 20, 2008 - 7:01 am

So it's not standard assembler anymore. Welcome objdump -S !

-Andi
--

From: Ingo Molnar
Date: Thursday, November 20, 2008 - 8:04 am

very nice cleanup! This is exactly what should be done. Applied to 
tip/x86/irq.

Note, i did a small rename:

    cfi_pushq  =>  pushq_cfi
    cfi_popq   =>  popq_cfi
    cfi_store  =>  movq_cfi

as the goal is to have the actual source code read mostly as regular 
assembly code. The fact that the macro is equivalent to a 
default-annotated pushq/popq/movq instruction is much more important 
than the fact that it also does CFI annotations.

Also, while cfi_store is correct as well, the usual x86 assembly term 
(and instruction used here) is movq.

Find below how the commit ended up looking like.

Thanks,

	Ingo

----------------->
From 250f351f57022f125c19178af1b0335503f1a02d Mon Sep 17 00:00:00 2001
From: Alexander van Heukelum <heukelum@mailshack.com>
Date: Thu, 20 Nov 2008 14:40:11 +0100
Subject: [PATCH] x86: clean up after: move entry_64.S register saving out of the macros

This add-on patch to x86: move entry_64.S register saving out
of the macros visually cleans up the appearance of the code by
introducing some basic helper macro's. It also adds some cfi
annotations which were missing.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/entry_64.S |  220 ++++++++++++++++++++++----------------------
 1 files changed, 112 insertions(+), 108 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5a12432..6df9d31 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -60,6 +60,23 @@
 #define __AUDIT_ARCH_LE	   0x40000000
 
 	.code64
+/*
+ * Some macro's to hide the most frequently occuring CFI annotations.
+ */
+	.macro pushq_cfi reg
+	pushq \reg
+	CFI_ADJUST_CFA_OFFSET 8
+	.endm
+
+	.macro popq_cfi reg
+	popq \reg
+	CFI_ADJUST_CFA_OFFSET -8
+	.endm
+
+	.macro movq_cfi reg offset=0
+	movq %\reg, \offset(%rsp)
+	CFI_REL_OFFSET \reg, \offset
+	.endm
 
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef ...
From: Alexander van Heukelum
Date: Thursday, November 20, 2008 - 8:26 am

Now I have a little problem with my next patch... I wanted to
introduce cfi_load. Guess what assembly instruction that maps
to ;).

Greetings,
-- 
Alexander van Heukelum
--

From: Ingo Molnar
Date: Thursday, November 20, 2008 - 8:39 am

heh ;-)

the restore direction could be named movq_cfi_restore, and have the same 
order of arguments as the regular movq that it replaces. I.e.:

        movq 8(%rsp),%r11
        CFI_RESTORE r11

would map to:

	movq_cfi_restore 8, r11

or so.

cfi_store has really a bad name: it's confusing whether it's the CFI 
info we are storing/registering (which we are), or a 'store' instruction 
(which this is too).

If then we should name it movq_cfi_store or movq_cfi_register - but 
that's too long.

movq_cfi for the frame construction direction and movq_cfi_restore for 
the frame deconstruction phase sounds like a good naming compromise, hm?

	Ingo
--

From: Jan Beulich
Date: Thursday, November 20, 2008 - 8:50 am

How about movq_spill and movq_fill (losely following some ia64 terminology)?

Jan

--

From: Alexander van Heukelum
Date: Thursday, November 20, 2008 - 8:57 am

Does not work... But if you are attached to the underscores, I
think we can force it to work by using CPP to convert it to
something the assembler does parse right:

#define pushq_cfi pushq.cfi

etc?

Or is that just too ugly?


-- 
Alexander van Heukelum
--

From: Cyrill Gorcunov
Date: Thursday, November 20, 2008 - 9:07 am

[Alexander van Heukelum - Thu, Nov 20, 2008 at 04:57:44PM +0100]
| On Thu, Nov 20, 2008 at 04:39:54PM +0100, Ingo Molnar wrote:
| > 
| > * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| > 
| > > On Thu, Nov 20, 2008 at 04:04:12PM +0100, Ingo Molnar wrote:
| > > > 
| > > > * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| > > > 
| > > > > This add-on patch to x86: move entry_64.S register saving out of the 
| > > > > macros visually cleans up the appearance of the code by introducing 
| > > > > some basic helper macro's. It also adds some cfi annotations which 
| > > > > were missing.
| > > > > 
| > > > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| > > > > ---
| > > > >  arch/x86/kernel/entry_64.S |  220 ++++++++++++++++++++++----------------------
| > > > >  1 files changed, 112 insertions(+), 108 deletions(-)
| > > > > 
| > > > > Hello Ingo,
| > > > > 
| > > > > This patch improves the CFI-situation in entry_64.S, but restricted 
| > > > > mostly to the areas touched by "x86: move entry_64.S register saving 
| > > > > out of the macros". I'm sure there will be some small errors 
| > > > > somewhere, but it compiles and runs fine.
| > > > 
| > > > very nice cleanup! This is exactly what should be done. Applied to 
| > > > tip/x86/irq.
| > > > 
| > > > Note, i did a small rename:
| > > > 
| > > >     cfi_pushq  =>  pushq_cfi
| > > >     cfi_popq   =>  popq_cfi
| > > >     cfi_store  =>  movq_cfi
| 
| Does not work... But if you are attached to the underscores, I
| think we can force it to work by using CPP to convert it to
| something the assembler does parse right:
| 
| #define pushq_cfi pushq.cfi
| 
| etc?
| 
| Or is that just too ugly?
| 
| Alexander
| 
| > > > as the goal is to have the actual source code read mostly as regular 
| > > > assembly code. The fact that the macro is equivalent to a 
| > > > default-annotated pushq/popq/movq instruction is much more important 
| > > > than the fact that it also does CFI ...
From: Alexander van Heukelum
Date: Thursday, November 20, 2008 - 9:29 am

Hi Ingo,

False alarm. It's happily building at the moment. Does it still
fail for you?

Greetings,
--

From: Ingo Molnar
Date: Thursday, November 20, 2008 - 10:24 am

changed it to pushq.cfi / popq.cfi / movq.cfi - does that read OK to 
you? It looks fine here.

	Ingo
--

From: Alexander van Heukelum
Date: Friday, November 21, 2008 - 8:41 am

The save_rest function completes a partial stack frame for use
by the PTREGSCALL macro. This also avoid the indirect call in
PTREGSCALLs.

This adds the macro movq_cfi_restore to hide the CFI_RESTORE
annotation when restoring a register from the stack frame.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
 arch/x86/kernel/entry_64.S |   91 +++++++++++++++++++++++++------------------
 1 files changed, 53 insertions(+), 38 deletions(-)

Hello Ingo,

Here are three more patches cleaning up entry_64.S. They apply to
current tip/x86/irq. Thanks again for your help on the binutils
problems!

Greetings,
    Alexander

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 92c5e18..ef95c45 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -78,6 +78,11 @@
 	CFI_REL_OFFSET \reg, \offset
 	.endm
 
+	.macro movq_cfi_restore offset reg
+	movq \offset(%rsp), %\reg
+	CFI_RESTORE \reg
+	.endm
+
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CONFIG_DYNAMIC_FTRACE
 ENTRY(mcount)
@@ -186,21 +191,21 @@ ENTRY(native_usergs_sysret64)
  */
 
 	/* %rsp:at FRAMEEND */
-	.macro FIXUP_TOP_OF_STACK tmp
-	movq	%gs:pda_oldrsp,\tmp
-	movq  	\tmp,RSP(%rsp)
-	movq    $__USER_DS,SS(%rsp)
-	movq    $__USER_CS,CS(%rsp)
-	movq 	$-1,RCX(%rsp)
-	movq	R11(%rsp),\tmp  /* get eflags */
-	movq	\tmp,EFLAGS(%rsp)
+	.macro FIXUP_TOP_OF_STACK tmp offset=0
+	movq %gs:pda_oldrsp,\tmp
+	movq \tmp,RSP+\offset(%rsp)
+	movq $__USER_DS,SS+\offset(%rsp)
+	movq $__USER_CS,CS+\offset(%rsp)
+	movq $-1,RCX+\offset(%rsp)
+	movq R11+\offset(%rsp),\tmp  /* get eflags */
+	movq \tmp,EFLAGS+\offset(%rsp)
 	.endm
 
-	.macro RESTORE_TOP_OF_STACK tmp,offset=0
-	movq   RSP-\offset(%rsp),\tmp
-	movq   \tmp,%gs:pda_oldrsp
-	movq   EFLAGS-\offset(%rsp),\tmp
-	movq   \tmp,R11-\offset(%rsp)
+	.macro RESTORE_TOP_OF_STACK tmp offset=0
+	movq RSP+\offset(%rsp),\tmp
+	movq \tmp,%gs:pda_oldrsp
+	movq EFLAGS+\offset(%rsp),\tmp
+	movq \tmp,R11+\offset(%rsp)
 ...
From: Alexander van Heukelum
Date: Friday, November 21, 2008 - 8:43 am

Also expand the paranoid_exit0 macro into nmi_exit inside the
nmi stub in the case of enabled irq-tracing.

This gives a few hundred bytes code size reduction.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
 arch/x86/kernel/entry_64.S |  151 +++++++++++++++++++++++++++++--------------
 1 files changed, 102 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index ef95c45..fad777b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -353,6 +353,36 @@ ENTRY(save_rest)
 	CFI_ENDPROC
 END(save_rest)
 
+/* save complete stack frame */
+ENTRY(save_paranoid)
+	XCPT_FRAME 1 RDI+8
+	cld
+	movq_cfi rdi, RDI+8
+	movq_cfi rsi, RSI+8
+	movq_cfi rdx, RDX+8
+	movq_cfi rcx, RCX+8
+	movq_cfi rax, RAX+8
+	movq_cfi r8, R8+8
+	movq_cfi r9, R9+8
+	movq_cfi r10, R10+8
+	movq_cfi r11, R11+8
+	movq_cfi rbx, RBX+8
+	movq_cfi rbp, RBP+8
+	movq_cfi r12, R12+8
+	movq_cfi r13, R13+8
+	movq_cfi r14, R14+8
+	movq_cfi r15, R15+8
+	movl $1,%ebx
+	movl $MSR_GS_BASE,%ecx
+	rdmsr
+	testl %edx,%edx
+	js 1f	/* negative -> in kernel */
+	SWAPGS
+	xorl %ebx,%ebx
+1:	ret
+	CFI_ENDPROC
+END(save_paranoid)
+
 /*
  * A newly forked process directly context switches into this.
  */
@@ -1012,24 +1042,15 @@ END(spurious_interrupt)
 	.endm
 
 	/* error code is on the stack already */
-	/* handle NMI like exceptions that can happen everywhere */
-	.macro paranoidentry sym, ist=0, irqtrace=1
-	SAVE_ALL
-	cld
-	movl $1,%ebx
-	movl  $MSR_GS_BASE,%ecx
-	rdmsr
-	testl %edx,%edx
-	js    1f
-	SWAPGS
-	xorl  %ebx,%ebx
-1:
+	.macro paranoidentry sym ist=0
+	subq $15*8, %rsp
+	CFI_ADJUST_CFA_OFFSET 15*8
+	call save_paranoid
+	DEFAULT_FRAME 0
 	.if \ist
 	movq	%gs:pda_data_offset, %rbp
 	.endif
-	.if \irqtrace
 	TRACE_IRQS_OFF
-	.endif
 	movq %rsp,%rdi
 	movq ORIG_RAX(%rsp),%rsi
 	movq $-1,ORIG_RAX(%rsp)
@@ -1041,9 +1062,7 @@ END(spurious_interrupt)
 	addq	$EXCEPTION_STKSZ, per_cpu__init_tss + ...
From: Alexander van Heukelum
Date: Friday, November 21, 2008 - 8:44 am

DISABLE_INTERRUPTS(CLBR_NONE)/TRACE_IRQS_OFF is now always
executed just before paranoid_exit. Move it there.

Split out paranoidzeroentry, paranoiderrorentry, and
paranoidzeroentry_ist to get more readable macro's.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
 arch/x86/kernel/entry_64.S |  102 ++++++++++++++++++++++----------------------
 1 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index fad777b..692c1da 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1026,6 +1026,39 @@ END(spurious_interrupt)
 	CFI_ENDPROC
 	.endm
 
+	.macro paranoidzeroentry sym
+	INTR_FRAME
+	PARAVIRT_ADJUST_EXCEPTION_FRAME
+	pushq $-1		/* ORIG_RAX: no syscall to restart */
+	CFI_ADJUST_CFA_OFFSET 8
+	subq $15*8, %rsp
+	call save_paranoid
+	TRACE_IRQS_OFF
+	movq %rsp,%rdi		/* pt_regs pointer */
+	xorl %esi,%esi		/* no error code */
+	call \sym
+	jmp paranoid_exit	/* %ebx: no swapgs flag */
+	CFI_ENDPROC
+	.endm
+
+	.macro paranoidzeroentry_ist sym ist
+ 	INTR_FRAME
+	PARAVIRT_ADJUST_EXCEPTION_FRAME
+	pushq $-1		/* ORIG_RAX: no syscall to restart */
+	CFI_ADJUST_CFA_OFFSET 8
+	subq $15*8, %rsp
+	call save_paranoid
+	TRACE_IRQS_OFF
+	movq %rsp,%rdi		/* pt_regs pointer */
+	xorl %esi,%esi		/* no error code */
+	movq %gs:pda_data_offset, %rbp
+	subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
+	call \sym
+	addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
+	jmp paranoid_exit	/* %ebx: no swapgs flag */
+	CFI_ENDPROC
+	.endm
+
 	.macro errorentry sym
 	XCPT_FRAME
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
@@ -1042,27 +1075,20 @@ END(spurious_interrupt)
 	.endm
 
 	/* error code is on the stack already */
-	.macro paranoidentry sym ist=0
-	subq $15*8, %rsp
+	.macro paranoiderrorentry sym
+	XCPT_FRAME
+	PARAVIRT_ADJUST_EXCEPTION_FRAME
+	subq $15*8,%rsp
 	CFI_ADJUST_CFA_OFFSET 15*8
 	call save_paranoid
 	DEFAULT_FRAME ...
From: Ingo Molnar
Date: Friday, November 21, 2008 - 9:06 am

cool!

Could you please also collapse it into just a single macro? The 
ENTRY/END sequence can be generated by the macro. (if you do it then 
please do it in a followup patch, i'll look at and apply your current 
series)

Something like:

  PARANOID_ERROR_ENTRY(stack_segment)

Could do all the magic, right?

	Ingo
--

From: Alexander van Heukelum
Date: Sunday, November 23, 2008 - 2:08 am

Impact: cleanup of entry_64.S

Except for the order and the place of the functions, this
patch should not change the generated code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---
 arch/x86/kernel/entry_64.S |  259 +++++++++++++++++++-------------------------
 1 files changed, 109 insertions(+), 150 deletions(-)


I chose to just reuse the existing names, but it's bikeshedding, so
change it if you like ;)

I now know why I did not hit the bug that was fixed by "x86: split
out some macro's and move common code to paranoid_exit, fix"... 
*blush* I was doing my real-world testing using an i386-image of
debian.

Greetings,


diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e5ddf57..ec0ed9c 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -945,76 +945,70 @@ END(common_interrupt)
 /*
  * APIC interrupts.
  */
-	.p2align 5
-
-	.macro apicinterrupt num,func
+.macro apicinterrupt num sym do_sym
+ENTRY(\sym)
 	INTR_FRAME
 	pushq $~(\num)
 	CFI_ADJUST_CFA_OFFSET 8
-	interrupt \func
+	interrupt \do_sym
 	jmp ret_from_intr
 	CFI_ENDPROC
-	.endm
+END(\sym)
+.endm
 
-ENTRY(thermal_interrupt)
-	apicinterrupt THERMAL_APIC_VECTOR,smp_thermal_interrupt
-END(thermal_interrupt)
+#ifdef CONFIG_SMP
+apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \
+	irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt
+#endif
 
-ENTRY(threshold_interrupt)
-	apicinterrupt THRESHOLD_APIC_VECTOR,mce_threshold_interrupt
-END(threshold_interrupt)
+apicinterrupt 220 \
+	uv_bau_message_intr1 uv_bau_message_interrupt
+apicinterrupt LOCAL_TIMER_VECTOR \
+	apic_timer_interrupt smp_apic_timer_interrupt
 
 #ifdef CONFIG_SMP
-ENTRY(reschedule_interrupt)
-	apicinterrupt RESCHEDULE_VECTOR,smp_reschedule_interrupt
-END(reschedule_interrupt)
-
-	.macro INVALIDATE_ENTRY num
-ENTRY(invalidate_interrupt\num)
-	apicinterrupt ...
From: Alexander van Heukelum
Date: Sunday, November 23, 2008 - 2:15 am

Impact: moves some code out of .kprobes.text

KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
uses .popsection to get back to the previous section (.text, normally).
Also replace ENDPROC by END, for consistency.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---
 arch/x86/kernel/entry_64.S |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

Hi Ingo,

One more small change for today. The xen-related functions 
xen_do_hypervisor_callback and xen_failsafe_callback are put
in the .kprobes.text even in the current kernel: ignore_sysret
is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
KPROBE_END, but I guess the situation is harmless.

Greetings,
	Alexander

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index ec0ed9c..477a428 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1190,7 +1190,7 @@ paranoid_schedule:
 	TRACE_IRQS_OFF
 	jmp paranoid_userspace
 	CFI_ENDPROC
-END(paranoid_exit)
+KPROBE_END(paranoid_exit)
 
 /*
  * Exception entry point. This expects an error code/orig_rax on the stack.
@@ -1282,7 +1282,7 @@ gs_change:
 	CFI_ADJUST_CFA_OFFSET -8
         ret
 	CFI_ENDPROC
-ENDPROC(native_load_gs_index)
+END(native_load_gs_index)
 
         .section __ex_table,"a"
         .align 8
@@ -1336,7 +1336,7 @@ ENTRY(kernel_thread)
 	UNFAKE_STACK_FRAME
 	ret
 	CFI_ENDPROC
-ENDPROC(kernel_thread)
+END(kernel_thread)
 
 child_rip:
 	pushq $0		# fake return address
@@ -1352,7 +1352,7 @@ child_rip:
 	mov %eax, %edi
 	call do_exit
 	CFI_ENDPROC
-ENDPROC(child_rip)
+END(child_rip)
 
 /*
  * execve(). This function needs to use IRET, not SYSRET, to set up all state properly.
@@ -1383,9 +1383,7 @@ ENTRY(kernel_execve)
 	UNFAKE_STACK_FRAME
 	ret
 	CFI_ENDPROC
-ENDPROC(kernel_execve)
-
-
+END(kernel_execve)
 
 	/* runs on exception stack */
 KPROBE_ENTRY(nmi)
@@ -1460,14 +1458,14 @@ ENTRY(call_softirq)
 	decl %gs:pda_irqcount
 ...
From: Ingo Molnar
Date: Sunday, November 23, 2008 - 6:27 am

yeah. It narrows no-kprobes protection for that code, but it should 
indeed be fine (and that's the intention as well).

Note that this is a reoccuring bug type, and rather long-lived. Can 
you think of any way to get automated nesting protection of both the 
.cfi_startproc/endproc macros and kprobes start/end? A poor man's 
solution would be to grep the number of start and end methods and 
enforce that they are equal.

	Ingo
--

From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 6:51 am

[Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100]
| 
| * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| 
| > Impact: moves some code out of .kprobes.text
| > 
| > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
| > uses .popsection to get back to the previous section (.text, normally).
| > Also replace ENDPROC by END, for consistency.
| > 
| > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| 
| applied to tip/x86/irq, thanks Alexander!
| 
| > One more small change for today. The xen-related functions 
| > xen_do_hypervisor_callback and xen_failsafe_callback are put
| > in the .kprobes.text even in the current kernel: ignore_sysret
| > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
| > KPROBE_END, but I guess the situation is harmless.
| 
| yeah. It narrows no-kprobes protection for that code, but it should 
| indeed be fine (and that's the intention as well).
| 
| Note that this is a reoccuring bug type, and rather long-lived. Can 
| you think of any way to get automated nesting protection of both the 
| .cfi_startproc/endproc macros and kprobes start/end? A poor man's 
| solution would be to grep the number of start and end methods and 
| enforce that they are equal.
| 
| 	Ingo
| 

I think we could play with preprocessor and check if ENTRY/END matches.
Looking now.

		- Cyrill -
--

From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 7:12 am

[Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300]
| [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100]
| | 
| | * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| | 
| | > Impact: moves some code out of .kprobes.text
| | > 
| | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
| | > uses .popsection to get back to the previous section (.text, normally).
| | > Also replace ENDPROC by END, for consistency.
| | > 
| | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| | 
| | applied to tip/x86/irq, thanks Alexander!
| | 
| | > One more small change for today. The xen-related functions 
| | > xen_do_hypervisor_callback and xen_failsafe_callback are put
| | > in the .kprobes.text even in the current kernel: ignore_sysret
| | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
| | > KPROBE_END, but I guess the situation is harmless.
| | 
| | yeah. It narrows no-kprobes protection for that code, but it should 
| | indeed be fine (and that's the intention as well).
| | 
| | Note that this is a reoccuring bug type, and rather long-lived. Can 
| | you think of any way to get automated nesting protection of both the 
| | .cfi_startproc/endproc macros and kprobes start/end? A poor man's 
| | solution would be to grep the number of start and end methods and 
| | enforce that they are equal.
| | 
| | 	Ingo
| | 
| 
| I think we could play with preprocessor and check if ENTRY/END matches.
| Looking now.
| 
| 		- Cyrill -

Here is what I've done

1) Add some macros like:

	.macro __set_entry
	.set _ENTRY_IN, 1
	.endm

	.macro __unset_entry
	.set _ENTRY_IN, 0
	.endm

	.macro __check_entry
	.ifeq _ENTRY_IN
	.error "END should be used"
	.abort
	.endif
	.endm

So the code

ENTRY(mcount)
	__unset_entry
	retq
	__check_entry
END(mcount)

will fail like

cyrill@lenovo linux-2.6.git $ make arch/x86/kernel/entry_64.o
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  SYMLINK ...
From: Ingo Molnar
Date: Sunday, November 23, 2008 - 7:55 am

looks good!

Can we somehow detect a missing .cfi_endproc? That's another pattern 
i've seen.

	Ingo
--

From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 8:04 am

[Ingo Molnar - Sun, Nov 23, 2008 at 03:55:54PM +0100]
| 
| * Cyrill Gorcunov <gorcunov@gmail.com> wrote:
| 
| > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300]
| > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100]
| > | | 
| > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| > | | 
| > | | > Impact: moves some code out of .kprobes.text
| > | | > 
| > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
| > | | > uses .popsection to get back to the previous section (.text, normally).
| > | | > Also replace ENDPROC by END, for consistency.
| > | | > 
| > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| > | | 
| > | | applied to tip/x86/irq, thanks Alexander!
| > | | 
| > | | > One more small change for today. The xen-related functions 
| > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put
| > | | > in the .kprobes.text even in the current kernel: ignore_sysret
| > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
| > | | > KPROBE_END, but I guess the situation is harmless.
| > | | 
| > | | yeah. It narrows no-kprobes protection for that code, but it should 
| > | | indeed be fine (and that's the intention as well).
| > | | 
| > | | Note that this is a reoccuring bug type, and rather long-lived. Can 
| > | | you think of any way to get automated nesting protection of both the 
| > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's 
| > | | solution would be to grep the number of start and end methods and 
| > | | enforce that they are equal.
| > | | 
| > | | 	Ingo
| > | | 
| > | 
| > | I think we could play with preprocessor and check if ENTRY/END matches.
| > | Looking now.
| > | 
| > | 		- Cyrill -
| > 
| > Here is what I've done
| > 
| > 1) Add some macros like:
| > 
| > 	.macro __set_entry
| > 	.set _ENTRY_IN, 1
| > 	.endm
| > 
| > 	.macro __unset_entry
| > 	.set _ENTRY_IN, 0
| > 	.endm
| > 
| > 	.macro __check_entry
| > ...
From: Alexander van Heukelum
Date: Sunday, November 23, 2008 - 8:04 am

Looks like a good approach to me. But I assume the ENTRY cppmacro
will include magic?

Greetings,
--

From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 8:12 am

[Alexander van Heukelum - Sun, Nov 23, 2008 at 04:04:18PM +0100]
| On Sun, Nov 23, 2008 at 05:12:37PM +0300, Cyrill Gorcunov wrote:
| > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300]
| > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100]
| > | | 
| > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| > | | 
| > | | > Impact: moves some code out of .kprobes.text
| > | | > 
| > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
| > | | > uses .popsection to get back to the previous section (.text, normally).
| > | | > Also replace ENDPROC by END, for consistency.
| > | | > 
| > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| > | | 
| > | | applied to tip/x86/irq, thanks Alexander!
| > | | 
| > | | > One more small change for today. The xen-related functions 
| > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put
| > | | > in the .kprobes.text even in the current kernel: ignore_sysret
| > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
| > | | > KPROBE_END, but I guess the situation is harmless.
| > | | 
| > | | yeah. It narrows no-kprobes protection for that code, but it should 
| > | | indeed be fine (and that's the intention as well).
| > | | 
| > | | Note that this is a reoccuring bug type, and rather long-lived. Can 
| > | | you think of any way to get automated nesting protection of both the 
| > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's 
| > | | solution would be to grep the number of start and end methods and 
| > | | enforce that they are equal.
| > | | 
| > | | 	Ingo
| > | | 
| > | 
| > | I think we could play with preprocessor and check if ENTRY/END matches.
| > | Looking now.
| > | 
| > | 		- Cyrill -
| > 
| > Here is what I've done
| > 
| > 1) Add some macros like:
| > 
| > 	.macro __set_entry
| > 	.set _ENTRY_IN, 1
| > 	.endm
| > 
| > 	.macro __unset_entry
| > 	.set _ENTRY_IN, 0
| > 	.endm
| > 
| > 	.macro ...
From: Ingo Molnar
Date: Sunday, November 23, 2008 - 8:31 am

i'd suggest to introduce another entry macro name for that, for the 
time being. If other architectures want to pick up the method, they 
can generalize and test it.

( but this is assembly magic so i'm doubtful - while the features used 
  are generic GAS features that should work everywhere, binutils 
  variants tend to be rather fragile. So lets go with some other name 
  like X86_ENTRY()/X86_END() or so - or maybe ENTRY_CFI()/END_CFI(). )

	Ingo
--

From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 8:41 am

[Ingo Molnar - Sun, Nov 23, 2008 at 04:31:40PM +0100]
| 
| * Cyrill Gorcunov <gorcunov@gmail.com> wrote:
| 
| > [Alexander van Heukelum - Sun, Nov 23, 2008 at 04:04:18PM +0100]
| > | On Sun, Nov 23, 2008 at 05:12:37PM +0300, Cyrill Gorcunov wrote:
| > | > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300]
| > | > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100]
| > | > | | 
| > | > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| > | > | | 
| > | > | | > Impact: moves some code out of .kprobes.text
| > | > | | > 
| > | > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
| > | > | | > uses .popsection to get back to the previous section (.text, normally).
| > | > | | > Also replace ENDPROC by END, for consistency.
| > | > | | > 
| > | > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| > | > | | 
| > | > | | applied to tip/x86/irq, thanks Alexander!
| > | > | | 
| > | > | | > One more small change for today. The xen-related functions 
| > | > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put
| > | > | | > in the .kprobes.text even in the current kernel: ignore_sysret
| > | > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
| > | > | | > KPROBE_END, but I guess the situation is harmless.
| > | > | | 
| > | > | | yeah. It narrows no-kprobes protection for that code, but it should 
| > | > | | indeed be fine (and that's the intention as well).
| > | > | | 
| > | > | | Note that this is a reoccuring bug type, and rather long-lived. Can 
| > | > | | you think of any way to get automated nesting protection of both the 
| > | > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's 
| > | > | | solution would be to grep the number of start and end methods and 
| > | > | | enforce that they are equal.
| > | > | | 
| > | > | | 	Ingo
| > | > | | 
| > | > | 
| > | > | I think we could play with preprocessor and check if ENTRY/END matches.
| > | > | Looking ...
From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 8:37 am

[Alexander van Heukelum - Sun, Nov 23, 2008 at 04:04:18PM +0100]
| On Sun, Nov 23, 2008 at 05:12:37PM +0300, Cyrill Gorcunov wrote:
| > [Cyrill Gorcunov - Sun, Nov 23, 2008 at 04:51:34PM +0300]
| > | [Ingo Molnar - Sun, Nov 23, 2008 at 02:27:52PM +0100]
| > | | 
| > | | * Alexander van Heukelum <heukelum@mailshack.com> wrote:
| > | | 
| > | | > Impact: moves some code out of .kprobes.text
| > | | > 
| > | | > KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
| > | | > uses .popsection to get back to the previous section (.text, normally).
| > | | > Also replace ENDPROC by END, for consistency.
| > | | > 
| > | | > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| > | | 
| > | | applied to tip/x86/irq, thanks Alexander!
| > | | 
| > | | > One more small change for today. The xen-related functions 
| > | | > xen_do_hypervisor_callback and xen_failsafe_callback are put
| > | | > in the .kprobes.text even in the current kernel: ignore_sysret
| > | | > is enclosed in KPROBE_ENTRY / ENDPROC, instead of KPROBE_ENTRY /
| > | | > KPROBE_END, but I guess the situation is harmless.
| > | | 
| > | | yeah. It narrows no-kprobes protection for that code, but it should 
| > | | indeed be fine (and that's the intention as well).
| > | | 
| > | | Note that this is a reoccuring bug type, and rather long-lived. Can 
| > | | you think of any way to get automated nesting protection of both the 
| > | | .cfi_startproc/endproc macros and kprobes start/end? A poor man's 
| > | | solution would be to grep the number of start and end methods and 
| > | | enforce that they are equal.
| > | | 
| > | | 	Ingo
| > | | 
| > | 
| > | I think we could play with preprocessor and check if ENTRY/END matches.
| > | Looking now.
| > | 
| > | 		- Cyrill -
| > 
| > Here is what I've done
| > 
| > 1) Add some macros like:
| > 
| > 	.macro __set_entry
| > 	.set _ENTRY_IN, 1
| > 	.endm
| > 
| > 	.macro __unset_entry
| > 	.set _ENTRY_IN, 0
| > 	.endm
| > 
| > 	.macro ...
From: Ingo Molnar
Date: Sunday, November 23, 2008 - 9:29 am

very nice! Looks like the way to go.

	Ingo
--

From: Jan Beulich
Date: Monday, November 24, 2008 - 2:17 am

I disagree to this and similar changes in this patch: Why do we need to
get rid of the ENDPROC() here? It's a procedure that's being ended, and
using ENDPROC() is the only (existing) way to mark something as a
procedure in assembly code.

And btw., while described so in the patch comment, this change has nothing
to do with the subject of the patch.

Jan

--

From: Alexander van Heukelum
Date: Monday, November 24, 2008 - 3:26 am

Hallo Jan Beulich,

You are right. ENDPROC(name) adds ".type name, @function;" as compared

Right. I thought of END and ENDPROC as equivalent, so I added the change
to this patch as a small cleanup only. But if we want this .type
annotation, what about KPROBE_END? should it include one there too?

I'm getting a feeling we would be better off removing KPROBE_ENTRY and
KPROBE_END if favour of explicitly changing sections in the .S files?
And using the ENDPROC annotation for all procedures?

Greetings,
--

From: Jan Beulich
Date: Monday, November 24, 2008 - 3:35 am

Yes, it always bothered me that there's no KPROBE_ENDPROC() (or
alternatively, as this being code is implied by the macro, it didn't do the

It got explicitly added a while back, so there must have been a reason to
*not* do the section adjustments explicitly. And given the current discussion
I'd also assume that more hiding of code in macros is the preferred route.

Jan

--

From: Alexander van Heukelum
Date: Monday, November 24, 2008 - 5:24 am

entry_64.S is the only user of KPROBE_ENTRY / KPROBE_END on
x86_64. This patch reorders entry_64.S and explicitly generates
a separate section for functions that need the protection. The
generated code before and after the patch is equal.

Implicitly changing sections in assembly files makes it more
difficult to follow why the assembler is doing certain things.
For example,

.p2align 5
KPROBE_ENTRY(...)

was not doing what you would expect. Other section changes
(__ex_table, .fixup, .init.rodata) are done explicitly already.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Jan Beulich <jbeulich@novell.com>

---
 arch/x86/kernel/entry_64.S |  444 ++++++++++++++++++++++----------------------
 1 files changed, 220 insertions(+), 224 deletions(-)


Hallo Jan Beulich,

Let's see if this catches on. On i386 there is also only one user
of KPROBES_ENTRY / KPROBES_END: entry_32.S. If we remove those
entries in a similar way, we can get rid of them tree-wide.

Greetings,
	Alexander


diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index f2d546e..38fcd05 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1002,7 +1002,7 @@ END(\sym)
 .endm
 
 .macro paranoidzeroentry sym do_sym
-KPROBE_ENTRY(\sym)
+ENTRY(\sym)
 	INTR_FRAME
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	pushq $-1		/* ORIG_RAX: no syscall to restart */
@@ -1015,11 +1015,11 @@ KPROBE_ENTRY(\sym)
 	call \do_sym
 	jmp paranoid_exit	/* %ebx: no swapgs flag */
 	CFI_ENDPROC
-KPROBE_END(\sym)
+END(\sym)
 .endm
 
 .macro paranoidzeroentry_ist sym do_sym ist
-KPROBE_ENTRY(\sym)
+ENTRY(\sym)
  	INTR_FRAME
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	pushq $-1		/* ORIG_RAX: no syscall to restart */
@@ -1035,15 +1035,11 @@ KPROBE_ENTRY(\sym)
 	addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
 	jmp paranoid_exit	/* %ebx: no swapgs flag */
 	CFI_ENDPROC
-KPROBE_END(\sym)
+END(\sym)
 .endm
 
-.macro errorentry sym do_sym entry=0
-.if ...
From: Jan Beulich
Date: Monday, November 24, 2008 - 6:33 am

I like things being done this way (including the goal of doing away with
entry_64.S is the only user of KPROBE_ENTRY / KPROBE_END on
x86_64. This patch reorders entry_64.S and explicitly generates
a separate section for functions that need the protection. The
generated code before and after the patch is equal.

Implicitly changing sections in assembly files makes it more
difficult to follow why the assembler is doing certain things.
For example,

.p2align 5
KPROBE_ENTRY(...)

was not doing what you would expect. Other section changes
(__ex_table, .fixup, .init.rodata) are done explicitly already.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Jan Beulich <jbeulich@novell.com>

---
 arch/x86/kernel/entry_64.S |  444 ++++++++++++++++++++++----------------------
 1 files changed, 220 insertions(+), 224 deletions(-)


Hallo Jan Beulich,

Let's see if this catches on. On i386 there is also only one user
of KPROBES_ENTRY / KPROBES_END: entry_32.S. If we remove those
entries in a similar way, we can get rid of them tree-wide.

Greetings,
	Alexander


diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index f2d546e..38fcd05 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1002,7 +1002,7 @@ END(\sym)
 .endm
 
 .macro paranoidzeroentry sym do_sym
-KPROBE_ENTRY(\sym)
+ENTRY(\sym)
 	INTR_FRAME
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	pushq $-1		/* ORIG_RAX: no syscall to restart */
@@ -1015,11 +1015,11 @@ KPROBE_ENTRY(\sym)
 	call \do_sym
 	jmp paranoid_exit	/* %ebx: no swapgs flag */
 	CFI_ENDPROC
-KPROBE_END(\sym)
+END(\sym)
 .endm
 
 .macro paranoidzeroentry_ist sym do_sym ist
-KPROBE_ENTRY(\sym)
+ENTRY(\sym)
  	INTR_FRAME
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	pushq $-1		/* ORIG_RAX: no syscall to restart */
@@ -1035,15 +1035,11 @@ KPROBE_ENTRY(\sym)
 	addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
 	jmp paranoid_exit	/* %ebx: no swapgs flag */
 ...
From: Alexander van Heukelum
Date: Monday, November 24, 2008 - 7:38 am

entry_32.S is now the only user of KPROBE_ENTRY / KPROBE_END,
treewide. This patch reorders entry_64.S and explicitly generates
a separate section for functions that need the protection. The
generated code before and after the patch is equal.

The KPROBE_ENTRY and KPROBE_END macro's are removed too.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
 arch/x86/kernel/entry_32.S |  438 ++++++++++++++++++++++---------------------
 include/linux/linkage.h    |    8 -
 2 files changed, 224 insertions(+), 222 deletions(-)


Thanks for the comment. Here is the i386-side of the patch, including
the removal of the KPROBE_ENTRY and KPROBE_END macro's.

Greetings,
	Alexander

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index bd02ec7..6e96028 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -688,65 +688,6 @@ ENDPROC(name)
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
 
-KPROBE_ENTRY(page_fault)
-	RING0_EC_FRAME
-	pushl $do_page_fault
-	CFI_ADJUST_CFA_OFFSET 4
-	ALIGN
-error_code:
-	/* the function address is in %fs's slot on the stack */
-	pushl %es
-	CFI_ADJUST_CFA_OFFSET 4
-	/*CFI_REL_OFFSET es, 0*/
-	pushl %ds
-	CFI_ADJUST_CFA_OFFSET 4
-	/*CFI_REL_OFFSET ds, 0*/
-	pushl %eax
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET eax, 0
-	pushl %ebp
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET ebp, 0
-	pushl %edi
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET edi, 0
-	pushl %esi
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET esi, 0
-	pushl %edx
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET edx, 0
-	pushl %ecx
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET ecx, 0
-	pushl %ebx
-	CFI_ADJUST_CFA_OFFSET 4
-	CFI_REL_OFFSET ebx, 0
-	cld
-	pushl %fs
-	CFI_ADJUST_CFA_OFFSET 4
-	/*CFI_REL_OFFSET fs, 0*/
-	movl $(__KERNEL_PERCPU), %ecx
-	movl %ecx, %fs
-	UNWIND_ESPFIX_STACK
-	popl %ecx
-	CFI_ADJUST_CFA_OFFSET -4
-	/*CFI_REGISTER es, ecx*/
-	movl PT_FS(%esp), %edi		# get the ...
From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 2:21 am

[Alexander van Heukelum - Sun, Nov 23, 2008 at 10:08:28AM +0100]
| Impact: cleanup of entry_64.S
| 
| Except for the order and the place of the functions, this
| patch should not change the generated code.
| 
| Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| 
| ---
|  arch/x86/kernel/entry_64.S |  259 +++++++++++++++++++-------------------------
|  1 files changed, 109 insertions(+), 150 deletions(-)
|

Hi Alexander,

great! One moment is not obvious for me -- why we
stopped to align interrupt section to 32 bytes?
Did I miss anyhing?
 
		- Cyrill -
--

From: Alexander van Heukelum
Date: Sunday, November 23, 2008 - 4:23 am

I put a ".p2align 5" in earlier in the series which caused the
apicinterrupts to be 32-byte aligned. But it is a hack, really,
relying on the generated code per stub to be between 17 and 32
bytes, on the default alignment to be 16 bytes and all stubs
to be in the .text section.

I'm in favour of aligning all of the interrupt/exception stubs
to 32 bytes, but it should be implemented the right way ;),
which means that we need KPROBE_ENTRY_P5ALIGNED and so on :-/.

Greetings,
--

From: Cyrill Gorcunov
Date: Sunday, November 23, 2008 - 4:35 am

[Alexander van Heukelum - Sun, Nov 23, 2008 at 12:23:54PM +0100]
| On Sun, Nov 23, 2008 at 12:21:36PM +0300, Cyrill Gorcunov wrote:
| > [Alexander van Heukelum - Sun, Nov 23, 2008 at 10:08:28AM +0100]
| > | Impact: cleanup of entry_64.S
| > | 
| > | Except for the order and the place of the functions, this
| > | patch should not change the generated code.
| > | 
| > | Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
| > | 
| > | ---
| > |  arch/x86/kernel/entry_64.S |  259 +++++++++++++++++++-------------------------
| > |  1 files changed, 109 insertions(+), 150 deletions(-)
| > |
| > 
| > Hi Alexander,
| > 
| > great! One moment is not obvious for me -- why we
| > stopped to align interrupt section to 32 bytes?
| > Did I miss anyhing?
| 
| I put a ".p2align 5" in earlier in the series which caused the
| apicinterrupts to be 32-byte aligned. But it is a hack, really,
| relying on the generated code per stub to be between 17 and 32
| bytes, on the default alignment to be 16 bytes and all stubs
| to be in the .text section.
| 
| I'm in favour of aligning all of the interrupt/exception stubs
| to 32 bytes, but it should be implemented the right way ;),
| which means that we need KPROBE_ENTRY_P5ALIGNED and so on :-/.
| 
| Greetings,
|         Alexander
|  
| > 		- Cyrill -
| 

ah. thanks for info.

		- Cyrill -
--

From: H. Peter Anvin
Date: Sunday, November 23, 2008 - 1:13 pm

I'm sorry, I really don't follow that logic at all.  Why the heck would
KPROBE_ENTRY_P5ALIGNED be better than .p5align?

For the record, I think we already have way to much macro crappage.  It
makes the code painful to read and hard to figure out what the real
constraints on it is.

	-hpa
--

From: Alexander van Heukelum
Date: Monday, November 24, 2008 - 3:06 am

Hello hpa,

KPROBE_ENTRY_P5ALIGNED It isn't better, that's the point.
It's needed, because we have this problem:

	.p2align 5
	KPROBE_ENTRY(something)
		somecode	
	KPROBE_END(something)

This does not align "something", because something is not in the

I agree with that to some point. But even without the macro's, the
code is hard to read. As an alternative to more macro's, what do you
think of Cyrills suggestion of putting CFI-annotations next to the
instuctions, like:

	pushq %rax			; CFI_ADJUST_CFA_OFFSET 8
	movq %rax, RAX+8(%rsp)		; CFI_REL_OFFSET rax, RAX+8

instead? That still leaves the problem that the instruction and the
annotation might not match, but it leaves the code more readable for
someone who is used to reading x86 assembly.

Greetings,
--

From: H. Peter Anvin
Date: Monday, November 24, 2008 - 11:07 am

I think that is braindamaged.  If KPROBE_ENTRY() changes the section
behind the programmers back, that's completely and totally stupid in an
utterly reckless sort of way.  The author of a macro like that should be
taken out and shot, because it makes the programmer think the code does
something completely different than what the code actually does.


I think using the macros for opcodes make sense, as the CFI operation is
tied to the operation.  That is a good way to use macros.

	-hpa
--

From: Ingo Molnar
Date: Sunday, November 23, 2008 - 6:23 am

yeah. But such things, if they pile up long enough, can result in real 

heh, that indeed explains :)

	Ingo
--

From: Ingo Molnar
Date: Monday, November 17, 2008 - 2:47 am

applied to tip/x86/cleanups, thanks Alexander. This file still has a 
horrible code structure but now at least our eyes do not burn out when 
looking at it ;-)

	Ingo
--

From: Alexander van Heukelum
Date: Monday, November 17, 2008 - 8:14 am

Lots of thanks!

Alexander
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - I mean, what is it about a decent email service?

--

Previous thread: Suspend regression in 2.6.25.18 by Rafael J. Wysocki on Sunday, November 16, 2008 - 6:04 am. (1 message)

Next thread: Turn CONFIG_STRICT_DEVMEM in sysctl dev.mem.restricted by