Re: [linux-pm] Freezer: Don't count threads waiting for frozen filesystems.

Previous thread: Linux 2.6.28-rc2 by Linus Torvalds on Sunday, October 26, 2008 - 12:41 pm. (11 messages)

Next thread: Fw: Re: unexpected extra pollout events from epoll by Paul P on Sunday, October 26, 2008 - 1:19 pm. (1 message)
From: Rafael J. Wysocki
Date: Sunday, October 26, 2008 - 1:01 pm

While your description above seems completely correct to me and I have no
objections to the patch, I'd prefer it if someone having more experience with
the VFS looked at it.  Miklos, can you have a look, please?

Thanks,
--

From: Miklos Szeredi
Date: Monday, October 27, 2008 - 4:12 am

The description is missing some details: why is the filesystem frozen
before suspend?  AFAICS this can happen when DM calls bdev_freeze() on
the device before the task freezing begins.  Is this the case?

Also, while the patch might solve some of the symptoms of the fuse
vs. process freezer interaction, it will not fully fix that problem.
As such it's just a hack to hide the problem, making it less likely to
appear.

So I'm not at all conviced about this patch.

Thanks,
Miklos
--

From: Nigel Cunningham
Date: Monday, October 27, 2008 - 4:20 am

Hi Miklos.


It doesn't matter why a process is sitting in that wait_event call. What
does matter is that one can be there. In the case where I saw it, I was
working on fuse freezing. I don't remember the details, as it's a year

No, it's part of the solution. I haven't posted the full fuse freezing
patch because I thought this could be profitably merged without the rest
of the patch.

Regards,

Nigel

--

From: Rafael J. Wysocki
Date: Monday, October 27, 2008 - 4:37 am

Well, I guess it's better if you post the entire thing so that we can see
what the role of the $subject patch is in it, even if this patch finally gets
merged separately.

Thanks,
Rafael
--

From: Nigel Cunningham
Date: Monday, October 27, 2008 - 4:40 am

Hi.


Ah.. that makes me see how vfs_check_frozen was getting triggered...
(fs/namei.c, below).

Regards,

Nigel

 fs/buffer.c                 |   87 ++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/control.c           |    1 
 fs/fuse/dev.c               |    7 +++
 fs/fuse/dir.c               |   35 +++++++++++++++--
 fs/fuse/file.c              |   14 +++++++
 fs/fuse/fuse.h              |   13 ++++++
 fs/fuse/inode.c             |    4 +-
 fs/namei.c                  |    2 +
 include/linux/buffer_head.h |    5 ++
 include/linux/freezer.h     |   15 +++++++
 include/linux/fs.h          |   10 ++++-
 kernel/power/process.c      |   48 +++++++++++++++++++++---
 12 files changed, 227 insertions(+), 14 deletions(-)
diff -ruNp 700-BRANCH-fuse-freezing.patch-old/fs/buffer.c 700-BRANCH-fuse-freezing.patch-new/fs/buffer.c
--- 700-BRANCH-fuse-freezing.patch-old/fs/buffer.c	2008-10-19 04:57:22.000000000 +1100
+++ 700-BRANCH-fuse-freezing.patch-new/fs/buffer.c	2008-10-21 13:16:08.000000000 +1100
@@ -247,6 +247,93 @@ void thaw_bdev(struct block_device *bdev
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+#if 0
+#define FS_PRINTK(fmt, args...) printk(fmt, ## args)
+#else
+#define FS_PRINTK(fmt, args...)
+#endif
+
+/* #define DEBUG_FS_FREEZING */
+
+/**
+ * freeze_filesystems - lock all filesystems and force them into a consistent
+ * state
+ * @which:	What combination of fuse & non-fuse to freeze. 
+ */
+void freeze_filesystems(int which)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+
+	/*
+	 * Freeze in reverse order so filesystems dependant upon others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		FS_PRINTK(KERN_INFO "Considering %s.%s: (root %p, bdev %x)",
+			sb->s_type->name ? sb->s_type->name : "?",
+			sb->s_subtype ? sb->s_subtype : "", sb->s_root,
+			sb->s_bdev ? sb->s_bdev->bd_dev : 0);
+
+		if (sb->s_type->fs_flags & FS_IS_FUSE &&
+		    sb->s_frozen == SB_UNFROZEN &&
+		   ...
From: Miklos Szeredi
Date: Monday, October 27, 2008 - 5:38 am

Nigel, thanks for the patch, it makes thinks a lot clearer.

From a quick look through the patch it seems to solve a bunch of cases
where new fuse requests during the freezing could cause problems.  But
it doesn't do anything with requests that are already under way when
the freezing starts, which would still result in all the same
problems.

Take this scenario:

 1) process A does rename("/mnt/fuse/a", "/mnt/fuse/b")
 2) request goes to process B serving the fuse filesystem
 3) filesystems are frozen, no new fuse requests
 4) processes are frozen, let's say B first, then A
 5) freezing A will fail, since it's still waiting for the request to finish

Several solutions have been posted, none of which really solve the problem:

 a) Let's tag fuse server processes and freeze them later.  This is
 basically impossible, because many processes could be interoperating
 and there's no way to know which is depending on which (example:
 sshfs uses ssh for communication, which possibly relies on openvpn
 process for packet transmission).

 b) While waiting for replies to fuse request, allow process to
 freeze.  Does not fully solve the problem, as VFS might be holding
 locks, and other processes waiting for those locks will not be
 freezable.

Miklos
--

From: Nigel Cunningham
Date: Monday, October 27, 2008 - 1:59 pm

Hi.


I'll have a look at the code again. I deliberately didn't stop existing
requests, but perhaps that's the wrong behaviour.

In the above scenario, process B won't see process A's new request until
post-resume if the filesystem is already frozen, so steps 4 & 5 don't
happen. Process B will also always be frozen before process A if process
A is userspace (most likely in the above scenario) or was mounted after
process B. (I've had this patch distributed as is for almost a year,

I agree that these solutions won't work.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Monday, October 27, 2008 - 2:09 pm

No, I mean the filesystem is only frozen at 3 not before, so B is

Both A and B are userspace processes.  The order of freezing userspace
processes is basically random, there's no way to make sure that
processes doing work on behalf of a fuse filesystem (B) are frozen
after processes accessing the fuse filesystem (A).

Miklos
--

From: Nigel Cunningham
Date: Monday, October 27, 2008 - 3:13 pm

Hi.


Sorry. You're right - I was confusing things in what I said, but I do
have a (unconfused) solution:

The answer is to freeze the fuse filesystems first, stopping new
requests (freezing the processes making them) before they are passed on
to userspace and allowing existing requests to complete or freeze. Once
that is done, the userspace fuse processes will be idle, at least as far
as fuse activity is concerned. After fuse activity is stopped, userspace
can be frozen without worrying about what processes are fuse and what
aren't. This is what my patch implements so far.

To deal with requests that are already in progress, I'd suggest three
possibilities, in the order I think they should be preferred.

1) Use wait_event_freezeable[_timeout] in fuse code. Probably preferable
to #2, but I thought of it later :)

2) Use freezer_do_not_count as part of FUSE_MIGHT_FREEZE (resetting
before exiting the callers, of course). If the request doesn't complete
during the freezing period, it must be because the userspace thread was
already frozen. If the request does complete, we're counted again during
the normal freezing of userspace that follows the freezing of
filesystems.

3) Adding a means to check whether processes being frozen are fuse
requests. The code could then wait for fc->num_waiting - (say)
fc->num_frozen == 0.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 1:25 pm

Yup, these fix the freezing of tasks which have outstanding fuse
requests.

However it does not fix the freezing of tasks which are waiting for
VFS locks (i.e. inode->i_mutex) held by the outstanding fuse requests.
This is the tricky part...

Miklos
--

From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 2:29 pm

Hi.


Convert them all to wait_event_freezeable. If you know the locks they're
waiting on definitely aren't going to be free until post-resume and
we're going to be trying to freeze them while they're waiting, that
would be the right behaviour.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 2:51 pm

You mean convert mutexes to event queues?  Not a very good idea.

I fear we are going down the same path as the last time.  I still
don't think rewriting the VFS is the right solution to the freezing
problem.  But hey, if you want, sumbit a patch or an RFD and lets see
what others think.

Miklos
--

From: Rafael J. Wysocki
Date: Tuesday, October 28, 2008 - 3:00 pm

So, what solution would you prefer?

Rafael
--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 3:02 pm

I would prefer a freezer-less solution.  Suspend to ram doesn't need
the freezer, and with the kexec approach hibernate could be done
without it also.

I don't think adding hacks to the VFS to work around the issues with
the freezer is the right way to solve this.  But this is just my
personal opinion, the VFS maintainers may think otherwise.

Miklos
--

From: Rafael J. Wysocki
Date: Tuesday, October 28, 2008 - 3:21 pm

There are problems with that too, although they aren't directly related to

Well, my personal opinion is that we need filesystems to support suspend,
this way or another and the sooner it happens, the better.  Still, I'm rather
not going to make that happen myself. :-)

Apparently, Nigel is willing to work in this direction and we can use this as
an opportunity to learn what exactly is necessary for this purpose and _then_
decide if this is reasonable or not instead of dismissing it upfront.

Thanks,
Rafael
--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 4:21 pm

Umm, OK.  Last I remember everybody agreed that there's absolutely no
reason why processes need to be frozen, and the only important thing
is that drivers are not twiddling the hardware during suspend, and

I haven't dismissed it, just voiced my opinion that I think it's the
wrong direction.

Miklos
--

From: Rafael J. Wysocki
Date: Tuesday, October 28, 2008 - 4:59 pm

Well, this turned out not to be the case in the meantime.

In fact to handle that without the freezer we'd have to synchronize every
driver's suspend/resume callbacks with every possible way in which
applications can access the device for regular I/O (for example for PCI devices
this means any I/O other than configuration space accesses).

While this is possible in theory, I don't see this happening any time soon,
especially that we're going to keep the bar for accepting new drivers
relatively low.

Thanks,
Rafael
--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 1:10 am

Not all callbacks.  I don't know what the current model is but AFAIR
it should be something like this:

  1) call drivers to prepare for suspend (allocate space, etc)
  2) stop all driver activity (plug queues, disable interrupts, etc)
  3) call drivers to actually save state and power down
  4) suspend

The part we are concerned is stopping driver activity.  It could be
done with a mutex, or it could be done by freezing tasks.  Adding a
mutex or other mechanism is the one I most like, but it's probably the
biggest work, so lets look at how to fix the freezing:

Currently the criteria for freezing is that userspace task has to exit
kernelspace, and kernel task has to hit a specific "freeze point".
This causes problems where we want to freeze tasks which are "stuck"
inside filesystems or other non-driver parts of the kernel.  We can
fix this two ways:

  a) mark additional places to freeze for userspace tasks as
  well. This is the direction Nigel seems to be taking.

  b) or instead we could allow freezing anywhere in uninterruptible
  sleep, _except_ where explicity marked.

Which is easier?  I don't know.  But I very storgly feel that marking
un-freezable places instead of marking freezable places is a much
cleaner solution.  It only affects parts of the kernel which have
something to do with suspend, instead of affecting parts of the kernel
which have absolutely nothing to do with suspend.

Miklos
--

From: Alan Stern
Date: Wednesday, October 29, 2008 - 6:51 am

Not only is adding a mutex the biggest amount of work, it has has the 
largest impact.  Every I/O pathway would have to acquire the 

The problem with unrestricted freezing shows up when you freeze tasks
that hold a mutex or other sort of lock.  If this mutex is needed later
on for suspending a device then the suspend will hang, because a frozen
task can't release any mutexes.

I suppose you could try to categorize mutexes as "freezable" and 
"non-freezable".  Ones used by device drives would generally be 
non-freezable, whereas others (such as those used by VFS) would be 
freezable.  Still, it would be pretty difficult.  Among other things, 
it would be necessary to verify that a task holding a non-freezable 
mutex never tries to acquire a freezable mutex.

Alan Stern

--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 7:50 am

Actually I was thinking of an rw-semaphore, not a mutex.  But yeah
that still has scalability problems.  But it could be done with custom
locking primitives, optimized for this case:

   suspend_disable();
   /* driver stuff */

I did a random sampling of ->suspend() callbacks, and they don't seem
to be taking mutexes.  Does that happen at all?

Did anybody ever try modifying the freezer for suspend (not
hibernate), so that it allows tasks not in running state to freeze?
If not, I think that's an experiment worth doing.

Miklos
--

From: Alan Stern
Date: Wednesday, October 29, 2008 - 8:28 am

Yes, it could be done.  And the overhead could be minimized by using
per-CPU variables.  It would still be an awful lot of work, and easy to

It does, particularly among drivers that do runtime PM, which is 
becoming more and more important.

Besides, suspend has to synchronize with I/O somehow.  Right now that 
is handled by making suspend wait until no tasks are doing I/O (because 
they are all frozen).  If you allow tasks to be frozen at more or less 
arbitrary times, while holding arbitrary locks, then you may end up 
freezing a task that's in the middle of I/O.  That should certainly 

What happens if the reason the task isn't running is because it's 
waiting for I/O to complete?  I just don't think this can be made to 
work.

Alan Stern

--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 8:50 am

What is the middle of I/O?  Depending the type of I/O it could be
messed up regardless of whether tasks happen to be in userspace or not
(e.g. printing).

And some types of I/O are already mostly decoupled from userspace
(file I/O, networking), so the userspace freezing shoudln't make any

Don't know.  I've never written a driver, and I'm not familiar with
runtime PM, etc.  So I can't come up with a detailed design for
solving the freezer issues there.

But I do think that the solution does not lie in "fixing" the VFS.

Miklos
--

From: Alan Stern
Date: Wednesday, October 29, 2008 - 9:17 am

I don't know.  Maybe it slipped through the cracks.  It wouldn't be 

I'll wriggle out of this by saying that "in the middle of I/O" means
the task has gotten a device into a state from which it can't
successfully be suspended.  An example would be if the system had sent
the device part of a command.  Even if you did manage to suspend the
device and resume it later, you wouldn't expect it to work right when
it received the second half of the command.

In general, drivers don't leave devices in this kind of intermediate
state for long.  A driver might sleep at such times, but it wouldn't 

True.  We're mostly talking about character device I/O.  There are a 

I tend to agree.  The problems posed by fuse are very difficult.  It is 
not at all obvious how to attack them.

Alan Stern

--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 9:10 am

OK, getting back to this, as it seems to be the only way that we agree
is doable.

How about this,

 a) identify syscalls that may make drivers do I/O:

 - read
 - write
 - ioctl
 ???

 b) add the suspend_disable/enable() primitives to these syscalls

 c) push primitives inside the implementation

c) is slightly tricky, but could be done for example by setting a flag
on open: FMODE_NO_SUSPEND_DISABLE (better name required), saying that
implementation is responsible for getting the suspend disable magic
right.

For starters this flag could be set for all non-device opens (maybe all
non-char-dev opens?), solving the fuse vs. freezer issues without any
complicated trickery.

Miklos
--

From: Alan Stern
Date: Wednesday, October 29, 2008 - 1:37 pm

I discussed this last summer with Rafael.  It's a lot harder than it 
looks, for all sorts of reasons.  For example, what about user tasks 

I don't know.  There are other interfaces too, like sysfs attributes, 
that would have to be handled specially.  On the whole, the freezer 
seems much, much simpler.

Regarding fuse, something like Nigel's scheme for preventing new 
requests and then waiting for old requests to complete might work out.  
Especially if you combine it with a strategy for making the freezer 
back and retry after a delay when something goes wrong.

Alan Stern

--

From: Rafael J. Wysocki
Date: Wednesday, October 29, 2008 - 2:11 pm

Yeah.

The current design of the freezer is rather simplistic and I'm not really sure
it's the best one possible.  Perhaps we can redesign the freezer to work
differently and handle the cases like fuse.

I didn't have the time to think about that yet, though.

Thanks,
Rafael
--

From: Nigel Cunningham
Date: Wednesday, October 29, 2008 - 2:45 pm

Hi.


Why redo what I've already done? In the full patch, you have the basis
of what you're talking about. I haven't seen a failure to freeze fuse or
anything else in a year of use.

Nigel

--

From: Rafael J. Wysocki
Date: Wednesday, October 29, 2008 - 3:07 pm

Still, Miklos noticed some problems with it.

I'm not talking about doing things on top of the current signal-based
freezing mechanism, but rather about moving the freezer a bit closer towards
the scheduler.  Maybe this is the way to go.  I don't know.

In any case, the freezing of user space seems to be much simpler than modifying
all drivers to add suspend synchronization.

Thanks,
Rafael
--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 4:53 pm

As I mentioned, we don't _have_ to modify all drivers.  It could start
out by just modifying syscalls that result in calls to drivers.  That
would be a 10-liner patch.

Miklos
--

From: Pavel Machek
Date: Sunday, November 9, 2008 - 6:44 am

If you can solve current freezer vs. fuse in 10 lines, that would be
useful...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 4:48 pm

Well yeah, your patch handles the straightforward cases.  But it
doesn't help with the more tricky cases, where one fuse filesystem is
using another, and as those may become more widespread, this approach
will fail.

Miklos
--

From: Nigel Cunningham
Date: Thursday, October 30, 2008 - 6:04 am

Hi.


At the moment, yes. But it's not impossible for us to modify the patch
to handle that as well.

Regards,

Nigel

--

From: Alan Stern
Date: Thursday, October 30, 2008 - 6:56 am

It depends on what you mean.  The most direct reading of your statement
is simply wrong: It is _theoretically_ impossible to find the correct
order for freezing filesystems.  To do so would be equivalent to
solving the halting problem.

Alan Stern

--

From: Nigel Cunningham
Date: Thursday, October 30, 2008 - 2:44 pm

Hi Alan.


I'm not sure that's true. You see, I'm thinking of this as not that
different to the problem of unmounting filesystems. There, too, we need
to unmount in a particular order, and let transactions on each
filesystem stop cleanly before we can unmount them. Even if there are
differences, perhaps looking at how we handle unmounting will help with
handling freezing.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Friday, October 31, 2008 - 1:49 am

There's nothing magic about umount, it just uses a refcount on the fs.

But umount changes the namespace, that's the big difference.  For
example if a process is accessing path P which has a component inside
the mount, it _will_ get different results before and after the
umount.  This is not acceptable for freezing.

For freezing to work with such a refcounting scheme, we'd have to
count _future_ uses of the fs as well, not just current ones, which is
obviously impossible.

Miklos
--

From: Nigel Cunningham
Date: Friday, October 31, 2008 - 2:10 am

Hi.


I must be missing something. If you're freezing future users of the
filesystem before they can start anything new, doesn't that deal with
this problem?

Regards,

Nigel

--

From: Miklos Szeredi
Date: Friday, October 31, 2008 - 2:16 am

How do you determine which are the future users?

Miklos
--

From: Nigel Cunningham
Date: Friday, October 31, 2008 - 4:28 am

Hi again.


You don't. You just use FUSE_MIGHT_FREEZE to stop them before they take
locks that might cause problems. So my suggestion is:

1) Stop new requests at FUSE_MIGHT_FREEZE
2) Handle existing requests by using freezer_do_not_count in
request_send and request_send_nowait before the spin_lock and
freezer_count after the spin_unlock.

With #2, we don't need to care about whether the request is completed
before freezing completes or post-resume.

If the userspace process tries to use an already frozen fuse filesystem
and gets frozen, that's okay because we'll sit waiting for an answer,
not be counted by the freezer and so not contribute to any failure to
freeze processes.

If the userspace process completes its work, we'll either get caught at
the freezer_count (if we've already been told to freeze) or be gotten
later, after exiting the fuse code.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Friday, October 31, 2008 - 5:44 am

Before?  FUSE_MIGHT_FREEZE is called _after_ i_mutex is taken by the

Yes, this is the variant of categorizing sleeps to freezing and
non-freezing.  The problem is, you need to do that with all
mutex_lock(&inode->i_mutex) instances as well.  Try grepping for that
in fs/*.c!

It _is_ possible, it's just not practical.

Miklos
--

From: Nigel Cunningham
Date: Friday, October 31, 2008 - 2:11 pm

Hi.



Let's get practical. You know fuse a lot better than me. Can you give me
the most complicated, ugly configuration you can think of, and I'll work
on making it freeze first time, every time.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Wednesday, October 29, 2008 - 4:43 pm

I don't think it will work out, because to be able to do this some
ordering between freezing the filesystems must be done.  But this is
basically impossible, for all the same reasons it's impossible to
order the freezing of userspace tasks.

Also this is not just a fuse issue: we have userspace network devices,
we have userspace USB drivers, etc, affected by this problem.

If there _is_ a solution with the freezer that does solve all of this,
I haven't yet heard it.  But yeah, in the end the simpler solution
should win.

Miklos
--

From: Alan Stern
Date: Thursday, October 30, 2008 - 6:54 am

Sure it does.  A frozen process can't touch a memory-mapped I/O region, 
whereas a non-frozen process can.  Although as Matthew pointed out, 

Would you like to write a first-pass patch?  I don't think it will 

To do this properly, it is necessary to categorize sleeping states.  
Some should count as frozen and others shouldn't.  So for instance, all 
the mutexes and wait queues involved in VFS would count as frozen; a 
process waiting on one of them can simply be kept from waking up until 
the suspend is over.  Others (those involved in device I/O) wouldn't 
count as frozen; a task waiting on one of them would have to wake up 
and move forward before the system could suspend.

Doing that seems like a lot of work, just as modifying every driver 
does.  Changing a few kernel entry points is simpler, but I'm pretty
sure it won't work.  For instance, tasks can block arbitrarily long on 
read calls (waiting for data to arrive); you can't allow such things to 
prevent the system from suspending.

Alan Stern

--

From: Miklos Szeredi
Date: Thursday, October 30, 2008 - 7:39 am

If somebody doesn't beat me to it, I'll do that (first implemented

But we already do: either

 a) it's in interruptible sleep (I/O on sockets, pipes, etc), and
    freezing simply interrupts it, or

 b) it's in uninterruptible sleep and suspend will wait it out (or
    time out).

In the new scheme we could retain that part of the freezer: interrupt
all tasks which are inside the critical region and wait for them to
exit the critical region.

To put it in another way: it's still the freezer, it does all the same
things as the old freezer, except that the condition for freezing is
not that the task is out of the kernel, rather that it's out of the
disable_supend - enable_suspend region.  As such it's not a big change
to the whole suspend system, and so there shouldn't be anything big
going wrong there.

Miklos
--

From: Alan Stern
Date: Thursday, October 30, 2008 - 10:07 am

Okay.  Don't forget things like ioctl for sockets -- they often involve 
doing I/O directly to the network interface device.

What happens to a task accessing a non-regular file on a fuse 
filesystem?  :-)

Alan Stern

--

From: Miklos Szeredi
Date: Thursday, October 30, 2008 - 10:43 am

Yeah, ioctls should probably just always be protected (at least

The same as on any other filesystem, i.e. the fs is only involved as
far as calling init_special_inode(), the rest is taken care of by the
VFS.

Tejun Heo recently posted patches to fuse which enable emulating a
char device from userspace.  That is another matter, obviously we'd
want to keep the "allow suspend during I/O" property of fuse in that
case, even though there's a char device involved (but no hardware, at
least not on that level).

Miklos
--

From: Rafael J. Wysocki
Date: Thursday, October 30, 2008 - 1:17 pm

I like this idea.

I was thinking about using task flags to mark processes as "not freezable
at the moment", which would make the freezer to wait for such tasks
(currently a task may only be always freezable or not freezable at all,
which is not very flexible).

Unfortunately, I didn't have the time to implement this.

Thanks,
Rafael
--

From: Pavel Machek
Date: Saturday, November 15, 2008 - 9:58 am

Cool!

Disadvantage is it will add overhead to regular syscalls, at least
initialy. That's why I implemented freezer initialy...

Of course, suspend is more important than it was back then, and
disadvantages of freezer are now well known, so maybe a little
overhead in exchange of cleaner design is worth it...?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Matthew Garrett
Date: Wednesday, October 29, 2008 - 4:56 pm

The only interesting cases of this I know of are X (which ought to stop 
doing so in the near future) and certain sound operations (which are 
going via a well-defined API anyway and need to handle devices going 
away at random, so not a problem). There may be some other niche cases, 
but really - if you're mmapping hardware then it's generally because you 
haven't written a proper kernel driver. Do that instead. Runtime power 
management's already going to make you wildly unhappy.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 3:03 pm

[Empty message]
From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 4:04 pm

Hi again.


Okay. Looking back on our conversation brought me back to this message,
which I think needs another reply.

If we take the strategy of holding new requests and allowing existing
ones to complete, then am I right in thinking that we only need to worry
about where inode->i_mutex is taken in fs/fuse/file.c (I don't see it
taken in other fs/fuse/*.c files).

If that's correct, dealing with that issue looks relatively straight
forward: we need some more FUSE_MIGHT_FREEZE calls for those functions,
and something done to the vfs_check_frozen call - I'm a bit confused by
that - inode->i_sb will refer to the fuse filesystem, won't it?

Regards,

Nigel

--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 4:12 pm

Nope, i_mutex is usually taken by the VFS not the filesystem.  That
means that the filesystem is called with the mutex already held.  Try
"grep i_mutex fs/*.c".  There's also sb->s_vfs_rename_mutex, for all
the gory details see Documentation/filesystems/Locking.

So it's not just having to fix fuse, it's having to "fix" the VFS as
well.

Miklos
--

From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 4:17 pm

Hi.


Remember, though, that we're only freezing fuse at the moment, and
strictly one filesystem at a time. We can thus happily wait for the
i_mutex taken by some other process to be released.

Regards,

Nigel

--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 4:24 pm

Not going to work: you need to wait for all requests to be finished,
but those might depend on some other fuse filesystem which has already
been frozen.

Miklos
--

From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 4:41 pm

Hi.


Okay. In that case, am I right in thinking that the request waiting on
the frozen filesystem will be stuck in request_wait_answer, and the
userspace process that was trying to satisfy the request will be stuck
in the FUSE_MIGHT_FREEZE call that was invoked for the frozen
filesystem?

Regards,

Nigel

--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 4:45 pm

No, it already passed that, before the filesystem got frozen.  But it
doesn't matter, in either case i_mutex will already have been taken by
the VFS and it won't be released until the request completely
finishes.

Miklos
--

From: Miklos Szeredi
Date: Tuesday, October 28, 2008 - 4:50 pm

Sorry, I misunderstood this.  Yes you're right, in the case of one
fuse filesystem relying on another to complete the request the already
frozen one will be stuck in FUSE_MIGHT_FREEZE().

How does that help?

Miklos
--

From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 4:58 pm

Hi.


Well, my next question was going to be: can we find a way to know that
the userspace process we're waiting on was frozen? If we can know that,
then perhaps we can apply that knowledge in this thread to avoid a
freezing failure.

Regards,

Nigel

--

From: Nigel Cunningham
Date: Tuesday, October 28, 2008 - 4:54 pm

Hi Miklos.


I think we're making different assumptions. I'm assuming that one of
those solutions we already discussed is implemented, such that we don't
start freezing a new filesystem until all the requests for the current
filesystem are dealt with. Perhaps I should come up with a new version
of the patch that implements that.

Nigel

--

From: Miklos Szeredi
Date: Monday, October 27, 2008 - 4:37 am

Right, but it could also be sleeping in any other wait_event(),

Please post the full patch, it'd be a lot better to see the big
picture instead of just a small part of it.

Thanks,
Miklos
--

Previous thread: Linux 2.6.28-rc2 by Linus Torvalds on Sunday, October 26, 2008 - 12:41 pm. (11 messages)

Next thread: Fw: Re: unexpected extra pollout events from epoll by Paul P on Sunday, October 26, 2008 - 1:19 pm. (1 message)