>
> for files, I would first only care about stdios (make sure they're
> relinked to something safe on restart) and file states of regular files.
> contents is generally handled externally (deleted files being an annoying
> exception)
>
> then, support for openmp application is a nice to have, so I'd probably
> go on with thread support.
>
> > I think it's important to demonstrate how shared resources and multiple
> > processes are handled. FDs demonstrate the former (with a fixed version
> > of the recent patchset - I will post soon).
>
> shared resources are only useful in a multiprocess/multitask context.
> I'd start working on this first. here we jump directly in the pid namespace
> issues, how we start a set of process in a pidnamespace ? how do we
> relink it to its parent pidnamespace ? are signals well propagated ? etc.
> but hey, we'll have to solve it one day.
>
> FD's are shared but have many types which are pain to handle. (it would
> interesting to see if we can add checkpoint() and restart() operations in
> fileops) So, for shared resources demonstration, I'd work on sysvipc,
> there are less types to handle and they force us to think how we are going
> to merge with the sysvipc namespaces.
>
> > The latter will increase the size of the patchset significantly, so
> > perhaps can indeed wait for now.
>
> hmm, that depends how you do it.
>
> If you restart all the hierarchy in the kernel, It will increase for sure
> the patch footprint. However if you restore the hierarchy from user space
> and then let each process restore itself from some binary blob, it should
> not. This, of course, means that the binary blob representing the state of
> the container (we call it statefile) is not totally opaque. It see it a bit
> like /proc, a directory containing shared states (all namespaces) and
> tasks states. That's something to discuss.
>
> I do prefer the second option for many reasons:
>
> . each process restarts itself from its current context, this makes it
> easier to reuse kernel services depending on current.
>
> . user tools can evaluate more precisely what they are going to restart
> from the statefile. see this as a generalised 'readelf' that would be run
> on the statefile, like we do on a core file today.
>
> > It should not be hard for me to add functionality on top of a more
> > basic patchset. The question is, what is "basic" ? Anyway, I will be
> > back towards the end of the week. Let's try to discuss this over IRC
> > then (e.g. Friday afternoon ?).
>
> IHMO, the first one is to support a 'basic' computational program in
> a real environment (under a load manager HPC). your POC nearly reaches
> it but the user space API (how to launch, checkpoint, restart) needs to
> be worked on.
>
>
> There are some big steps in the development.
>
> Multi-task is a big step which opens plenty of other big steps with
> shared resources : mem, ipc, fds, etc. Not all have to be solved
> but at least detected if we don't have the support.
>
> Network is another one. This is an interesting step to support
> distributed application using MPI over TCP. May be a priority.
>
> there are also plenty of funky kernel resources used by misc servers,
> database that will need special attention.
>
>
>
> I'll be happy to start with the basic menu first as I know that it will
> be useful for many applications !
>
> Thanks,
>
> C.
>
> _______________________________________________
> Containers mailing list
>
Containers@lists.linux-foundation.org
>
https://lists.linux-foundation.org/mailman/listinfo/containers
>
> _______________________________________________
> Devel mailing list
>
Devel@openvz.org
>
https://openvz.org/mailman/listinfo/devel