Thread (18 messages) 18 messages, 8 authors, 2016-02-19

Distributed Process Scheduling Algorithm

From: Nitin Varyani <hidden>
Date: 2016-02-16 09:46:17

According to my project requirement, I need a distributed algorithm so
mesos will not work. But work stealing is the best bargain. It will save
communication costs. Thankyou. Can you please elaborate on the last part of
your reply?

On Tue, Feb 16, 2016 at 2:12 PM, Dominik Dingel [off-list ref]
wrote:
On Tue, 16 Feb 2016 00:13:34 -0500
Valdis.Kletnieks at vt.edu wrote:
quoted
On Tue, 16 Feb 2016 10:18:26 +0530, Nitin Varyani said:
quoted
1) Sending process context via network
Note that this is a non-trivial issue by itself.  At a *minimum*,
you'll need all the checkpoint-restart code.  Plus, if the process
has any open TCP connections, *those* have to be migrated without
causing a security problem.  Good luck on figuring out how to properly
route packets in this case - consider 4 nodes 10.0.0.1 through 10.0.0.4,
you migrate a process from 10.0.0.1 to 10.0.0.3,  How do you make sure
*that process*'s packets go to 0.3 while all other packets still go to
0.1.  Also, consider the impact this may have on iptables, if there is
a state=RELATED,CONNECTED on 0.1 - that info needs to be relayed to 0.3
as well.

For bonus points, what's the most efficient way to transfer a large
process image (say 500M, or even a bloated Firefox at 3.5G), without
causing timeouts while copying the image?

I hope your research project is *really* well funded - you're going
to need a *lot* of people (Hint - find out how many people work on
VMWare - that should give you a rough idea)
I wouldn't see things that dark. Also this is an interesting puzzle.

To migrate processes I would pick an already existing solution.
Like there is for container. So every process should be, if possible, in a
container.
To migrate them efficiently without having some distributed shared memory,
you might want to look at userfaultfd.

So now back to the scheduling, I do not think that every node should keep
track
of every process on every other node, as this would mean a massive need for
communication and hurt scalability. So either you would implement
something like work stealing or go for a central entity like mesos. Which
could do process/job/container scheduling for you.

There are now two pitfalls which are hard enough on their own:
- interprocess communication between two process with something different
than a socket
  in such an case you would probably need to merge the two distinct
containers

- dedicated hardware

Dominik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20160216/5e59d9fb/attachment.html 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help