Michael Bishop
2012-04-19 11:36:37 UTC
Hello Everyone,
I've recently finished an implementation of -j <max_concurrent_jobs> and would be delighted if the members of this list would take a detailed look at it
It's in this rake branch on github:
https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0
or
http://bit.ly/JMVARy
In addition, I've added a pull-request for the inclusion of the change to the master branch (allowing of course for further changes to match the style and simplicity of the original)
https://github.com/jimweirich/rake/pull/112
My apologies if this has been hashed out many times on this list. I'm hoping to offer a fresh look at the problem. Thank you for your consideration.
Sincerely,
Michael Bishop
---
SUMMARY
-------
USER INTERFACE:
The user-interface is simply a new -j flag that specifies the maximum number of tasks that can execute simultaneously (discussed here before as I've seen). If the -j flag is omitted, the old behavior of unlimited concurrent tasks is retained.
IMPLEMENTATION:
The implementation is inspired by Apple's Grand Central Dispatch model. The core of the problem is solved via changes to the file "multi-task.rb".
The current implementation of MultiTask#invoke_prerequisites spawns a new thread for each prerequisite and then waits to return until all the threads have completed.
In the alternate implementation, MultiTask#invoke_prerequisites creates a list of blocks inside which each prerequisite is called. Each block is then added to a thread-safe Queue for processing. A class-member ThreadPool is then expanded to include enough threads to consume and call the blocks, but only up to the limit as passed to -j.
Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the newly spawned threads, it participates in the processing of the queued blocks while it waits. This prevents deadlock situations should more threads call MultiTask#invoke_prerequisites.
How then does MultiTask#invoke_prerequisites know when its prerequisites have finished? Before enqueing the prerequisite blocks, it surrounds the original prerequisite block with *another* block which maintains bookkeeping tasks as to when the prerequisite is completed and which thread is working on it. That block is what is added to the queue.
Two conditions need to be met for MultiTask#invoke_prerequisites to return:
1 - It notices that its prerequisites have all been processed
2 - It notices there are no more blocks on the queue but its prerequisites are not yet finished. In this case, it joins those threads that are still executing its prerequisites.
What is attractive to me about this implementation is that the flow retains the simplicity of the original: MultiTask#invoke_prerequisites sends all the prerequisites to be executed then waits until they are done.
THE CASE FOR -J
---------------
(Quoted from the pull-request)
PROBLEM SUMMARY:
Rake can be unusable for builds invoking large numbers of concurrent external processes.
PROBLEM DESCRIPTION:
Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes).
SOLUTION SUMMARY:
This request includes the code to add support for a "--jobs NUMBER (-j)" command-line option to specify the number of simultaneous tasks to execute.
SOLUTION:
The solution creates a work queue to which blocks calling the task-prerequisites are added and a thread pool to process them. To prevent deadlock, the task that added the pre-requisites processes items on the queue (alongside the thread pool) until its prerequisites have been processed.
To maintain backward compatibility, not passing -j reverts to the old behavior of unlimited concurrent tasks.
REQUIREMENTS:
The Ruby version requirements remain the same. "multi-task.rb" adds two new requirements: 'thread' and 'set'
I've recently finished an implementation of -j <max_concurrent_jobs> and would be delighted if the members of this list would take a detailed look at it
It's in this rake branch on github:
https://github.com/michaeljbishop/rake/commit/295c7a4d6d58b3e10c27b940a8259dd3e01c52f0
or
http://bit.ly/JMVARy
In addition, I've added a pull-request for the inclusion of the change to the master branch (allowing of course for further changes to match the style and simplicity of the original)
https://github.com/jimweirich/rake/pull/112
My apologies if this has been hashed out many times on this list. I'm hoping to offer a fresh look at the problem. Thank you for your consideration.
Sincerely,
Michael Bishop
---
SUMMARY
-------
USER INTERFACE:
The user-interface is simply a new -j flag that specifies the maximum number of tasks that can execute simultaneously (discussed here before as I've seen). If the -j flag is omitted, the old behavior of unlimited concurrent tasks is retained.
IMPLEMENTATION:
The implementation is inspired by Apple's Grand Central Dispatch model. The core of the problem is solved via changes to the file "multi-task.rb".
The current implementation of MultiTask#invoke_prerequisites spawns a new thread for each prerequisite and then waits to return until all the threads have completed.
In the alternate implementation, MultiTask#invoke_prerequisites creates a list of blocks inside which each prerequisite is called. Each block is then added to a thread-safe Queue for processing. A class-member ThreadPool is then expanded to include enough threads to consume and call the blocks, but only up to the limit as passed to -j.
Then, rather than MultiTask#invoke_prerequisites sleeping by joining to the newly spawned threads, it participates in the processing of the queued blocks while it waits. This prevents deadlock situations should more threads call MultiTask#invoke_prerequisites.
How then does MultiTask#invoke_prerequisites know when its prerequisites have finished? Before enqueing the prerequisite blocks, it surrounds the original prerequisite block with *another* block which maintains bookkeeping tasks as to when the prerequisite is completed and which thread is working on it. That block is what is added to the queue.
Two conditions need to be met for MultiTask#invoke_prerequisites to return:
1 - It notices that its prerequisites have all been processed
2 - It notices there are no more blocks on the queue but its prerequisites are not yet finished. In this case, it joins those threads that are still executing its prerequisites.
What is attractive to me about this implementation is that the flow retains the simplicity of the original: MultiTask#invoke_prerequisites sends all the prerequisites to be executed then waits until they are done.
THE CASE FOR -J
---------------
(Quoted from the pull-request)
PROBLEM SUMMARY:
Rake can be unusable for builds invoking large numbers of concurrent external processes.
PROBLEM DESCRIPTION:
Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes).
SOLUTION SUMMARY:
This request includes the code to add support for a "--jobs NUMBER (-j)" command-line option to specify the number of simultaneous tasks to execute.
SOLUTION:
The solution creates a work queue to which blocks calling the task-prerequisites are added and a thread pool to process them. To prevent deadlock, the task that added the pre-requisites processes items on the queue (alongside the thread pool) until its prerequisites have been processed.
To maintain backward compatibility, not passing -j reverts to the old behavior of unlimited concurrent tasks.
REQUIREMENTS:
The Ruby version requirements remain the same. "multi-task.rb" adds two new requirements: 'thread' and 'set'