Discussion:
ANN: (yet another) version of -j implementation (includes -m for multitask)
Michael Bishop
2012-04-27 13:54:53 UTC
Permalink
Hello Everyone,

My last implementation of -j had a fatal flaw which could have led to stack overflow for large amounts of tasks. I discovered this after prototyping a version with drake's feature of turning tasks into multitasks.

I have a new version with a rewritten MultiTask implementation which removes that flaw, is smaller, and easily fits into Rake code. In addition, I've added an optional "--multitask -m" flag to turn every task into a multitask (in direct homage to drake).

https://github.com/michaeljbishop/rake

I've again added a pull-request for the inclusion of the change to the master branch (as always allowing for further changes to match the style and simplicity of the original)

https://github.com/jimweirich/rake/pull/113

Questions and comments as always, are welcome.

Sincerely,

Michael Bishop


---


## PROBLEM SUMMARY (THE CASE FOR -j and -m)

Rake can be unusable for builds invoking large numbers of concurrent external processes.

## PROBLEM DESCRIPTION

Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes).

## SOLUTION SUMMARY

This request includes the code to add support for a `--jobs NUMBER (-j)` command-line option to specify the number of simultaneous tasks to execute.

* To maintain backward compatibility, not passing `-j` reverts to the old behavior of unlimited concurrent tasks.

As a nod to [drake](http://drake.rubyforge.org), a `--multitask (-m)` flag is also included which when supplied, changes tasks into multitasks.

## SOLUTION

Rather than spawning a new thread per prerequisite `MultiTask` now sends its prerequisites to a `WorkerPool` object. `WorkerPool.new(n).execute_blocks` has the same semantics as `Thread.new`...`join` but caps the thread count at `n`.

### Core Change

threads = @prerequisites.collect { |p|
Thread.new(p) { |r| application[r, @scope].invoke_with_call_chain(args, invocation_chain) }
}
threads.each { |t| t.join }

...becomes...

@@wp ||= WorkerPool.new(application.options.thread_pool_size)

blocks = @prerequisites.collect { |r|
lambda { application[r, @scope].invoke_with_call_chain(args, invocation_chain) }
}
@@wp.execute_blocks blocks


To support `-m`, the `MultiTask` implementation has moved to `Task#invoke_prerequisites_concurrently` and is called from `MultiTask#invoke_prerequisites`. This enables concurrent behavior for `Task` when `-m` is used.

### Details

`WorkerPool#execute_blocks` adds the passed-in blocks to a queue, ensures there are enough threads to execute them (under the maximum), and sleeps the current thread until the blocks are processed.
What if all of the blocks then called `#execute_blocks`? Wouldn't that sleep all the threads?
Yes it would. This is solved as `#execute_blocks` removes the current thread from the thread pool just before it sleeps and creates a new one in its place. When all the blocks are processed, the current thread is added back to the pool (adjusting for the max-size). There are always enough available threads in the thread pool for processing.
When do the threads shutdown?
`WorkerPool#execute_blocks` knows how many threads are waiting for their blocks to be processed. If, upon its awakening, it notices there are no threads waiting on blocks, it shuts down the thread pool.

### Statistics

---LINES-- ----LOC---
old new old new File Name
---------- ---------- ----------
598 605 477 484 lib/rake/application.rb
16 13 11 8 lib/rake/multi_task.rb
327 341 210 222 lib/rake/task.rb
111 80 lib/rake/worker_pool.rb
4264 4393 2696 2792 TOTAL
--------------------------------------
+129 +96 SUMMARY

## TESTS

Tests are included for all new functionality

## REQUIREMENTS

The Ruby version requirements remain the same. `lib/rake/worker_pool.rb` adds two new requirements: `thread` and `set`
Loading...