Discussion:
Rake 0.9.3.beta.2 with -j option
Jim Weirich
2012-10-22 19:04:47 UTC
Permalink
I've merge Michael Bishop's "-j"/thread pool pull request into the master branch and intend to include it in the next release. I've push the current code base out as Rake 0.9.3.beta.2, so feel free to download it and give it a try.

I am especially interested in developers running on windows to give the -j option with multitasks a spin.

Here's a rakefile I used in playing with the -j option. Try rake with different numbers on the -j option and see how it behaves.

#--Start Rakefile --
#!/usr/bin/ruby -wKU

require 'thread'

$m = Mutex.new

def out(*str)
$m.synchronize do
puts(*str)
end
end

DELAY = 0.1

TASKS = ('a'..'z').map { |prefix| "#{prefix}_task" }

TASKS.each do |name|
desc "#{name} prereq"
subtasks = ('0'..'9').map { |suffix| "#{name}_sub#{suffix}" }
subtasks.each do |name|
task name do
sleep DELAY
end
end
multitask name => subtasks do
sleep DELAY
end
end

multitask :main => TASKS

task :default do
t = Time.now
Rake::Task[:main].invoke
delta = Time.now - t
out "#{delta} seconds have passed"
end
#--End Rakefile --
--
-- Jim Weirich
-- ***@gmail.com
Jos Backus
2012-10-22 19:38:45 UTC
Permalink
Post by Jim Weirich
I've merge Michael Bishop's "-j"/thread pool pull request into the master
branch and intend to include it in the next release. I've push the current
code base out as Rake 0.9.3.beta.2, so feel free to download it and give it
a try.
I was hoping that multitask would be deprecated in favor of task, and that
we could use `-jN' to specify the concurrent set of tasks to be operated
on, as with GNU and other make versions. I believe this is what drake does.
Any reason you didn't merge the drake code, which looks like the more
general solution?

At any rate, thanks for working on Rake!

Cheers,
Jos
--
Jos Backus
jos at catnook.com
Vassilis Rizopoulos
2012-10-22 19:45:13 UTC
Permalink
Post by Jim Weirich
I've merge Michael Bishop's "-j"/thread pool pull request into the
master branch and intend to include it in the next release. I've
push the current code base out as Rake 0.9.3.beta.2, so feel free to
download it and give it a try.
I was hoping that multitask would be deprecated in favor of task, and
that we could use `-jN' to specify the concurrent set of tasks to be
operated on, as with GNU and other make versions. I believe this is
what drake does. Any reason you didn't merge the drake code, which
looks like the more general solution?
At any rate, thanks for working on Rake!
That could be an option for the 10.x series.
rake is so central to the ruby ecosystem that I don't mind a certain
conservative approach to new features.

V.-
--
http://www.ampelofilosofies.gr
Hongli Lai
2012-10-22 20:04:56 UTC
Permalink
On Mon, Oct 22, 2012 at 9:45 PM, Vassilis Rizopoulos <
Post by Vassilis Rizopoulos
That could be an option for the 10.x series.
rake is so central to the ruby ecosystem that I don't mind a certain
conservative approach to new features.
Conservative is one thing, but drake was written 2 years ago. There has
been no response every time someone asks why drake was not merged.

Furthermore, this -j behavior is so different from GNU make and other build
tools that it raises the wrong expectations from users. It should not be
called -j. Reserve -j for when drake is eventually (if ever) merged.
--
Phusion | Ruby & Rails deployment, scaling and tuning solutions

Web: http://www.phusion.nl/
E-mail: ***@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)
Jos Backus
2012-10-22 20:18:38 UTC
Permalink
Post by Hongli Lai
On Mon, Oct 22, 2012 at 9:45 PM, Vassilis Rizopoulos <
Post by Vassilis Rizopoulos
That could be an option for the 10.x series.
rake is so central to the ruby ecosystem that I don't mind a certain
conservative approach to new features.
Conservative is one thing, but drake was written 2 years ago. There has
been no response every time someone asks why drake was not merged.
Furthermore, this -j behavior is so different from GNU make and other
build tools that it raises the wrong expectations from users. It should not
be called -j. Reserve -j for when drake is eventually (if ever) merged.
+1

There's not much extra work right now to merge drake, just integration. If
this use of -j (multitask) catches on, it will be much harder to migrate to
the proper solution as implemented in drake later, so if this change has to
go in, I agree that it should not use -j. Otherwise there will be a
backward compatibility issue, which we don't have right now. Please choose
wisely.

Jos
--
Jos Backus
jos at catnook.com
Jim Weirich
2012-10-23 16:18:09 UTC
Permalink
Conservative is one thing, but drake was written 2 years ago. There has been no response every time someone asks why drake was not merged.
My main problem with drake is that it adds a second task execution engine that is subtly different the mainline rake engine. The difference isn't critical and most projects won't even notice the difference, but having two similar but different engines offends my sensibilities.

If drake were to be merge, I would want to either (a) discard the current engine and use drake's engine exclusively, or (b) make the parallelization mechanism work more closely with the current rake engine.

I know drake uses a dry-run pass to compute the dependency tree, but I'm not sure if the dry run pass uses the regular rake engine (which might impact option (a)) or if it does its own thing.

In any case, a drake merge won't happen in the 0.9.x series as I would like to work out the current bug list and hit some simple features. The Thread pool looked like an easy win and is really needed for the multitask stuff anyways. Michael has also proposed a -m option that implicitly turns tasks into multitasks, and I'm considering that instead of a drake integration.

However, if the -m flag is deemed inadequate, I will probably hold off on the thread pool as well and reconsider a drake move a bit farther down the line.

Thoughts are welcome.

(Postscript: I also have some concerns about turning on parallel execution in arbitrary Rakefiles. I suspect it will work fine in projects that most shell out to compilers and linkers, but Rakefiles that run most Ruby code will probably be broken in ways that are hard to detect and reproduce. If anyone has any ideas on addressing that issue, I would love to hear them.)
--
-- Jim Weirich
-- ***@gmail.com
Jos Backus
2012-10-23 20:34:18 UTC
Permalink
Post by Hongli Lai
Post by Hongli Lai
Conservative is one thing, but drake was written 2 years ago. There has
been no response every time someone asks why drake was not merged.
My main problem with drake is that it adds a second task execution engine
that is subtly different the mainline rake engine. The difference isn't
critical and most projects won't even notice the difference, but having two
similar but different engines offends my sensibilities.
It would trigger my OCD ;)
Post by Hongli Lai
If drake were to be merge, I would want to either (a) discard the current
engine and use drake's engine exclusively, or (b) make the parallelization
mechanism work more closely with the current rake engine.
I know drake uses a dry-run pass to compute the dependency tree, but I'm
not sure if the dry run pass uses the regular rake engine (which might
impact option (a)) or if it does its own thing.
Is this something the drake author could help gain certainty about?
Post by Hongli Lai
In any case, a drake merge won't happen in the 0.9.x series as I would
like to work out the current bug list and hit some simple features. The
Thread pool looked like an easy win and is really needed for the multitask
stuff anyways. Michael has also proposed a -m option that implicitly turns
tasks into multitasks, and I'm considering that instead of a drake
integration.
I like -m better, it avoids a future behavioral change conflict with -j.
Post by Hongli Lai
However, if the -m flag is deemed inadequate, I will probably hold off on
the thread pool as well and reconsider a drake move a bit farther down the
line.
Thoughts are welcome.
(Postscript: I also have some concerns about turning on parallel execution
in arbitrary Rakefiles. I suspect it will work fine in projects that most
shell out to compilers and linkers, but Rakefiles that run most Ruby code
will probably be broken in ways that are hard to detect and reproduce. If
anyone has any ideas on addressing that issue, I would love to hear them.)
But would it not require users to specify some option? Iow, the default
case would not be affected. And if someone specifies a new option, the
documentation could point out that in the case of incomplete dependency
specifications, recipes that depend on pure sequential operation for
correctness could break, and the missing dependencies need to be specified.

Jos
Post by Hongli Lai
--
-- Jim Weirich
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
--
Jos Backus
jos at catnook.com
Jim Weirich
2012-10-23 21:06:02 UTC
Permalink
Post by Jos Backus
It would trigger my OCD ;)
<an aside>
Saw a post that read: "I have CDO. It's like OCD but with the letters in alphabetical order."
</an aside>
Post by Jos Backus
<questions on drake>
Is this something the drake author could help gain certainty about?
Oh, yes. Certainly. The fault is my own laziness.
Post by Jos Backus
<discussion of options>
I like -m better, it avoids a future behavioral change conflict with -j.
Michael's proposal introduces both a -j and -m flag. The -j flag sets the thread pool size and the -m turns tasks into multi-task. The drake behavior is to use -j to do both jobs and leave no way of setting the thread pool for multitasks.
Post by Jos Backus
<problems with arbitrarily turning on multithreading>
But would it not require users to specify some option? Iow, the default case would not be affected. And if someone specifies a new option, the documentation could point out that in the case of incomplete dependency specifications, recipes that depend on pure sequential operation for correctness could break, and the missing dependencies need to be specified.
The problem is not incomplete dependency specifications, but using shared/mutable objects in tasks (that suddenly could be executed in multiple threads). I doubt there is any completely safe way to do this in general, but would like to hear ideas on reducing risk.
--
-- Jim Weirich
-- ***@gmail.com
Jos Backus
2012-10-23 22:38:42 UTC
Permalink
Post by Jim Weirich
Post by Jos Backus
It would trigger my OCD ;)
<an aside>
Saw a post that read: "I have CDO. It's like OCD but with the letters in
alphabetical order."
</an aside>
Heh, good one.
Post by Jim Weirich
Post by Jos Backus
<questions on drake>
Is this something the drake author could help gain certainty about?
Oh, yes. Certainly. The fault is my own laziness.
Okay, just checking :)
Post by Jim Weirich
Post by Jos Backus
<discussion of options>
I like -m better, it avoids a future behavioral change conflict with -j.
Michael's proposal introduces both a -j and -m flag. The -j flag sets the
thread pool size and the -m turns tasks into multi-task. The drake
behavior is to use -j to do both jobs and leave no way of setting the
thread pool for multitasks.
Separating them sounds like it would give us more flexibility. All I was
worried about was mainly a change of semantics of -j down the road. This
approach avoids that, good to hear.
Post by Jim Weirich
Post by Jos Backus
<problems with arbitrarily turning on multithreading>
But would it not require users to specify some option? Iow, the default
case would not be affected. And if someone specifies a new option, the
documentation could point out that in the case of incomplete dependency
specifications, recipes that depend on pure sequential operation for
correctness could break, and the missing dependencies need to be specified.
The problem is not incomplete dependency specifications, but using
shared/mutable objects in tasks (that suddenly could be executed in
multiple threads). I doubt there is any completely safe way to do this in
general, but would like to hear ideas on reducing risk.
Ah, so it's a general thread-safety issue.

Thanks, Jim.

Jos
--
Jos Backus
jos at catnook.com
Mark Watson
2012-10-23 20:54:38 UTC
Permalink
What about having the old code called by default and if you specify -j
the new parallel code is executed? That way old rakefiles still work,
and new ones can take advantage of the -j feature (after all that was
good enough for GNUmake). This is what I've done with my own
parallelization patch (From the number of patches it seems -j is
certainly a much wanted rake feature! :)

https://github.com/watsonmw/rakecpp/blob/master/minusj/minusj.rb
Post by Jos Backus
Post by Jim Weirich
Post by Hongli Lai
Conservative is one thing, but drake was written 2 years ago. There has
been no response every time someone asks why drake was not merged.
My main problem with drake is that it adds a second task execution engine
that is subtly different the mainline rake engine. The difference isn't
critical and most projects won't even notice the difference, but having two
similar but different engines offends my sensibilities.
It would trigger my OCD ;)
Post by Jim Weirich
If drake were to be merge, I would want to either (a) discard the current
engine and use drake's engine exclusively, or (b) make the parallelization
mechanism work more closely with the current rake engine.
I know drake uses a dry-run pass to compute the dependency tree, but I'm
not sure if the dry run pass uses the regular rake engine (which might
impact option (a)) or if it does its own thing.
Is this something the drake author could help gain certainty about?
Post by Jim Weirich
In any case, a drake merge won't happen in the 0.9.x series as I would
like to work out the current bug list and hit some simple features. The
Thread pool looked like an easy win and is really needed for the multitask
stuff anyways. Michael has also proposed a -m option that implicitly turns
tasks into multitasks, and I'm considering that instead of a drake
integration.
I like -m better, it avoids a future behavioral change conflict with -j.
Post by Jim Weirich
However, if the -m flag is deemed inadequate, I will probably hold off on
the thread pool as well and reconsider a drake move a bit farther down the
line.
Thoughts are welcome.
(Postscript: I also have some concerns about turning on parallel execution
in arbitrary Rakefiles. I suspect it will work fine in projects that most
shell out to compilers and linkers, but Rakefiles that run most Ruby code
will probably be broken in ways that are hard to detect and reproduce. If
anyone has any ideas on addressing that issue, I would love to hear them.)
But would it not require users to specify some option? Iow, the default case
would not be affected. And if someone specifies a new option, the
documentation could point out that in the case of incomplete dependency
specifications, recipes that depend on pure sequential operation for
correctness could break, and the missing dependencies need to be specified.
Jos
Post by Jim Weirich
--
-- Jim Weirich
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
--
Jos Backus
jos at catnook.com
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
Jim Weirich
2012-10-23 23:05:47 UTC
Permalink
Post by Mark Watson
What about having the old code called by default and if you specify -j
the new parallel code is executed? That way old rakefiles still work,
and new ones can take advantage of the -j feature
So you check out a new project from GitHub and decide to run rake on it. How do you decide if its safe to run with -j or not? Try it and see? Wait for subtle unreproducible race conditions to manifest?
Post by Mark Watson
(after all that was good enough for GNUmake).
GNUMake mainly deals with shelling out to commands. I suspect Rakefiles that mainly shell out to compilers and linkers will have little problem with -j.

It's the Rakefiles that execute significant Ruby code in process that I'm concerned about. And maybe I'm overly concerned about this issue, but I've dealt with real-time systems and multiple threads in a past life and know how tricky it can be to get things right.[1]
--
-- Jim Weirich
-- ***@gmail.com

[1] Ask me sometime about my 1 in a million failure.
Mark Watson
2012-10-24 00:30:49 UTC
Permalink
With GNUMake it usually safe to assume that a project will *not* work
with -j by default. Like you said there are probably a bunch of
subtle and not so subtle race conditions. Even if the developers of a
makefile use -j, you can pretty sure it doesn't work for some build
targets. So, yeah, I agree that would argue in favor of multitask and
the developer of the rakefile making explicit that they want to allow
a task to execute it's dependencies in parallel.
Post by Jim Weirich
Post by Mark Watson
What about having the old code called by default and if you specify -j
the new parallel code is executed? That way old rakefiles still work,
and new ones can take advantage of the -j feature
So you check out a new project from GitHub and decide to run rake on it. How do you decide if its safe to run with -j or not? Try it and see? Wait for subtle unreproducible race conditions to manifest?
Post by Mark Watson
(after all that was good enough for GNUmake).
GNUMake mainly deals with shelling out to commands. I suspect Rakefiles that mainly shell out to compilers and linkers will have little problem with -j.
It's the Rakefiles that execute significant Ruby code in process that I'm concerned about. And maybe I'm overly concerned about this issue, but I've dealt with real-time systems and multiple threads in a past life and know how tricky it can be to get things right.[1]
--
-- Jim Weirich
[1] Ask me sometime about my 1 in a million failure.
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
Hugh Sasse
2012-10-24 09:41:12 UTC
Permalink
Post by Jim Weirich
Post by Mark Watson
What about having the old code called by default and if you specify -j
the new parallel code is executed? That way old rakefiles still work,
and new ones can take advantage of the -j feature
So you check out a new project from GitHub and decide to run rake on it. How do you decide if its safe to run with -j or not? Try it and see? Wait for subtle unreproducible race conditions to manifest?
I've done a little of this parallel programming in Ruby for an EM
solver, and it does get tricky to find this sort of bug. And I
tried to simplify it with Tuplespaces. Does any of this community
have contacts in the Fortran 90,95,2003,2008 community? From what
I have read of modern Fortran, the compilers are pretty good (i.e.
much better than me) at figuring this stuff out), so there may be
things that could be learned. The question then becomes: "Is it
tractable for a dynamic language like Ruby?". Also, do the
algorithms permit one to detect certainty of success, so one can
reject parallel approaches if it comes back "uncertain"?

Actually, this is beginning to sound like a PhD project.
Post by Jim Weirich
Post by Mark Watson
(after all that was good enough for GNUmake).
GNUMake mainly deals with shelling out to commands. I suspect Rakefiles that mainly shell out to compilers and linkers will have little problem with -j.
Although GNUmakefiles probably make more use of variables than traditional
ones do, this is essentially true.
Post by Jim Weirich
It's the Rakefiles that execute significant Ruby code in process that I'm concerned about. And maybe I'm overly concerned about this issue, but I've dealt with real-time systems and multiple threads in a past life and know how tricky it can be to get things right.[1]
--
-- Jim Weirich
[1] Ask me sometime about my 1 in a million failure.
Quite often enough at GHz speeds running for days, weeks!
Hugh
Post by Jim Weirich
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
T***@Robitzki.de
2012-10-27 19:03:50 UTC
Permalink
Hello to all,
Post by Jim Weirich
So you check out a new project from GitHub and decide to run rake on it. How do you decide if its safe to run with -j or not? Try it and see? Wait for subtle unreproducible race conditions to manifest?
one solution could be do have an API function that must have been called from the rakefile to allow concurrent execution of Tasks. If that function wasn't called, -j defaults to 1 (is ignored). This has the drawback that a rakefile has to explicitly enable parallel execution but on the other side, thread unsafe rakefile won't executed in parallel. Example:

rakefile:

enable_parallel

task :one do
#compile file one
end

task :two do
#compile file two
end

task :all => [:one, :two] do
#link file one and two
end

running rake with -j 2 could execute task :one and :two in parallel. Without the call to enable_parallel(), -j would effectively ignored.

kind regards
Torsten

Michael Bishop
2012-10-23 20:03:45 UTC
Permalink
Hi Everyone,

I've been thinking about this question of the Drake implementation vs. the ThreadPool implementation and I wanted to share my thoughts. I had no idea the resulting email would be so long. It's my hope to offer interesting points for discussion.

These are all ordered by importance so you can bail when you like :)

Please bear with me...


What Should -j mean? (Part 1.)

There are two features for which I've made pull requests:

1 - Limit the number of concurrent tasks executing.
2 - All tasks process their prerequisites in parallel.

Both of these features are activated with separate flags: -j and -m, respectively. Neither feature requires the other. They are complementary.

Drake uses one flag to specify both features but there is no technical reason why Rake couldn't also activate both features with a single -j.

I raise this to separate the issue of "what -j means" from the possibly larger issue of the advantages of the drake implementation.


A Perk of the ThreadPool Implementation

The reason I ask if the issue isn't simply about "what -j means" is because the drake implementation is documented as breaking the existing contract exposed by the Rake API. From the drake page ( http://quix.github.com/rake/files/doc/parallel_rdoc.html ):

Task#invoke inside Task#invoke

Parallelizing tasks means surrendering control over the micro-management
of their execution. Manually invoking tasks inside other tasks is rather
contrary to this notion, throwing a monkey wrench into the system. An
exception will be raised when this is attempted in -j mode.

The ThreadPool implementation does not share this same limitation or limit any features of the Rake API.

[A use case for this is below...]


What Should -j mean? (Part 2.)

As a Rakefile author, I have found a lot of utility in being able to incrementally parallelize my Rakefile. Allowing both task and multitask enables me to quickly activate parallelization for a section of my Rakefile. I like that if I've detected a parallelization bug, I can quickly fix it by simply removing the parallelization for that section, leaving the rest of the file to remain in parallel (which hopefully still maintains good performance). I've been grateful for those times when I can quickly fix the build by changing a multitask to a task.

Being able to choose between task and multitask has always seemed to me a gentler way to allow authors to parallelize their Rakefiles while retaining the power to really take advantage of the machine upon which it runs.

That's why I like the separation of the -m option.


Use Case For Task#invoke inside Task#invoke

Being able to call and activate tasks on the fly is also important to me because the build system at my job uses Task#invoke from within another Task#invoke. It's possible that I'm misusing Rake (and if so, this is a great opportunity for me to get a better solution from the community).

Here's how we use Task#invoke:

Our build system has a packaging component which creates a deployable "package" containing variations of the product, and a collection of global items used by all variations. For each product variation, there is a binary of the build with its corresponding symbol files.

Package
-------
- variations
- debug
- product.exe
- product.pdb
- release
- ...
- debug-only-feature-A
- release-only-feature-B
- etc...
- global-items
- assets
- manifest
- etc...

We need to be able to specify at the rake command-line:
- Which variations will be included
- Overall options that affect every variation in the package

I tried to write a Rakefile that would take all those options and build a giant dependency tree. Inside a enumeration of variations would be a declaration for the current variation for our :build task. The :build task would be declared with a unique name based on the configuration, essentially creating a parametrized task (akin to C++ templates). These would all depend on a resulting :package task. Each variation would depend on a prerequisite, which would all depend on a single task :preprocess_assets

Here's pseudo-code:

multitask :preprocess_assets => asset_tasks do |t,args|
[code]
end

variations.each do |variation|

task "build_prereq(#{variation.to_s})" => :preprocess_assets do |t,args|
[code]
end

task "build(#{variation.to_s})" => "build_prereq(#{variation.to_s})" do |t,args|
[use variation in build code]
end

task :package => "build(#{variation.to_s})"

end

task :package do |t,args|
[packaging code]
end

Here's an ascii diagram (note that there were many more variables than "conf" and "features"):

[asset,asset,...] <-- (in parallel)
|
:preprocess_assets ------------------------------------
/ | \ \
"build_prereq(conf=release,features=A,B) | "build_prereq(conf=debug,features=A,B)" |
| "build_prereq(conf=debug,features=A)" / "build_prereq(conf=release,features=B)"
| | / /
"build(conf=release,features=A,B) | "build(conf=debug,features=A,B)" /
| "build(conf=debug,features=A)" / "build(conf=release,features=B)"
\ | / /
\ \ / /
----------------------------- :package -------


It seemed very straightforward, but it was difficult to read and debug the Rakefile. All the task names were generated (making them hard to find in the code when referenced from rake output) and the tree was very large.

Using Task#invoke allowed me to get rid of all the parameterization and create a Rakefile that better matched the flow of the process and was simpler to read.

multitask :preprocess_assets => asset_tasks do |t,args|
[code]
end

task :build_prereq, [:conf, :features] => :preprocess_assets do |t,args|
[code]
end

task :build, [:conf, :features] => :build_prereq do |t,args|
[use args]
end

task :package do |t,args|

variations.each do |variation|
Rake::Task[:build].invoke(*variation)
[reenable :build and its prerequisites]
end

[packaging code]
end


Here's an ascii diagram

[asset,...] <-- (in parallel)
|
:preprocess_assets
|
:build_prereq
|
:build <--loops over-- :package



Keeping Rake Flexible

On a more general note, Rake has always been presented to me as an API to enable dependency-based programming and the DSL is a (significant) perk enabling writing a dependency tree in a declarative style. But as far as I know, there has never a formal boxing of the Rake system into "declare tasks" mode and "execute tasks" mode which it seems the drake implementation encourages, if not requires.


Thank you for making it this far. I look forward to the discussion generated by these points.

Sincerely,

_ michael bishop
Post by Jim Weirich
Conservative is one thing, but drake was written 2 years ago. There has been no response every time someone asks why drake was not merged.
My main problem with drake is that it adds a second task execution engine that is subtly different the mainline rake engine. The difference isn't critical and most projects won't even notice the difference, but having two similar but different engines offends my sensibilities.
If drake were to be merge, I would want to either (a) discard the current engine and use drake's engine exclusively, or (b) make the parallelization mechanism work more closely with the current rake engine.
I know drake uses a dry-run pass to compute the dependency tree, but I'm not sure if the dry run pass uses the regular rake engine (which might impact option (a)) or if it does its own thing.
In any case, a drake merge won't happen in the 0.9.x series as I would like to work out the current bug list and hit some simple features. The Thread pool looked like an easy win and is really needed for the multitask stuff anyways. Michael has also proposed a -m option that implicitly turns tasks into multitasks, and I'm considering that instead of a drake integration.
However, if the -m flag is deemed inadequate, I will probably hold off on the thread pool as well and reconsider a drake move a bit farther down the line.
Thoughts are welcome.
(Postscript: I also have some concerns about turning on parallel execution in arbitrary Rakefiles. I suspect it will work fine in projects that most shell out to compilers and linkers, but Rakefiles that run most Ruby code will probably be broken in ways that are hard to detect and reproduce. If anyone has any ideas on addressing that issue, I would love to hear them.)
--
-- Jim Weirich
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
Vassilis Rizopoulos
2012-10-24 10:12:07 UTC
Permalink
Post by Michael Bishop
Hi Everyone,
*A Perk of the ThreadPool Implementation*
The reason I ask if the issue isn't simply about "what -j means" is
because the drake implementation is documented as breaking the existing
contract exposed by the Rake API. From the drake page (
Task#invoke inside Task#invoke
Parallelizing tasks means surrendering control over the
micro-management
of their execution. Manually invoking tasks inside other tasks is rather
contrary to this notion, throwing a monkey wrench into the system. An
exception will be raised when this is attempted in -j mode.
The ThreadPool implementation does not share this same limitation or
limit any features of the Rake API.
[A use case for this is below...]
I have a much better use case and since my patch for allowing this
within the tasks was rejected because of abuse potential I'm dreading
losing the ability to use invoke within tasks.
What I do is

task build
t=calculate_task_with_dynamic_dependencies(params)
Task[t].invoke
end

Now for various reasons import and other tricks do not work for my use
case (there's a bit more info on the pull request for dynamic prereqs
https://github.com/jimweirich/rake/pull/103) but the above idiom works
really well.
Not allowing it would be fatal for my system.
I'll also +1 the differentiation of -m and -j.
Much prefer explicitly specifying MultiTask instead of having to hunt
down subtle race condition and resource contention bugs because of the
implicitly multi threaded environment.
Cheers,
V.-
--
http://www.ampelofilosofies.gr
Loading...