Heath Kehoe
2010-05-19 17:13:55 UTC
Hello all,
I have recently implemented a build system based on Rake for a
decently-sized project (~8000 source files) which builds code and data
for several libraries and applications targeting four separate platforms.
I was new to both rake and ruby when I started; and it's a testament to
the awesomeness of both that I was able to learn them quickly to be able
to implement a fairly complicated system in less than a month.
Anyway, I wanted to pass along some comments.
Firstly, when I first got stuff building with rake (replacing a build
environment that used gnu make) I noticed that a null build (where
everything was up to date) took a really long time. I used the ruby
profiler and found that approx. half the run time was spent doing file
stats (exist? and mtime), with an average of 70 calls *per file* during
a rake run. As it runs under Cygwin on Windows those file stat
operations are more expensive than they are on Linux. I should note that
I'm using ruby 1.8.7 and the rake that gem installed (0.8.7).
If you look at the FileTask code, the needed? method calls File.exist?
then timestamp, which calls File.exist? then File.mtime. That's three
stats in a row right there for the file itself; then each prerequisite
is asked for its timestamp which generates two stats for each (for
FileTasks that is). My approach was to create a simple global cache that
uses the filename as the key and stores the file's mtime.
module Rake
# Modify Rake's FileTask to use our cached file tests
class FileTask < Task
def needed?
! File.cached_exist?(name) || out_of_date?(timestamp)
end
def timestamp
if File.cached_exist?(name)
File.cached_mtime(name.to_s)
else
Rake::EARLY
end
end
def execute(args=nil)
ret = super
File.invalidate_cache(name)
ret
end
end
end
The invalidate_cache method simply deletes the cache entry for the given
file; which is necessary if the file was changed by the task's action.
The execute method does this for the FileTasks's own target; if an
action creates or modifies other files as a side-effect, I explicitly
call invalidate_cache in the action block for each side-effect file to
make sure the cache doesn't contain any stale info.
This change resulted in an order of magnitude improvement in run-time.
Now, I'm not saying rake should adopt this specific optimization;
however I think you should consider some type of caching to reduce the
quantity of exist?/mtime calls. Perhaps the FileTask could simply cache
its own exist?/mtime results (invalidated when execute runs).
I have more to say about dependency generation and multitasking, but
I'll send those thoughts in separate emails.
-Heath
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
I have recently implemented a build system based on Rake for a
decently-sized project (~8000 source files) which builds code and data
for several libraries and applications targeting four separate platforms.
I was new to both rake and ruby when I started; and it's a testament to
the awesomeness of both that I was able to learn them quickly to be able
to implement a fairly complicated system in less than a month.
Anyway, I wanted to pass along some comments.
Firstly, when I first got stuff building with rake (replacing a build
environment that used gnu make) I noticed that a null build (where
everything was up to date) took a really long time. I used the ruby
profiler and found that approx. half the run time was spent doing file
stats (exist? and mtime), with an average of 70 calls *per file* during
a rake run. As it runs under Cygwin on Windows those file stat
operations are more expensive than they are on Linux. I should note that
I'm using ruby 1.8.7 and the rake that gem installed (0.8.7).
If you look at the FileTask code, the needed? method calls File.exist?
then timestamp, which calls File.exist? then File.mtime. That's three
stats in a row right there for the file itself; then each prerequisite
is asked for its timestamp which generates two stats for each (for
FileTasks that is). My approach was to create a simple global cache that
uses the filename as the key and stores the file's mtime.
module Rake
# Modify Rake's FileTask to use our cached file tests
class FileTask < Task
def needed?
! File.cached_exist?(name) || out_of_date?(timestamp)
end
def timestamp
if File.cached_exist?(name)
File.cached_mtime(name.to_s)
else
Rake::EARLY
end
end
def execute(args=nil)
ret = super
File.invalidate_cache(name)
ret
end
end
end
The invalidate_cache method simply deletes the cache entry for the given
file; which is necessary if the file was changed by the task's action.
The execute method does this for the FileTasks's own target; if an
action creates or modifies other files as a side-effect, I explicitly
call invalidate_cache in the action block for each side-effect file to
make sure the cache doesn't contain any stale info.
This change resulted in an order of magnitude improvement in run-time.
Now, I'm not saying rake should adopt this specific optimization;
however I think you should consider some type of caching to reduce the
quantity of exist?/mtime calls. Perhaps the FileTask could simply cache
its own exist?/mtime results (invalidated when execute runs).
I have more to say about dependency generation and multitasking, but
I'll send those thoughts in separate emails.
-Heath
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________