Comments from a new rake user

Discussion:

Heath Kehoe

2010-05-19 17:13:55 UTC

Hello all,

I have recently implemented a build system based on Rake for a
decently-sized project (~8000 source files) which builds code and data
for several libraries and applications targeting four separate platforms.

I was new to both rake and ruby when I started; and it's a testament to
the awesomeness of both that I was able to learn them quickly to be able
to implement a fairly complicated system in less than a month.

Anyway, I wanted to pass along some comments.

Firstly, when I first got stuff building with rake (replacing a build
environment that used gnu make) I noticed that a null build (where
everything was up to date) took a really long time. I used the ruby
profiler and found that approx. half the run time was spent doing file
stats (exist? and mtime), with an average of 70 calls *per file* during
a rake run. As it runs under Cygwin on Windows those file stat
operations are more expensive than they are on Linux. I should note that
I'm using ruby 1.8.7 and the rake that gem installed (0.8.7).

If you look at the FileTask code, the needed? method calls File.exist?
then timestamp, which calls File.exist? then File.mtime. That's three
stats in a row right there for the file itself; then each prerequisite
is asked for its timestamp which generates two stats for each (for
FileTasks that is). My approach was to create a simple global cache that
uses the filename as the key and stores the file's mtime.

module Rake
# Modify Rake's FileTask to use our cached file tests
class FileTask < Task
def needed?
! File.cached_exist?(name) || out_of_date?(timestamp)
end

def timestamp
if File.cached_exist?(name)
File.cached_mtime(name.to_s)
else
Rake::EARLY
end
end

def execute(args=nil)
ret = super
File.invalidate_cache(name)
ret
end
end
end

The invalidate_cache method simply deletes the cache entry for the given
file; which is necessary if the file was changed by the task's action.
The execute method does this for the FileTasks's own target; if an
action creates or modifies other files as a side-effect, I explicitly
call invalidate_cache in the action block for each side-effect file to
make sure the cache doesn't contain any stale info.

This change resulted in an order of magnitude improvement in run-time.

Now, I'm not saying rake should adopt this specific optimization;
however I think you should consider some type of caching to reduce the
quantity of exist?/mtime calls. Perhaps the FileTask could simply cache
its own exist?/mtime results (invalidated when execute runs).

I have more to say about dependency generation and multitasking, but
I'll send those thoughts in separate emails.

-Heath

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

Antoine Toulme

2010-05-19 17:22:48 UTC

Permalink

Hey,

what kind of sources were you building ? I'm committer on Buildr and we
built on Rake as well. Just curious.

Antoine

Post by Heath Kehoe
Hello all,
I have recently implemented a build system based on Rake for a
decently-sized project (~8000 source files) which builds code and data for
several libraries and applications targeting four separate platforms.
I was new to both rake and ruby when I started; and it's a testament to the
awesomeness of both that I was able to learn them quickly to be able to
implement a fairly complicated system in less than a month.
Anyway, I wanted to pass along some comments.
Firstly, when I first got stuff building with rake (replacing a build
environment that used gnu make) I noticed that a null build (where
everything was up to date) took a really long time. I used the ruby profiler
and found that approx. half the run time was spent doing file stats (exist?
and mtime), with an average of 70 calls *per file* during a rake run. As it
runs under Cygwin on Windows those file stat operations are more expensive
than they are on Linux. I should note that I'm using ruby 1.8.7 and the rake
that gem installed (0.8.7).
If you look at the FileTask code, the needed? method calls File.exist? then
timestamp, which calls File.exist? then File.mtime. That's three stats in a
row right there for the file itself; then each prerequisite is asked for its
timestamp which generates two stats for each (for FileTasks that is). My
approach was to create a simple global cache that uses the filename as the
key and stores the file's mtime.
module Rake
# Modify Rake's FileTask to use our cached file tests
class FileTask < Task
def needed?
! File.cached_exist?(name) || out_of_date?(timestamp)
end
def timestamp
if File.cached_exist?(name)
File.cached_mtime(name.to_s)
else
Rake::EARLY
end
end
def execute(args=nil)
ret = super
File.invalidate_cache(name)
ret
end
end
end
The invalidate_cache method simply deletes the cache entry for the given
file; which is necessary if the file was changed by the task's action. The
execute method does this for the FileTasks's own target; if an action
creates or modifies other files as a side-effect, I explicitly call
invalidate_cache in the action block for each side-effect file to make sure
the cache doesn't contain any stale info.
This change resulted in an order of magnitude improvement in run-time.
Now, I'm not saying rake should adopt this specific optimization; however I
think you should consider some type of caching to reduce the quantity of
exist?/mtime calls. Perhaps the FileTask could simply cache its own
exist?/mtime results (invalidated when execute runs).
I have more to say about dependency generation and multitasking, but I'll
send those thoughts in separate emails.
-Heath
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email______________________________________________________________________
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel

Heath Kehoe

2010-05-19 18:02:45 UTC

Permalink

Post by Antoine Toulme
Hey,
what kind of sources were you building ? I'm committer on Buildr and
we built on Rake as well. Just curious.
Antoine

C++, along with custom data/asset builders

-h

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

Antoine Toulme

2010-05-19 18:06:27 UTC

Permalink

Cool, let me know if it becomes/is open source, I'd be interested to look at
it :)

Post by Heath Kehoe

Post by Antoine Toulme
Hey,
what kind of sources were you building ? I'm committer on Buildr and we
built on Rake as well. Just curious.
Antoine

C++, along with custom data/asset builders
-h
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email______________________________________________________________________
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel

Heath Kehoe

2010-05-19 19:23:25 UTC

Permalink

Not much chance of it ever being open source, but I should be able to
share snippets of rake code, especially those that are modified versions
of existing rake methods.

-h

Post by Antoine Toulme
Cool, let me know if it becomes/is open source, I'd be interested to
look at it :)
Hey,
what kind of sources were you building ? I'm committer on
Buildr and we built on Rake as well. Just curious.
Antoine
C++, along with custom data/asset builders
-h
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
_______________________________________________
Rake-devel mailing list
http://rubyforge.org/mailman/listinfo/rake-devel
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

James Tucker

2010-05-19 18:42:14 UTC

Permalink

Yeah, this is a problem for ruby on windows generally, and somewhat hard to overcome in the generic case.

Post by Heath Kehoe
If you look at the FileTask code, the needed? method calls File.exist? then timestamp, which calls File.exist? then File.mtime. That's three stats in a row right there for the file itself; then each prerequisite is asked for its timestamp which generates two stats for each (for FileTasks that is). My approach was to create a simple global cache that uses the filename as the key and stores the file's mtime.
Now, I'm not saying rake should adopt this specific optimization; however I think you should consider some type of caching to reduce the quantity of exist?/mtime calls. Perhaps the FileTask could simply cache its own exist?/mtime results (invalidated when execute runs).

It's a bit hacky and incomplete, but here's an intro implementation...

class FileTask < Task
NOFILESTAT = File::Stat.allocate
class << NOFILESTAT
def mtime
Rake::EARLY
end
# TODO fixup other methods that are busted by the fact we had to hack
# with allocate.
end

# Is this file task needed? Yes if it doesn't exist, or if its time stamp
# is out of date.
def needed?
stat == NOFILESTAT || out_of_date?(timestamp)
end

# Time stamp for file task.
def timestamp
stat.mtime
end

def execute(*a)
super.tap { @stat = nil }
end

private
def stat(refresh = false)
return @stat if @stat && !refresh
@stat = begin
File.stat(name)
rescue Errno::ENOENT
NOFILESTAT
end
end
...