Monday, June 24, 2013


Johanna Rothman recently published a series of articles on dzone.com on Software Project Estimation that I thought was quite good.  I would encourage anybody to read it:  http://agile.dzone.com/articles/estimating-unknown-dates-or

I think she has really distilled down the most essential aspects of software estimation.  The most difficult thing about estimating for software projects is that they are almost always new territory.  How can one reliably make an estimate on something one has never done before?
This is much different than estimation for a traditional project, say constructing a building.  In this case, probably 90% or more of the project is identical to many past projects.  So, right off the bat one can expect a variance of less than 10% in the estimation accuracy w/o any effort at all.  This is even easier if the project is similar to past projects (e.g. all commercial office buildings) and one is using largely the same workforce (all the same subcontractors) or skilled tradesmen (i.e. steelworkers) who are largely interchangeable.
Since most software projects are plowing new territory for the first time, one has no past projects on which to base predictions of the future.  In this case, iteration (i.e. successive refinement of an estimate) is the only way to improve on the SWAG that is the first "estimate". I think Kent Beck's book "Extreme Programming" was the first place I read the comparison of hitting a target with either a rifle & bullet vs. a cruise missile.  In the first case, one carefully aims the rifle, calculating the distance to target, wind drift, elevation difference, and many other variables.  This method only works if one has extensive history with this particular rifle, scope, cartridge, and other details.  In the other case, a cruise missile is launched in the general direction of the target and makes continuous course corrections during every of second flight. The cruise missile is able to compensate for headwinds, bad weather, changes in fuel load, and works for targets at any range or elevation within its capability.
I love her suggestion of giving confidence ranges along with estimates.  That really helps to wake up managers (& the whole team) that these estimates have an inherent uncertainty to them (i.e. a probability distribution). It also helps to make clear up front what the unknowns are in the project, in order to help focus people on answering some of those unknowns. This has the beneficial effect that, when an unknown is resolved, one gets a reward in the form of tighter confidence intervals.
Keep up the great work, Johanna!
Alan Thompson

Sunday, June 16, 2013

Simple makefile dependencies using the "touch" command

Many software projects such as C++ or even (gasp!) PL/I have a complicated dependency hierarchy.  For example, a source code file dogSays.cpp may depend on an include file dogSays.h, so that the *.cpp file needs to be recompiled whenever the *.h file is modified.  This may become even more complicated if dogSays.h, in turn, depends on another file such as dogSound.h, etc.

The Unix "make" utility attempts to ease the pain in this process by allowing the user to specify the dependencies for each project artifact in a simple manner, along with the commands needed to rebuild that artifact should it become outdated.  For the example above, we would have a makefile entry such as:

dogSays.o : dogSays.cpp dogSays.h    
  g++ -c $<

This makefile snipped says that the dependent file dogSays.o depends on both of the prerequisite files dogSays.cpp and dogSays.h.  If either of those files has a timestamp newer than dogSays.o, then the object file is "out of date" and must be remade with the command "g++ -c $<", where the "$<" is an automatic makefile variable that will be replaced with the first prerequisite filename, in this case "dogSays.cpp". 

This is all fine and well, but creating and maintaining the required dependency specifications in a makefile is a boring and error-prone process.  Fortunately, this process can be automated to a great degree by using the "-MM" option for gcc and other compilers (please see GNU Automatic PrerequisitesAutodependencies with GNU make, and Advanced Auto-Dependency Generation).  However, you may be using a language that gcc doesn't support (like PL/I) or you may be stuck using a compiler that doesn't have the "-MM" dependency generation option.

So, what to do?

Well, the first thing to do is to realize that your source code is nothing but a simple text file and that searching for lines beginning with "#include" is pretty easy with regular expressions.  We can use regex groups to pick out the name of the included file such as "dogSays.h".  We can then build our own dependency specification file for input to the make utility to automate the whole process.

I wrote a simple Groovy program named Depends.groovy (source code) to parse C/C++ and PL/I source files in order to identify prerequisite include files on lines such as the following: 

#include "dogSays.h"          // C++ normal
#include <iostream>           // C++ system
%include 'someHeader.inc'     // PL1-style

The program also knows the common suffixes used by the various source code files, so it is easy to invoke by either:

> Depends.groovy *

or

> Depends.groovy -r .

As an example, consider the following sample C++ program dog.exe (source code here).  It consists of 2 *.cpp files and 3 *.h include files, creating the following  dependency tree (in outline form):

dog.exe  
  dog.cpp    
    dogSays.h      
      strType.h  
  dogSays.cpp    
    dogSound.h    
    dogSays.h      
      strType.h

So the executable dog.exe depends on both *.cpp files, which in turn depend on all of the *.h files, either directly or indirectly.  Note that dogSays.h depends on strType.h, so dependencies need to be considered in a recursive manner.  Now, using "g++ -MM" will recursively follow all include files to make a complete dependency specification, but we don't need to get that fancy. Instead, we will simply search all *.cpp files and all *.h files, making separate dependency specifications for each of them.  

A makefile uses file system timestamps to decide when a program artifact must be remade.  So, if the file dog.cpp has a more recent timestamp the the corresponding object file dog.o, then make will cause dog.o to be rebuilt using the appropriate compiler command.  The make utility will then recognize that the executable dog.exe is out of date relative to the newly rebuilt dog.o, and will then perform the relevant linker command to rebuild dog.exe.  Note that it really doesn't matter how a file is rebuilt and brought up to date.  The only crucial part is that the timestamp of a file is reset to the current time once the file has been rebuilt.

But, how does one "rebuild" a source file such as dog.cpp if a prerequisite like dogSays.h is updated?  After all, only the object file dog.o really needs to be recompiled - no changes are made to dog.cpp itself. At first, this seems like a conundrum: how can we "rebuild" a file when nothing needs to be done?  The answer is simple!  Just do "nothing" (i.e. a noop).  We do, however, need to signal to make that the file (dog.cpp in this case) has been "rebuilt" and any downstream dependencies must also be rebuilt.  All we need in order to accomplish both the noop and the signalling is so use the unix command touch.

The unix command touch, with syntax "touch <filename>" has no effect on the specified file except to update its timestamp to the current time.  As a side effect, using touch on a nonexistent file will cause an empty file (zero bytes in length) with the specified name  to be created.  So, if any prerequisite include file (*.h or *.inc) has been updated, all we need do is touch the dependency file (e.g. dog.cpp), and make will rebuild the corresponding downstream dependent files in the normal manner.  Note that this also will work for any recursive prerequisites, since the whole point of the make utility is that it can easily understand the entire dependency tree and rebuild any portions that are required.

Let's see an example.  Running Depends.groovy on the 5 source code files for dog.exe produces the following file depSpecs.mk (source code):

dog.cpp : dogSays.h 
  touch $@
dogSays.cpp : dogSound.h dogSays.h 
  touch $@
dogSays.h : strType.h 
   touch $@

So, both *.cpp files depend on dogSays.h, and dogSays.cpp also depends on dogSound.h.  Note that dogSays.h, in turn, depends on strType.h, which recursively affects both of the *.cpp files as well.  So, if strType.h is updated, make will execute the touch command on dogSays.h, which will then cause both dog.cpp and dogSays.cpp to also be updated (the symbol "$@" is an automatic makefile variable which expands to the current target filename).

The dependencies specified in depSpecs.mk are included into the basic makefile (source code).  Note that while the dependency specifications are concerned only with source code files (i.e. *.cpp and *.h files).  The actual object code dependencies are then easy to handle with a single generic dependency rule:

####################################################
Generic build rules
%.o : %.cpp 
  ${CXX} ${CFLAGS} -c $<
This generic make rule says that any file matching the pattern %.o can be built from the corresponding file %.cpp (where the "%" character in makefiles is called the filename stem). As is usual in makefiles, we use variables to hold the explicit build commands. Here, the make variable CXX is set to be "g++" and the variable CFLAGS is empty.

To see all of this in action, we have the following demonstration of partial compilation in action:

> make clean ; make    # Build clean from scratch
rm -f *.o *.exe
g++  -c dog.cpp
g++  -c dogSays.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark

> make    # Run again - no need to recompile
dog.exe
  dog says:  bark

> touch strType.h ; make    # Recompile all dependent files
touch dogSays.h
touch dog.cpp
g++  -c dog.cpp
touch dogSays.cpp
g++  -c dogSays.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark

> touch dogSound.h ; make    # Recompile all dependent files
touch dogSays.cpp
g++  -c dogSays.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark

> touch dogSays.h ; make    # Recompile all dependent files
touch dog.cpp
g++  -c dog.cpp
touch dogSays.cpp
g++  -c dogSays.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark

> touch dogSays.cpp ; make    # Recompile all dependent files
g++  -c dogSays.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark

> touch dog.cpp ; make    # Recompile all dependent files
g++  -c dog.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark
# We can even make a real change, such as switching from 
# C-style strings to the C++ std::string class.
> echo "#define StrType string" > strType.h
> cat strType.h
#define StrType string
> make
touch dogSays.h
touch dog.cpp
g++  -c dog.cpp
touch dogSays.cpp
g++  -c dogSays.cpp
g++  dog.o dogSays.o -o dog.exe
dog.exe
  dog says:  bark

>

So, by using Depends.groovy to assemble the tree of source code dependencies into depSpecs.mk for inclusion into our makefile (line 55), the make utility is able to automatically determine which source files need to be recompiled at any time.  This will work correctly no matter how intricate the dependency tree becomes or how many levels of include file recursion may be present.

The astute reader may notice one wrinkle in this scheme.  As it stands now, the user must manually run Depends.groovy whenever a new #include dependency is added to a source file (or removed).  We'll work on adding that into our makefile in a future post.

Before we leave this post, we should mention one additional option.  If your source code is small and compiles quickly and you don't mind the occasional needless recompilation, you can get by with a much simpler version of the makefile that does not require Depends.groovy.  In this case, simple delete line 55 from the makefile ("include depSpecs.mk") and instead substitute in the lines:

$(wildcard *.cpp) : $(wildcard *.h)
  touch $@

The above makefile snippet specifies a type of "fail-safe" dependency injection.  The "wildcard" function of make will expand the specified file pattern just like a unix shell, so that we get a single line that consists of all *.cpp filenames, a colon, and a list of all *.h filenames.  This specifies to make that every *.cpp file depends on every *.h file so that, if any *.h file is changed, every *.cpp file will be sent to the touch command.  In turn, make will recompile every *.cpp file.  For small projects this simpler & more foolproof option may be the best choice.

Please note that all of the examples in this post were performed with GNU Make (version 3.82).  Versions of make from vendors other than GNU are often older or of inferior capability.  If you do not yet have GNU Make installed on your system I highly recommend that you upgrade to GNU Make at your first opportunity.

----------------------------------------------------------------------------------------------------------
Notes:
2013-6-19 Updated to include makefile source code link

Friday, June 14, 2013

Comparison of Git and Perforce


Hello All,

I recently had to do a comparison of  Git and Perforce at work and thought I'd share the most important points I used in arguing that Git was the best SCM tool for us to adopt.  As usual, the names have been changed to protect the guilty.

Enjoy!
Alan Thompson