PetaPoco
PetaPoco
A tiny ORM-ish thing for your POCOs
PetaPoco is a tiny, fast, single-file micro-ORM for .NET and Mono.
- Like Massive it's a single file that you easily add to any project
- Unlike Massive it works with strongly typed POCO's
- Like Massive, it now also supports dynamic Expandos too - read more
- Like ActiveRecord, it supports a close relationship between object and database table
- Like SubSonic, it supports generation of poco classes with T4 templates
- Like Dapper, it's fast because it uses dynamic method generation (MSIL) to assign column values to properties
Background
PetaPoco was original inspired by Rob Conery's Massive project but for use with non-dynamic POCO objects. It came about because I was finding many of my projects that used SubSonic/Linq were slow or becoming mixed bags of Linq and CodingHorror.
I needed a data acess layer that was tiny, fast, easy to use and could run on .NET 3.5 and/or Mono 2.6 (ie: no support for dynamic expandos). Rob's claim of Massive being only 400 lines of code intruiged me and I wondered if something similar could be done without dynamics.
So, what's with the name? Well if Massive is massive, this is "Peta" massive (it's now over 1,500 lines after all) and since it works with "Poco"s ... "PetaPoco" seemed like a fun name!!
PetaPoco's line count has grown to more than I originally hoped - it's not the tiny 400 lines of Massive. But check out what it can do ... it packs a lot of punch for it's size.
Features at a Glance
- Tiny, no dependencies... a single C# file you can easily add to any project.
- Works with strictly undecorated POCOs, or attributed almost-POCOs.
- Helper methods for Insert/Delete/Update/Save and IsNew
- Paged requests automatically work out total record count and fetch a specific page.
- Easy transaction support.
- Better parameter replacement support, including grabbing named parameters from object properties.
- Great performance by eliminating Linq and fast property assignment with DynamicMethod generation.
- Includes T4 templates to automatically generate POCO classes for you.
- The query language is SQL... no weird fluent or Linq syntaxes (yes, matter of opinion)
- Includes a low friction SQL builder class that makes writing inline SQL much easier.
- Hooks for logging exceptions, installing value converters and mapping columns to properties without attributes.
- Works with SQL Server, SQL Server CE, MySQL, PostgreSQL and Oracle.
- Works under .NET 3.5 or Mono 2.6 and later.
- Experimental support for
dynamic
under .NET 4.0 and Mono 2.8- NUnit unit tests.
- OpenSource (Apache License)
- All of this in about 1,500 lines of code
Download
PetaPoco is available from:
Show Me the Code!
These examples start out more verbose than they need to be but become less so as more features are introduced... make sure you read to the bottom for the full experience. I've explicitly referenced the PetaPoco namespace to make it obvious what comes from where but in reality you'd probably chuck in a
using PetaPoco;
.Also, all of these examples have been hand-typed and never compiled. There are probably typos. If so, please let me know.
No Assembly
PetaPoco is supplied as a single file - PetaPoco.cs. With no dependencies other than what's in the GAC, just add this file to your project and you're set to go...
Running Queries
First define your POCOs:
// Represents a record in the "articles" tablepublic class article{ public long article_id { get; set; } public string title { get; set; } public DateTime date_created { get; set; } public bool draft { get; set; } public string content { get; set; }}Next, create a
PetaPoco.Database
and run the query:// Create a PetaPoco database objectvar db=new PetaPoco.Database("connectionStringName");// Show all articles foreach (var a in db.Query<article>("SELECT * FROM articles")){ Console.WriteLine("{0} - {1}", a.article_id, a.title);}To query a scalar:
long count=db.ExecuteScalar<long>("SELECT Count(*) FROM articles");Or, to get a single record:
var a = db.SingleOrDefault<article>("SELECT * FROM articles WHERE article_id=@0", 123));Paged Fetches
PetaPoco can automatically perform paged requests.
var result=db.Page<article>(1, 20, // <-- page number and items per page "SELECT * FROM articles WHERE category=@0 ORDER BY date_posted DESC", "coolstuff");In return you'll get a PagedFetch object:
public class Page<T> where T:new(){ public long CurrentPage { get; set; } public long ItemsPerPage { get; set; } public long TotalPages { get; set; } public long TotalItems { get; set; } public List<T> Items { get; set; }}Behind the scenes, PetaPoco does the following:
- Synthesizes and executes a query to retrieve the total number of matching records
- Modifies your original query to request just a subset of the entire record set
You now have everything to display a page of data and a pager control all wrapped up in one handy little object!
Query vs Fetch
The Database class has two methods for retrieving records
Query
andFetch
. These are pretty much identical except Fetch returns a List<> of POCO's whereas Query usesyield return
to iterate over the results without loading the whole set into memory.Non-query Commands
To execute non-query commands, use the Execute method
db.Execute("DELETE FROM articles WHERE draft<>0");Inserts, Updates and Deletes
PetaPoco has helpers for insert, update and delete operations.
To insert a record, you need to specify the table and its primary key:
// Create the articlevar a=new article();a.title="My new article";a.content="PetaPoco was here";a.date_created=DateTime.UtcNow;// Insert itdb.Insert("articles", "article_id", a);// by now a.article_id will have the id of the new articleUpdates are similar:
// Get a recordvar a=db.SingleOrDefault<article>("SELECT * FROM articles WHERE article_id=@0", 123);// Change ita.content="PetaPoco was here again";// Save itdb.Update("articles", "article_id", a);Or you can pass an anonymous type to update a subset of fields. In this case only the article's title field will be updated.
db.Update("articles", "article_id", new { title="New title" }, 123);To delete:
// Delete an article extracting the primary key from a recorddb.Delete("articles", "article_id", a);// Or if you already have the ID elsewheredb.Delete("articles", "article_id", null, 123);Decorating Your POCOs
In the above examples, it's a pain to have to specify the table name and primary key all over the place, so you can attach this info to your POCO:
// Represents a record in the "articles" table[PetaPoco.TableName("articles")][PetaPoco.PrimaryKey("article_id")]public class article{ public long article_id { get; set; } public string title { get; set; } public DateTime date_created { get; set; } public bool draft { get; set; } public string content { get; set; }}Now inserts, updates and deletes get simplified to this:
// Insert a recordvar a=new article();a.title="My new article";a.content="PetaPoco was here";a.date_created=DateTime.UtcNow;db.Insert(a);// Update ita.content="Blah blah";db.Update(a);// Delete itdb.Delete(a);There are also other overloads for Update and Delete:
// Delete an articledb.Delete<article>("WHERE article_id=@0", 123);// Update an articledb.Update<article>("SET title=@0 WHERE article_id=@1", "New Title", 123);You can also tell it to ignore certain fields:
public class article{ [PetaPoco.Ignore] public long SomeCalculatedFieldPerhaps { get; set; }}Or, perhaps you'd like to be a little more explicit. Rather than automatically mapping all columns you can use the ExplicitColumns attribute on the class and the Column to indicate just those columns that should be mapped.
// Represents a record in the "articles" table[PetaPoco.TableName("articles")][PetaPoco.PrimaryKey("article_id")][PetaPoco.ExplicitColumns]public class article{ [PetaPoco.Column] public long article_id { get; set; } [PetaPoco.Column] public string title { get; set; } [PetaPoco.Column] public DateTime date_created { get; set; } [PetaPoco.Column] public bool draft { get; set; } [PetaPoco.Column] public string content { get; set; }}This works great with partial classes. Put all your table binding stuff in one .cs file and calculated and other useful properties can be added in a separate file with out thinking about the data layer).
Hey! Aren't there already standard attributes for decorating a POCO's database info?
Well I could use them but there are so few that PetaPoco supports that I didn't want to cause confusion over what it could do.
Hey! Wait a minute... they're not POCO objects!
Your right, the attributes really do break the strict concept of POCO, but if you can live with that they really do make working with PetaPoco easy.
T4 Template
Writing all those POCO objects can soon get tedious and error prone... so PetaPoco includes a T4 template that can automatically write classes for all the tables in your your SQL Server, SQL Server CE, MySQL, PostgreSQL or Oracle database.
Using the T4 template is pretty simple. The git repository includes three files (The NuGet package adds these to your project automatically in the folder
\Models\Generated
).
- PetaPoco.Core.ttinclude - includes all the helper routines for reading the DB schema
- PetaPoco.Generator.ttinclude - the actual template that defines what's generated
- Database.tt - the template itself that includes various settings and includes the two other ttinclude files.
A typical Database.tt file looks like this:
<#@ include file="PetaPoco.Core.ttinclude" #><# // Settings ConnectionStringName = "jab"; Namespace = ConnectionStringName; DatabaseName = ConnectionStringName; string RepoName = DatabaseName + "DB"; bool GenerateOperations = true; // Load tables var tables = LoadTables();#><#@ include file="PetaPoco.Generator.ttinclude" #>To use the template:
- Add the three files to you C# project
- Make sure you have a connection string and provider name set in your app.config or web.config file
- Edit ConnectionStringName property in Records.tt (ie: change it from "jab" to the name of your connection string)
- Save Database.tt.
All going well Database.cs should be generated with POCO objects representing all the tables in your database. To get the project to build you'll also need to add PetaPoco.cs to your project and ensure it is set to compile (NuGet does this for you) .
The template is based on the SubSonic template. If you're familiar with its ActiveRecord templates you'll find PetaPoco's template very similar.
Automatic Select clauses
When using PetaPoco, most queries start with "SELECT * FROM table". Given that we can now grab the table name from the POCO object using the TableName attribute, there's no reason we can't automatically generate this part of the select statement.
If you run a query that doesn't start with "SELECT", PetaPoco will automatically put it in. So this:
// Get a recordvar a=db.SingleOrDefault<article>("SELECT * FROM articles WHERE article_id=@0", 123);can be shortened to this:
// Get a recordvar a=db.SingleOrDefault<article>("WHERE article_id=@0", 123);PetaPoco doesn't actually generate "SELECT *"... rather it picks the column names of the POCO and just queries for those columns.
IsNew and Save Methods
Sometimes you have a POCO and you want to know if it's already in the database. Since we have the primary key all we need to do is check if that property has been set to something other than the default value.
So to test if a record is new:
// Is this a new record if (db.IsNew(a)){ // Yes it is...}And related, there's a Save method that will work out whether to Insert or Update
// Save a new or existing recorddb.Save(a);Transactions
Transactions are pretty simple:
using (var scope=db.Transaction){ // Do transacted updates here // Commit scope.Complete();}Transactions can be nested, so you can call out to other methods with their own nested transaction scopes and the whole lot will be wrapped up in a single transaction. So long as all nested transcaction scopes are completed the entire root level transaction is committed, otherwise everything is rolled back.
Note: for transactions to work, all operations need to use the same instance of the PetaPoco database object. So you'll probably want to use a per-http request, or per-thread IOC container to serve up a shared instance of this object. Personally StructureMap is my favourite for this.
But where's the LINQ stuff?
There isn't any. I've used Linq with Subsonic for a long time now and more and more I find myself descending into CodingHorror for things that:
- can't be done in Linq easily
- work in .NET but not under Mono (especially Mono 2.6)
- don't perform efficiently. Eg: Subsonic's activerecord.SingleOrDefault(x=x.id==123) seems to be about 20x slower than CodingHorror. (See here)
Now that I've got CodingHorror all over the place it bugs me that half the code is Linq and half is SQL.
Also, I've realized that for me the most annoying thing about SQL directly in the code is not the fact that it's SQL but that it's nasty to format nicely and to build up those SQL strings.
So...
PetaPoco's SQL Builder
There's been plenty of attempts at building fluent type API's for building SQL. This is my version and it's really basic.
The point of this is to make formatting the SQL strings easy and to use proper parameter replacements to protect from SQL injection. This is not an attempt to ensure the SQL is syntactically correct, nor is it trying to hold anyone's hand with intellisense.
Here's its most basic form:
var id=123;var a=db.Query<article>(PetaPoco.Sql.Builder .Append("SELECT * FROM articles") .Append("WHERE article_id=@0", id))Big deal right? Well what's cool about this is that the parameter indicies are specific to each
.Append
call:var id=123;var a=db.Query<article>(PetaPoco.Sql.Builder .Append("SELECT * FROM articles") .Append("WHERE article_id=@0", id) .Append("AND date_created<@0", DateTime.UtcNow))You can also conditionally build SQL.
var id=123;var sql=PetaPoco.Sql.Builder .Append("SELECT * FROM articles") .Append("WHERE article_id=@0", id);if (start_date.HasValue) sql.Append("AND date_created>=@0", start_date.Value);if (end_date.HasValue) sql.Append("AND date_created<=@0", end_date.Value);var a=db.Query<article>(sql)Note that each append call uses parameter @0? PetaPoco builds the full list of arguments and updates the parameter indices internally for you.
You can also use named parameters and it will look for an appropriately named property on any of the passed arguments
sql.Append("AND date_created>=@start AND date_created<=@end", new { start=DateTime.UtcNow.AddDays(-2), end=DateTime.UtcNow } );With both numbered and named parameters, if any of the parameters can't be resolved an exception is thrown.
There are also methods for building common SQL stuff:
var sql=PetaPoco.Sql.Builder() .Select("*") .From("articles") .Where("date_created < @0", DateTime.UtcNow) .OrderBy("date_created DESC");SQL Command Tracking
Sometime it's useful to be able to see what SQL was just executed. PetaPoco exposes these three properties:
- LastSQL - pretty obvious
- LastArgs - an object[] array of all arguments passed
- LastCommand - a string that shows the SQL and the arguments
Watching the LastCommand property in the debugger makes it easy to see what just happened!
OnException Handler Routine
PetaPoco wraps all SQL command invocations in try/catch statements. Any exceptions are passed to the virtual OnException method. By logging these exceptions (or setting a breakpoint on this method) you can easily track where an when there are problems with your SQL.
More
The covers most of the basics of working with PetaPoco, but for more please read these blog posts about PetaPoco.
Wednesday, December 28, 2011
Topten Software - petapoco
Friday, December 23, 2011
Thursday, December 15, 2011
John Sheehan : Blog: I love travel sites that auto-populate the search...
I think we do this. Otherwise it will get added to our backlog if I can tomorrow!
The Best Email Client for Mac OS X
an older post and one I wished I had seen before paying for PostBox even though I love it.
Sparrow — Get mail done
I'm using and enjoying PostBox but this looks quite good as well. no time to check it out yet but posting it here for when I do have some spare time.
Plaform Updates: Operation Developer Love - Facebook developers
Last week we introduced the Subscribe Button for Websites, announced the removal of App Profile Pages on February 1st, 2012, provided a recap on our recent Social Games Hack and published a tutorial on debugging Open Graph apps.
Finally, we updated the JavaScript SDK to only support OAuth 2.0. This new JavaScript SDK was first announced in July with the requirement that all apps to migrate by October 1, 2011. If your app was affected, read more about the specific changes that you need to make here. A friendly reminder that it is important to follow the Facebook Developer Blog or the Roadmap for breaking change updates. This can be done by email or RSS.
Platform Changes
This week we completed the discontinuation of the Dashboard APIs with the exception of
dashboard.incrementCount
,dashboard.decrementCount
,dashboard.setCount
anddashboard.getCount
methods. All apps should upgrade to the new Requests 2.0.Upcoming changes on January 1, 2012
- Deprecating the FB.Data.* JS SDK APIs This will be no longer supported and will not be available to new apps.
- Deprecating FB.Canvas.setAutoResize We have renamed FB.Canvas.setAutoResize to FB.Canvas.setAutoGrow so that the method more accurately represents its function. Fb.Canvas.setAutoResize will be removed.
- Deprecating FBML FBML will no longer be supported as of January 1, 2012. Aside from security and privacy related bugs, we will not fix any bugs related to FBML after January 1, 2012. On June 1, 2012 FBML endpoints will be removed from Platform
- All apps will be opted into "Upgrade to Requests 2.0" and "Requests 2.0 Efficient" Existing apps will be opted into “Requests 2.0 Efficient” and "Upgrade to Requests 2.0" migrations and all developers must ensure that they are using the correct request_id format and deleting requests appropriately. Details here
- Enforcing Credits Policy We have added a new policy to the Facebook Credits Terms that prohibits routing Credits from one app to another app without our prior authorization.
2.14 You may not accept Credits in one app and deliver or transfer the purchase to the user in another app without our prior authorization. For example, an app solely designed to facilitate transactions is not permitted.Apps that are not compliant by January 1, 2012 run the risk of having their Credits disabled shortly after.
Please refer to the Platform Roadmap for more info on these and other upcoming changes.
Bug Activity from 12/6 to 12/13
- 210 bugs were reported
- 89 bugs were reproducible and accepted (after duplicates removed)
- 61 bugs were by design
- 27 bugs were fixed
- 71 bugs were duplicate, invalid, or need more information
Bugs fixed from 12/6 to 12/13
- Link to the "How do I get reimbursed" page is wrong
- Duplicate attribute - num_posts - listed on Comment Plugin Documentation Page
- Dead link on http://developers.facebook.com/docs/guides/canvas/#insights
- BREAKING CHANGE: Format of Facebook API Response
- Documentation typo: "JavaScript" misspelled in link "JavaScrtip SDK" at https://developers.facebook.com/docs/reference/javascript/FB.init/
- "format error in Old REST API Documentation page
- New Auth does not send "code" to redirect page if users skips extended permissions
- Requests 1.0 no longer showing request message
- Cannot delete posts to app and fan pages
- Like Box feed photo squished
- connect() timed out!
- server whitelist does not accept ipv6 adddresses
- Facebook comments disappear past May 2011 v2
- Graph API stats for an adgroup - missing unique impressions
- Like Box Error.
- Facebook Send button missing icon
- Live stream box configuration error
- Allow page
- iphone and ipad apps lose URL context on redirect for canvas apps
- Facebook Ads API - Ad group Suggested Bid
- Dark scheme hides links on social plugin
- Broken Link - Mobile SSO Documentation
- fb:comment returning httpS content when called over http
- Notifications auto-scroll not working for game requests
- Live Stream Plugin not displaying posts
- Like button not counting likes for some app pages
- Bad request error
Activity on facebook.stackoverflow.com this week:
- 140 questions asked
- 16 answered, 11% answered rate
- 50 replied, 36% reply rate
Some facebook developer love.
Most Popular Mac Downloads and Posts of 2011
must be careful of disk space on my SSD but here are some useful links of what lifehacker considers the best mac downloads and posts of 2011.
Friday, December 09, 2011
KsigDo. A Knockout, SignalR, ASP.net, MVVM, MVC and EF all mixed together into a multiuser real time sync To Do" example | Coding4Fun Blog | Channel 9
Magnets and mirrors the only missing technology is this wonderful example.Things have moved on somewhat since I added a hit counter to a website back in the 1990's.
CSSCop - Visual Studio Gallery
CSSCop extension made by Mads, the same guy now at Microsoft making cool plugins that will probably end up in Visual Studio.Next
Hello MonoDroid
Quite a few versions to target/support. Hmm. How hard can it be. Seeing as Java is closer to C Sharp than Objective-C perhaps Android development should be native. Will see how I get on with MonoDevelop first.
Hello iPad and hello iPhone
MonoDevelop updated itself and launched this help screen. Next up should be MonoDroid.
Tuesday, November 29, 2011
Wednesday, November 23, 2011
Friday, November 18, 2011
Postbox — Awesome Email
Giving this a trial as a replacement for Mail.App. Seeing if it handles multiple gmail app accounts properly.
Thursday, November 03, 2011
MacHg
Going to give this a go for my mercurial source control work on my new shiny macbook pro
Monday, October 24, 2011
User Interfaces - HTML Edit Control for .NET
A user control for windows form for editing html content. I wanted this ages ago for a project I was working on. Not sure I'll ever need this but I am posting here just in case!
Thursday, October 20, 2011
Your ctor says that your code is headache inducing: Explanation - Ayende @ Rahien
The Single Responsibility Principle as highlighted by Ayende the guy behind RavenDb and NHibernate
Introducing the Microsoft “Roslyn” CTP
Microsoft releasing a CTP of a compiler as a service type project with public api around the .net compiler.
Friday, October 14, 2011
Mercurial-Cheat-Sheet.png (1024×768)
So as our team will be moving to Mercurial as our new shiny source control system next week and not everyone likes using the tortoise hg windows shell extension or source control plug-ins to visual studio.net.The other option is the command line which I am starting to try and learn because it's:
a) quicker in most cases
b) more powerful)
c) cooler
I've dug out a nicely laid out basic cheat sheet for the team to use/ignore
Monday, October 10, 2011
amazon.com really want me don't they
The second person from Amazon on linked in trying to get in touch
Thursday, October 06, 2011
I have finally bitten the bullet and have bought a mac!
Bring on the iPhone and iPad development with monotouch now. 10+ years coding for windows and IIS and now I've gone to the other side for a bit.
Sunday, September 18, 2011
Saturday, September 17, 2011
Windows 8 – First Impression
Visual Studio v.next looks much less of a memory hog looking at the task manager on this first impressions blog post
Thursday, September 15, 2011
if this then that
Like Yahoo Pipes but can be used but those that are not rocket surgeons. Must find time to have a play with this.
ASP.NET MVC 4: The Official Microsoft ASP.NET Site
And MVC continues forward. Just bought a book on MVC3. Sigh.
Like the built in jQuery mobile and adaptive rendering. Its getting better and better.
Organize anything, together. | Trello
From fog creek software, reminds me a little bit of the kanban board that I used from the folks at agilezen. Might give it a trial at some point.
In fact I'll sign up now.
Windows 8 Developer Preview Install Screenshots « Sean’s Stuff
The new metro interface. html5 and css as an alternative to c/c++ or .NET to build apps. WPF and Silverlight?
Wednesday, September 07, 2011
Sunday, September 04, 2011
Book review: Learning Monotouch by Michael Bluestein | Peter van Ooijen
Must get a copy of this chap. Then buy a mac mini or something so I can actually start writing something. The apple tax.
Intro to NoSQL
A good into to what this NoSQL stuff is all about. RavenDB my favourite at the moment just because it's so .net friendly and very very fast.
Red Gate | SQL Compare Review
I bought the entire toolkit at version 5 when I ran my own software development company and did contracting.
It was awesome back then and it looks like it's got even better. I don't use SQL server enough any more these days but if that ever changed the virtual cheque book would be out again.
Thursday, September 01, 2011
Download Windows 7 ISOs to Reinstall Without Restoring Your System
As long as you have a valid licence key. But if your machine didnt come with the original media and you like to pave your machine every now and again then read this.
Exploding Software-Engineering Myths - Microsoft Research
At Microsoft Research, there are computer scientists and mathematicians who live in a world of theory and abstractions. Then there is Nachi Nagappan, who was on loan to the Windows development group for a year while building a triage system for software bugs. For Nagappan, a senior researcher at Microsoft Research Redmond with the Empirical Software Engineering and Measurement Research Group (ESM), the ability to observe software-development processes firsthand is critical to his work.
The ESM group studies large-scale software development and takes an empirical approach. When Nagappan gets involved in hands-on projects with Microsoft development teams, it’s all part of ongoing research in his quest to validate conventional software-engineering wisdom.
Nachi Nagappan“A big part of my learning curve when I joined Microsoft in 2005,” Nagappan says, “was getting familiar with how Microsoft does development. I found then that many of the beliefs I had in university about software engineering were actually not that true in real life.”
That discovery led Nagappan to examine more closely the observations made by Frederick Brooks in The Mythical Man Month, his influential book about software engineering and project management.
“To some degree, The Mythical Man Month formed the foundation of a lot of the work we did,” Nagappan says. “But we also studied other existing assumptions in software engineering. They can be good or bad, because people make decisions based on these assumptions. Our primary goal was to substantiate some of these beliefs in a Microsoft context, so managers can make decisions based on data rather than gut feel or subjective experience.”
More Isn’t Always Better
One assumption Nagappan examined was the relationship between code coverage and software quality. Code coverage measures how comprehensively a piece of code has been tested; if a program contains 100 lines of code and the quality-assurance process tests 95 lines, the effective code coverage is 95 percent. The ESM group measures quality in terms of post-release fixes, because those bugs are what reach the customer and are expensive to fix. The logical assumption would be that more code coverage results in higher-quality code. But what Nagappan and his colleagues saw was that, contrary to what is taught in academia, higher code coverage was not the best measure of post-release failures in the field.
“Furthermore,” Nagappan says, “when we shared the findings with developers, no one seemed surprised. They said that the development community has known this for years.”
The reason is that software quality depends on so many other factors and dynamics that no one metric can predict quality—and not all metrics apply to all projects. Nagappan points out two of the most obvious reasons why code coverage alone fails to predict error rates: usage and complexity.
Code coverage is not indicative of usage. If 99 percent of the code has been tested, but the 1 percent that did not get tested is what customers use the most, then there is a clear mismatch between usage and testing. There is also the issue of code complexity; the more complex the code, the harder it is to test. So it becomes a matter of leveraging complexity versus testing. After looking closely at code coverage, Nagappan now can state that, given constraints, it is more beneficial to achieve higher code coverage of more complex code than to test less complex code at an equivalent level. Those are the kinds of tradeoffs that development managers need to keep in mind.
Write Test Code First
Nagappan and his colleagues then examined development factors that impact quality, another area of software engineering discussed in The Mythical Man Month. One of the recent trends that caught their interest was development practices; specifically, test-driven development (TDD) versus normal development. In TDD, programmers first write the test code, then the actual source code, which should pass the test. This is the opposite of conventional development, in which writing the source code comes first, followed by writing unit tests. Although TDD adherents claim the practice produces better design and higher quality code, no one had carried out an empirical substantiation of this at Microsoft.
“The nice thing about working at Microsoft,” Nagappan says, “is that the development organization is large enough that we could select teams that allowed for an apples-to-apples comparison. We picked three development projects under the same senior manager and looked at teams that used TDD and those that didn’t. We collected data from teams working on Visual Studio, Windows, and MSN and also got data from a team at IBM, since the project was a joint study.”
The study and its results were published in a paper entitled Realizing quality improvement through test driven development: results and experiences of four industrial teams, by Nagappan and research colleagues E. Michael Maximilien of the IBM Almaden Research Center; Thirumalesh Bhat, principal software-development lead at Microsoft; and Laurie Williams of North Carolina State University. What the research team found was that the TDD teams produced code that was 60 to 90 percent better in terms of defect density than non-TDD teams. They also discovered that TDD teams took longer to complete their projects—15 to 35 percent longer.
“Over a development cycle of 12 months, 35 percent is another four months, which is huge,” Nagappan says. “However, the tradeoff is that you reduce post-release maintenance costs significantly, since code quality is so much better. Again, these are decisions that managers have to make—where should they take the hit? But now, they actually have quantified data for making those decisions.”
Proving the Utility of Assertions
Another development practice that came under scrutiny was the use of assertions. In software development, assertions are contracts or ingredients in code, often written as annotations in the source-code text, describing what the system should do rather than how to do it.
“One of our very senior researchers is Turing Award winner Tony Hoare of Microsoft Research Cambridge in the U.K.,” Nagappan says. “Tony has always promoted the utility of assertions in software. But nobody had done the work to quantify just how much assertions improved software quality.”
One reason why assertions have been difficult to investigate is a lack of access to large commercial programs and bug databases. Also, many large commercial applications contain significant amounts of legacy code in which there is minimal use of assertions. All of this contributes to lack of conclusive analysis.
At Microsoft however, there is systematic use of assertions in some Microsoft components, as well as synchronization between the bug-tracking system and source-code versions; this made it relatively easy to link faults against lines of code and source-code files. The research team managed to find assertion-dense code in which assertions had been used in a uniform manner; they collected the assertion data and correlated assertion density to code defects. The results are presented in the technical paper Assessing the Relationship between Software Assertions and Code Quality: An Empirical Investigation, by Gunnar Kudrjavets, a senior development lead at Microsoft, along with Nagappan and Tom Ball.
The team observed a definite negative correlation: more assertions and code verifications means fewer bugs. Looking behind the straight statistical evidence, they also found a contextual variable: experience. Software engineers who were able to make productive use of assertions in their code base tended to be well-trained and experienced, a factor that contributed to the end results. These factors built an empirical body of knowledge that proved the utility of assertions.
The work also brings up another issue: What kind of action should development managers take based on these findings? The research team believes that enforcing the use of assertions would not work well; rather, there needs to be a culture of using assertions in order to produce the desired results. Nagappan and his colleagues feel there is an urgent need to promote the use of assertions and plan to collaborate with academics to teach this practice in the classroom. Having the data makes this easier.
Has there been any feedback from Hoare?
“Absolutely,” Nagappan says. “He followed up and read our work on assertions and was very happy that someone was proving the relationship between assertions and software quality.”
Organizational Structure Does Matter—a Lot.
Nagappan recognized that although metrics such as code churn, code complexity, code dependencies, and other code-related factors have an impact on software quality, his team had yet to investigate the people factor. The Mythical Man Month is most famous for describing how communication overhead increases with the number of programmers on a project, but it also cites Conway’s Law, paraphrased as, “If there are N product groups, the result will be a software system that to a large degree contains N versions or N components.” In other words, the system will resemble the organization building the system.
The first challenge was to somehow describe the relationships between members of a development group. The team settled on using organizational structure, taking the entire tree structure of the Windows group as an example. They took into account reporting structure but also degrees of separation between engineers working on the same project, the level to which ownership of a code base rolled up, the number of groups contributing to the project, and other metrics developed for this study.
The Influence of Organizational Structure on Software Quality: An Empirical Case Study, by Nagappan, Brendan Murphy of Microsoft Research Cambridge, and Victor R. Basili of the University of Maryland, presents startling results: Organizational metrics, which are not related to the code, can predict software failure-proneness with a precision and recall of 85 percent. This is a significantly higher precision than traditional metrics such as churn, complexity, or coverage that have been used until now to predict failure-proneness. This was probably the most surprising outcome of all the studies.
“That took us by surprise,” Nagappan says. “We didn't expect it to be that accurate. We thought it would be comparable to other code-based metrics, but these factors were at least 8 percent better in terms of precision and recall than the closest factor we got from the code measures. Eight percent, on a code base the size of Windows, is a huge difference.”
Geographical Distance Doesn’t Matter— Much.
One of the most cherished beliefs in software project management is that a distributed-development model has a negative impact on software quality because of problems with communication, coordination, culture, and other factors. Again, this meant looking at organizational metrics.
“The fact is,” Nagappan says, “that no one has really studied a large project. Most studies were either based on assumptions or on an outsourced model where a piece of the project is handled outside. But at Microsoft, we don’t outsource product development. Our global development teams are Microsoft employees: the same management structure, access to the same resources.”
But first of all, how do you define “distributed”? The research team took the corporate address book and came up with six degrees of distribution:
- In the same building.
- In different buildings but sharing a cafeteria.
- On the same campus, within walking distance.
- In the same region, within easy driving distance.
- In the same time zone.
- More than three time zones away.
Next, they classified all developers in the Windows team into these buckets. Then they looked for statistical evidence that components developed by distributed teams resulted in software with more errors than components developed by collocated teams.
Does distributed development affect software quality? An empirical case study of Windows Vista—by Christian Bird, University of California, Davis; Nagappan; Premkumar Devanbu, University of California, Davis; Harald Gall, University of Zurich, and Murphy—found that the differences were statistically negligible. In order to verify results, the team also conducted an anonymous survey with researchers Sriram Rajamani and Ganesan Ramalingam in Microsoft Research India, asking engineers who they would talk to if they ran into problems. Most people preferred to talk to someone from their own organization 4,000 miles away rather than someone only five doors down the hall but from a different organization. Organizational cohesiveness played a bigger role than geographical distance.
Putting Research to Work in the Real World
Many of the findings from these papers have been put to use by Microsoft product teams. Some of the tools that shipped with Visual Studio 2005 and Visual Studio 2008 incorporate work from the ESM group, and Microsoft’s risk-analysis and bug-triage system for Windows Vista SP2 made use of the team’s technology for risk estimation and analysis.
Nagappan believes there is value in further exploring organizational metrics but adds that more real-world context needs to be applied, because his studies have been confined to Microsoft development groups. He would like to extend his work to other commercial environments, distributed development teams, and open-source environments.
But there is one point that gives this software-engineering myth buster a great deal of satisfaction.
“I feel that we’ve closed the loop,” Nagappan says. “It started with Conway’s Law, which Brooks cited in The Mythical Man-Month; now, we can show that, yes, the design of the organization building the software system is as crucial as the system itself.”
Share
E-mail this
An interesting post about software engineering myths and a few stats thrown in as well.