Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator


Music on the page is just a potential. Like software, it changes when it is implemented.

Music on the page is just a potential. Like software, it changes when it is implemented.

The twelve principles that underpin the Agile Manifesto include several that link the concept of value to the delivery of working software. The focus on working software stems from one of the four values, “Working software over comprehensive documentation,” which is a reaction to projects and programs that seem to value reports and PowerPoint presentations more than putting software in the hands of users. For a typical IT organization that develops, enhances and maintains the software that the broader organization uses to do their ultimate business, value is only delivered when software can be used in production. Implementing software provides value through the following four mechanisms:

  1. Validation – In order to get the point where software is written, tested and implemented a number of  decisions must be make.  The process of implementing functional software and getting real people to use the software provides a tool to validate not only the ideas that the software represents, but also the assumptions that were made to prioritize the need and build the software. Implementing and using software provides the information needed to validate the ideas and decision made during the process.
  2. Real life feedback – The best feedback is generated when users actually have to use the software to do their job in their day-to-day environment. Reviews and demonstrations are a great tool to generate initial feedback, however those are artificial environments that lack the complexity of most office environments.
  3. Proof of performance – One of the most salient principles of the Agile Manifesto is that working software is the primary measure of progress. The delivery of valuable working software communicates with the wider organizational community that they are getting something of value for their investment.
  4. Revenue – In scenarios in which the software being delivered, enhanced or maintained is customer facing until it is in use it can’t generate revenue whether the implementation is a new software supported product or improvement in the user experience of an existing product.

In most scenarios,  software that is both in production and being used creates value for the organization. Software that is either being worked on or sitting in library waiting to be implemented into production might have potential value, but that potential has little real value unless it can be converted. In batteries, the longer we wait to convert potential energy into kinetic energy the less energy that exists because the capacity of the battery decays over time. In any reasonably dynamic environment information, like the capacity of a battery, decays over time. Software requirements and the ideas encompassed by the physical software also decay over time as the world we live and work in changes. Bottom line: If the software is not in production, we can’t get value from using it, nor can we get feedback that tells us that if the work environment it will someday run in is changing; therefore, all we have is a big ball of uncertainty.  And, as we know, uncertainty reduces value.

Categories: Process Management

Connect With the World Around You Through Nearby APIs

Android Developers Blog - Tue, 08/04/2015 - 23:51

Originally posted on the Google Developers blog.

Posted by Akshay Kannan, Product Manager

Mobile phones have made it easy to communicate with anyone, whether they’re right next to you or on the other side of the world. The great irony, however, is that those interactions can often feel really awkward when you're sitting right next to someone.

Today, it takes several steps -- whether it’s exchanging contact information, scanning a QR code, or pairing via bluetooth -- to get a simple piece of information to someone right next to you. Ideally, you should be able to just turn to them and do so, the same way you do in the real world.

This is why we built Nearby. Nearby provides a proximity API, Nearby Messages, for iOS and Android devices to discover and communicate with each other, as well as with beacons.

Nearby uses a combination of Bluetooth, Wi-Fi, and inaudible sound (using the device’s speaker and microphone) to establish proximity. We’ve incorporated Nearby technology into several products, including Chromecast Guest Mode, Nearby Players in Google Play Games, and Google Tone.

With the latest release of Google Play services 7.8, the Nearby Messages API becomes available to all developers across iOS and Android devices (Gingerbread and higher). Nearby doesn’t use or require a Google Account. The first time an app calls Nearby, users get a permission dialog to grant that app access.

A few of our partners have built creative experiences to show what's possible with Nearby.

Edjing Pro uses Nearby to let DJs publish their tracklist to people around them. The audience can vote on tracks that they like, and their votes are updated in realtime.

Trello uses Nearby to simplify sharing. Share a Trello board to the people around you with a tap of a button.

Pocket Casts uses Nearby to let you find and compare podcasts with people around you. Open the Nearby tab in Pocket Casts to view a list of podcasts that people around you have, as well as podcasts that you have in common with others.

Trulia uses Nearby to simplify the house hunting process. Create a board and use Nearby to make it easy for the people around you to join it.

To learn more, visit

Join the discussion on

+Android Developers
Categories: Programming

M Developer Preview Gets Its First Update

Android Developers Blog - Tue, 08/04/2015 - 23:48

By Jamal Eason, Product Manager, Android

Earlier this summer at Google I/O, we launched the M Developer Preview. The developer preview is an early access opportunity to test and optimize your apps for the next release of Android. Today we are releasing an update to the M Developer Preview that includes fixes and updates based on your feedback.

What’s New

The Developer Preview 2 update includes the up to date M release platform code, and near-final APIs for you to validate your app. To provide more testing support, we have refined the Nexus system images and emulator system images with the Android platform updates. In addition to platform updates, the system images also include Google Play services 7.6.

How to Get the Update

If you are already running the M developer preview launched at Google I/O (Build #MPZ44Q) on a supported Nexus device (e.g. Nexus 5, Nexus 6, Nexus 9, or Nexus Player), the update can be delivered to your device via an over-the-air update. We expect all devices currently on the developer preview to receive the update over the next few days. We also posted a new version of the preview system image on the developer preview website. (To view the preview website in a language other than English, select the appropriate language from the language selector at the bottom of the page).

For those developers using the emulator, you can update your M preview system images via the SDK Manager in Android Studio.

What are the Major Changes?

We have addressed many issues brought up during the first phase of the developer preview. Check out the release notes for a detailed list of changes in this update. Some of the highlights to the update include:

  • Android Platform Changes:
    • Modifications to platform permissions including external storage, Wi-Fi & Bluetooth location, and changes to contacts/identity permissions. Device connections through the USB port are now set to charge-only mode by default. To access the device, users must explicitly grant permission.
  • API Changes:
    • Updated Bluetooth Stylus APIs with updated callback events. View.onContextClickListener and GestureDetector.OnContextClickListener to listen for stylus button presses and to perform secondary actions.
    • Updated Media API with new callback InputDevice.hasMicrophone() method for determining if a device microphone exists.
  • Fixes for developer-reported issues:
    • TextInputLayout doesn't set hint for embedded EditText. (fixed issue)
    • Camera Permission issue with Legacy Apps (fixed issue)
Next Steps

With the final M release still on schedule for this fall, the platform features and API are near final. However, there is still time to report critical issues as you continue to test and validate your apps on the M Developer Preview. You can also visit our M Developer Preview community to share ideas and information.

Thanks again for your support. We look forward to seeing your apps that are ready to go for the M release this fall.

Join the discussion on

+Android Developers
Categories: Programming

An update on Eclipse Android Developer Tools

Android Developers Blog - Tue, 08/04/2015 - 23:44
Posted by Jamal Eason, Product Manager, Android
Over the past few years, our team has focused on improving the development experience for building Android apps with Android Studio. Since the launch of Android Studio, we have been impressed with the excitement and positive feedback. As the official Android IDE, Android Studio gives you access to a powerful and comprehensive suite of tools to evolve your app across Android platforms, whether it's on the phone, wrist, car or TV.
To that end and to focus all of our efforts on making Android Studio better and faster, we are ending development and official support for the Android Developer Tools (ADT) in Eclipse at the end of the year. This specifically includes the Eclipse ADT plugin and Android Ant build system.

Time to Migrate If you have not had the chance to migrate your projects to Android Studio, now is the time. To get started, download Android Studio. For many developers, migration is as simple as importing your existing Eclipse ADT projects in Android Studio with File → New→ Import Project as shown below:

For more details on the migration process, check out the migration guide. Also, to learn more about Android Studio and the underlying build system, check out this overview page.

Next Steps Over the next few months, we are migrating the rest of the standalone performance tools (e.g. DDMS, Trace Viewer) and building in additional support for the Android NDK into Android Studio.
We are focused on Android Studio so that our team can deliver a great experience on a unified development environment. Android tools inside Eclipse will continue to live on in the open source community via the Eclipse Foundation. Check out the latest Eclipse Andmore project if you are interested in contributing or learning more.
For those of you that are new to Android Studio, we are excited for you to integrate Android Studio into your development workflow. Also, if you want to contribute to Android Studio, you can also check out the project source code. To follow all the updates on Android Studio, join our Google+ community.

Categories: Programming

Project Tango Tablet Development Kits coming to select countries

Google Code Blog - Tue, 08/04/2015 - 19:32

Posted by Larry Yang, Product Manager, Project Tango

Project Tango Tablet Development Kits are available in South Korea and Canada starting today, and on August 26, will be available in Denmark, Finland, France, Germany, Ireland, Italy, Norway, Sweden, Switzerland, and the United Kingdom. The dev kit is intended for software developers only. To order a device, visit the Google Store.

Project Tango is a mobile platform that uses computer vision to give devices the ability to sense the world around them. The Project Tango Tablet Development Kit is a standard Android device plus a wide-angle camera, a depth sensing camera, accurate sensor timestamping, and a software stack that exposes this new computer vision technology to application developers. Learn more on our website.

The Project Tango community is growing. We’ve shipped more than 3,000 developer devices so far. Developers have already created hundreds of applications that enable users to explore the physical space around them, including precise navigation without GPS, windows into virtual 3D worlds, measurement of physical spaces, and games that know where they are in the room and what’s around them. And we have an app development contest in progress right now.

We’ve released 13 software updates that make it easier to create Area Learning experiences with new capabilities such as high-accuracy and building-scale ADFs, more accurate re-localization, indoor navigation, and GPS/maps alignment. Depth Perception improvements include the addition of real-time textured and Unity meshing. Unity developers can take advantage of an improved Unity lifecycle. The updates have also included improvements in IMU characterization, performance, thermal management and drift-reduction. Visit our developer site for details.

We have a lot more to share over the coming months. Sign-up for our monthly newsletter to keep up with the latest news. Join the conversation in our Google+ community. Get help from other developers by using the Project Tango tag in Stack Overflow. See what other’s are saying on our YouTube channel. And share your story on Twitter with #ProjectTango.

Join us on our journey.

Categories: Programming

Using markup to promote your critic reviews within Google Search

Google Code Blog - Tue, 08/04/2015 - 18:23

Posted by Jonathan Wald, Product Manager

When Google announced Rich Snippets for reviews six years ago, it provided publishers with an entirely new way to promote their content by incorporating structured markup into their webpages. Since then, structured data has only become more important to Google Search and we’ve been building out the Knowledge Graph to better understand the world, the web, and users’ queries. When a user asks “did ex machina get good reviews?”, Google is now aware of the semantics - recognizing that the user wants critic reviews for the 2015 film Ex Machina and, equally importantly, where to find them.

With the recent launch of critic reviews in the Knowledge Graph, we’ve leveraged this technology to once again provide publishers with an opportunity to increase the discoverability and consumption of their reviews using markup. This feature, available across mobile, tablet, and desktop, organizes publishers’ reviews into a prominent card at the top of the page.

By using markup to identify their reviews and how they relate to Knowledge Graph entities, publishers can increase the visibility of their reviews and expose their reviews to a new audience whenever a Knowledge Graph card for a reviewed entity is surfaced.

Critic reviews are currently launched for movie entities, but we’re expanding the feature to other verticals like TV shows and books this year! Publishers with long-form reviews for these verticals can get up and running by selecting snippets from their reviews and then adding markup to their webpages. This process, detailed in our critic reviews markup instructions, allows publishers to communicate to Google which snippet they prefer, what URL is associated with the review, and other metadata about the reviewed item that allows us to ensure that we’re showing the right review for the right entity.

Google can understand a variety of markup formats, including the JSON+LD data format, which makes it easier than ever to incorporate structured data about reviews into your webpage! Get started here.

Categories: Programming

SE-Radio Episode 234: Barry OReilly on Lean Enterprise

Johannes Thönes talks to Barry O’Reilly, principal consultant at ThoughtWorks, about his recently published book Lean Enterprise. A lean enterprise is a large organization that manages to keep innovating while keeping its existing products in the market. O’Reilly talks about the idea of scientific experiments and the build-measure-learn loop popularized by the lean-startup method. He shares […]
Categories: Programming

Flaws and Fallacies of #NoEstimates

Herding Cats - Glen Alleman - Tue, 08/04/2015 - 17:04

All the work we do in the projects domain is driven by uncertainty. Uncertainty of some probabilistic future event impacting our project. Uncertainty in the work activities performed while developing a product or service.

Decision making in the presence of these uncertainties is a natural process in all of business.

The decision maker is asked to express her beliefs by assigning probabilities to certain possible states of the system in the future and the resulting outcomes of those states.

What's the chance we'll have this puppy ready for VMWorld in August? What's the probability that when we go live and 300,000 users logon we'll be able to handle the load? What's our test coverage for the upcoming release given we've added 14 new enhancements to the code base this quarter? Questions like that are normal everyday business questions, along with what's the expected delivery date, what's the expected total sunk cost, and what's the expected bookable value measured in Dead Presidents for the system when it goes live?

To answer these and the unlimited number of other business, technical, operational, performance, security, and financial questions, we need to know something about probability and statistics. This knowledge  is an essential tool for decision making no matter the domain.

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write - H.G. Wells

If we accept the notion that all project work is probabilistic, driven by the underlying statistical processes of time, cost, and technical outcomes, including Effectiveness, Performance, Capabilities, and all the ...ilities that  manifest and determine value after a system is put into initial use. Then these conditions are the source of uncertainty and come in two types:

  • Reducible - event based with a probability of occurrence within a specified time period.
  • Irreducible - naturally occurring by a Probability Distribution Function of the variances produced by the underlying process.

If you don't accept this - that all project work is probabilistic in nature - stop reading, this Blog is not for you.

If you do accept that all project work is uncertain, then there are some more assumptions we need to make sense of the decision making processes. The term statistic has two definitions - one long ago and a current one. The long ago one means a fact, referring to numerical facts. A numerical fact as a measurement, a count, or a rank. This number can represent a total, an average or a percentage of several such measures. This term also applied to the broad discipline of statistical manipulation in the same way accounting applies to entering and balancing accounts. 

Statistics in the second sense is a set of methods for obtaining, organizing, and summarizing numerical facts. These facts usually represent a partial rather than complete knowledge about a situation. For example the sample of the population rather than counting the entire population in the case of the census.

These numbers - statistics - are usually subjected to formal statistical analysis to help in our decision making in the presence of uncertainty.

In our software project world uncertainty is an inherent fact. Software uncertainty is likely much higher than in construction, since the requirements in software development are soft unlike the requirements in interstate highway development. But while the domain may have different variance in the level of uncertainty, estimates are still needed to make decisions in the presence of these uncertainties. Highway development has many uncertainties - none the least is the weather and weather delays. 

When you measure what you are speaking about and express it in numbers you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind - Lord Kelvin

Decisions are made on data. Otherwise those decisions are just gut feel, intuition, and at their core guesses. When you are guessing with other peoples money you have a low probability of keeping your job or the business staying in business. 

... a tale told by an idiot, full of sound and fury, signifying nothing - Shakespeare

When we hear personal anecdotes about how to correct a problem and the conjecture that those anecdotes are applicable outside the individual telling the anecdote - beware. Without a test of any conjecture it is just a conjecture. 

He uses statistics as a drunken man uses lampposts - for support rather than illumination - Andrew Lang

We many times confuse a symptom with the cause. When reading about all the failures in IT projects, and probability of failure, the number of failures versus success, there is rarely - in those naive posts on that topic - any assessment of the cause of the failure. The Root Cause analysis is not present. The Chaos Report is the most egregious of these. 

There is no merit where there is no trial; and till experience stamps the mark of strength, cowards may pass for heroes, and faith for falsehood - A. Hill

Tossing out anecdotes, platitudes, and misquoted quotes does not make for a credible argument for anything. I knew a person that did X successfully, therefore you should have the same experience is common. Or just try it you may find it works for you just like it worked for me. 

It seems there are no Principles or tested Practices in the approach to improving projects success. Just platitudes and anecdotes - masking chatter as process improvement advice. 

I started to write a detailed exposition using this material for the #NoEstimates conjecture that decisions can be made without an estimate. But Steve McConnell's post is much better than anything I could have done. So here's the wrap up...

If it is conjectured that decisions, any decisions, some decisions, self selected decisions, can be made in the presence of uncertainty can be made without also making an estimate of the outcome of that decision, the cost of that decision, the impact of that decision without estimating - then let's hear how, so we can test it outside personal opinion and anecdote.


It's time for #NoEstimates advocates to provide some principle based examples of how to make decisions in the presence of uncertainty without estimating. Here these are populist books (Books without the heavy math), but still capable of conveying the principles of the topic can be a source of learning. 

  1. Flaws and Fallacies in Statistical Thinking, Stephen K. Campbell, Prentice Hall, 1974
  2. The Economics of Iterative Software Development: Steering Toward Better Business Results, Walker Royce, Kurt Bittner, and Mike Perrow, Addison Wesley, 2009.
  3. How Not to be Wrong: The Power of Mathematical Thinking, Jordan Ellenberg, Penguin Press, 2014
  4. Hard Facts, Dangerous Half-Truths & Total Nonsense: Profiting from Evidence Based Management, Jeffery Pfeffer and Robert I. Sutton, Harvard Business School Press, 2006.
  5. How to Measure Anything, Finding the Value of Intangibles in Business, 3rd Edition, Douglas W. Hubbard, John Wiley & Sons, 2014.
  6. Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways Ways to Lie With Statistics, Gary Smith
  7. Center for Informed Decision Making
  8. Decision Making for the Professional, Peter McNamee and John Celona

Some actual math books on the estimating problem

  1. Probability Methods for Cost Uncertainty Analysis, Pau R. Garvey
  2. Making Hard Decisions: An Introduction to Decision Analysis, 2nd Edition, Robert T, Clemen, Duxbury Press, 1996.
  3. Estimating Software Intensive Systems, Richard D. Stutzke, Addison Wesley, 2005.
  4. Probabilities as Similarly Weighted Frequencies, Antoine Billot · Itzhak Gilboa · Dov Samet · David Schmeidler
Related articles Making Conjectures Without Testable Outcomes Estimating Processes in Support of Economic Analysis Applying the Right Ideas to the Wrong Problem
Categories: Project Management

Sponsored Post: Surge, Redis Labs,, VoltDB, Datadog, Power Admin, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?
Cool Products and Services
  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • VoltDB is a full-featured fast data platform that has all of the data processing capabilities of Apache Storm and Spark Streaming, but adds a tightly coupled, blazing fast ACID-relational database, scalable ingestion with backpressure; all with the flexibility and interactivity of SQL queries. Learn more.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Who Should be Your Product Owner?

In agile, we separate the Product Owner function from functional (development) management. The reason is that we want the people who can understand and evaluate the business value to articulate the business value to tell the people who understand the work’s value when to implement what. The technical folks determine how to implement the what.

Separating the when/what from how is a great separation. It allows the people who are considering the long term and customer impact of a given feature or set of features a way to rank the work. Technical teams may not realize when to release a given feature/feature set.

In my recent post, Product Manager, Product Owner, or Business Analyst?, I discussed what these different roles might deliver. Now it’s time to consider who should do the product management/product ownership roles.

If you have someone called a product manager, that person defines the product, asks the product development team(s) for features, and talks to customers. Notice the last part, the talking to customers part. This person is often out of the office. The product manager is an outward-facing job, not an internally-focused job.

The product owner works with the team to define and refine features, to replan the backlogs, and to know when it is time to release. The product owner is an inward-facing function.

(Just for completeness, the business analyst is an inward-facing function. The BA might sit with people in the business to ask, “Exactly what did you mean when you asked for this functionality? What does that mean to you?” A product owner might ask that same question.)

What happens when your product manager is your product owner? The product development team doesn’t have enough time with the product owner. Maybe the team doesn’t understand the backlog, or the release criteria, or even something about a specific story.

Sometimes, functional managers become product owners. They have the domain expertise and the ability to create a backlog and to work with the product manager when that person is available. Is this a good idea?

If the manager is not the PO for his/her team, it’s okay. I wonder how a manager can build relationships with people in his/her team and manage the problems and remove impediments that the team needs. Maybe the manager doesn’t need to manage so much and can be a PO. Maybe the product ownership job isn’t too difficult. I’m skeptical, but it could happen.

There is a real problem when a team’s manager is also the product owner. People are less likely to have a discussion and disagree with their managers, especially if the organization hasn’t moved to team compensation. In Weird Ideas That Work: How to Build a Creative Company, Sutton discusses the issue of how and when people feel comfortable challenging their managers. 

Many people do not feel comfortable challenging their managers. At all.

We want the PO and the team to be able to have that give-and-take about ranking, value, when it makes sense to do what. The PO makes the decision, and with information from the team, can take all the value into account. The PO might hear, “We can implement this feature first, and then this other feature is much easier.” Or, “If we fix these defects now, we can make these features much easier.” You want those conversations. The PO might say, “No, I want the original order” and the team will do it. The conversations are critical.

If you are a manager, considering being a PO for your team, reconsider. Your organization may have too many managers and not enough POs. That’s a problem to fix. Don’t make it difficult for your team to have honest discussions with you. Make it possible for people with the best interests of the product to have real discussions without being worried about their jobs.

(If you are struggling with the PO role, consider my Product Owner Training for Agencies. It starts next week.)

Categories: Project Management

Spark: pyspark/Hadoop – py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

Mark Needham - Tue, 08/04/2015 - 07:35

I’ve been playing around with pyspark – Spark’s Python library – and I wanted to execute the following job which takes a file from my local HDFS and then counts how many times each FBI code appears using Spark SQL:

from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext("local", "Simple App")
sqlContext = SQLContext(sc)
file = "hdfs://localhost:9000/user/markneedham/Crimes_-_2001_to_present.csv"
sqlContext.load(source="com.databricks.spark.csv", header="true", path = file).registerTempTable("crimes")
rows = sqlContext.sql("select `FBI Code` AS fbiCode, COUNT(*) AS times FROM crimes GROUP BY `FBI Code` ORDER BY times DESC").collect()
for row in rows:
    print("{0} -> {1}".format(row.fbiCode, row.times))

I submitted the job and waited:

$ ./spark-1.3.0-bin-hadoop1/bin/spark-submit --driver-memory 5g --packages com.databricks:spark-csv_2.10:1.1.0
Traceback (most recent call last):
  File "/Users/markneedham/projects/neo4j-spark-chicago/", line 11, in <module>
    sqlContext.load(source="com.databricks.spark.csv", header="true", path = file).registerTempTable("crimes")
  File "/Users/markneedham/projects/neo4j-spark-chicago/spark-1.3.0-bin-hadoop1/python/pyspark/sql/", line 482, in load
    df = self._ssql_ctx.load(source, joptions)
  File "/Users/markneedham/projects/neo4j-spark-chicago/spark-1.3.0-bin-hadoop1/python/lib/", line 538, in __call__
  File "/Users/markneedham/projects/neo4j-spark-chicago/spark-1.3.0-bin-hadoop1/python/lib/", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.
: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(
	at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(
	at org.apache.hadoop.ipc.RPC.getProxy(
	at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(
	at org.apache.hadoop.hdfs.DFSClient.<init>(
	at org.apache.hadoop.hdfs.DFSClient.<init>(
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(
	at org.apache.hadoop.fs.FileSystem.createFileSystem(
	at org.apache.hadoop.fs.FileSystem.access$200(
	at org.apache.hadoop.fs.FileSystem$Cache.get(
	at org.apache.hadoop.fs.FileSystem.get(
	at org.apache.hadoop.fs.Path.getFileSystem(
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1156)
	at org.apache.spark.rdd.RDD.first(RDD.scala:1189)
	at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:129)
	at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:127)
	at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:109)
	at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:62)
	at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:115)
	at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40)
	at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28)
	at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
	at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
	at org.apache.spark.sql.SQLContext.load(SQLContext.scala:667)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at py4j.reflection.MethodInvoker.invoke(
	at py4j.reflection.ReflectionEngine.invoke(
	at py4j.Gateway.invoke(
	at py4j.commands.AbstractCommand.invokeMethod(
	at py4j.commands.CallCommand.execute(

It looks like my Hadoop Client and Server are using different versions which in fact they are! We can see from the name of the spark folder that I’m using Hadoop 1.x there and if we check the local Hadoop version we’ll notice it’s using the 2.x seris:

$ hadoop version
Hadoop 2.6.0

In this case the easiest fix is use a version of Spark that’s compiled against Hadoop 2.6 which as of now means Spark 1.4.1.

Let’s try and run our job again:

$ ./spark-1.4.1-bin-hadoop2.6/bin/spark-submit --driver-memory 5g --packages com.databricks:spark-csv_2.10:1.1.0
06 -> 859197
08B -> 653575
14 -> 488212
18 -> 457782
26 -> 431316
05 -> 257310
07 -> 197404
08A -> 188964
03 -> 157706
11 -> 112675
04B -> 103961
04A -> 60344
16 -> 47279
15 -> 40361
24 -> 31809
10 -> 22467
17 -> 17555
02 -> 17008
20 -> 15190
19 -> 10878
22 -> 8847
09 -> 6358
01A -> 4830
13 -> 1561
12 -> 835
01B -> 16

And it’s working!

Categories: Programming

Notes on setting up an ELK stack and logstash-forwarder

Agile Testing - Grig Gheorghiu - Tue, 08/04/2015 - 00:22
I set up the ELK stack a while ago and I want to jot down some notes on installing and configuring it.  I was going to write "before I forget how to do it", but that's not true anymore, because I have ansible playbooks and roles for this setup. As I said before, using ansible as executable documentation has been working really well for me. I still need to write this blog post though just so I refresh my memory about the bigger picture of ELK when I revisit it next.

Some notes:

  • Used Jeff Geerling's ansible-role-logstash for the main setup of the ELK server I have
  • Used logstash-forwarder (used to be called lumberjack) on all servers that need to send their logs to the ELK server
  • Wrapped the installation and configuration of logstash-forwarder into a simple ansible role which installs the .deb file for this package and copies over a templatized logstash-forwarder.conf file; here is my ansible template for this file
  • Customized the lumberjack input config file on the ELK server (still called lumberjack, but actually used in conjunction with the logstash-forwarder agents running on each box that sends its logs to ELK); here is my /etc/logstash/conf.d/01-lumberjack-input.conf file
  • Added my app-specific config file on the ELK server; here is my /etc/logstash/conf.d/20-app.conf file with a few things to note
    • the grok stanza applies the 'valid' tag only to the lines that match the APPLOGLINE pattern (see below for more on this pattern)
    • the 'payload' field of any line that matches the APPLOGLINE pattern is parsed as JSON; this is nice because I can change the names of the fields in the JSON object in the log file and all these fields will be individually shown in ELK
    • all lines that are not taggeed as 'valid' will be dropped
  • Created a file called myapp in the /opt/logstash/patterns directory on the ELK server; this file contains all my app-specific patterns referenced in the 20-app.conf file above, in this example just 1 pattern: 
    • APPLOGLINE \[myapp\] %{TIMESTAMP_ISO8601:timestamp}Z\+00:000 \[%{WORD:severity}\] \[myresponse\] \[%{NUMBER:response}\] %{GREEDYDATA:payload}
    • this patterns uses predefined logstash patterns such as TIMESTAMP_ISO8601, WORD, NUMBER and GREEDYDATA
    • note the last field called payload; this is the JSON payload that gets parsed by logstash

Seven of the Nastiest Anti-patterns in Microservices

Daniel Bryant gave an energetic talk at Devoxx UK 2015 on lessons learned from over five years of experience with microservice based projects. The talk: The Seven Deadly Sins of Microservices: Redux (video, slides).

If you don't want to risk your immortal API then be sure to avoid:

  1. Lust - using the latest and greatest tech with the idea it will solve all your problems. It won't. Do you really need microservices at all? If you do go microservices do you really need new tech in your stack? Choose boring technology. Know why you are choosing something. A monolith can perform better and because a monolith can be developed faster it may also be the correct choice in proving your business case 
  2. Gluttony - excessive communication protocols. Projects often have a crazy number of protocols for gluing parts together. Standardize on the glue across an organization. Choose one synchronous and one asynchronous protocol. Don't gold-plate.
  3. Greed - all your service are belong to us. Do not underestimate the impact moving to a microservice approach will have on your organization. Your business organization needs to change to take advantage of microservices. Typically orgs will have silos between Dev, QA, and Ops with even more silos inside each silo like front-end, middleware, and database. Use cross functional teams like Spotify, Amazon, and Gilt. Connect rather than divide your company. 
  4. Sloth - creating a distributed monolith. If you can't deploy your services independently then they aren't microservices. Decouple. Transform data at a less central part of the stack. Some options are schema-first design and consumer-driven contracts.
  5. Wrath - blowing up when bad things happen. Bad things happen all the time so you need to test. Microservices are inherently distributed so you have network problems to deal with that weren't a problem in a monolith. The book Release It! has a lot of good fault tolerance patterns. Operationally you need to implement continuous delivery, agile, and devops. Test for failures using real life disaster scenarios testing, live injection failure testing, and something like Netflix's Simian Army.
  6. Envy - the shared single domain fallacy. A lot of time has been spent building and perfecting the model of a single domain. There's one big database with a unified schema. Microservices decompose a system along different lines and that can cause contention in an organization. Reports can be generated using pull by service or data pumps with events. 
  7. Pride - testing in the world of transience. Does your stuff really work? We all make mistakes. Think testing at the developer level, operational level, and business level. Surprisingly little has been written about testing microservices. Invest in your build pipeline testing. Some tools: Serenity BOD, Wiremock/Saboteur, Jenkins Performance Plugin. Testing in production is an emerging idea with companies that deploy many microservices.
Categories: Architecture

7 Things Your Boss Doesn’t Understand About Software Development

Making the Complex Simple - John Sonmez - Mon, 08/03/2015 - 16:00

Your boss may be awesome. I’ve certainly had a few awesome bosses in my programming career, but even the most awesome bosses don’t always seem to “get it.” In fact, I’d say that most software development managers are a little short-sighted when it comes to more than a few elements of programming. So, I’ve compiled […]

The post 7 Things Your Boss Doesn’t Understand About Software Development appeared first on Simple Programmer.

Categories: Programming

Building IntelliJ plugins from the command line

Xebia Blog - Mon, 08/03/2015 - 13:16

For a few years already, IntelliJ IDEA has been my IDE of choice. Recently I dove into the world of plugin development for IntelliJ IDEA and was unhappily surprised. Plugin development all relies on IDE features. It looked hard to create a build script to do the actual plugin compilation and packaging from a build script. The JetBrains folks simply have not catered for that. Unless you're using TeamCity as your CI tool, you're out of luck.

For me it makes no sense writing code if:

  1. it can not be compiled and packaged from the command line
  2. the code can not be compiled and tested on a CI environment
  3. IDE configurations can not be generated from the build script

Google did not help out a lot. Tomasz Dziurko put me in the right direction.

In order to build and test a plugin, the following needs to be in place:

  1. First of all you'll need IntelliJ IDEA. This is quite obvious. The Plugin DevKit plugins need to be installed. If you want to create a language plugin you might want to install Grammar-Kit too.
  2. An IDEA SDK needs to be registered. The SDK can point to your IntelliJ installation.

The plugin module files are only slightly different from your average project.

Compiling and testing the plugin

Now for the build script. My build tool of choice is Gradle. My plugin code adheres to the default Gradle project structure.

First thing to do is to get a hold of the IntelliJ IDEA libraries in an automated way. Since the IDEA libraries are not available via Maven repos, an IntelliJ IDEA Community Edition download is probably the best option to get a hold of the libraries.

The plan is as follows: download the Linux version of IntelliJ IDEA, and extract it in a predefined location. From there, we can point to the libraries and subsequently compile and test the plugin. The libraries are Java, and as such platform independent. I picked the Linux version since it has a nice, simple file structure.

The following code snippet caters for this:

apply plugin: 'java'

// Pick the Linux version, as it is a tar.gz we can simply extract
def IDEA_SDK_URL = ''
def IDEA_SDK_NAME = 'IntelliJ IDEA Community Edition IC-139.1603.1'

configurations {
    bundle // dependencies bundled with the plugin

dependencies {
    ideaSdk fileTree(dir: 'lib/sdk/', include: ['*/lib/*.jar'])

    compile configurations.ideaSdk
    compile configurations.bundle
    testCompile 'junit:junit:4.12'
    testCompile 'org.mockito:mockito-core:1.10.19'

// IntelliJ IDEA can still run on a Java 6 JRE, so we need to take that into account.
sourceCompatibility = 1.6
targetCompatibility = 1.6

task downloadIdeaSdk(type: Download) {
    sourceUrl = IDEA_SDK_URL
    target = file('lib/idea-sdk.tar.gz')

task extractIdeaSdk(type: Copy, dependsOn: [downloadIdeaSdk]) {
    def zipFile = file('lib/idea-sdk.tar.gz')
    def outputDir = file("lib/sdk")

    from tarTree(resources.gzip(zipFile))
    into outputDir

compileJava.dependsOn extractIdeaSdk

class Download extends DefaultTask {
    String sourceUrl

    File target

    void download() {
       if (!target.parentFile.exists()) {
       ant.get(src: sourceUrl, dest: target, skipexisting: 'true')

If parallel test execution does not work for your plugin, you'd better turn it off as follows:

test {
    // Avoid parallel execution, since the IntelliJ boilerplate is not up to that
    maxParallelForks = 1
The plugin deliverable

Obviously, the whole build process should be automated. That includes the packaging of the plugin. A plugin is simply a zip file with all libraries together in a lib folder.

task dist(type: Zip, dependsOn: [jar, test]) {
    from configurations.bundle
    from jar.archivePath
    rename { f -> "lib/${f}" }

build.dependsOn dist
Handling IntelliJ project files

We also need to generate IntelliJ IDEA project and module files so the plugin can live within the IDE. Telling the IDE it's dealing with a plugin opens some nice features, mainly the ability to run the plugin from within the IDE. Anton Arhipov's blog post put me on the right track.

The Gradle idea plugin helps out in creating those files. This works out of the box for your average project, but for plugins IntelliJ expects some things differently. The project files should mention that we're dealing with a plugin project and the module file should point to the plugin.xml file required for each plugin. Also, the SDK libraries are not to be included in the module file; so, I excluded those from the configuration.

The following code snippet caters for this:

apply plugin: 'idea'

idea {
    project {
        languageLevel = '1.6'
        jdkName = IDEA_SDK_NAME

        ipr {
            withXml {
                it.node.find { node ->
                    node.@name == 'ProjectRootManager'
                }.'@project-jdk-type' = 'IDEA JDK'

                logger.warn "=" * 71
                logger.warn " Configured IDEA JDK '${jdkName}'."
                logger.warn " Make sure you have it configured IntelliJ before opening the project!"
                logger.warn "=" * 71

    module {
        scopes.COMPILE.minus = [ configurations.ideaSdk ]

        iml {
            beforeMerged { module ->
            withXml {
                it.node.@type = 'PLUGIN_MODULE'
                //  &lt;component name="DevKit.ModuleBuildProperties" url="file://$MODULE_DIR$/src/main/resources/META-INF/plugin.xml" />
                def cmp = it.node.appendNode('component')
                cmp.@name = 'DevKit.ModuleBuildProperties'
                cmp.@url = 'file://$MODULE_DIR$/src/main/resources/META-INF/plugin.xml'
Put it to work!

Combining the aforementioned code snippets will result in a build script that can be run on any environment. Have a look at my idea-clock plugin for a working example.

SPaMCAST 353 -Learning Styles, Microservices for All, Tame Flow

Software Process and Measurement Cast - Sun, 08/02/2015 - 22:00

This week’s Software Process and Measurement Cast features three columns.  The first is our essay on learning styles.  Learning styles are useful to consider when you are trying to change the world or just and an organization.  While opposites might attract in poetry and sitcoms, however rarely do opposite learning styles work together well in teams without empathy and a dash of coaching. Therefore, the coach and teams need to have an inventory of learning styles on the team. Models and active evaluation against a model are tools to generate knowledge about teams so they can tune how they work to maximize effectiveness.

Our second column features Gene Hughson bringing the ideas from his wonderful Form Follows Function Blog.  Gene talks about the topic of microservices. Gene challenges the idea that microservices are a silver bullet.

We anchor this week’s SPaMCAST with Steve Tendon’s column discussing the TameFlow methodology and his great new book, Hyper-Productive Knowledge Work Performance.   One of the topics Steve tackles this week is the idea of knowledge workers and why a knowledge worker is different.  The differences Steve describes are key to developing a hyper-productive environment.

Call to Action!

I have a challenge for the Software Process and Measurement Cast listeners for the next few weeks. I would like you to find one person that you think would like the podcast and introduce them to the cast. This might mean sending them the URL or teaching them how to download podcasts. If you like the podcast and think it is valuable they will be thankful to you for introducing them to the Software Process and Measurement Cast. Thank you in advance!

Re-Read Saturday News

Remember that the Re-Read Saturday of The Mythical Man-Month is in full swing.  This week we tackle the essay titled “The Second-System Effect”!  Check out the new installment at Software Process and Measurement Blog.

Upcoming Events

Software Quality and Test Management 

September 13 – 18, 2015

San Diego, California

I will be speaking on the impact of cognitive biases on teams!  Let me know if you are attending! If you are still deciding on attending let me know because I have a discount code!


Agile Development Conference East

November 8-13, 2015

Orlando, Florida

I will be speaking on November 12th on the topic of Agile Risk!  Let me know if you are going and we will have a SPaMCAST Meetup.


The next Software Process and Measurement Cast features our interview with Allan Kelly.  We talked #NoProjects and having a focus of delivering a consistent flow of value.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

SPaMCAST 353 -Learning Styles, Microservices for All, Tame Flow


Subscribe on iTunes

This week’s Software Process and Measurement Cast features three columns.  The first is our essay on learning styles.  Learning styles are useful to consider when you are trying to change the world or just and an organization.  While opposites might attract in poetry and sitcoms, however rarely do opposite learning styles work together well in teams without empathy and a dash of coaching. Therefore, the coach and teams need to have an inventory of learning styles on the team. Models and active evaluation against a model are tools to generate knowledge about teams so they can tune how they work to maximize effectiveness.

Our second column features Gene Hughson bringing the ideas from his wonderful Form Follows Function Blog.  Gene talks about the topic of microservices. Gene challenges the idea that microservices are a silver bullet.

We anchor this week’s SPaMCAST with Steve Tendon’s column discussing the TameFlow methodology and his great new book, Hyper-Productive Knowledge Work Performance.   One of the topics Steve tackles this week is the idea of knowledge workers and why a knowledge worker is different.  The differences Steve describes are key to developing a hyper-productive environment.

Call to Action!

I have a challenge for the Software Process and Measurement Cast listeners for the next few weeks. I would like you to find one person that you think would like the podcast and introduce them to the cast. This might mean sending them the URL or teaching them how to download podcasts. If you like the podcast and think it is valuable they will be thankful to you for introducing them to the Software Process and Measurement Cast. Thank you in advance!

Re-Read Saturday News

Remember that the Re-Read Saturday of The Mythical Man-Month is in full swing.  This week we tackle the essay titled “The Second-System Effect”!  Check out the new installment at Software Process and Measurement Blog.

Upcoming Events

Software Quality and Test Management 

September 13 – 18, 2015

San Diego, California

I will be speaking on the impact of cognitive biases on teams!  Let me know if you are attending! If you are still deciding on attending let me know because I have a discount code!


Agile Development Conference East

November 8-13, 2015

Orlando, Florida

I will be speaking on November 12th on the topic of Agile Risk!  Let me know if you are going and we will have a SPaMCAST Meetup.


The next Software Process and Measurement Cast features our interview with Allan Kelly.  We talked #NoProjects and having a focus of delivering a consistent flow of value.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

Spark: Processing CSV files using Databricks Spark CSV Library

Mark Needham - Sun, 08/02/2015 - 19:08

Last year I wrote about exploring the Chicago crime data set using Spark and the OpenCSV parser and while this worked well, a few months ago I noticed that there’s now a spark-csv library which I should probably use instead.

I thought it’d be a fun exercise to translate my code to use it.

So to recap our goal: we want to count how many times each type of crime has been committed. I have a more up to date version of the crimes file now so the numbers won’t be exactly the same.

First let’s launch the spark-shell and register our CSV file as a temporary table so we can query it as if it was a SQL table:

$ ./spark-1.3.0-bin-hadoop1/bin/spark-shell
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val crimeFile = "/Users/markneedham/Downloads/Crimes_-_2001_to_present.csv"
crimeFile: String = /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@9746157
scala> sqlContext.load("com.databricks.spark.csv", Map("path" -> crimeFile, "header" -> "true")).registerTempTable("crimes")
java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
	at scala.sys.package$.error(package.scala:27)
	at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:268)
	at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:279)
	at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
        at java.lang.reflect.Method.invoke(
	at org.apache.spark.repl.SparkIMain$
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I’ve actually forgotten to tell spark-shell about the CSV package so let’s restart the shell and pass it as an argument:

$ ./spark-1.3.0-bin-hadoop1/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.1.0
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val crimeFile = "/Users/markneedham/Downloads/Crimes_-_2001_to_present.csv"
crimeFile: String = /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@44587c44
scala> sqlContext.load("com.databricks.spark.csv", Map("path" -> crimeFile, "header" -> "true")).registerTempTable("crimes")
15/08/02 18:57:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/02 18:57:46 INFO DAGScheduler: Stage 0 (first at CsvRelation.scala:129) finished in 0.207 s
15/08/02 18:57:46 INFO DAGScheduler: Job 0 finished: first at CsvRelation.scala:129, took 0.267327 s

Now we can write a simple SQL query on our ‘crimes’ table to find the most popular crime types:

scala>  sqlContext.sql(
        select `Primary Type` as primaryType, COUNT(*) AS times
        from crimes
        group by `Primary Type`
        order by times DESC
        """).save("/tmp/agg.csv", "com.databricks.spark.csv")

That spits out a load of CSV ‘part files’ into /tmp/agg.csv so let’s bring in the merge function that we’ve used previously to combine these into one CSV file:

scala> import org.apache.hadoop.conf.Configuration
scala> import org.apache.hadoop.fs._
scala> def merge(srcPath: String, dstPath: String): Unit =  {
         val hadoopConfig = new Configuration()
         val hdfs = FileSystem.get(hadoopConfig)
         FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), false, hadoopConfig, null)
scala> merge("/tmp/agg.csv", "agg.csv")

And finally let’s browse the contents of our new CSV file:

$ cat agg.csv

Great! We’ve got the same output with much less code which is always a #win.

Categories: Programming

17 Theses on Software Estimation

10x Software Development - Steve McConnell - Sun, 08/02/2015 - 17:20

(with apologies to Martin Luther for the title)

Arriving late to the #NoEstimates discussion, I’m amazed at some of the assumptions that have gone unchallenged, and I’m also amazed at the absence of some fundamental points that no one seems to have made so far. The point of this article is to state unambiguously what I see as the arguments in favor of estimation in software and put #NoEstimates in context.  

1. Estimation is often done badly and ineffectively and in an overly time-consuming way. 

My company and I have taught upwards of 10,000 software professionals better estimation practices, and believe me, we have seen every imaginable horror story of estimation done poorly. There is no question that “estimation is often done badly” is a true observation of the state of the practice. 

2. The root cause of poor estimation is usually lack of estimation skills. 

Estimation done poorly is most often due to lack of estimation skills. Smart people using common sense is not sufficient to estimate software projects. Reading two page blog articles on the internet is not going to teach anyone how to estimate very well. Good estimation is not that hard, once you’ve developed the skill, but it isn’t intuitive or obvious, and it requires focused self-education or training. 

3. Many comments in support of #NoEstimates demonstrate a lack of basic software estimation knowledge. 

I don’t expect most #NoEstimates advocates to agree with this thesis, but as someone who does know a lot about estimation I think it’s clear on its face. Here are some examples

(a) Are estimation and forecasting the same thing? As far as software estimation is concerned, yes they are. (Just do a Google or Bing search of “definition of forecast”.) Estimation, forecasting, prediction--it's all the same basic activity, as far as software estimation is concerned. 

(b) Is showing someone several pictures of kitchen remodels that have been completed for $30,000 and implying that the next kitchen remodel can be completed for $30,000 estimation? Yes, it is. That’s an implementation of a technique called Reference Class Forecasting. 

(c) Is doing a few iterations, calculating team velocity, and then using that empirical velocity data to project a completion date count as estimation? Yes it does. Not only is it estimation, it is a really effective form of estimation. I’ve heard people argue that because velocity is empirically based, it isn’t estimation. That argument is incorrect and shows a lack of basic understanding of the nature of estimation. 

(d) Is estimation time consuming and a waste of time? One of the most common symptoms of lack of estimation skill is spending too much time on the wrong activities. This work is often well-intentioned, but it’s common to see well-intentioned people doing more work than they need to get worse answers than they could be getting.  

4. Being able to estimate effectively is a skill that any true software professional needs to develop, even if they don’t need it on every project. 

“Estimating is problematic, therefore software professionals should not develop estimation skill” – this is a common line of reasoning in #NoEstimates. Unless a person wants to argue that the need for estimation is rare, this argument is not supported by the rest of #NoEstimate’s premises. 

If I agreed, for sake of argument, that 50% of the projects don’t need to be estimated, the other 50% of the projects would still benefit from the estimators having good estimation skills. If you’re a true software professional, you should develop estimation skill so that you can estimate competently on the 50% of projects that do require estimation. 

In practice, I think the number of projects that need estimates is much higher than 50%. 

5. Estimates serve numerous legitimate, important business purposes.

Estimates are used by businesses in numerous ways, including: 

  • Allocating budgets to projects (i.e., estimating the effort and budget of each project)
  • Making cost/benefit decisions at the project/product level, which is based on cost (software estimate) and benefit (defined feature set)
  • Deciding which projects get funded and which do not, which is often based on cost/benefit
  • Deciding which projects get funded this year vs. next year, which is often based on estimates of which projects will finish this year
  • Deciding which projects will be funded from CapEx budget and which will be funded from OpEx budget, which is based on estimates of total project effort, i.e., budget
  • Allocating staff to specific projects, i.e., estimates of how many total staff will be needed on each project
  • Allocating staff within a project to different component teams or feature teams, which is based on estimates of scope of each component or feature area
  • Allocating staff to non-project work streams (e.g., budget for a product support group, which is based on estimates for the amount of support work needed)
  • Making commitments to internal business partners (based on projects’ estimated availability dates)
  • Making commitments to the marketplace (based on estimated release dates)
  • Forecasting financials (based on when software capabilities will be completed and revenue or savings can be booked against them)
  • Tracking project progress (comparing actual progress to planned (estimated) progress)
  • Planning when staff will be available to start the next project (by estimating when staff will finish working on the current project)
  • Prioritizing specific features on a cost/benefit basis (where cost is an estimate of development effort)

These are just a subset of the many legitimate reasons that businesses request estimates from their software teams. I would be very interested to hear how #NoEstimates advocates suggest that a business would operate if you remove the ability to use estimates for each of these purposes.

The #NoEstimates response to these business needs is typically of the form, “Estimates are inaccurate and therefore not useful for these purposes” rather than, “The business doesn’t need estimates for these purposes.” 

That argument really just says that businesses are currently operating on much worse quality information than they should be, and probably making poorer decisions as a result, because the software staff are not providing very good estimates. If software staff provided more accurate estimates, the business would make better decisions in each of these areas, which would make the business stronger. 

This all supports my point that improved estimation skill should be part of the definition of being a true software professional. 

6. Part of being an effective estimator is understanding that different estimation techniques should be used for different kinds of estimates. 

One thread that runs throughout the #NoEstimates discussions is lack of clarity about whether we’re estimating before the project starts, very early in the project, or after the project is underway. The conversation is also unclear about whether the estimates are project-level estimates, task-level estimates, sprint-level estimates, or some combination. Some of the comments imply ineffective attempts to combine kinds of estimates—the most common confusion I’ve read is trying to use task-level estimates to estimate a whole project, which is another example of lack of software estimation skill. 

Effective estimation requires that the right kind of technique be applied to each different kind of estimate. Learning when to use each technique, as well as learning each technique, requires some professional skills development. 

7. Estimation and planning are not the same thing, and you can estimate things that you can’t plan. 

Many of the examples given in support of #NoEstimates are actually indictments of overly detailed waterfall planning, not estimation. The simple way to understand the distinction is to remember that planning is about “how” and estimation is about “how much.” 

Can I “estimate” a chess game, if by “estimate” I mean how each piece will move throughout the game? No, because that isn’t estimation; it’s planning; it’s “how.”

Can I estimate a chess game in the sense of “how much”? Sure. I can collect historical data on the length of chess games and know both the average length and the variation around that average and predict the length of a game. 

More to the point, estimating software projects is not analogous to estimating one chess game. It’s analogous to estimating a series of chess games. People who are not skilled in estimation often assume it’s more difficult to estimate a series of games than to estimate an individual game, but estimating the series is actually easier. Indeed, the more chess games in the set, the more accurately we can estimate the set, once you understand the math involved. 

8. You can estimate what you don’t know, up to a point. 

In addition to estimating “how much,” you can also estimate “how uncertain.” In the #NoEstimates discussions, people throw out lots of examples along the lines of, “My project was doing unprecedented work in Area X, and therefore it was impossible to estimate the whole project.” That isn’t really true. What you would end up with in cases like that is high variability in your estimate for Area X, and a common estimation mistake would be letting X’s uncertainty apply to the whole project rather than constraining it’s uncertainty just to Area X. 

Most projects contain a mix of precedented and unprecedented work, or certain and uncertain work. Decomposing the work, estimating uncertainty in different areas, and building up an overall estimate from that is one way of dealing with uncertainty in estimates. 

9. Both estimation and control are needed to achieve predictability. 

Much of the writing on Agile development emphasizes project control over project estimation. I actually agree that project control is more powerful than project estimation, however, effective estimation usually plays an essential role in achieving effective control. 

To put this in Agile Manifesto-like terms:

We have come to value project control over project estimation, 
as a means of achieving predictability

As in the Agile Manifesto, we value both terms, which means we still value the term on the right. 

#NoEstimates seems to pay lip service to both terms, but the emphasis from the hashtag onward is really about discarding the term on the right. This is a case where I believe the right answer is both/and, not either/or

10. People use the word "estimate" sloppily. 

No doubt. Lack of understanding of estimation is not limited to people tweeting about #NoEstimates. Business partners often use the word “estimate” to refer to what would more properly be called a “planning target” or “commitment.” Further, one common mistake software professionals make is trying to create estimates when the business is really asking for a commitment, or asking for a plan to meet a target, but using the word “estimate” to ask for that. 

We have worked with many companies to achieve organizational clarity about estimates, targets, and commitments. Clarifying these terms makes a huge difference in the dynamics around creating, presenting, and using software estimates effectively. 

11. Good project-level estimation depends on good requirements, and average requirements skills are about as bad as average estimation skills. 

A common refrain in Agile development is “It’s impossible to get good requirements,” and that statement has has never been true. I agree that it’s impossible to get perfect requirements, but that isn’t the same thing as getting good requirements. I would agree that “It is impossible to get good requirements if you don’t have very good requirement skills,” and in my experience that is a common case.  I would also agree that “Projects usually don’t have very good requirements,” as an empirical observation—but not as a normative statement that we should accept as inevitable. 

Like estimation skill, requirements skill is something that any true software professional should develop, and the state of the art in requirements at this time is far too advanced for even really smart people to invent everything they need to know on their own. Like estimation skill, a person is not going to learn adequate requirements skills by reading blog entries or watching short YouTube videos. Acquiring skill in requirements requires focused, book-length self-study or explicit training or both. 

Why would we care about getting good requirements if we’re Agile? Isn’t trying to get good requirements just waterfall? The answer is both yes and no. You can’t achieve good predictability of the combination of cost, schedule, and functionality if you don’t have a good definition of functionality. If your business truly doesn’t care about predictability (and some truly don’t), then letting your requirements emerge over the course of the project can be a good fit for business needs. But if your business does care about predictability, you should develop the skill to get good requirements, and then you should actually do the work to get them. You can still do the rest of the project using by-the-book Scrum, and then you’ll get the benefits of both good requirements and Scrum. 

12. The typical estimation context involves moderate volatility and a moderate levels of unknowns

Ron Jeffries writes, “It is conventional to behave as if all decent projects have mostly known requirements, low volatility, understood technology, …, and are therefore capable of being more or less readily estimated by following your favorite book.” 

I don’t know who said that, but it wasn’t me, and I agree with Ron that that statement doesn’t describe most of the projects that I have seen. 

I think it would be more true to say, “The typical software project has requirements that are knowable in principle, but that are mostly unknown in practice due to insufficient requirements skills; low volatility in most areas with high volatility in selected areas; and technology that tends to be either mostly leading edge or mostly mature; …; and are therefore amenable to having both effective requirements work and effective estimation work performed on those projects, given sufficient training in both skill sets.”

In other words, software projects are challenging, and they’re even more challenging if you don’t have the skills needed to work on them. If you have developed the right skills, the projects will still be challenging, but you’ll be able to overcome most of the challenges or all of them. 

Of course there is a small percentage of projects that do have truly unknowable requirements and across-the-board volatility. I consider those to be corner cases. It’s good to explore corner cases, but also good not to lose sight of which cases are most common. 

13. Responding to change over following a plan does not imply not having a plan. 

It’s amazing that in 2015 we’re still debating this point. Many of the #NoEstimates comments literally emphasize not having a plan, i.e., treating 100% of the project as emergent. They advocate a process—typically Scrum—but no plan beyond instantiating Scrum. 

According to the Agile Manifesto, while agile is supposed to value responding to change, it also is supposed to value following a plan. Doing no planning at all is not only inconsistent with the Agile Manifesto, it also wastes some of Scrum's capabilities. One of the amazingly powerful aspects of Scrum is that it gives you the ability to respond to change; and that doesn’t imply that you need to avoid committing to plans in the first place. 

My company and I have seen Agile adoptions shut down in some companies because an Agile team is unwilling to commit to requirements up front or refuses to estimate up front. As a strategy, that’s just dumb. If you fight your business up front about providing estimates, even if you win the argument that day, you will still get knocked down a peg in the business’s eyes. 

Instead, use your velocity to estimate how much work you can do over the course of a project, and commit to a product backlog based on your demonstrated capacity for work. Your business will like that. Then, later, when your business changes its mind—which it probably will—you’ll be able to respond to change. Your business will like that even more. Wouldn’t you rather look good twice than look bad once? 

14. Scrum provides better support for estimation than waterfall ever did, and there does not have to be a trade off between agility and predictability. 

Some of the #NoEstimates discussion seems to interpret challenges to #NoEstimates as challenges to the entire ecosystem of Agile practices, especially Scrum. Many of the comments imply that predictability comes at the expense of agility. The examples cited to support that are mostly examples of unskilled misapplications of estimation practices, so I see them as additional examples of people not understanding estimation very well. 

The idea that we have to trade off agility to achieve predictability is a false trade off. In particular, if no one had ever uttered the word “agile,” I would still want to use Scrum because of its support for estimation and predictability. 

The combination of story pointing, product backlog, velocity calculation, short iterations, just-in-time sprint planning, and timely retrospectives after each sprint creates a nearly perfect context for effective estimation. Scrum provides better support for estimation than waterfall ever did. 

If a company truly is operating in a high uncertainty environment, Scrum can be an effective approach. In the more typical case in which a company is operating in a moderate uncertainty environment, Scrum is well-equipped to deal with the moderate level of uncertainty and provide high predictability (e.g., estimation) at the same time. 

15. There are contexts where estimates provide little value. 

I don’t estimate how long it will take me to eat dinner, because I know I’m going to eat dinner regardless of what the estimate says. If I have a defect that keeps taking down my production system, the business doesn’t need an estimate for that because the issue needs to get fixed whether it takes an hour, a day, or a week. 

The most common context I see where estimates are not done on an ongoing basis and truly provide little business value is online contexts, especially mobile, where the cycle times are measured in days or shorter, the business context is highly volatile, and the mission truly is, “Always do the next most useful thing with the resources available.” 

In both these examples, however, there is a point on the scale at which estimates become valuable. If the work on the production system stretches into weeks or months, the business is going to want and need an estimate. As the mobile app matures from one person working for a few days to a team of people working for a few weeks, with more customers depending on specific functionality, the business is going to want more estimates. Enjoy the #NoEstimates context while it lasts; don’t assume that it will last forever. 

16. This is not religion. We need to get more technical and economic about software discussions. 

I’ve seen #NoEstimates advocates treat these questions of requirements volatility, estimation effectiveness, and supposed tradeoffs between agility and predictability as value-laden moral discussions in which their experience with usually-bad requirements and usually-bad estimates calls for an iterative approach like pure Scrum, rather than a front-loaded approach like Scrum with a pre-populated product backlog. In these discussions, “Waterfall” is used as an invective, where the tone of the argument is often more moral than economic. That religion isn’t unique to Agile advocates, and I’ve seen just as much religion on the non-Agile sides of various discussions. I’ve appreciated my most recent discussion with Ron Jeffries because he hasn’t done that. It would be better for the industry at large if people could stay more technical and economic more often. 

For my part, software is not religion, and the ratio of work done up front on a software project is not a moral issue. If we assume professional-level skills in agile practices, requirements, and estimation, the decision about how much work to do up front should be an economic decision based on cost of change and value of predictability. If the environment is volatile enough, then it’s a bad economic decision to do lots of up front requirements work just to have a high percentage of requirements spoil before they can be implemented. If there’s little or no business value created by predictability, that also suggests that emphasizing up front estimation work would be a bad economic decision.

On the other hand, if the business does value predictability, then how we support that predictability should also be an economic decision. If we do a lot of the requirements work up front, and some requirements spoil, but most do not, and that supports improved predictability, and the business derives value from that, that would be a good economic choice. 

The economics of these decisions are affected by the skills of the people involved. If my team is great at Scrum but poor at estimation and requirements, the economics of up front vs. emergent will tilt one way. If my team is great at estimation and requirements but poor at Scrum, the economic might tilt the other way. 

Of course, skill sets are not divinely dictated or cast in stone; they can be improved through focused self-study and training. So we can treat the question of whether we should invest in developing additional skills as an economic issue too. 

What is the cost of training staff to reach proficiency in estimation and requirements? Does the cost of achieving proficiency exceed the likely benefits that would derive from proficiency? That goes back to the question of how much the business values predictability. If the business truly places no value on predictability, there’s won’t be any ROI from training staff in practices that support predictability. But I do not see that as the typical case. 

My company and I can train software professionals to become proficient in both requirements and estimation in about a week. In my experience most businesses place enough value on predictability that investing a week to make that option available provides a good ROI to the business. Note: this is about making the option available, not necessarily exercising the option on every project. 

My company and I can also train software professionals to become proficient in a full complement of Scrum and other Agile technical practices in about a week. That produces a good ROI too. In any given case, I would recommend both sets of training. If I had to recommend only one or the other, sometimes I would recommend starting with the Agile practices. But I wouldn’t recommend stopping with them. 

Skills development in practices that support predictability vs. practices that support agility is not an either/or decision. A truly agile business would be able to be flexible when needed, or predictable when needed. A true software professional will be most effective when skilled in both skill sets. 

17. Agility plus predictability is better than agility alone. 

If you think your business values agility only, ask your business what it values. Businesses vary, and you might work in a business that truly does value agility over predictability or that values agility exclusively. 

In some cases, businesses will value predictability over agility. Odds are that your business actually values both agility and predictability. The point is, ask the business, don’t just assume it’s one or the other. 

I think it’s self-evident that a business that has both agility and predictability will outperform a business that has agility only. We need to get past the either/or thinking that limits us to one set of skills or the other and embrace both/and thinking that leads us to develop the full set of skills needed to become true software professionals. 


Re-Read Saturday: The Mythical Man-Month, Part 5 – The Second-System Effect

The Mythical Man-Month

The Mythical Man-Month

In the fifth essay of The Mythical Man-Month, titled The Second-System Effect, Brooks circles back to question he left unanswered in the essay Aristocracy, Democracy and System Design. The question was: If functional specialties are split, what bounds are left to constrain the possibility of a runaway architecture and design? The thought is that is that without the pressure of implementation an architect does not have to consider constraints.

Brooks begins the essay by establishing a context to consider the second-system effect with a section titled, “Interactive discipline for the architect”. All architects work within a set of constraints typically established by project stakeholders. Operating within these constraints requires self-discipline. In order to drive home the point, Brooks uses the analogy of a building architect. When designing a building an architect works against a budget and other constraints, such as a location and local ordinances. Implementation falls to the general contractor and subcontractors. In order to test the design, the architect will ask for estimates from the contractors (analogous to the teams in a software environment). Estimates provide the architect with the feedback needed to test ideas and assumptions and to assure the project’s stakeholders that their build can be completed within the constraints. When estimates come in too high, the architect will need to either alter the design or challenge the estimates.

When an architect challenges an estimate her or she could be seen as leveraging the power hierarchy established by separating functions (see last week’s essay). However the architect needs to recognize that in order to successfully challenge the estimate they need to remember four points.

  1. The contractors (development personnel in software) are responsible for implementation. The architect can only suggest, not dictate, changes in implementation. Force will generate a power imbalance that will generate poor behaviors.
  2. When challenging an estimate be prepared to suggest a means of implementation, but be willing to accept other ways to implement to achieve the same goal. Recognize that if you make a suggestion before being asked that you will establish an anchor bias and may not end up with an optimal solution.
  3. When making suggestions, make them discretely. Quiet leadership is often most effective.
  4. The architect should be prepared to forego credit for the changes generated as estimates and constraints are negotiated. Brooks pointed out that in the end it is not about architect, but rather about the solution.

The first part of the essay established both the context and framework for developing the self-discipline needed to control runaway design. Brooks concludes the essay by exposing the exception he observed. Brooks called this exception the second-system effect. In a nutshell, the second-system effect reflects the observation that a first work is apt to be spare, where as second attempts tend to be over designed as frills and signature embellishments start to creep in. Brooks points out this behavior can often be seen in scenarios in which a favorite function or widget is continually refined even as it becomes obsolete. For example, why are designers spending precious design time on the steering wheel for the Google self-driving car? (It is should be noted that recently the steering wheel was removed from the self-driving car then put back in  . . . with a brake).

How can you avoid the second-system effect? The simplest would be to never hire an architect with only one design job under their belt therefore avoiding the second-system effect. Unfortunately that solution over time is a non-starter. Who would replace the system architects that retire or move to other careers. Other techniques, like ensuring everyone is aware of the effect or stealing an idea from Extreme Programming and paring architects with other more seasoned architects or business analysts, are far more effective and scalable.

Brooks provides the punch line for the essay in the first two paragraphs. A project or organization must establish an environment in which self-discipline and communication exist to reduce the potential for runaway design.

Previous installments of Re-Read Saturday for the The Mythical Man-Month

Introductions and The Tar Pit

The Mythical Man-Month (The Essay)

The Surgical Team

Aristocracy, Democracy and System Design

Categories: Process Management