Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Building A Backlog: Notes on Showing For Gathering Requirements

The evacuation instructions could be a form of paper prototype.

The evacuation instructions could be a form of paper prototype.

The most powerful single technique for generating requirements is showing users and stakeholders something and collecting reactions. There are numerous techniques for developing something to generate feedback, ranging from functional code that could implemented in production at one extreme, to functional prototypes in the middle, to paper prototypes at the end of the scale. Functional code is used to generate feedback to evolve the backlog in Agile projects, however showing techniques are not used as often as they should be to generate the initial requirements backlog. The reason these techniques are not used is the perceived level of effort needed to generate the prototypes or an impetus to begin building the solution immediately.

The power of the showing techniques is based on the theory that many people only know what they want when they see it. The process of generating feedback and requirements is also risk management. A prototype reduces the risk that the project will either build the wrong thing or build the right thing wrong. Functional prototypes also, to an extent, are useful to prove that a solution is at least conceptually feasible (prototypes are generally too small and not fully functional therefore do not truly serve to prove technical feasibility). Paper prototypes address generating requirements and reducing risk but are faster and cheaper to generate because there is no code or code related infrastructure. A very simple example of a paper prototype for a customer relationship management system (CRM) might be set of drawings of screens to show the rough flow of work.  Users to use the paper screens to imagine the process and discuss the flow. In comparison a functional prototype would have mock screen on a computer show system flow but typically without any background functionality.

The first issue with prototypes is that developing them requires time and effort. Project teams are often presented with a goal, a budget and a deadline. In most corporate IT organizations, performance against the project budget is considered a critical measure of progress (and in the longer term, project success). Therefore teams and project/program managers mercilessly manage the budget to improve the possibility of project success. Unless prototyping is built into the budget and the planned approach for the project, there will be pressure to use the less costly, albeit less effective, asking techniques to generate requirements. Teams, sponsors and project managers make a rational choice to manage the budget risk against the possibility of not generating a good enough backlog to get started. However, generating requirements using prototypes is a process that can used to balance requirements and budget risk. The higher the risk of not having a more through early backlog the more important techniques like prototyping will become to mitigate that risk.

The second issue that the use of prototypes face is what I call “starting fever.” In many methods, including Agile, the whole cross-functional project team is assembled to start the project. There are many individuals that don’t believe that gathering requirement is important, therefore unless there is something for them to build or test, they will find other things to do. There are numerous ways to deal with this type of structural slack; here are two extreme cases will illustrate the range. The first solution is to have a subset of the team (like the Three Amigos) generate the initial backlog before kicking the project off. The second solution is have the whole team spend a day as the project kicks off to generate a quick initial backlog and then use the demonstrations at the end of each sprint to continue to flesh out the backlog. I lean towards scenario one or variants in which the other portions of the team work on framing the technical and physical infrastructure.

Another way “starting fever” can be triggered is because of the panic a fixed budget, fixed scope and fixed date project that someone actually said yes to before requirements are known can cause. Before you protest, regardless of how much every developer, tester, BA, project manager or CIO believes in that this an irrational situation they happen ( I see this as the norm in some organizations) and they casue teams to abandon good  practices like prototyping. In the long run project teams have to pick up the pieces when these types of projects had problems. When teams are put under immediate schedule, budget and scope pressure, leaping into action and then creating and firming up a backlog later can look like a great solution. Action is still equated to progress.

Prototyping is an effective tool for driving out requirements that can’t be expressed until they are seen. The backlog building techniques that show the users and stakeholders something to react to also serves to mitigate the risk of building the wrong thing or the right thing wrong. The power of these techniques is offset by the cost and time required to generate prototypes and a fear that unless you are building something the project has started and the due date is at risk.


Categories: Process Management

R: ggplot – Plotting back to back charts using facet_wrap

Mark Needham - 8 hours 48 min ago

Earlier in the week I showed a way to plot back to back charts using R’s ggplot library but looking back on the code it felt like it was a bit hacky to ‘glue’ two charts together using a grid.

I wanted to find a better way.

To recap, I came up with the following charts showing the RSVPs to Neo4j London meetup events using this code:

2014 07 20 17 42 40

The first thing we need to do to simplify chart generation is to return ‘yes’ and ‘no’ responses in the same cypher query, like so:

timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01", tz = "GMT")
 
query = "MATCH (e:Event)<-[:TO]-(response {response: 'yes'})
         WITH e, COLLECT(response) AS yeses
         MATCH (e)<-[:TO]-(response {response: 'no'})<-[:NEXT]-()
         WITH e, COLLECT(response) + yeses AS responses
         UNWIND responses AS response
         RETURN response.time AS time, e.time + e.utc_offset AS eventTime, response.response AS response"
allRSVPs = cypher(graph, query)
allRSVPs$time = timestampToDate(allRSVPs$time)
allRSVPs$eventTime = timestampToDate(allRSVPs$eventTime)
allRSVPs$difference = as.numeric(allRSVPs$eventTime - allRSVPs$time, units="days")

The query is a bit because we want to capture the ‘no’ responses when they initially said yes which is why we check for a ‘NEXT’ relationship when looking for the negative responses.

Let’s inspect allRSVPs:

> allRSVPs[1:10,]
                  time           eventTime response difference
1  2014-06-13 21:49:20 2014-07-22 18:30:00       no   38.86157
2  2014-07-02 22:24:06 2014-07-22 18:30:00      yes   19.83743
3  2014-05-23 23:46:02 2014-07-22 18:30:00      yes   59.78053
4  2014-06-23 21:07:11 2014-07-22 18:30:00      yes   28.89084
5  2014-06-06 15:09:29 2014-07-22 18:30:00      yes   46.13925
6  2014-05-31 13:03:09 2014-07-22 18:30:00      yes   52.22698
7  2014-05-23 23:46:02 2014-07-22 18:30:00      yes   59.78053
8  2014-07-02 12:28:22 2014-07-22 18:30:00      yes   20.25113
9  2014-06-30 23:44:39 2014-07-22 18:30:00      yes   21.78149
10 2014-06-06 15:35:53 2014-07-22 18:30:00      yes   46.12091

We’ve returned the actual response with each row so that we can distinguish between responses. It will also come in useful for pivoting our single chart later on.

The next step is to get ggplot to generate our side by side charts. I started off by plotting both types of response on the same chart:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  geom_bar(binwidth=1)

2014 07 25 22 14 28

This one stacks the ‘yes’ and ‘no’ responses on top of each other which isn’t what we want as it’s difficult to compare the two.

What we need is the facet_wrap function which allows us to generate multiple charts grouped by key. We’ll group by ‘response’:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  geom_bar(binwidth=1) + 
  facet_wrap(~ response, nrow=2, ncol=1)

2014 07 25 22 34 46

The only thing we’re missing now is the red and green colours which is where the scale_fill_manual function comes in handy:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  scale_fill_manual(values=c("#FF0000", "#00FF00")) + 
  geom_bar(binwidth=1) +
  facet_wrap(~ response, nrow=2, ncol=1)

2014 07 25 22 39 56

If we want to show the ‘yes’ chart on top we can pass in an extra parameter to facet_wrap to change where it places the highest value:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  scale_fill_manual(values=c("#FF0000", "#00FF00")) + 
  geom_bar(binwidth=1) +
  facet_wrap(~ response, nrow=2, ncol=1, as.table = FALSE)

2014 07 25 22 43 29

We could go one step further and group by response and day. First let’s add a ‘day’ column to our data frame:

allRSVPs$dayOfWeek = format(allRSVPs$eventTime, "%A")

And now let’s plot the charts using both columns:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  scale_fill_manual(values=c("#FF0000", "#00FF00")) + 
  geom_bar(binwidth=1) +
  facet_wrap(~ response + dayOfWeek, as.table = FALSE)

2014 07 25 22 49 57

The distribution of dropouts looks fairly similar for all the days – Thursday is just at an order of magnitude below the other days because we haven’t run many events on Thursdays so far.

At a glance it doesn’t appear that so many people sign up for Thursday events on the day or one day before.

One potential hypothesis is that people have things planned for Thursday whereas they decide more last minute what to do on the other days.

We’ll have to run some more events on Thursdays to see whether that trend holds out.

The code is on github if you want to play with it

Categories: Programming

Why Project Management is a Control System

Herding Cats - Glen Alleman - 9 hours 56 min ago

When it is mentioned project management is a control system many in the agile world whince. But in fact project is a control system, a closed loop control system.

Here's how it works.

  • We have a goal, a target, some desired outcome.
  • The desired outcome usually comes with a budget - some expected cost.
  • It also comes with a time frame for achieving that desired outcome.
  • That outcome usually - or should if we're doing it right - has a beneficial outcome.

Each of these elements has some unit of measure:

  • Time - the day we need to deliver value or aa capability to meet the business goals or accomplish a mission. There can of course be incremental delievrables, but those also have a time element.
  • Outcome might be he accomplishments of a mission or fulfillment of the business strategy
  • Cost - there is no way to determine the value of anything without knowing its cost. This is the foundation of microeconomics. This can only happen - not knowing cost - by intentionally ignoring the principles of micro-economics. It's done I know, but Don't Do Stupid Things On Purpose.

Here's a small example of incremental delivery of value in an enterprise domain

Project Maturity Flow is the Incremental Delivery of Business Value

The accomplishment of a mission or fulfillment of a business strategy can be called the value produced by the project. In the picture above the value delivered to the business is incremental, but fully functional on delivery to accomplish the business goal. These goals are defined in Measures of Effectiveness and Measures of Performance and these measures are derived from the business strategy or mission statement. So if I want a fleet of cars for my taxi service, producing a sketboard, then a bicycle, is not likley to accomplishment the business goal.

But the term value alone is nice, but not sufficient. Value needs to have some unit of measure. Revenue, cost reduction, environmental cleanup, education of students, reduction of disease, the process of sales orders at a lower cost, flying the 747 to it's destination with minimal fuel. Something that can be assessed in tangible units of measure.

In exchange for this value, with it's units of measure, we have the cost of producing this value.

To assess the value or the cost, we need to know the other item. We can't know the value of something without knowing its cost. We can't know if the cost is appropriate without knowing the value produced by the cost.

This is one principle of Microeconomics of software development

The process of deciding between choices about cost and value - the trade space between cost and value - starts with information about both cost and value. This information lives in the realm of uncertainty before and during the project's life-cycle. It is only known on the cost side after the project completes. And for the value may never be known in the absence of some uncertainty as to the actual measure. This is also a principle of microeconomics - the measures we use to make decisions are random variables.

To determine the value of the random variable we need to estimate, since of course they are random. With these random variables - cost of producing value and the value exchanged for the cost, the next step in projects is to define what we want the project to do:

  • The desired outcome in the form of capabilities.
  • The desired cost for the desired value.
  • The desired time for the delivery of that desired value for the desired cost.
  • The confidence that we can show up on or before the desired time, at or below the desied cost to delivery the desired value, and deliver the needed capabilities to fulfill the mission.

The actual delivery of this value can be incremental, it can be iterative, evolutionary, linear, big bang, or other ways. Software many times can be iterative or incremental, pouring concrete and welding pipe can as well. Building the Interstate might be incremental, the high rise usually needs to wait for the occupancy permit before the value is delivered to the owners. There is no single approach.

For each of these a control system is needed to assure progress to plan is being made. The two types of control systems are Open Loop and Close Loop. The briefing below speaks to those and their use.

So In The End When we hear about a control loop applied to project management, we'll now know about Open and Closed Loop control. And that there can't be a Closed Loop control process without.
  • A desired outcome - the target budget, date, or some performance parameter.
  • Measures of progress to plan - what has the project been doing to date? This should be measured in units of physical percent complete. Working software is a popular platitude in the agile community. But is it the right working software to fulfill the needed capabilities developed through a Capabilities Based Planning process.
  • Variances between actual and plan - with the target outcomes, capture actuals and calculate variances. These variances are the information needed to make business decisions.
    • These decisions are based not only past performance - the actual's - but future performance - the estimates of future performance given the past performance.
    • This future performance must also be risk adjusted.
    • Both the future performance and the uncertainties that create risks to this performance are statistical processes, producing probabilistic outcomes, that are integrated into the decision making processes.
  • Take Corrective Actions - to keep the project inside the white lines. Using the past performance or cost, schedule, and technical outcomes, the assessment of variances, the role of management is to take corrective action to meet the desired outcomes of the project.
    • The cost goals - we have a target budget that is used for assesing the Return on Investment or target product margin.
    • The schedule goals - we have planned go live or release date that been communicated to the market or the customer.
    • The Needed Capabilities - for the product (internal or external) to earn its keep
    • Adjustments - to each of these attributes required management action, assessment of this action to actually get back to GREEN, in our parlance, and keep the project headed to success.
  • Monitor these actions against plan - once corrections are taken, management must still monitor the project to assure it stays on plan.
Related articles Elements of Project Success The Value of Information Control Systems - Their Misuse and Abuse Four Critical Elements of Project Success Critical Thinking Skills Needed for Any Change To Be Effective Seven Immutable Activities of Project Success How Not To Make Decisions Using Bad Estimates Why is Statistical Thinking Hard?
Categories: Project Management

Marketing scrum vs IT scrum - a report published and presented at agile 2014

Xebia Blog - 13 hours 55 min ago

As we know, Scrum is the perfect framework for IT / software development projects to learn, adapt to change and deliver great software of value, faster.

But is Scrum also usable outside of software development? Can we apply similar or maybe even the same principals in other departments in the enterprise?

Yes, we can! And yes there are differences but there are also a lot of similarities.

We (Remco en Me)  successfully implemented Scrum in the marketing departments of two large companies: The Dutch AAA and ING Bank. Both companies are now using Scrum for the development of new campaigns, their full commercial expressions and even at the product development level. They wanted a faster time to market, more ownership, and greater innovation. How did we approach and realized a transition with those goals in the marketing environment? And what are the results?

So when we are not delivering software but other things, how does Scrum change? Well, a great deal actually. The people working in these other departments are, in general, quite different to those in Software Development (and yes more than you would expect). This means coaches or change agents need to take another approach.

Since the people are different, it is possible to go faster or ‘deeper’ in certain areas. Entrepreneurial skills or ambitions are more present in marketing. This gives a sense of ‘act first apologize later’, taking ownership, a higher drive to succeed, and upfront and willing behavior. Scrumming here means thinking more about business goals and KPIs (how to go from department to scrumteam goals for example). After that the fun begins…

I will be speaking about this topic at agile 2014. A great honor offcourse to be standing there. I will also attende the conference and therefor try to post some updates here.

To read more about this topic you can read my publication about marketing scrum. It has the extensive research paper I publisched about this story. Please feel free to give me comments and questions either about agile 2014 or the paper.

 

Enjoy reading the paper:

Marketing scrum vs IT scrum – two marketing case studies who now ‘act first and apologize later'

 

Stuff The Internet Says On Scalability For July 25th, 2014

Hey, it's HighScalability time:


It's systems all the way down. Bugs That Call Us Home.
  • 1 million users in just 4 days: Yo;  30 billion: Pinterest Pins
  • Quotable Quotes:
    • @GlennF: Amazon still dreams it is a startup, like a dog dreaming of chasing rabbits, twitching its legs while asleep.
    • @mfdii: Nobody knows how git works. We all just type in commands like monkeys trying to write Shakespeare. #devopsdays 
    • Benedict Evans: When you pull these strands together, smartphones don't just increase the size of the internet by 2x or 3x, but more like 5x or 10x. It's not just how many devices, but how different those devices are, that has the multiplier effect.
    • @Aaronontheweb: @codinghorror I broke this rule for myself last week. Spent 3 days fixing a problem that we finally solved by a $0.06/hour AWS bill increase
    • Physicist George Ellis: Barring something very unforeseen – the possible tests of the very large and the very small are coming towards the limits of whatever will be possible.
    • The Master Switch: Once the industry had concluded that its profits could be maximized if more people listened to fewer stations, the government, acting as if the business of America were only business, did the industry’s bidding, showing only the most feeble awareness of its consequences for the American ideal of free expression.

  • Ex-Googlers try to recreate Spanner with CockroachDB (awesome name!), which is A Scalable, Geo-Replicated, Transactional Datastore. The design is here and looks good. There's an article on Wired. Good discussion on HackerNews. A globally distributed transactional database it's not, yet, but it's early days yet. After all, they can only work in the dark.

  • Useful post on Handling 1 Billion requests a week with Symfony2. Symfony2 provides good performance and a nice development environment. HAProxy distributes to application servers. Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Redis and MySQL for storing data. MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. 

  • The truest form of the Interest Graph on the net? Details on how Pinterest scales their data infrastructure to create a personalized discovery engine. 20 terabytes of new data each day. 10 petabytes of data in S3. 100 regular Mapreduce users run over 2,000 jobs each day through Qubole. 6 standing Hadoop clusters comprised of over 3,000 nodes. 

  • Just like Captain Kirk. Shifts In Algorithm Design: Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful. The brilliance of the current approach is that we can change the problem. 

  • Wet Computing Could Put a Terabyte in a Tablespoon: Researchers from the University of Michigan and New York University demonstrated how plastic nanoparticles, deposited in a liquid, can form a one-bit cluster—the essential building block for information storage. It's called "wet computing," and the technique mimics other biological processes found in nature, like DNA in living cells.

  • Daniel Eloff: The world is not just going massively multicore, it's going heterogeneous core. The one core fits all model of programming is going away. Big performance and efficiency gains can be had from splitting your application among different types of specialized processors. Programmable hardware with FPGAs seems like a natural extension of this trend.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

The Drag of Old Mental Models on Innovation and Change

“Don’t worry about people stealing your ideas. If your ideas are any good, you’ll have to ram them down people’s throats.” — Howard Aiken

It's not a lack of risk taking that holds innovation and change back. 

Even big companies take big risks all the time.

The real barrier to innovation and change is the drag of old mental models.

People end up emotionally invested in their ideas, or they are limited by their beliefs or their world views.  They can't see what's possible with the lens they look through, or fear and doubt hold them back.  In some cases, it's even learned helplessness.

In the book The Future of Management, Gary Hamel shares some great insight into what holds people and companies back from innovation and change.

Yesterday’s Heresies are Tomorrow’s Dogmas

Yesterday's ideas that were profoundly at odds with what is generally accepted, eventually become the norm, and then eventually become a belief system that is tough to change.

Via The Future of Management:

“Innovators are, by nature, contrarians.  Trouble is, yesterday's heresies often become tomorrow's dogmas, and when they do, innovation stalls and the growth curve flattens out.”

Deeply Held Beliefs are the Real Barrier to Strategic Innovation

Success turns beliefs into barriers by cementing ideas that become inflexible to change.

Via The Future of Management:

“... the real barrier to strategic innovation is more than denial -- it's a matrix of deeply held beliefs about the inherent superiority of a business model, beliefs that have been validated by millions of customers; beliefs that have been enshrined in physical infrastructure and operating handbooks; beliefs that have hardened into religious convictions; beliefs that are held so strongly, that nonconforming ideas seldom get considered, and when they do, rarely get more than grudging support.”

It's Not a Lack of Risk Taking that Holds Innovation Back

Big companies take big risks every day.  But the risks are scoped and constrained by old beliefs and the way things have always been done.

Via The Future of Management:

“Contrary to popular mythology, the thing that most impedes innovation in large companies is not a lack of risk taking.  Big companies take big, and often imprudent, risks every day.  The real brake on innovation is the drag of old mental models.  Long-serving executives often have a big chunk of their emotional capital invested in the existing strategy.  This is particularly true for company founders.  While many start out as contrarians, success often turns them into cardinals who feel compelled to defend the one true faith.  It's hard for founders to credit ideas that threaten the foundations of the business models they invented.  Understanding this, employees lower down self-edit their ideas, knowing that anything too far adrift from conventional thinking won't win support from the top.  As a result, the scope of innovation narrows, the risk of getting blindsided goes up, and the company's young contrarians start looking for opportunities elsewhere.”

Legacy Beliefs are a Much Bigger Liability When It Comes to Innovation

When you want to change the world, sometimes it takes a new view, and existing world views get in the way.

Via The Future of Management:

“When it comes to innovation, a company's legacy beliefs are a much bigger liability than its legacy costs.  Yet in my experience, few companies have a systematic process for challenging deeply held strategic assumptions.  Few have taken bold steps to open up their strategy process to contrarian points of view.  Few explicitly encourage disruptive innovation.  Worse, it's usually senior executives, with their doctrinaire views, who get to decide which ideas go forward and which get spiked.  This must change.”

What you see, or can’t see, changes everything.

You Might Also Like

The New Competitive Landscape

The New Realities that Call for New Organizational and Management Capabilities

Who’s Managing Your Company

Categories: Architecture, Programming

Playing with two most interesting new features of elasticsearch 1.3.0

Gridshore - 19 hours 57 min ago

Just a few days a go elasticsearch released version 1.3.0 of their flagship product. The first one is the most waited for feature called the Top hits aggregation. Basically this is what is called grouping. You want to group certain items based on one characteristic, but within this group you want to have the best matching result(s) based on score. Another very important feature is the new support for scripts. Better security options when using scripts using sandboxed script languages.

In this blogpost I am going to explain and show the top_hits feature as well as the new scripting support.


Top hits

I am going to show a very simple example of top hits using my music index. This index contains all the songs I have in my itunes library. First step is to find songs by genre, the following query gives (the default) 10 hits based on the match_all query and the terms aggregation as requested.

GET /mymusic/_search
{
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    }
  }
}

The response is of the format:

{
	"hits": {},
	"aggregations": {
		"byGenre": {
			"buckets": [
				{"key":"rock","doc_count":1910},
				...
			]
		}
    }
}

Now we add the query to the request, songs containing the word love in the title.

GET /mymusic/_search
{
  "query": {
    "match": {
      "name": "love"
    }
  }, 
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    }
  }
}

Now we have less hits, still a number of buckets and the amount of songs that match our query within that bucket. The biggest change is the score in the returned hits. In te previous query the score was always 1, now the score is different due to the query we execute. The highest score now is the song Love by The Mission. The genre for this song is Rock and the song is from the year 1990. Time to introduce the top hits aggregation. With this query we can return the top song containing the word love in the title per genre

GET /mymusic/_search
{
  "query": {
    "match": {
      "name": "love"
    }
  },
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 5
      },
      "aggs": {
        "topFoundHits": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

Again we get hits, but they are not different from the query before. The interesting part is in the aggs part. Here we add a sub aggregation to the byGenre aggregation. This aggregation is called topFoundHits of type top_hits. We only return the best hit per genre. The next code block shows the part of the response with the top hits, I did remove the content of the _source field in the top_hits to keep the response shorter.

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 141,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "byGenre": {
         "buckets": [
            {
               "key": "rock",
               "doc_count": 52,
               "topFoundHits": {
                  "hits": {
                     "total": 52,
                     "max_score": 4.715253,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "4147",
                           "_score": 4.715253,
                           "_source": {
                              "name": "Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "pop",
               "doc_count": 39,
               "topFoundHits": {
                  "hits": {
                     "total": 39,
                     "max_score": 3.3341873,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "11381",
                           "_score": 3.3341873,
                           "_source": {
                              "name": "Love To Love You",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "alternative",
               "doc_count": 12,
               "topFoundHits": {
                  "hits": {
                     "total": 12,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "b",
               "doc_count": 9,
               "topFoundHits": {
                  "hits": {
                     "total": 9,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "r",
               "doc_count": 7,
               "topFoundHits": {
                  "hits": {
                     "total": 7,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

Did you note a problem with my analyser for genre? Hint R&B!

More information on the top_hits aggregation can be found here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Scripting

Elasticsearch has support for scripts for a long time. The default scripting language was and is mvel up to version 1.3. It will change to groovy in 1.4. Mvel is not a well known scripting language. The biggest advantage is that mvel is very powerful. The disadvantage is that it is to powerful. Mvel does not come with a sandbox principle. Therefore is is possible to write some very nasty scripts even when only doing a query. This was very well shown by a colleague of mine (Byron Voorbach) who created a query to read private keys on developer machines who did not safeguard their elasticsearch instance. Therefore dynamic scripting was switched off in version 1.2 by default.

This came with a very big disadvantage, now it was not possible anymore to use the function_score query without resorting to stored scripts on the server. In version 1.3 of elasticsearch a much better way is introduced. Now you use sandboxed scripting languages like groovy to keep using the flexible approach. Groovy can be configured to include object creation and method calls that are allowed. More information about this is provided in the elasticsearch documentation about scripting.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html

Next is an example query that is querying my music index. This index contains all the songs from my music library. It queries all he songs after the year 1999 and calculates the score based on the year. So the newest songs get the highest score. And yes I know a sort by year desc would have given the same result.

GET mymusic/_search
{
  "query": {
    "function_score": {
      "query": {
        "range": {
          "year": {
            "gte": 2000
          }
        }
      },
      "functions": [
        {
          "script_score": {
            "lang": "groovy", 
            "script": "_score * doc['year'].value"
          }
        }
      ]
    }
  }
}

The score now becomes high, since we do a range query we get back only scores of one. Using the function_score as the multiplication of the year with the score, the end score is the year. I added the year as the only field to return, some of the results than are:

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 2895,
      "max_score": 2014,
      "hits": [
         {
            "_index": "mymusic",
            "_type": "itunes",
            "_id": "12965",
            "_score": 2014,
            "fields": {
               "year": [
                  "2014"
               ]
            }
         },
         {
            "_index": "mymusic",
            "_type": "itunes",
            "_id": "12975",
            "_score": 2014,
            "fields": {
               "year": [
                  "2014"
               ]
            }
         }
      ]
   }
}

Next up is the last sample, a combination of top_hits and scripting.

Top hits with scripting

We start with the sample from top_hits using my music index. Now we want to sort the buckets on the score of the best matching document in the bucket. The default is the number of documents in the bucket. As mentioned in the documentation you need a trick to do this.

The top_hits aggregator isn’t a metric aggregator and therefor can’t be used in the order option of the terms aggregator.

GET /mymusic/_search?search_type=count
{
  "query": {
    "match": {
      "name": "love"
    }
  },
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 5,
        "order": {
          "best_hit":"desc"
        }
      },
      "aggs": {
        "topFoundHits": {
          "top_hits": {
            "size": 1
          }
        },
        "best_hit": {
          "max": {
            "lang": "groovy", 
            "script": "doc.score"
          }
        }
      }
    }
  }
}

The results of this query again with most of the _source taken out is following. Compare it to the query in the top_hits section. Notice the different genres that we get back now. Also check the scores.

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 141,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "byGenre": {
         "buckets": [
            {
               "key": "rock",
               "doc_count": 37,
               "topFoundHits": {
                  "hits": {
                     "total": 37,
                     "max_score": 4.715253,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "4147",
                           "_score": 4.715253,
                           "_source": {
                              "name": "Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.715252876281738
               }
            },
            {
               "key": "alternative",
               "doc_count": 12,
               "topFoundHits": {
                  "hits": {
                     "total": 12,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.194550514221191
               }
            },
            {
               "key": "punk",
               "doc_count": 3,
               "topFoundHits": {
                  "hits": {
                     "total": 3,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.194550514221191
               }
            },
            {
               "key": "pop",
               "doc_count": 24,
               "topFoundHits": {
                  "hits": {
                     "total": 24,
                     "max_score": 3.3341873,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "11381",
                           "_score": 3.3341873,
                           "_source": {
                              "name": "Love To Love You",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 3.3341872692108154
               }
            },
            {
               "key": "b",
               "doc_count": 7,
               "topFoundHits": {
                  "hits": {
                     "total": 7,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 3.027156352996826
               }
            }
         ]
      }
   }
}

This is just a first introduction into the top_hits and scripting. Stay tuned for more blogs around these topics.

The post Playing with two most interesting new features of elasticsearch 1.3.0 appeared first on Gridshore.

Categories: Architecture, Programming

Quote of the Month July 2014

From the Editor of Methods & Tools - 23 hours 22 min ago
Research has shown that the presumption of selfishness is true for maybe 30% of most populations; another 50% are reliably unselfish, and the remaining 20% could go either way, depending on the context. If a company presumes that the undecided 20% are selfish, you can bet they will be selfish—it’s a self-fulfilling prophecy. But worse, the company will create an environment where the 50% of the people who are unselfish are forced to act selfishly. And losing the energy, commitment, and intelligence of half the workforce is perhaps the biggest ...

Building A Backlog: Notes On Observing For Gathering Requirements

3740422323_12654e3ec0_b

One of my jobs during high school was in a tire manufacturing plant in Memphis, Tennessee. On more than one occasion the hated and dreaded time and motion “guy” showed up to observe how I was doing my job. I never knew what the outcome of the observation was or whether the change to the four page process I performed to sort green tires was due to the observation. The job was never easier after the change. Reflecting on that time (and several industrial classes later) I understand that observation is an important tool for developing an understanding of how work should be done, but is not a tool to be used all the time. Using observation in the right scenarios and then taking steps to plan how you will observe is critical getting value for the effort needed.

Why observe? The simplest and clearest rational for using observation techniques is that users and stakeholders either don’t always know what they want or can’t always express their current needs and foresee their future needs. Therefore a new set of eyes will expose more and different needs. There are four typical reasons for observing should be considered as a tool gather knowledge and requirements.

  1. Physical location is a determinant. Processes and work flow is often affected by the physical location. When the physical layout of the people or machines could strongly affect the solution the team developing the initial backlog should observe the process in action to understand the nuances of flow of work.
  2. When people can’t tell you. Occasionally the process being studied will be so complex that no one is able to coherently describe how it works or how it should work. Even more occasionally asking is met by silence due to lack of trust. In both cases observation is a valid tool to develop an initial backlog.
  3. When interactions are crucial. Complex processes often require a wide range of interactions between people, tool and applications. Interactions, except when they cross the boundary, are difficult to identify unless you see them.
  4. When the output and the process don’t match. When, on occasion, the measured output or the output described by a manager does no match what is possible based on the published process then observing the real process is mandatory.

Once you have decided that you must observe, planning becomes a necessity.

  1. Begin by reviewing the known policies, culture and process of the organization or team being observed. This step helps to ensure that you have a sense of the environment and what you will be seeing.
  2. Decide on how long you will observe. Some processes and process variations need time to be seen. If a process requires a week to complete you will need to observe for at least that amount of time.
  3. Determine how you will record what you see. Trying memorize what you see will result in some information, however you will at least need to take notes. Remember that recording can include taking notes or recording audio and video. The level of detail needed will help determine the method needed.
  4. Finalize the logistics of the observation session before showing up. Office space, network and physical access can suck up huge quantities of time and effort. If you have a week for observation do not spend the first day dealing with administration tasks.
  5. Decide how you will create rapport with the group you are observing. Your presence will cause disruption. You need to find a way of observing with minimal impact to the results and without scaring those you are observing into calling placement firms. I am a fan of transparency; tell people why you observing and what will be done with the data. Where possible I usually involve those that I have observed in an early review of the data collected to elicit more information (hybridization of techniques by combining observing and asking).
  6. Finally do what was planned, but do not be afraid to tweak the plan as needed.

When I was in the tire plant, the time and motion guy would just appear and no one was thrilled. When we saw him coming we followed the proscribed process a bit more carefully, even if it was less effective. Observation can change behavior positively or negatively (Hawthorne effect). Sometimes observation might be the only way to know what is really happening, but without planning the data you gather might be what someone wants you to know rather than what you need to know.


Categories: Process Management

Review The Twitter Story by Nick Bilton

Gridshore - Thu, 07/24/2014 - 23:35


A lot of people dream of creating the new Facebook, Twitter or Instagram. Nobody knows when they will create it, most of the ideas just came to be big. Personally I do not believe I will ever create such a product. To busy with to much different things, usually based on great ideas of others. One thing I do like is reading about the success stories of others. I read the book about starbucks, microsoft, apple and a few others.

Recently I started reading the Twitter Story. It reeds like an exciting story, nevertheless it is telling a story based on interviews and facts behind one of the most exciting companies on the internet of the last century.

I do not think it is a coincidence that a lot of what I read in the Lean Startup but also the starbucks story is coming back in the twitter story. One thing that struck me in this book is what business does to friendship. It is also showing that people with great ideas are usually not the people that make this ideas into a profitable company.

When starting a company based on your terrific idea, read this book and learn from it. It might make your life a lot better.

The post Review The Twitter Story by Nick Bilton appeared first on Gridshore.

Categories: Architecture, Programming

Purpose, Not Discipline

NOOP.NL - Jurgen Appelo - Thu, 07/24/2014 - 16:06
Purpose, Not Discipline

I tried running a few years ago, but I stopped because of shin splints and impossible work schedules.

I tried a fitness school, for several months, but I hated all the machines and uninteresting equipment.

I tried Pilates exercises, for a few days, but I found the exercises on a mat mind-numbingly boring.

I tried swimming, for a week or two, but the pool was always crowded and it was far away from my home.

I tried body-weight exercises, for exactly two days, but I found them too hard, which didn’t really motivate me.

And I tried yoga, for less than a week, but it was as least as boring as the Pilates exercises.

The post Purpose, Not Discipline appeared first on NOOP.NL.

Categories: Project Management

How Do I Make $2,000 A Month On Passive Income?

Making the Complex Simple - John Sonmez - Thu, 07/24/2014 - 15:00

In this video I answer a question about how to make passive income from a book and a blog.

The post How Do I Make $2,000 A Month On Passive Income? appeared first on Simple Programmer.

Categories: Programming

How Not To Make Decisions Using Bad Estimates

Herding Cats - Glen Alleman - Thu, 07/24/2014 - 04:54

The presentation Dealing with Estimation, Uncertainty, Risk, and Commitment: An Outside-In Look at Agility and Risk Management has become a popular message for those suggesting we can make decisions about software development in the absence of estimates.

The core issue starts with first chart. It shows the actual completion of a self-selected set of projects versus the ideal estimate. This chart is now in use for the #NoEstimates paradigm as to why estimating is flawed and should be eliminated. How to eliminate estimates while making decisions about spending other peoples money is not actually clear. You'll have to pay €1,300 to find out. 

But let's look at this first chart. It shows the self-selected projects, the vast majority completed above the initial estimate. What is this initial estimate? In the original paper, the initial estimate appears to be the estimate made by someone for how long the project would take. No sure how that estimate was arrived at - the basis of estimate - or how was the estimate was derived. We all know that subject matter expertise is the least desired and past performance, calibrated for all the variables is the best.

So Here in Lies the Rub - to Misquote from Shakespeare's Hamlet

The ideal line is not calibrated. There is no assessment if the orginal estimate was credible or bogus. If it was credible, what was the confidence of that credibility and what was the error band on that confidence. 

This is a serious - some might say egregious - error in statistical analysis. We're comparing actuals to a baseline that is not calibrated. This means the initial estimate is meaningless in the analysis of the variances without an assessment of it accuracy and precision. To then construct a probability distribution chart is nice, but measured against what - against bogus data.

This is harsh, but the paper and the presentation provide no description of the credibility of the initial estimates. Without that, any statistical analysis is meaningless. Let's move to another example in the second chart.

Screen Shot 2014-07-23 at 11.22.14 AM

The second chart - below - is from a calibrated  baseline. The calibration comes from a parametric model, where the parameters of the initial estimate are derived from prior projects - the reference class forecasting paradigm. The tool used here is COCOMO. There are other tools based on COCOMO and Larry Putman's and other methods that can be used for similar calibration of the initial estimates. A few we use are QSM, SEER, Price.

One place to start is Validation Method for Calibrating Software Effort Models. But this approach started long ago with An Empirical Validation of Software Cost Estimation Models. All the way to the current approaches of ARIMA and PCA forecasting for cost, schedule, and performance using past performance. And current approaches, derived from past research, of tuning those cost drivers using Bayesian Statistics.

Screen Shot 2014-07-20 at 10.42.05 PMSo What's All The Flap About?

The issue of software management, estimates of software cost, time, and performance abound. We hear about it every day. Our firm works on programs that have gone Over Target Baseline. So we walk the walk every day.

But when there is bad statistics used to sell solutions to complex problems, that's when it becomes a larger problem. To solve this nearly intractable problem of project cost and schedule over run, we need to look to the root cause. Let's start with a book Facts and Fallacies of Estimating Software Cost and ScheduleFrom there let's look to some more root causes of software project problems. Why Projects Fail is a good place to move to, with their 101 common casues. Like the RAND and IDA Root Cause Analysis reports many are symptoms, rather than root causes, but good infomation all the same.

So in the end when it is suggested that the woo's of project success can be addressed by applying

  • Decision making frameworks for projects that do not require estimates.
  • Investment models for software projects that do not require estimates.
  • Project management (risk management, scope management, progress reporting, etc.) approaches that do not require estimates.

Ask a simple question - is there any tangible, verifiable, externally reviewed evidence for this. Or is this just another self-selected, self-reviewed, self-promoting idea that violates the principles of microeconomics as it is applied to software development, where:

  • Economics is the study of how people make decisions in resource-limited situations. This definition of economics fits the major branches of classical economics very well. 

  • Macroeconomics is the study of how people make decisions in resource-limited situations on a national or global scale. It deals with the effects of decisions that national leaders make on such issues as tax rates, interest rates, and foreign and trade policy, in the presence of uncertainty

  • Microeconomics is the study of how people make decisions in resource—limited situations on a  personal scale. It deals with the decisions that individuals and organizations make on such issues as how much insurance to buy, which word processor to buy, what features to develop in what order, whether to make or buy a capability, or what prices to charge for their products or services, in the presence of uncertainty. Real Options is part of this decision making process as well.

Economic principles underlie the structure of the software development life cycle, and its primary refinements of prototyping, itertaive and incremental development, and emerging requirements. 

If we look at writing software for money, it falls into the microeconomics realm. We have limited resources, limited time, and we need to make decisions in the presence of uncertainty.

In order to decide about the future impact of any one decision - making a choice - we need to know something about the furture which is itself uncertain. The tool to makes these decisions about the future in the presence of uncertainty is call estimating. Lot's of ways to estimate. Lots of tools to help us. Lots of guidance - books, papers, classrooms, advisers. 

But asserting we can in fact make decisions about the future in the presence of uncertainty without estimating is mathematically and practically nonsense. 

So now is the time to learn how to estimate, using your favorite method, because to decide in the absence of knowing the impact of that decision is counter to the stewardship of our customers money. And if we want to keep writing software for money we need to be good stewards first.

Related articles Averages Without Variances are Meaningless - Or Worse Misleading How to "Lie" with Statistics How to Fib With Statistics When Uncertainty is Good No Estimates of Costs and Schedule? The Value of Information COCOMO Model Why is Statistical Thinking Hard? Back To The Future The Failure of Open Loop Thinking
Categories: Project Management

All Decisions Are Based On Mathematics

Herding Cats - Glen Alleman - Thu, 07/24/2014 - 04:25

How Not To Be WrongObvious not every decision we make is based on mathematics, but when we're spending money, especially other people's money, we'd better have so good reason to do so. Some reason other than gut feel for any sigifican value at risk. This is the principle of Microeconomics.

All Things Considered is running a series on how people interprete probability. From capturing a terrortist to the probability it will rain at your house today. The world lives on probabilitic outcomes. These probabilities are driven by underlying statistical process. These statistical processes create uncertainties in our decision making processes.

Both Aleatory and Epistemic uncertainty exist on projects. These two uncertainties create risk. This risk impacts how we make decisions. Minimizing risk, while maximizing reward is a project management process, as well as a microeconomics process. By applying statistical process control we can engage project participants in the decision making process. Making decision in the presence of uncertainty is sporty business and many example of poor forecasts abound. The flaws of statistical thinking are well documented.

When we encounter to notion that decisions can be made in the absence of statistical thinking, there are some questions that need to be answered. Here's one set of questions and answers from the point of view of the mathematics of decision making using probability and statistics.

The book opens with a simple example.

Here's a question. We're designing airplanes - during WWII - in ways that will prevent them getting shot down by enemy fighters, so we provide them  with armour. But armor makes them heavier. Heavier planes are less maneuverable and use more fuel. Armoring planes too much is a proplem. Too little is a problem. Somewhere in between is optimum.

When the planes came back from a mission, the number of bullet holes was recorded. The damage was not uniformly distributed, but followed this pattern

  • Engine - 1.11 bullet holes per square foot (BH/SF)
  • Fueselage - 1.73 BH/SF
  • Fuel System - 1.55 BH/SF
  • Rest of plane - 1.8 BH/SF
The first thought was to provide armour where the need was the highest. But after some thought, the right answer was to provide amour where the bullet holes aren't - on the engines. "where are the missing bullet holes?" The answer was onb the missing planes. The total number of planed leaving minus those returning were the number of planes that were hit in a location that caused them not to return - the engines.

The mathematics here is simple. Start with setting a variable to Zero. This variables is the probability that a plane that takes a hit in the enginer manages to staty in the air and return to base. The result of this analysis (pp. 5-7 of the book) can be applied to our project work.

This is an example of the thought processes needed for project management and the decision making processes needed for spending other peoples money. The mathematician approach is to ask what assumptions are we making? Are they justified? The first assumption - the errenous assumption - was tyhat the planes returning represented were a random sample of all the planes. If so, the conclusions could be drawn.

In The End

Show me the numbers. Numbers talk BS walks is the crude phrase, but true. When we hear some conjecture about the latest fad think about the numbers. But before that read Beyond the Hype: Rediscovedring the Essence of Management, Robert Eccles and Nitin Nohria. This is an important book that lays out the processes for sorting out the hype - and untested and liley untestable conjectures - from the testable processes.

Related articles How To Fix Martin Fowler's Estimating Problem in 3 Easy Steps The World of Probability and Statistics Stationary processes How Not To Be Wrong Why is Statistical Thinking Hard? Selection bias and bombers How Not To Make Decisions Using Bad Estimates
Categories: Project Management

Building A Backlog: Notes On Asking For Requirements

Asking requires listening and writing down what you hear!

Asking requires listening and writing down what you hear!

Asking stakeholders to describe or define requirements is the most common way to develop requirements for projects. Specific techniques include talking to stakeholders in the hall informally, interviews and questionnaires and very formal joint application design (JAD). These techniques are popular because asking and talking to people is easy and opens a dialog. However, while stakeholders may know their business need, they may not know the details of what they really want and need. Moderation and planning are critical for making all of the techniques in the category as effective as possible for creating an initial backlog. Examples of how moderation and planning could be implemented in two classic “asking” techniques are shown below:

Joint Application Design (JAD) is a very formal technique that is an off-shoot of Joint Application Development that evolved in the 1970s. JAD is highly structured approach to developing the requirements and design for an application or project. The process is based on the interaction between key roles (sponsor, subject matter experts including business and IT participants, facilitator, scribe and potentially observers). The process requires all roles. It should be noted that the JAD process was one of the earliest techniques used to embed business and IT personnel for any substantial period of time. The process (documented many places including Wikipedia) has a number of key steps that provide a structured approach for interaction and generating information. Setting the goal (one of the key steps) of the JAD acts as an anchor for the process and provides a tool for the facilitator to re-focus the process if it wanders off course. In order for a JAD to work, up-front planning is mandatory. The participants need to be carefully identified, the goals of the JAD identified and a detailed agenda with supporting documentation needs to be developed. Preparing for the JAD can take as long as the session itself. JADs typically run three to eight days and participants typically were sequestered from the typical working environment during the session.   The combination of skilled facilitator and structure help IT and business participants interact in a creative and productive fashion. Overall JAD is a very powerful technique, however the structure and overhead tend to make it more difficult to apply.

In its classic form, JAD is viewed as less than Agile. Historically it was used to develop the much abused, big up-front design (BUFD). Agile principles call out the concept of emergent design, while eschewing the BUFD. The practice of Agile  is generally more a reflection of finding the balance between what needs to be known and what needs to be discovered. I have used the formal structure of the JAD process as a tool to initiate Agile projects very successfully by refocusing the goal to build an initial product backlog. The combination of structure and facilitation is more valuable when a team is addressing a new business area or in matrix organizations where teams are assembled for each new project.

Interviews are another of the classic “asking” techniques. Interview techniques can range from formally scripted question and answer sessions to loosely guided discussions. Formal interview techniques begin by developing a set of questions to be asked during the interview. In formal interview situations, the responses to the questions in the scrip and any follow-on questions captured as close to verbatim as possible. A legal disposition is an example of a formal interview. They require the interviewers to prepare for the interview not only by developing the set of questions to be asked, but also to gather information about the general outline of the answer they are going to receive. A good interviewer is rarely surprised by the answer they receive. Informal interviews are typically less structured, however they still require preparation. In less formal scenarios I generally recommend developing a loose set of framing questions (framing questions capture the direction of interview without being specific) so that the interviewer develops a goal for the interview and then plans the approach to attain that goal. The framing process is important in case the interviewee throws a curve so that interviewer can gradually guide the interview back to the correct track. Take notes (do not trust your memory) in all interviews. While informal interview seem more like common conversation, interviewers that are good at the informal technique tend to good counter-punchers (able to deliver well formed follow questions that keep the interviewee talking) however even in an informal interview, the interviewee must always their ultimate goal in mind. In both formal and informal situations, if the interviewer is emotionally involved in what the answer should be, consider using a facilitator or external interviewer.

Asking stakeholders for requirements is a tried and true method to generate an initial backlog. Asking should not equate to ad hoc or mere order taking. Asking requires preparation to be effective whether using formal techniques based on JAD or informal interviews. As an interviewer you need to map out where you want the session to go and then act as the guide.


Categories: Process Management

Java: Determining the status of data import using kill signals

Mark Needham - Wed, 07/23/2014 - 23:20

A few weeks ago I was working on the initial import of ~ 60 million bits of data into Neo4j and we kept running into a problem where the import process just seemed to freeze and nothing else was imported.

It was very difficult to tell what was happening inside the process – taking a thread dump merely informed us that it was attempting to process one line of a CSV line and was somehow unable to do so.

One way to help debug this would have been to print out every single line of the CSV as we processed it and then watch where it got stuck but this seemed a bit over kill. Ideally we wanted to only print out the line we were processing on demand.

As luck would have it we can do exactly this by sending a kill signal to our import process and have it print out where it had got up to. We had to make sure we picked a signal which wasn’t already being handled by the JVM and decided to go with ‘SIGTRAP’ i.e. kill -5 [pid]

We came across a neat blog post that explained how to wire everything up and then created our own version:

class Kill3Handler implements SignalHandler
{
    private AtomicInteger linesProcessed;
    private AtomicReference<Map<String, Object>> lastRowProcessed;
 
    public Kill3Handler( AtomicInteger linesProcessed, AtomicReference<Map<String, Object>> lastRowProcessed )
    {
        this.linesProcessed = linesProcessed;
        this.lastRowProcessed = lastRowProcessed;
    }
 
    @Override
    public void handle( Signal signal )
    {
        System.out.println("Last Line Processed: " + linesProcessed.get() + " " + lastRowProcessed.get());
    }
}

We then wired that up like so:

AtomicInteger linesProcessed = new AtomicInteger( 0 );
AtomicReference<Map<String, Object>> lastRowProcessed = new AtomicReference<>(  );
Kill3Handler kill3Handler = new Kill3Handler( linesProcessed, lastRowProcessed );
Signal.handle(new Signal("TRAP"), kill3Handler);
 
// as we iterate each line we update those variables
 
linesProcessed.incrementAndGet();
lastRowProcessed.getAndSet( properties ); // properties = a representation of the row we're processing

This worked really well for us and we were able to work out that we had a slight problem with some of the data in our CSV file which was causing it to be processed incorrectly.

We hadn’t been able to see this by visual inspection since the CSV files were a few GB in size. We’d therefore only skimmed a few lines as a sanity check.

I didn’t even know you could do this but it’s a neat trick to keep in mind – I’m sure it shall come in useful again.

Categories: Programming

The New Realities that Call for New Organizational and Management Capabilities

“The only people who can change the world are people who want to. And not everybody does.” -- Hugh MacLeod

Is it just me or is the world changing faster than ever?

I hear from everybody around me (inside and outside of Microsoft) how radically their worlds are changing under their feet, business models are flipped on their heads, and the game of generating new business value for customers is at an all-time competitive high.

Great.

Challenge is where growth and greatness come from.  It’s always a chance to test what we’re capable of and respond to whatever gets thrown our way.  But first, it helps to put a finger on what exactly these changes are that are disrupting our world, and what to focus on to survive and thrive.

In the book The Future of Management, Gary Hamel shares some great insight into the key challenges that companies are facing that create even more demand for management innovation.

The New Realities We’re Facing that Call for Management Innovation

I think Hamel describes our new world pretty well …

Via The Future of Management:

  • “As the pace of change accelerates more, more and more companies are finding themselves on the wrong side of the change curve.  Recent research by L.G. Thomas and Richard D'Aveni suggests that industry leadership is changing hands more frequently, and competitive advantage is eroding more rapidly, than ever before.  Today, it's not just the occasional company that gets caught out by the future, but entire industries -- be it traditional airlines, old-line department stores, network television broadcasters, the big drug companies, America's carmakers, or the newspaper and music industries.”
  • “Deregulation, along with the de-scaling effects of new technology, are dramatically reducing the barriers to entry across a wide range of industries, from publishing to telecommunications to banking to airlines.  As a result, long-standing oligopolies are fracturing and competitive 'anarchy' is on the rise.”
  • “Increasingly, companies are finding themselves enmeshed in 'value webs' and 'ecosystems' over which they have only partial control.  As a result, competitive outcomes are becoming less the product of market power, and more the product of artful negotiation.  De-verticalization, disintermediation, and outsource-industry consortia, are leaving firms with less and less control over their own destinies.”
  • “The digitization of anything not nailed down threatens companies that make their living out of creating and selling intellectual property.  Drug companies, film studios, publishers, and fashion designers are all struggling to adapt to a world where information and ideas 'want to be free.'”
  • The internet is rapidly shifting bargaining power from producers to consumers.  In the past, customer 'loyalty' was often an artifact of high search costs and limited information, and companies frequently profited from customer ignorance.  Today, customers are in control as never before -- and in a world of near-perfect information, there is less and less room for mediocre products and services.”
  • Strategy cycles are shrinking.  Thanks to plentiful capital, the power of outsourcing, and the global reach of the Web, it's possible to ramp up a new business faster than ever before.  But the more rapidly a business grows, the sooner it fulfills the promise of its original business model, peaks, and enters its dotage.  Today, the parabola of success is often a short, sharp spike.”
  • “Plummeting communication costs and globalization are opening up industries to a horde of new ultra-low-cost competitors.  These new entrants are eager to exploit the legacy costs of the old guard.  While some veterans will join the 'race to the bottom' and move their core activities to the world's lowest-cost locations, many others will find it difficult to reconfigure their global operations.  As Indian companies suck in service jobs and China steadily expands its share of global manufacturing, companies everywhere will struggle to maintain their margins.”
Strategically Adaptive and Operationally Efficient

So how do you respond to the challenges.   Hamel says it takes becoming strategically adaptable and operationally efficient.   What a powerful combo.

Via The Future of Management:

“These new realities call for new organizational and managerial capabilities.  To thrive in an increasingly disruptive world, companies must become as strategically adaptable as they are operationally efficient.  To safeguard their margins, they must become gushers of rule-breaking innovation.  And if they're going to out-invent and outthink a growing mob of upstarts, they must learn how to inspire their employees to give the very best of themselves every day.  These are the challenges that must be addressed by 21st-century management innovations.”

There are plenty of challenges.  It’s time to get your greatness on. 

If there ever was a chance to put to the test what you’re capable of, now is the time.

No matter what, as long as you live and learn, you’ll grow from the process.

You Might Also Like

Simplicity is the Ultimate Enabler

The New Competitive Landscape

Who’s Managing Your Company

Categories: Architecture, Programming

How Pairing & Swarming Work & Why They Will Improve Your Products

If you’ve been paying attention to agile at all, you’ve heard these terms: pairing and swarming. But what do they mean? What’s the difference?

When you pair, two people work together to finish a piece of work. Traditionally, two developers paired. The “driver” wrote the piece of work. The other person, the “navigator,” observed the work, providing review, as the work was completed.

I first paired as a developer in 1982 (kicking and screaming). I later paired in the late 1980′s as the tester in several developer-tester pairs. I co-wrote Behind Closed Doors: Secrets of Great Management with Esther Derby as a pair.

There is some data that says that when we pair, the actual coding takes about 15-20% longer. However, because we have built-in code review, there is much less debugging at the end.

When Esther and I wrote the book, we threw out the original two (boring) drafts, and rewrote the entire book in six weeks. We were physically together. I had to learn to stop talking. (She is very funny when she talks about this.) We both had to learn each others’ idiosyncrasies about indentations and deletions when writing. That’s what you do when you pair.

However, this book we wrote and published is nothing like what the original drafts were. Nothing. We did what pairs do: We discussed what we wanted this section to look like. One of us wrote for a few minutes. That person stopped. We changed. The other person wrote. Maybe we discussed as we went, but we paired.

After about five hours, we were done for the day. Done. We had expended all of our mental energy.

That’s pairing. Two developers. One work product. Not limited to code, okay?

Now, let’s talk about swarming.

Swarming is when the entire team says, “Let’s take this story and get it to done, all together.” You can think of swarming as pairing on steroids. Everyone works on the same problem. But how?

Someone will have to write code. Someone will have to write tests. The question is this: in what order and who navigates? What does everyone else do?

When I teach my agile and lean workshop, I ask the participants to select one feature that the team can complete in one hour. Everyone groans. Then they do it.

Some teams do it by having the product owner explain what the feature is in detail. Then the developers pair and the tester(s) write tests, both automated and manual. They all come together at about the 45-minute mark. They see if what they have done works. (It often doesn’t.) Then the team starts to work together, to really swarm. “What if we do this here? How about if this goes there?”

Some teams work together from the beginning. “What is the first thing we can do to add value?” (That is an excellent question.) They might move into smaller pairs, if necessary. Maybe. Maybe they need touchpoints every 15-20 minutes to re-orient themselves to say, “Where are we?” They find that if they ask for feedback from the product owner, that works well.

If you first ask, “What is the first thing we can do to add value and complete this story?” you are probably on the right track.

Why Do Pairing and Swarming Work So Well?

Both pairing and swarming:

  • Build feedback into development of the task at hand. No one works alone. Can the people doing the work still make a mistake? Sure. But it’s less likely. Someone will catch the mistake.
  • Create teamwork. You get to know someone well when you work with them that intensely.
  • Expose the work. You know where you are.
  • Reduce the work in progress. You are less likely to multitask, because you are working with someone else.
  • Encourage you to take no shortcuts, at least in my case. Because someone was watching me, I was on my best professional behavior. (Does this happen to you, too?)
How do Pairing and Swarming Improve Your Products?

The effect of pairing and swarming is what improves your products. The built-in feedback is what creates less debugging downstream. The improved teamwork helps people work together. When you expose the work in progress, you can measure it, see it, have no surprises. With reduced work in progress, you can increase your throughput. You have better chances for craftsmanship.

You don’t have to be agile to try pairing or swarming. You can pair or swarm on any project. I bet you already have, if you’ve been on a “tiger team,” where you need to fix something for a “Very Important Customer” or you have a “Critical Fix” that must ship ASAP. If you had all eyes on one problem, you might have paired or swarmed.

If you are agile, and you are not pairing or swarming, consider adding either or both to your repertoire, now.

Categories: Project Management

The Deadline to Sign up for GTAC 2014 is Jul 28

Google Testing Blog - Wed, 07/23/2014 - 01:53
Posted by Anthony Vallone on behalf of the GTAC Committee

The deadline to sign up for GTAC 2014 is next Monday, July 28th, 2014. There is a great deal of interest to both attend and speak, and we’ve received many outstanding proposals. However, it’s not too late to add yours for consideration. If you would like to speak or attend, be sure to complete the form by Monday.

We will be making regular updates to our site over the next several weeks, and you can find conference details there:
  developers.google.com/gtac

For those that have already signed up to attend or speak, we will contact you directly in mid August.

Categories: Testing & QA

Building A Backlog: Three Categories

Requirements don't grow on trees

Requirements don’t grow on trees

Many discussions of Agile techniques begin with the assumption that a backlog has magically appeared on the team’s door step. Anyone that has participated in any form of project, whether related to information technology, operations or physical engineering, knows that requirements don’t grow on trees. They need to be developed before a team can start to satisfy those requirements.  There are three primary ways to gather requirements based on how information is elicited.

  1. Asking: There are many techniques that focus on asking users, potential users and other stakeholders what they want the project to deliver.  Classic examples are joint application design, questionnaires and interviews (formal and informal).  This category gets used most often because of its simplicity (everyone thinks they know how to ask questions). For small and uncomplicated projects simply asking and talking about what is needed may well suffice.  Unfortunately this method can fail if the wrong people are asked and/or if those who are asked don’t know what they really want or need (remember the adage: you don’t know what you don’t know).
  2. Observing: Observation was one of the first methods I was taught when collecting requirements for process changes.  Watching how people work jumps over what people think happens and goes directly to the source. For example, an organization saw a substantial drop in the number of mortgage applications being keyed into the system around lunchtime.  None of the management stakeholders could explain the productivity change well. In order to get the bottom of the issue we observed work in the department for a week. What was found was the form was long (many pages) and forms not completed before lunch were lost and had to be restarted. The system was changed to allow portions of the form to be saved and then restarted, increasing throughput in the department. One of the drawbacks to watching is the impact of observation. As noted in the Hawthorne effect, paying attention to people can impact the output of the process, thereby generating false information.  Further, in cases where trust is an issue, observing a group can generate panic.
  3. Showing: Prototyping (functional or paper) or delivering functional products that stakeholders can interact with and react to are both great ways to push requirements past what is known and currently possible.  Paper prototypes are a mechanism for helping team members and stakeholders visualize how work could be done.  Some techniques start with how work is currently done, then visualizes change. Functional prototyping delivers software that functions at some level.  Functional prototyping requires planning and effort as if a project in its own right.

Each of these categories can be overlapped to create hybrids.

An initial backlog is built at the beginning of a project, or at least very early on. If your organization has a strenuous budgeting and strategic planning process, the more that you will need to generate a detailed initial backlog before the team starts to work.  Whether a team develops just enough of a backlog to get started or builds a broader backlog, they need have a backlog before they start developing. Backlogs don’t grow on trees!


Categories: Process Management