There have been many good articles, papers and posts on this subject - see here and here for examples. Beyond that, there is a whole sub-methodology out there known as Agile Analytics that has been formalized. Experienced practitioners are just about unanimous in the opinion that a certain amount of agility (small a) is mandatory for data warehousing (DW) and Business Intelligence/analytics (BI/A) projects. For my part, I have always rejected the idea that traditional waterfall methodology applies effectively to BI/A projects and have resisted all attempts to impose it on my programs long before anyone was talking about Agile. Some of the major reasons include:
- BI requirements in general, and analytics requirements in particular, are generally somewhat fuzzy at the outset; users often cannot be specific about what they really want until they actually see what is possible in the form of a functional application
- Formal requirements definition, can often skipped in favor of rapid iterative prototyping - I should also be quick to point out that this does not remove the need for documentation; more on this later
- This, in turn really mandates the kind of iterative, rapid development cycles and ‘chunking’ of projects that are pillars of the Agile method
- Turning your program into a portfolio of shorter-term deliverables promotes early and continuous value delivery and ‘envy’ marketing internally
- Small teams with members who can play multiple roles (design, code, test, etc.) that include both technical and business (user) expertise are also very well suited to analytics development
- Knowledge sharing an mentoring among the teams is also uniquely advantageous for analytics, in part because of the need to continuously evolve analytic applications after release, and sustain supporting processes like data governance
- This model also discourages fine-grained role definition that often relegates junior staff to occasionally mind-numbing roles like QA and report/dashboard development
- Most importantly; I have seen too many BI/DW projects die of their own weight when they broke the rule that says you need to deliver something of value to your users within six months or risk losing your sponsorship, funding, or even relevance
So, end of story… Agile works for analytics, and all analytics shops should welcome an Agile development mandate from on high right? Not so fast… In my experience, Agile (capital A) is not perfectly suited to BI/A in a number of ways, mostly because it was not really developed with BI/A projects in mind.
Unlike more conventional software projects where Agile has been so successful, in BI/A programs we often build environments as opposed to applications. We create the ability for others, often users, to build the actual applications in self-service mode. Environments are specified as events/elements, dimensions and hierarchies; along with initial outputs like reports and dashboards. Please note that I am generalizing here; there are notable exceptions to this such as embedded operational BI applications within process automation. An example would be fraud scoring within customer service applications.
Then there is the fundamental Agile construct that is the user story. This idea needs to be redefined in a BI/A context. We often do not know in advance exactly how our environments will be used – as the answers usually beget the next questions when engaging in data discovery and data investigation.
Another BI/A trend that works against Agile is toward offshore development. Generally, offshoring works best when requirements can be locked down and documented with great rigor. Agile works best when all stakeholders can be co-located despite recent advances in collaboration technology.
Another factor that is often overlooked is the Agile mandate that projects are strictly time-boxed and code that is at least testable be delivered on short cycles, typically 2-3 weeks. This is often not entirely practical in the BI/DW context. Large projects can, and should be chunked up into smaller deliverable cycles; we generally accomplish this by dividing up the work by data subject area or decision class. If necessary, we can split development between a data infrastructure track (DW/ETL development) and a parallel track to develop the user facing pieces. Prototyping can be done using sample data extracts until the underlying data services infrastructure is complete. This all helps, but strict timeboxing can be hard to enforce, particularly for the infrastructure work, because of the dependencies on data source applications, and their schedules, which must be accommodated.
Another difference is that BI/A projects are often evergreen in the sense that they in a constant state of enhancement to adapt to changes in source data and decision scenarios; that is, the line between development and maintenance becomes blurry. It helps immensely to keep the core development team intact for this and not send them off to the next new Agile project.
You may be thinking that this all sounds pretty consistent with Agile philosophy, you just need to modify the orthodoxy a bit to fit in the BI/A context. I agree with that. In fact, I really thought Agile would be a great benefit to BI/DW practice since it discouraged waterfall thinking, but it became a case of ‘be careful what you wish for’, at least as it relates to the websites and mobile applications that are my main data sources these days.
When the applications (and websites) we are looking to instrument and analyze are developed using Agile, far more often than not one of the requirements that never gets properly documented or missed entirely is the one that specifies the need the capture the necessary event data properly. Why?
- This requirement does not fit very well into the Agile user story paradigm; the user of the source application is usually not the user of the generated data
- Scrum masters are rarely trained to recognize the need to generate data as a fundamental requirement of the application or product. If they do, it gets assigned a low priority especially when schedules get tight
- Data requirements can span the entirety of the application or website; as such the requirement applies to multiple scrum teams working mostly independently of each other
- Testing becomes especially challenging since the modules of the application must be tested along with the downstream analytic application in an integrated fashion once all the pieces are built
- It becomes necessary to identify the analytics team as a ‘user’ that gets to participate, test and iterate just like any other business user. This works best if the analytics team is part of the scrum, but most analytics teams are spread too thin to make that practical
So how do we reconcile this? Agile is here to stay. If there is an easy way, or one that has worked for you, please let me know. In the meantime, all I can suggest is that you try to work with Program and Scrum Masters that understand the notion that their applications are not fully successful if they do not yield the data necessary to analyze customer/user behavior and overall process effectiveness. Best case, the development and analytics programs become integrated to the point where the analyst looking to perform data discovery or generate performance metrics becomes a fully supported user story, with all that implies upstream.