There have been many good articles,
papers and posts on this subject - see here and here for examples. Beyond that, there
is a whole sub-methodology out there known as Agile
Analytics that has been
formalized. Experienced practitioners are just about unanimous in the opinion
that a certain amount of agility (small a) is mandatory for data warehousing
(DW) and Business Intelligence/analytics (BI/A) projects. For my part,
I have always rejected the idea that traditional waterfall methodology applies
effectively to BI/A projects and have resisted all attempts to impose it on my
programs long before anyone was talking about Agile. Some of the
major reasons include:
- BI requirements in general, and
analytics requirements in particular, are generally somewhat fuzzy at the
outset; users often cannot be specific about what they really want until
they actually see what is possible in the form of a functional application
- Formal requirements definition, can
often skipped in favor of rapid iterative prototyping - I should also
be quick to point out that this does not remove the need for
documentation; more on this later
- This, in turn really mandates the kind
of iterative, rapid development cycles and ‘chunking’ of projects that are
pillars of the Agile method
- Turning your program into a portfolio of
shorter-term deliverables promotes early and continuous value delivery and
‘envy’ marketing internally
- Small teams with members who can play
multiple roles (design, code, test, etc.) that include both technical and
business (user) expertise are also very well suited to analytics
development
- Knowledge sharing an mentoring among the
teams is also uniquely advantageous for analytics, in part because of the
need to continuously evolve analytic applications after release, and
sustain supporting processes like data governance
- This model also discourages fine-grained
role definition that often relegates junior staff to occasionally
mind-numbing roles like QA and report/dashboard development
- Most importantly; I have seen too many
BI/DW projects die of their own weight when they broke the rule that says
you need to deliver something of value to your users within six months or
risk losing your sponsorship, funding, or even relevance
So, end of
story… Agile works for analytics, and all analytics shops should welcome an
Agile development mandate from on high right? Not so fast… In my
experience, Agile (capital A) is not perfectly suited to BI/A in a number of
ways, mostly because it was not really developed with BI/A projects in mind.
Unlike more
conventional software projects where Agile has been so successful, in BI/A
programs we often build environments as opposed to applications. We
create the ability for others, often users, to build the actual applications in
self-service mode. Environments are specified as events/elements,
dimensions and hierarchies; along with initial outputs like reports and
dashboards. Please note that I am generalizing here; there are notable exceptions
to this such as embedded operational BI applications within process automation.
An example would be fraud scoring within customer service applications.
Then there
is the fundamental Agile construct that is the user story. This idea needs to
be redefined in a BI/A context. We often do not know in advance exactly
how our environments will be used – as the answers usually beget the next
questions when engaging in data discovery and data investigation.
Another BI/A
trend that works against Agile is toward offshore development. Generally,
offshoring works best when requirements can be locked down and documented with
great rigor. Agile works best when all stakeholders can be co-located
despite recent advances in collaboration technology.
Another
factor that is often overlooked is the Agile mandate that projects are strictly
time-boxed and code that is at least testable be delivered on short cycles,
typically 2-3 weeks. This is often not entirely practical in the BI/DW context.
Large projects can, and should be chunked up into smaller deliverable cycles;
we generally accomplish this by dividing up the work by data subject area or
decision class. If necessary, we can split development between a data
infrastructure track (DW/ETL development) and a parallel track to develop the
user facing pieces. Prototyping can be done using sample data extracts
until the underlying data services infrastructure is complete. This all helps,
but strict timeboxing can be hard to enforce, particularly for the
infrastructure work, because of the dependencies on data source applications,
and their schedules, which must be accommodated.
Another
difference is that BI/A projects are often evergreen in the sense that they in
a constant state of enhancement to adapt to changes in source data and decision
scenarios; that is, the line between development and maintenance becomes
blurry. It helps immensely to keep the core development team intact for
this and not send them off to the next new Agile project.
You may be
thinking that this all sounds pretty consistent with Agile philosophy, you just
need to modify the orthodoxy a bit to fit in the BI/A context. I agree
with that. In fact, I really thought Agile would be a great benefit to
BI/DW practice since it discouraged waterfall thinking, but it became a case of
‘be careful what you wish for’, at least as it relates to the websites and
mobile applications that are my main data sources these days.
When the
applications (and websites) we are looking to instrument and analyze are
developed using Agile, far more often than not one of the requirements that
never gets properly documented or missed entirely is the one that specifies the
need the capture the necessary event data properly. Why?
- This
requirement does not fit very well into the Agile user story paradigm; the
user of the source application is usually not the user of the generated
data
- Scrum
masters are rarely trained to recognize the need to generate data as a
fundamental requirement of the application or product. If they do, it gets
assigned a low priority especially when schedules get tight
- Data
requirements can span the entirety of the application or website; as such
the requirement applies to multiple scrum teams working mostly
independently of each other
- Testing
becomes especially challenging since the modules of the application must
be tested along with the downstream analytic application in an integrated
fashion once all the pieces are built
- It becomes
necessary to identify the analytics team as a ‘user’ that gets to
participate, test and iterate just like any other business user. This
works best if the analytics team is part of the scrum, but most analytics
teams are spread too thin to make that practical
So how do we
reconcile this? Agile is here to stay. If there is an easy way, or
one that has worked for you, please let me know. In the meantime, all I
can suggest is that you try to work with Program and Scrum Masters that
understand the notion that their applications are not fully successful if they
do not yield the data necessary to analyze customer/user behavior and overall
process effectiveness. Best case, the development and analytics programs
become integrated to the point where the analyst looking to perform data
discovery or generate performance metrics becomes a fully supported user story,
with all that implies upstream.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.