Why approaching big data collaboratively leads to big insights

By | April 7, 2014
The Financial Times claims, "“Big data” has arrived, but big insights have not." I disagree. Big insights have arrived too, but not along traditional channels. Big insights don't occur in statisticians' reports. Big insights occur in crowdsourced data analyses.

Crowdsourced data analysis occurs in open tools that let leaders put their data into the public – and let anyone have a stab at applying a plethora of analysis techniques to generate visual and statistical analyses. Some of these analyses are more insightful than others; the particularly insightful analyses spark yet others to add more insight to the analyses. This is a collaborative approach to statistical analysis, which matches the collaborative approach that social media paradigm imposes on us. And this is where the real 'big insights' take place today, and will increasingly do so in the future: in collaborative data analysis platforms – one that spontaneously comes to mind is www.manyeyes.com , though there are a few others out there as well.

/via +Francois Demers and +M Sinclair Stevens 

Big data: are we making a big mistake? – FT.com
Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of

7 thoughts on “Why approaching big data collaboratively leads to big insights

  1. M Sinclair Stevens

    +Sophie Wrobel Hmmm. I left a response to your comment on my own post of this story before I dashed off to work and now, upon returning home, it has disappeared. Too tired to reconstruct now it but it was about the misleading headline. I think the "mistake" was explained better in the story…mostly about bad samples and thinking that N=all when it doesn't.

  2. Dieter Mueller

    Can a pattern simply emerge from data and if so, will the pattern be better if there are lots and lots of instances?

    Yes, let's say you would try to figure out how Traffic flows in a City and which Streets are "stressed".

    Let's say you use Cell Phone to track Drivers as well as Traffic Data from Cameras.

    I am sure you'll agree that a Pattern will emerge just by itself, because of the Nature of the Task (Streets, Cars, Time, Amount, etc.).

    Patterns might change during different Hours and Seasons, also interesting to see if say one big Street is closed because of an Accident and how the Traffic "re-flows" around it.

    For some Tasks its actually easy to find Patterns, but it's much hard to find the Causes and Correlations within them and even harder to make Predictions how to shape them …

  3. Reinhard Puntigam

    In the classical model, theory is inferred from experience while experience is shaped by theory. Can a pattern simply emerge from data and if so, will the pattern be better if there are lots and lots of instances? 

    While there certainly are applications where size clearly matters, I remain unconvinced of the general idea that decisions on a platform "as if" there was no limiting bias will generate better results. 

    I seem to have seen more people making empirically wrong decisions than right ones. That's not the problem though.  "I thinking" might be. 

  4. Dieter Mueller

    I find this Article rather daft, because it knows that Big Data is not the Problems but Human's false Use of Statistics and Data Interpretation.

    Therefore the usual Nerd Bashing is out of Place, they have hardly invented the Love for Information and wishful Thinking that the Ghost in the Machine gives the perfect Answers about the Future …

    1. Collecting Data and getting Insights from it goes back to Egyptian and Babylonian Tax Collectors. Many early forms of Math, Statistics and Analysis go back to Taxation and running a Country.

    2. With any Form of Data it's important to ask the right Questions, have valid Samples and put them into a meaningful Context. These are the Tasks of "smart" Programmers and Managers – and has nothing to do with (Big) Data at all. In many Cases I have seen Managers struggle to make Head and Tails of all the Data they are swimming in. The Power of Deduction is not a given …

    3. That an Abstraction of a real Situation is always missing some Points is nothing new. That is why Managers should learn "reading" both: Data & Algorithms on one Side, real and anecdotal Evidence on the other. The same is especially true for Nerds who think that Data is everything and run into the Wiener Trap of Cybernetics (Feedback Loops aren't everything and even you try to collect everything you will miss something).

    Everybody who works with Abstraction from Reality faces the Problem to  only indulge preconceived Opinions, which means they only look at Data or make up Correlations for their Pet Arguments instead of being "neutral". Managers, Scientists and Activists are often guilty of working with too small Data Sets or stupid Interpretations of Abnormalities …

    I can't stress enough how important it is to ask the right Questions when facing huge Amounts of Data.

    4. I agree with +Sophie Wrobel that the more People look at Data the bigger the Chance that somebody asks the right Questions and finds actually meaningful Patters. The City of New York has made many of it's Data public and it was a Success. Overall I think especially public Services and Ministries should make as much Data available – although Bureaucrats are Masters of hiding Turds in plain Sight – but some Data is better than none.

    5. Tools like Hadoop and massive Cloud Computing have made it very easy and simple to collect and analyse massive Amounts of Data relatively cheap and effective. That is a good Thing. But with any new Toy Tool you need to be smart to use it and collect some Experience first. Mathematical Models always evolve – especially when it comes to complex "Systems" like the Climate or the Economy.

    Here we have the great Chance of improving our Data Models by improving sample Size, Quality and Context. Sure it will never perfectly "abstract" Reality, but come very close.

    It's an important Tool for our Future that we need to Master, because it can help a lot to funnel Resources where it matters and adapt Policies and Behaviour that is dangerous.

    (And yes, it is also dangerous, but we talked about the Datafication of our Lives so many Times it would be too redundant to repeat myself)

  5. Francois Demers

    Define "insight".
    Seriously, the word used to denote "penetrating intuition, serendipity, eureka…" It seems to me that, not only is it not algorithmic, it is also hard to explain after the fact by the insightful. (And that was one lousy sentence….)
    An idea I like, and I like it a lot, is algorithms can reveal patterns in large sets of data that would take a human years to find: data mining. Example: men who buy SUVs are 20% more likely to divorce than other men.
    Why? Insight needed.

  6. Reinhard Puntigam

    There seems to be notion in science which has long been abandoned in the humanities: The idea that you can produce information without an initial bias, a thesis or some form of ultimate purpose in mind. "Big data" and many other modern IT-related research fields have been actively denouncing this dynamic and seem to be going about their business as if no priors existed for what they are doing. 

    Transparency can address some of the issues at stake here. It will certainly be beneficial if decisions can be made not on the basis of a single analysis, but by some second order sampling of available interpretations. (Note the paradox +Sophie Wrobel)

    For some of the functions addressed by "Big Data" however, we'll have to make our peace with some of the very naive misconceptions about ourselves at work here. The political process, e.g., will mostly not get less messier with better data, because it essentially is about shaping the future based on diverse interests, not on predicting the inevitable by some form of data extrapolation. The medical sector will only be endurable if there is sufficient slack in the system: you can't service dying efficiently unless you are a killer. Good education is mostly about letting kids learn in their own tempo and with their own tools: Stuffing their cognition with statically compliant data is teaching a machine, not a human. 


Leave a Reply

Your email address will not be published.