Big Data has fallen into the category of big words that people throw around without fully comprehending them. While it holds the promise of harnessing the power of large amounts of data, can it really help us better understand the world?
Is there a definitive definition for Big Data?
The term first came into circulation in the 1990s. The origin of the term, as believed by many industry experts is the Silicon Valley. IBM is one of very organizations that has put its name behind a definition for Big Data. IBM said, “Big Data is characterized by the four V’s of Volume, Variety, Velocity and Veracity.”
The many contexts that it is used in however leads to confusion. It has seen usage in fields like marketing, science, politics and sports. Ambiguity has now become synonymous with the domain.
Applications of big data are not very easy to qualify either. For example rearranging all pages on the internet according to the ranking is associated with Big Data. Searching the phone records of all Airtel or Vodafone customers for patterns in calling is also relegated to big data. Apart from such broad agreements, the niche areas are still negotiated for their classification into big data.
If the amount of data should exceed what can be processed by one home computer to be classified as Big Data, then marketing analytics and most of the work done by Facebook wouldn’t qualify.
The fact that it might still be classified in the same domain if certain technologies and tools from the fields of machine learning and artificial intelligence were never used, makes the domain ambiguous.
The term encompasses scientific efforts that are sophisticated and focused efforts to understand customer patters in the same data minefield. With so much ambiguity and relativity, firm definitions of the domain are never available. Scientists and industry experts only end up talking over each other when asked for definitive definitions.
Is Big Data the cutting edge of technology?
Although the technology and the term have been around since the 1990s, it seems as if it has blown into the industry quite recently. A Reuters report last year said, “If wonks were fashionistas, big data would be this season’s hot new color.” The McKinsey Global Institute also claimed that it was “the next frontier for innovation, competition and productivity.”
The amount of data that can be mined today is nothing short of massive. Whether it is scientific, textual, social or otherwise, the data is mined through the power of the computer and complex algorithms. This computer power is what makes big data seem new and powerful.
In the days gone by, computers meant people who performed calculations. In such times data sets that were only a fraction of the data sets of the current age, were hard to compute and interpret.
Linguistics with vast data sets can be traced 800 years into history. The alphabetical indexes of words referred to in the bible and their context meant that analysis was performed on the data set that is the bible. The analysis is comparable to the analysis conducted today.
Science and statistics have relied on big data for a very long time now. Samuel Arbesman elucidated, “In the early 1600s, Johannes Kepler used Tycho brahe’s detailed astronomical dataset to elucidate certain laws od planetary motion. Astronomy in the age of the Sloan Digital Sky Survey is certainly different and more awesome, but it’s still astronomy.
Ask Statisticians, and they will tell you that they have been analyzing big data – or ‘data’, as they less redundantly call it – for centuries. As they like to argue, big data isn’t much more than a sexier version of statistics, with a few new tools that allow us to think more broadly about what data can be and how we generate it.”
Is Big Data revolutionary?
Viktor Mayer-Schonberger and Kenneth Cukier in their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think”, compare the data deluge that we are in today to the revolution that the Gutenberg printing press brought about.
Big Data is revolutionary in that precise advertising with a miniscule margin of error is now possible. The impact on a normal man’s life, however, is going to be gradual and modest at best.
Large phenomenon or effects don’t need huge data sets to be recognized. Science has traditionally focused on these so called large effects. It is with subtlety that big data really comes in to play a significant role. Although it is termed “Big”, the domain can lead us to smaller bits of knowledge and information. It is therefore more useful in learning how to treat a disease or tailor a product better to the consumer. The effect that these bits of information will have is small. It is not therefore revolutionary for the common man.
Does size matter?
Big Data is automatically heralded as the best feature yet, but in reality it is almost a fad. In the scientific field, there is some ground breaking analysis done using Big Data. Businesses are trying to adopt the domain so as to gain an edge over the competition.
The reality of it is that it often leads to a mess. The number of variables increase with the data set. Big Data is not very manageable. It leads to a whole lot of quantity without quality. Quality is the feature that is most often overlooked.
Most businesses are biased and think that throwing the blanket of data over a problem will solve it. The bias in the way the data is collected is often the problem and not the size of the database.
Samuel Arbesman said, “For example, if you’re trying to understand how people interact based on mobile phone data, a year of data rather than a month’s worth doesn’t address the limitation that certain populations don’t use mobile phones.”
Smaller data sets are also quite capable of solving problems in businesses. The six degrees of separation, thanks to Facebook is now four degrees of separation. The initial theory on the six degrees of separation was thought out by Stanley Milgram, a psychologist while relying on only a small set of postcards and a whole lot of intelligence.
Samueal Arbesman said, “Furthermore, although it’s exciting to have massive datasets with incredible breadth, too often they lack much in the way of a temporal dimension. To really understand a phenomenon, such as a social one, we need datasets with large historical sweep. We need long data, not just big data.”
Can Big Data put an end to scientific theories?
Chris Anderson is all for Big Data ending theories in an essay called Big Data renders the scientific method obsolete, written in 2008. He believes that if you feed enough data to an advanced machine learning technique, then it will help you understand all the correlations and the relationships. He believes that through the use of data we will understand everything.
What Chris Anderson argues for is not really practical. It is not possible to hope that data will explain the world. Spurious correlations can easily override data and common sense. In order to interpret the data also theories, ideas and hypotheses are still required. Without questions answers are merely meaningless data sets.
Data therefore is not a substitute for thinking, exploring deep truths, and recognizing anomalies. While big data is a domain that can help, caution is advised with healthy dose of common sense.