The business problem that is Big Data
Big Data is a very catchy term that sounds straight forward but is tricky in reality. We all know what the word “big” means and we know what the word “data” means; put them together, and boom, both words instantly became more interesting and makes one an instant geek.
A traditional method of handling the data is to store it and process it in a single computer and every time the computer capability is not enough, we upgrade it with a better one — as a vertical upgrade. However, in some cases, these traditional methods are no longer feasible for the business needs and that’s where big data comes in.
The value of data
Generally, big data is classified using the concept of the five Vs: volume, velocity, variety, veracity, and value. Let’s consider the data in the health industry as an example. Hospitals across the world collect 2,314 exabyte (1 exabyte equals 1 billion gigabytes) annually in the form of patients’ records, test results, etc. All of this data is generated at a very high-speed rate, which attributes to the velocity of big data. The collected data is of different variety: patient visits, log files, blood test results, X-ray images, and CT scans. We need to keep the veracity of data in mind, knowing that some data may have come from faulty or non-calibrated sensors. But last, all of this data, when analyzed properly, can have value: it can enable faster disease detection, better and faster treatment with a lower cost, for instance.
The business applicability of big data is significant. Some of the broad applicable areas are cybersecurity, health care, tax compliance, business forecasting, blockchain in the oil and gas industry, and the list goes on. The opportunities are plentiful, all we need is a strong link between a well-defined business objective and the use of big data technology.
There are many proven frameworks in the market that handle big data, i.e., Casandra, Hadoop, and Spark, etc., all of which use nontraditional methods of dealing with data. For example, Hadoop stores big data using a technique called distributed file system, where a big file will be broken into smaller chunks and stored in different computers. Similarly, processing the data will also be distributed among multiple computers, known as parallel processing. Once the data is stored and processed it can be analyzed to extract knowledge and insights from it with all the Vs applied giving the advantages back to business. Choosing the right technology to implement and to reap its success can be summed up by, “To implement Big Data, take small steps.”
One of the biggest misconceptions of starting with big data is to think of it as a solution to an IT problem. A better way of looking at it is as a tool to solve a business problem. There is no point in collecting and storing all this data and then doing nothing with it. Instead, as now you know what big data is and how it can be used, look around you to determine and define a business problem that you would like to solve. Then, start with a small set of data and try to solve the problem. Once you are successful in solving it, scale up gradually, and that’s how big data will find you.
— By Ashish J. Abraham and Manar M. Sughayer