Are You Familiar with the Key Vocabulary of Big Data?

Big Data is no longer a new phenomenon in the business world. It has established itself as an essential tool to test, track and analyse datasets of unprecedented scale in order to obtain breakthrough results. As such, a large number of people are taking keen interest in various aspects of Big Data and are trying to explore its potential in new business fields.

However, Big Data is a huge field that involves a lot of technologies. There are many among us who are not fully aware of the vocabulary generally associated with Big Data.

Crowned Crane. Photographer MD Ramaswami

Crowned Crane. Photographer MD Ramaswami

Given below are some of the key terms that are important to understand the phenomenon of Big Data:

Algorithm: A mathematical formula based on which one or more software carries out analysis on a set of data. An algorithm may consist of multi-step calculations that can be used to process data in an automated manner

Analytics: A procedure to collect, process and analyse data in order to obtain meaningful insights that can be helpful for decision-making

Application: Software developed to perform a particular task or a group of tasks

Behavioural Analytics: A process that utilizes data about people’s behaviour in order to recognize or comprehend their purpose and predict their future course of actions

Biometrics: A process to use technology and analytics for the purpose of identification, extraction and analysis of data in order to identify people based on their physical traits. Biometrics generally involves face recognition, fingerprint recognition, etc.

Business Intelligence (BI): A process that deals with identification, extraction and analysis of data in order to improve business decisions and optimize business performance

Cloud: A term that refers to a network of remote servers that are hosted on the internet and are utilized to store, manage and process data

Columnar Database: A database management system that stores data in columns instead of in rows and offers the advantage of faster hard disk access

Content Management System (CMS): A computer application that is used to edit, modify, publish and maintain content on the Web from a central interface

Crowdsourcing: A process that refers to the act of obtaining information or input by enlisting the services of a large number of people, generally through the internet

Data Cleansing or Data Cleaning or Data Scrubbing: A process that involves reviewing and revising data in order to remove duplicate entries, rectify spelling mistakes, add missing data, remove inaccurate data and bring about more consistency in the data

Data Aggregation: A term used to describe the act of gathering data from various sources and expressing in a summary form for the purpose of analysis. Data aggregation can be performed manually or through software

Distributed File System (DFS): A data storage system that stores huge volumes of data across multiple storage devices in order to reduce the cost of data storage and to simplify the process

Data Migration: A process to transfer data between storage types, formats, or computer systems

Data Mining: A process that is used to inspect large databases with the intention to derive new information through data analysis

Distributed Processing: The act of execution of a process or application across multiple computers connected by means of a computer network

Enterprise Resource Planning (ERP): A software system that can be used by an organisation to process and manage all its resources, information and business functions

Event Analytics: A process involving a number of steps that are carried out in order to lead to an action

Failover: A procedure through which a computer system automatically switches or transfers control to another computer system when it discovers a fault or failure

Gamification: A process to utilize game elements in non-game contexts. In Big Data terms, gamification is a way to incentivize data collection.

Grid Computing: The process of performing computer functions by collecting resources from a number of distributed systems. The systems that become part of a grid computing network can be of different designs and can be located in various geographical locations

HANA: A software/hardware in-memory computing platform from SAP that can carry out high volume data transactions and analytics

Hadoop: An open-source software developed by the Apache Software Foundation. It is a framework that allows storage and analysis of large data sets through the use of distributed hardware

Hive: A data warehouse engine that allows querying. It is similar to SQL.

In-database Analytics or In-database Processing: The process of integration of data analytics into data warehousing functionality

Internet of Things: A term that describes the phenomenon in which everyday objects are connected through network and they possess the ability to send and receive data

Kafka: An open-source message broker project developed by the Apache Software Foundation. It is designed to provide a unified platform for managing real-time data feeds.

Latency: The delay that occurs between input into a system to desired outcome.

Legacy System: An obsolete or out-dated application, computer system or technology that continues to remain in use because it performs a needed function in an adequate manner

MapReduce: A process to break up an analysis into pieces in order to distribute them across multiple computers on the same network or across dissimilar and geographically separated systems (map), and then collecting the results to combine them into a report (reduce)

Mashup: A process to combine various datasets within a single application with the objective to increase output

Natural Language Processing (NLP): A software algorithm that enables computers to have better understanding of human languages to enhance human-computer interactions

NoSQL: A database management system that allows for storage and retrieval of data that is modelled in means other than the tabular relations used in relational databases

Online Analytical Processing (OLAP): A process to analyse various dimensions of multidimensional data

Operational Data Store (ODS): A database that allows the storage of data from multiple sources in order to enable more operations to be performed on the data before it is sent to a data warehouse for reporting

Pig: A data flow language and execution framework that allows parallel computation

Predictive Analytics: A process of extracting information from existing data sets with the intention to determine trends or patterns in order to predict future events

Query Analysis: A process to analyse a search query in order to optimize it for the best possible result

R: An open source programming language and software environment used for statistical computing and analytics

Radio-Frequency Identification (RFID): A technology used to transfer information about an object or item from one point to another through the use of wireless communications

Software as a Service (SAAS): A software distribution model that allows applications to be hosted and made available to customers over a network, generally the internet

Storm: An open-source computation system that allows the processing of multiple data streams in real time

Transactional Data: Data that change in an unpredictable manner

Unstructured Data: Data that has no identifiable structure or is not organized in a pre-defined manner

Variable Pricing: A pricing strategy to change prices based on supply and demand through real-time monitoring of consumption and supply

Latest Columns

BPO August 2010 News

TN promoting rural BPOs As part of its policy to take the IT sector to rural areas, Tamil Nadu government is holding talks with various companies and colleges for setting up of Rural business process outsourcing centres.The government recently unveiled its ambitious rural business process outsourcing Policy, unveiling an incentive-based approach to encourage establishment of […]

The changing face of the outsourcing industry in India

India has long been an attractive destination for outsourcing business. Due to its vast human resources, the rising number of skilled professionals, and last but not the least, cheap labor, foreign enterprises had always favored India for offshoring their services and products. This had resulted in a great number of job opportunities for Indians, and […]

Speak Your Mind