Big Data is no longer a new phenomenon in the business world. It has established itself as an essential tool to test, track and analyse datasets of unprecedented scale in order to obtain breakthrough results. As such, a large number of people are taking keen interest in various aspects of Big Data and are trying to explore its potential in new business fields.
However, Big Data is a huge field that involves a lot of technologies. There are many among us who are not fully aware of the vocabulary generally associated with Big Data.
Given below are some of the key terms that are important to understand the phenomenon of Big Data:
Algorithm: A mathematical formula based on which one or more software carries out analysis on a set of data. An algorithm may consist of multi-step calculations that can be used to process data in an automated manner
Analytics: A procedure to collect, process and analyse data in order to obtain meaningful insights that can be helpful for decision-making
Application: Software developed to perform a particular task or a group of tasks
Behavioural Analytics: A process that utilizes data about people’s behaviour in order to recognize or comprehend their purpose and predict their future course of actions
Biometrics: A process to use technology and analytics for the purpose of identification, extraction and analysis of data in order to identify people based on their physical traits. Biometrics generally involves face recognition, fingerprint recognition, etc.
Business Intelligence (BI): A process that deals with identification, extraction and analysis of data in order to improve business decisions and optimize business performance
Cloud: A term that refers to a network of remote servers that are hosted on the internet and are utilized to store, manage and process data
Columnar Database: A database management system that stores data in columns instead of in rows and offers the advantage of faster hard disk access
Content Management System (CMS): A computer application that is used to edit, modify, publish and maintain content on the Web from a central interface
Crowdsourcing: A process that refers to the act of obtaining information or input by enlisting the services of a large number of people, generally through the internet
Data Cleansing or Data Cleaning or Data Scrubbing: A process that involves reviewing and revising data in order to remove duplicate entries, rectify spelling mistakes, add missing data, remove inaccurate data and bring about more consistency in the data
Data Aggregation: A term used to describe the act of gathering data from various sources and expressing in a summary form for the purpose of analysis. Data aggregation can be performed manually or through software
Distributed File System (DFS): A data storage system that stores huge volumes of data across multiple storage devices in order to reduce the cost of data storage and to simplify the process
Data Migration: A process to transfer data between storage types, formats, or computer systems
Data Mining: A process that is used to inspect large databases with the intention to derive new information through data analysis
Distributed Processing: The act of execution of a process or application across multiple computers connected by means of a computer network
Enterprise Resource Planning (ERP): A software system that can be used by an organisation to process and manage all its resources, information and business functions
Event Analytics: A process involving a number of steps that are carried out in order to lead to an action
Failover: A procedure through which a computer system automatically switches or transfers control to another computer system when it discovers a fault or failure
Gamification: A process to utilize game elements in non-game contexts. In Big Data terms, gamification is a way to incentivize data collection.
Grid Computing: The process of performing computer functions by collecting resources from a number of distributed systems. The systems that become part of a grid computing network can be of different designs and can be located in various geographical locations
HANA: A software/hardware in-memory computing platform from SAP that can carry out high volume data transactions and analytics
Hadoop: An open-source software developed by the Apache Software Foundation. It is a framework that allows storage and analysis of large data sets through the use of distributed hardware
Hive: A data warehouse engine that allows querying. It is similar to SQL.
In-database Analytics or In-database Processing: The process of integration of data analytics into data warehousing functionality
Internet of Things: A term that describes the phenomenon in which everyday objects are connected through network and they possess the ability to send and receive data
Kafka: An open-source message broker project developed by the Apache Software Foundation. It is designed to provide a unified platform for managing real-time data feeds.
Latency: The delay that occurs between input into a system to desired outcome.
Legacy System: An obsolete or out-dated application, computer system or technology that continues to remain in use because it performs a needed function in an adequate manner
MapReduce: A process to break up an analysis into pieces in order to distribute them across multiple computers on the same network or across dissimilar and geographically separated systems (map), and then collecting the results to combine them into a report (reduce)
Mashup: A process to combine various datasets within a single application with the objective to increase output
Natural Language Processing (NLP): A software algorithm that enables computers to have better understanding of human languages to enhance human-computer interactions
NoSQL: A database management system that allows for storage and retrieval of data that is modelled in means other than the tabular relations used in relational databases
Online Analytical Processing (OLAP): A process to analyse various dimensions of multidimensional data
Operational Data Store (ODS): A database that allows the storage of data from multiple sources in order to enable more operations to be performed on the data before it is sent to a data warehouse for reporting
Pig: A data flow language and execution framework that allows parallel computation
Predictive Analytics: A process of extracting information from existing data sets with the intention to determine trends or patterns in order to predict future events
Query Analysis: A process to analyse a search query in order to optimize it for the best possible result
R: An open source programming language and software environment used for statistical computing and analytics
Radio-Frequency Identification (RFID): A technology used to transfer information about an object or item from one point to another through the use of wireless communications
Software as a Service (SAAS): A software distribution model that allows applications to be hosted and made available to customers over a network, generally the internet
Storm: An open-source computation system that allows the processing of multiple data streams in real time
Transactional Data: Data that change in an unpredictable manner
Unstructured Data: Data that has no identifiable structure or is not organized in a pre-defined manner
Variable Pricing: A pricing strategy to change prices based on supply and demand through real-time monitoring of consumption and supply