Fencing Big Data to Secure its Integrity

Big Data Security. Image source itbusinessedge.com

The now infamous leak of classified NSA documents is attributed to Edward Snowden. The responsibilities assigned to him as a contractor for the NSA included moving especially sensitive documents from a common file sharing section to a more secure location. His job description afforded his access to highly classified documents according to Lonny Anderson, Chief Technology Officer at NSA.

The documents were in a common file sharing location so that the different law agencies within the nation could collaborate efficiently. The whole point of collecting and assessing big data is to find patterns, if agencies of law are denied access, the data becomes pointless.

Integrity or security for big data

The tug of war between data protection and improved collaboration is fast becoming the definitive challenge for the domain of Big Data. The rapid growth of data is forcing enterprises in both the public and the private sector to take advantage of the domain. While most organizations are not equipped to assess and streamline the information, outside organizations are invited to process the data. This channel of information leaving the walls of the organization poses a risk and an increased value.

A question of value

The vice president of marketing at Varonis said, “As every process gets digitized, your business cannot function without collaboration. Data you can’t share is a frozen asset. If nobody can access it, it has no value, but if too many people have access, it turns from an asset to liability. Secure collaboration, where all the right people have access, all data use is monitored and abuse is flagged, is where we have to get to.”

Varonis is accompany that is helping companies get this secure access to value collaboration. With over 2000 customers, Reuters predicts that the company is heading for an IPO by the end of 2013.

Division of digital labor

One the most painful and also inevitable by-product of the division of labor is the lack of integrity for the data. While the data is actually created by an individual, the liability and responsibility for the same is relegated to an organization. More importantly, the data that is created within the confines of one organization is accessed in a whole other organization, whether it is a customer, partner or supplier in the chain of digital collaboration. The fact that data is crossing organizational boundaries is posing the security risk.

While data ownership itself is under scrutiny, the issue is magnified by the woeful lack of data accountability and accounting. In a survey conducted by Varonis, most organizations were wholly unaware of where the data was stored, who was authorized to access it, who is responsible for it and who knew how to use the data within the organization.

Gibson said, “When distributed systems like Windows and UNIX started to proliferate in the mid – 1990s, there weren’t tools that allowed you to track your data, certainly not at today’s scale.”

The distributed systems today encompass a whole lot more than just Windows, including computing devices and mobile phones which generates unstructured data. Structured data is easier to audit and keep track of, unstructured data like presentations, emails and video files does not fit in to traditional data bases.

The right to data

Ponemon Institute conducted a study, according to which over 84% of organizations that responded, agreed that their users had access to data that they had no business requirement for. Gibson said, “One reason for this sorry state of affairs is that manually administering access rights to this data is nearly impossible – there is far too much data and it is growing too rapidly. There was one customer who had four full-time people in the data center answering requests for data access and figuring out who should grant the permission. We were able to automate this entire process.”

Figuring out who has the right to access the right set of data is only the tip of the iceberg according to many industry experts. Data management is finding bigger fish to fry every single day. The location of the data is often so complicated and hidden that it leads to costly and unnecessary duplication. The rights to sensitive data is often only given to temporary when the need arises. Once the project is completed, however, the rights are not revoked. With the increased use of storage applications that are cloud based, corporate information is uploaded to personal folders and forgotten about. It has to be deleted. When this employee leaves the organization, the data follows the employee to the next organization, wittingly or unwittingly.

While this unauthorized access to sensitive information is inadvertent most of the time, sabotage is possible. Varonis conducted a risk management report on a casino and found that 15 million credit card numbers were vulnerable as they were stored in a folder that everybody within the organization had access to. There was another file with 12 million credit card numbers that was also similarly left open and vulnerable to theft and misuse. The problem was simply an oversight that the security engineers within the organization fixed at the earliest.

Varonis is addressing the security issue

Since 2005, Varonis went about systematically addressing the issues plaguing data management. The company began with a comprehensive look at the why, where, what and who of the data produced and accessed by IT staff and the business partners of one particular company.

This comprehensive understanding lead to a map of the entire data in an enterprise allows the business heads and the IT leaders to make informed and joint decisions about who has permission to access the data and the question of ownership is also cleared. This data about data itself is referred to as metadata and in this scenario it is applied to big data.

Weeding out data

The mapping process employed by Varonis also aids in eradicate all duplicate information, cutting down metadata to a size that is manageable. Metadata that is not treated in any way will easily tower far above the size of the original data itself.

Algorithms and machine learning is then applied to the metadata to alert the owners if they find any anomalies in the pattern of access. If the profile of the person accessing the data does not match the profiles of all the others accessing the data then that constitutes an anomaly.

The pattern observation is similar to the one applied by credit card companies that rely on big data analytics to warn customers of unusual card activity. Varonis also boasts of a self-serve function. Gibson said, “In one month, the data owners made thousands of revocations of access on their own.” This revocation was made by one of the company’s clients.

The analysis on the metadata also helps the companies make classifications within the data. This enclosing of data within classes will help the companies when they need to flag information as and when regulation rules change. Varonis capitalized on the additional uses that clients were creating out of the analysis of the metadata.  They were using the analysis for other management tasks like preparing for data migration.

Very often when data has to move from one center to another, the analysis on metadata is handy as a guide. The analysis is also useful in case of deleting or archiving portions. Gibson said, “We decided to build a data transport engine. It automates end-to-end the process of data migration, archiving and retention.”

Cloud access to data

As with much of today’s technology, Varonis decided to add mobility to their services. DatAnywhere, the latest addition to the company’s profile, provides versatile access, similar to the cloud, to all the data that gets stored in the enterprise. Gibson spoke of the launch, “Our mission with the first release of DatAnywhere is to remove the temptation to use Dropbox and Box. We are giving our customers an experience that is comparable or better – they don’t care where the data is stored. The enterprise has complete control and the end user can access the data the way they want to.”


Varonis, like numerous other successful startups is addressing an unfulfilled void in the marketplace. The unidentified need that was Varonis is cashing in can be described as a mathematical formulation very close to the Metcalfe’s Law. The law states, “The lvel of risk associated with data is proportional to the square number of people sharing it. In other words, the more you share data, the greater the risk.”

The opposite of the law is also true. The fact that sharing data increases the value that one can get from it. The protection that most data requires is not available at organizations. Gibson spoke at length about this, “Nobody knows who grants access to what data. You are reviewing the keys on everybody’s key ring but you have no idea what doors they unlock. Now people realize they need a data-centric approach. When enough people realize that this really valuable asset is essentially in the dark without metadata and it’s unmanageable in its current form, we are poised for a watershed moment. You will not only be doing the day-to-day data management more efficiently, but you will also get the value out of the data with better and more secure collaboration.”

Latest Columns

BPO lags behind IT sector

BPO revenues are not growing like earlier times. Top tier IT companies are growing at an average of 20-25% this year (2010-11). Most analysts believe that the industry will not rise to the pre-recession growth levels at least for the next couple of years. India pioneered the BPO industry two decades ago. Today, a lot […]

Qualitative Measurement in call centers

In most call centers, qualitative measurement criteria, which focus on knowledge of products and services, customer service and call handling skills, and the policies of the organization, continue to become more refined and specific. Most use some form of monitoring (silent, with a beep tone, side by side, or record and review) to evaluate individual […]

Speak Your Mind