Big Data Analytics Needs a Code of Ethics

Drew Robb

Updated · Oct 08, 2014

Just how much data can private companies (or government agencies for that matter) collect on individuals and how much responsibility do they have in that regard? From a purely marketing, revenue or public safety point of view, the answers to those questions are “as much as we need to” and “none,” respectively.

But there have to be other considerations. Otherwise you end up with situations like Facebook being accused of going way beyond the bounds of acceptability by running experiments that manipulate the mood of the user.

“Fundamentally good ethics is good business, whether it’s how companies treat employees, how businesses follow regulations, the integrity of customer relationships or ensuring organizations have the proper regard for the communities they operate in,” said YY Lee, COO of data analytics provider FirstRain. “Ethical behavior builds trust that is an important and powerful enabler, and pays off in reducing barriers and increasing efficiency in doing business.”

There is no reason that this fundamental dynamic does not apply in the era of Big Data and data science. But it can be hard for companies to restrain themselves when they have at their fingertips the ability to harvest and analyze phenomenal amounts of data with a few clicks and a couple of cookies. Financial and retail companies, for example, are building incredibly sophisticated profiles of the general populace, while healthcare and pharmaceutical groups are able to map our bodies far better than we know ourselves. 

“Those capabilities don’t relieve them of the responsibility to use the information appropriately,” said Nik Rouda, senior analyst, Enterprise Strategy Group. “Big Data approaches can lead to some amazing benefits for individuals and society, but only if the information is adequately protected and used in an appropriate manner.”

Industry vs. Government Regulation

When the problem of ethics in business rears its head, one of two things tends to happen. Either the industry itself comes up with some kind of standards, or the government steps in. If the latter takes place, you sometimes see the pendulum swing too far the other way to where business itself is overly restrictive.

So how much of a groundswell is there for creating an industry standard around the gathering and usage of collected data? Revolution Analytics surveyed attendees at the 2014 Joint Statistical Meetings (JSM), the largest gathering of statisticians in North America, on the subject of ethics standards. The 144 respondents represented fields such as education, non-academic research, corporate IT/analytics and consulting fields.

Almost all of the data scientists and statisticians agreed that consumers should be concerned about privacy issues and that ethics should play a part in research, said David Smith, chief community officer at Revolution Analytics. About 43 percent said that ethics already plays a part in their research. As for creating an industry standard on ethics for collecting and using data, 42 percent said this was a good idea, while nine percent felt there should not be a standard and that ethics should be examined on a case-by-case basis. Another 3 percent said that ethics should not play any part in data research.

Lee sees a place for such a standard, up to a point. “Standards and rules can be useful when hard boundaries must be set and not crossed,” he said, noting that protecting against identity and intellectual property theft or misappropriation or misuse of confidential information are key areas that must be safeguarded.

“However, it is difficult to shape behavior and practice with rules — particularly in the micro-interactions that often underlie the dynamics of Big Data — when it is not clear which fragmentary pieces are contributing to harm. It may be much more effective to use standards, guidelines and rules to identify, delineate and guard against harmful outcomes,” he said.

Lee is clearly leery of an overly burdensome standard or — worse — restrictive regulations such as those in countries like Germany which seek to prevent customer data leaving its borders. But there appears to be a growing consensus that something has to be done. One problem with standards, though, is that they tend to take a long time. And due to the dynamics of human committees, the results can be far from ideal.

“My instinct is that industry standards are often late and toothless,” Rouda said.

That said, he’d prefer an accepted set of principled best practices over a Wild West anything-goes mentality that some say is the case today. If messy public flaps persist, the subject of data gathering and usage will be soured in the popular imagination. Such public discontent can lead to government intervention.

“I’m generally not a fan of government regulations for the undue burdens and unintended consequences they can bring, yet someone needs to clearly define a fair approach and enforce it,” said Rouda. “In an ideal world this would be a natural consideration, but capitalism dictates any advantage can and will be exploited unless specifically prohibited.”

Big Data Backlash

Part of the problem may be a lack of education. Once a few flaps hit the headlines, the public can easily turn completely against the idea of Big Data, despite ignorance of what data science is, the benefits of data collection and the advantages of advanced analytics.

“I think the ethics challenge and concerns come up as a public gut reaction to technologies they don’t understand,” Rouda said.  

Rouda thinks companies should take a lead in showing how information is used in a simple and engaging way. Take the subject of end-user license agreements and privacy policy disclosures. Currently, said Rouda, they tend to be written in incredibly verbose legal-ese which overwhelms people rather than educating them. Despite the fact that a ton of information is being collected by websites without any obvious notification, most companies bury a “how we use your information” clause somewhere within a lengthy and incomprehensible disclosure. Most people can’t and won’t dig for this information, and some can feel betrayed when they learn the truth.

Further, consumers are basically blackmailed in that they must accept terms without question or they can’t view a website, purchase online or download the software. So somewhere in all this, companies must try to understand user causes for complaint and take them into account. Failing to address these issues will hurt companies and the subject of data science as a whole.

“Companies that do not safeguard their customers, act in trustworthy ways, or deliver value that justifies and exceeds the exchange they transact with their customers will suffer in terms of reputation, business and results,” Lee said. “This is as true for Big Data as it is for ‘small’ data.”

Drew Robb is a freelance writer specializing in technology and engineering. Currently living in Florida, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).

Drew Robb
Drew Robb

Drew Robb is a writer who has been writing about IT, engineering, and other topics. Originating from Scotland, he currently resides in Florida. Highly skilled in rapid prototyping innovative and reliable systems. He has been an editor and professional writer full-time for more than 20 years. He works as a freelancer at Enterprise Apps Today, CIO Insight and other IT publications. He is also an editor-in chief of an international engineering journal. He enjoys solving data problems and learning abstractions that will allow for better infrastructure.

More Posts By Drew Robb