3 Tips for Tackling Big Data Collaboration

Joe Stanganelli

Updated · Jul 01, 2014

Few organizations experience as many Big Data problems as the health and life sciences sector. Companies in this vertical regularly deal with petabytes and exabytes of data (much of it unstructured), with the zettabyte just around the corner. They also place a heavy emphasis on collaboration – not just among their own employees, but with external research partners. 

Given this, data integration, management and accessibility are among this sector’s perennial concerns. At this year’s Bio-IT World Conference, IT and R&D panelists from across fields like genomics, pharmaceuticals and other health and life science industries gathered to address some of these questions.

Here are three tips they offered on how to manage Big Data collaboration.

Pick Your Collaboration Partners

According to Sebastien Lefebvre, director of Biogen Idec’s research and development IT platform, the concept of data collaboration in IT has evolved beyond the precepts of a traditional tightly-integrated partnership between two companies. Effective collaboration today is polygamous – and orchestrated with a ruthless efficiency.

“First, find your partners,” Lefebvre told attendees, “then second, [ask] what are they going to do for you?” To answer this question, he advises making an honest assessment of each partner’s capabilities. “They may have the skills and talents you are looking for, but [not] from the IT perspective.”

Lefebvre also recommended parsing out responsibilities and strategies among your partnerships to best leverage different assets for the greater needs – your needs – of the collaboration. As he put it: “I’m going to use this partner to do something; I’m going to use this partner to do something else.”

Your partners should approach their collaborations in the same fashion – for the best interests of all involved. “You are basically a component of a wider network,” Lefebvre said.

This utilitarian philosophy extends beyond external collaborators – indeed, to all business partners and vendors.  Multi-cloud solutions, for instance, are becoming increasingly popular – especially as cloud heavies like Amazon and VMware escalate their cross-compatibility offerings. It’s “a sensible approach to managing risk,” wrote Kevin Casey for Dell’s Tech Page One.

Still, as both the panelists and Casey point out, IT polygamy can lead to IT headaches.

Get Your Own Big Data House in Order

“The bottleneck is always – always – on the other side of where you are,” Anastasia Christianson, head of Translational R&D at Bristol-Myers Squibb, related to the crowd – to much laughter.

It’s a joke, but she’s not really kidding.  One of the major problems of Big Data, the panelists agreed, is that organizations fail from the beginning to properly manage and integrate their data for their own employees – making data management and accessibility nearly impossible for outside collaborators.

“What you must realize is that everything that happens internally must happen externally,” Lefebvre told the workshop audience.  “‘You ask yourself a question: ‘Did we do it well internally?’  …If not, [how] will it work well externally?”

Lefebvre noted that this is especially true of information security. Data improperly secured internally – whether on-premises or via a cloud solution – will not be secure for external collaborators (and may even compromise their data by virtue of the collaboration). The implication, Lefebvre said, is that collaborative organizations must employ a variety of well-vetted and accessible standardized tools, such as identity access management (IAM) utilities, content management platforms, R&D “find and share” services, pattern matching and linguistic analysis.

Taxonomies remain a significant obstacle in enterprise collaboration.  What means one thing to one person in one part of the world may mean something else to a different person in a different part of the world.  The goal, then, is to leverage IT solutions to be more expansive.

“How can users organize data and collaborate on different segments?” Sebastian Wernicke, director of Seven Bridges Genomics, rhetorically asked the workshop audience. “The only possible solution is to have an extremely flexible architecture.”

Take Coding out of Collaboration

On the other hand, Christianson pointed out, “The competing requirements are flexibility and standardization.”  The dictate of a standardized data management environment is often at odds with flexibility – but both are necessary for accessibility.

The whole point of collaboration is to get the benefit of the knowledge and contributions from all involved.  Any stymying of a single collaborator – any compromise of accessibility – compromises the entire project.

Kate Blair, director of Product Management for Seven Bridges Genomics, described her own experiences with this dilemma, in an on-site conference interview with Enterprise Apps Today.  Like many clinical researchers, Blair, a scientist with a clinical research background, is no programmer. “[Working with] the command line…was a brick wall,” she said.

Indeed, forcing non-programmer collaborators to try to wade their way through coding is counterproductive – both for the project goals of the collaborative organizations and for the individuals themselves. In her field, Blair said, the “currency” is research papers published in scientific journals. “You don’t get [that currency] for doing the plumbing,” she said.

To counter this problem, Seven Bridges developed a standardized, cloud-based GUI platform for clinical research collaboration.  For added flexibility, the company integrated its platform into an on-premises turnkey “cloud-in-a-box” appliance.

“Anything that can be run on the command line can be turned into a node [on the Seven Bridges platform],” Blair explained. Her reaction to the first time she worked with the GUI? “Oh my god!  This I understand!”

Joe Stanganelli is a writer, attorney and communications consultant. He is also principal and founding attorney of Beacon Hill Law in Boston. Follow him on Twitter at @JoeStanganelli.


  • Business Intelligence
  • Data Management
  • Healthcare
  • Research
  • More Posts By Joe Stanganelli