Startup Spotlight: Qubole's Hadoop-as-a-Service
Qubole, a company founded by Facebook alumni, aims to get Hadoop into the hands of business users to give companies more Big Data opportunities.
If you want to make a CEO's eyes glaze over, talk to her about Hadoop. While techies and developers can kill an afternoon debating the merits of different Hadoop distributions, non-techies tend to be put off by its complexity.
Making Hadoop easier for average business people to consume is the aim of Qubole, a Mountain View, Calif.-based startup founded by former members of Facebook's data team. "Hadoop is a great data platform that can really scale, but it doesn't have the interfaces you need to put the power of the thing into the hands of users," said Ashish Thusoo, Qubole's co-founder and CEO.
While at Facebook Thusoo and Qubole co-founder Joydeep Sen Sarma created SQL interfaces to run on top of Hadoop, integrated it with the company's existing tools and added more self-service tools. The project "worked wonders," Thusoo said, making data a central part of many business discussions. "We thought, 'this is the way agile companies should be consuming data,' and we wondered if we could recreate that magic for other companies."
Some prominent venture capitalists are betting they can. Qubole earlier this month raised $13 million from Norwest Venture Partners in a Series B funding round, bringing its total funding to $20 million.
The company offers a Big Data cloud platform called the Qubole Data Service that enables customers to create on-demand Hadoop clusters and to grow and modify them as needed via a self-management module. The solution also includes a workbench for adding SQL interfaces and connectors for moving data into and out of different systems.
After years of hype, Thusoo said the idea of Big Data is finally starting to see real momentum in the enterprise.
"In the past few years a lot of the discussion around Big Data involved technology. But now we are seeing conversations shifting more to use cases," he said. "There are a number of industries with many fundamental use cases around ad targeting, for example, and they are starting to talk more about it. That makes it much more real."
Some of Qubole's earliest customers are media and advertising companies, Thusoo said. One prominent customer is Pinterest, which migrated its Hadoop jobs from Elastic MapReduce (EMR) after it began experiencing stability issues with Amazon's Big Data platform.
Pinterest's Big Data Play
Writing on a company blog, Mohammad Shahangian, a Pinterest data engineer, said that Qubole is stable at petabyte scale and offers a higher throughput than EMR. More important, Qubole has "made it extremely easy to onboard non-technical users."
Pinterest uses Big Data "to put the most relevant and recent content in front of Pinners through features such as Related Pins, Guided Search and image processing," Shahangian wrote, and also to run experiments and analysis on proposed changes.
The company logs 20 terabytes of new data each day, and has around 10 petabytes of data in Amazon's Simple Storage Service (S3). According to Shahangian, Pinterest has 100-plus regular MapReduce users running more than 2,000 jobs each day through Qubole’s service.
"We have six standing Hadoop clusters comprised of over 3,000 nodes, and developers can choose to spawn their own Hadoop cluster within minutes. We generate over 20 billion log messages and process nearly a petabyte of data with Hadoop each day," Shahangian wrote.
Many Qubole customers, like Pinterest, already have a lot data in cloud, Thusoo said. In the past nine months he's seen more traditional enterprises getting into the cloud. "I believe the market is moving from early adopters to early majority," he said.
Much of the data currently located in on-premise data centers is generated by Web or mobile applications, and "you can easily change those pipes to move that data to the cloud," Thusoo said. "The problem boils down to 'how do I move internal data to the cloud?' and I think there are a lot of products to do that. It becomes an easier problem to solve than saying 'I have to first move my data into my data centers and then move it to the cloud.'"
Companies that have done a first-generation of on-premise Hadoop clusters "have determined it is hard to operate this beast" and are ready to consider the cloud option, he said. "In terms of flexibility, agility, ease of use and simplicity, the cloud option is so far removed from the on-premise distro-based option that people are willing to take the pain to make the transition."
Fast Facts about Qubole
Founders: Ashish Thusoo and Joydeep Sen Sarma
Funding: $20 million, with investors including Norwest Venture Partners, Lightspeed Venture Partners, Venky Harinarayan and Anand Rajaram
HQ: Mountain View, California, with an office in Bangalore, India
Product: Big Data platform called Qubole Data Service
Customers: Pinterest, Quora, MediaMath, TubeMogul, Answers.com, Videoplaza, Pubmatic and others
Ann All is the editor of Enterprise Apps Today and eSecurity Planet. She has covered business and technology for more than a decade, writing about everything from business intelligence to virtualization.