Texting a friend about dinner plans. Browsing websites at work. Checking email from home. Traveling with a smart phone.
These are a few sources of ever-increasing sets of personal data released into the world and collected by companies on a daily basis, often unbeknownst to the individual.
This collection and analysis of information is referred to as Big Data. The goal is to synthesize vast amounts of information, often in order to create portraits of specific individuals and what they might want or need.
“It isn’t just data about telephone calls or about retail purchases or about investments,” said Sandy Steier, CEO of New York-based 1010data, a company that specializes in Big Data storage and analytics. “It’s everything that companies do. It’s everything that people do, which of course raises the scary factor somewhat, but it also raises the utility.”
Industries from technology to advertising, healthcare to government are abuzz about the ability to draw new insights from their data. As computer processing power has increased and the price of data storage has fallen, huge pools of data can now be captured, analyzed and paired with other sets of data from still more sources.
Companies have long been interested in capturing and analyzing data, but the possibilities for that data and its value have exploded in recent years. While it’s difficult to estimate the size of the industry because of its scope and variable definitions of what qualifies as Big Data, interest in data has never been greater.
At 1010data’s midtown headquarters, a flat screen TV displayed a visualization of how information might spread through users on a cell phone network – a constellation of interconnected green dots simulate how a messages passes from users to their connections.
With about $90,000 of computer hardware and specially designed software, the company was able to analyze two year's worth of texts and calls on a phone network for 1.8 million users.
“This is opening the door up for all sorts of cool analysis we couldn’t do before,” said Afshin Goodarzi, 1010data’s vice president of analytics.
The phone company can now use this information to approach advertisers, offering them ability to target ads specifically to well connected people within their network.
When this targeted messaging is done right, it's convenient, according to Oded Netzer, a marketing professor at Columbia Business School.
“If not done right, then the information I get will be totally irrelevant for me, and that’s when consumers get pissed off,” Netzer said. “That’s when the consumer starts get worried about the information being shared.”
Legally, most companies can collect, sell or share the data they have on their consumers, as long as they tell the consumer what they’re doing with their information (though there are more specific requirements for data regarding children, as well as medical or financial information). That could mean personal characteristics, like age or gender, or transactional data, like purchase history or cell phone GPS locations.
But Netzer says consumers are getting more comfortable sharing their data – as long as they feel they’re getting more benefit than they’re giving up in information.
That benefit could come through better, more customized ads and recommendations, or through better treatment.
In the healthcare industry, vast pools of data can be used to develop new drugs, diagnostics or protocols. It can also offer researchers and clinicians the ability to tailor existing treatments to individuals based on the genetic components of their condition.
“We try and leverage very, very large-scale data of very, very deep complexity to probe questions about how biology works and that can be used to help patients,” said Andrew Kasarskis, co-director of Mount Sinai’s Institute for Genomics and Multiscale Biology.
Recently, Kasarskis and his team – part biologists, part data analysts – gathered around a white board to brainstorm how to use information at hand to understand the condition of an 18-month old with a liver condition.
They had sequenced DNA from the baby’s parents and from roughly 1,500 other liver samples, as well as data on known liver conditions tied to specific genes. And they hypothesized the condition was heredity because his parents have already lost three children to similar symptoms.
“If we can figure it out and maybe understand it, then hopefully we can come up with a treatment specifically for this child,” said Dr. Melissa Wasserstein, director of the Program for Inherited Metabolic Diseases at Mount Sinai’s School of Medicine. “And hopefully in the future for other children with a similar kind of thing.”
The promise of Big Data writ small.