Big Data Gets Personal

Your shopping cart is watching every swipe you make.

Illustration by Charis Tsevis

I have these healthy friends who I often run into at Kroger. Every time our paths cross in the aisle I seem to have the same two items in my cart: beer and mayonnaise. I walk away from our brief exchange slightly embarrassed, wondering why I can’t run into them with a cart full of fresh produce. Getting busted buying lipids and booze, I’m self-conscious because the contents of my shopping cart suggest something about me. Later, I step up to the self-checkout and pass my Kroger Plus Card over its ruby-red laser.

The computer’s female voice intones a soothing “Welcome, valued customer,” confirming that I am special enough to receive exclusive discounts on select items, thanks to my participation in their customer loyalty program. I quickly scan the mayonnaise and hide it in the shopping bag behind my beer. Next I scan a bottle of spicy V8 and a coupon darts tongue-like from the slot: “Save $1.00 off any TWO (2) V8® 100% Vegetable Juice!” I stick that in my wallet, almost certainly to be forgotten. Finally I pay and take the paper ribbon of my receipt as one might take up a length of unruly rope, hand over hand, a tally of Kroger fuel points accrued and additional offers listed beyond my purchases.

A 2014 study by Canadian marketing firm Bond Brand Loyalty found that 60 percent of millennials said they would change where they shop because of the presence of a quality loyalty program.

Later, when I look at the offers that come with my receipts, as well as in mailings Kroger sends me, I get an uneasy twinge not dissimilar to that embarrassment I felt in front of my healthy friends. The loyalty card saved me several bucks, but it also recorded the contents of my shopping cart in a distant database, which Kroger uses to target me with ads. Mayonnaise seems to crop up frequently in this correspondence, as well as products similar to those I habitually use. Should I be buying that extra-strength deodorant? What are they trying to tell me?

Gary Walters, a heating and air conditioning technician from Kentucky, is also less than keen on the collection of his personal data. Whenever possible, Walters pays cash. And when he shops at Kroger, he uses a Kroger Plus Card he found lying on the ground. (For a while he used a card belonging to someone who had died. That card stopped working, so when he noticed a loose card in a Frankfort, Kentucky, parking lot, he picked it up.)

Walters worries that his collected data has the potential to be misused. “There’s that law of unintended consequences,” he says. “You have people who sign up for these cards so they can save a little money, but then somewhere down the line it could come back to bite them.” You just don’t know, Walters says. What if his health insurance company got hold of his receipts and saw that he was at risk for a certain disease based on his shopping habits? “I don’t want them to be able to go back and check and see that I bought two-and-a-half gallons of ice cream every week for the last 12 years,” he says. From his point of view, the implications of vast stores of our personal data are simply unknown. What if he applied for a job, and they had information that kept them from hiring him? Not that a stranger’s Kroger Plus Card is any guarantee of complete privacy—his transactions are still all linked to the card’s unique number. But it makes it more difficult, as he puts it, to connect the dots. “It puts more space between them and being able to know who I am,” he says.

Walters has a nagging sense that our data might somehow be used to judge us; that in surrendering data, we surrender control. In a world where data breaches aren’t exactly unheard of, this ought to give us pause, although on the whole most data is carefully guarded and encrypted. Laws restrict what institutions can do with the data they collect. They can’t just pass it on without our permission. Right?

Spend time downtown and you can’t help notice that a British company with a melodious, uncapitalized name and American headquarters in Cincinnati is doing rather well. dunnhumbyUSA is doing so well they’ve built new $100 million-plus digs at Fifth and Race, dark and futuristic, like something out of Blade Runner. dunnhumby collects and analyzes data on more than 350 million people world-wide. They’re here first and foremost because they’ve been a “joint venture partner” with Kroger since 2003, looking into the crystal ball of Kroger’s customers’ data to gain insights the grocery chain can put to use. They also work closely with other local companies, including Procter & Gamble and Macy’s.

dunnhumbyUSA uses our data for everything from customizing marketing campaigns to improving product placement in stores, streamlining our movements aisle to aisle, and even identifying markets for yet-to-be invented products. It’s their job to see that their clients’ customers keep coming back. They cull your loyalty card data to zero in on your favorite products, the ones you’re willing to go out of your way for, to pay a little extra for. And according to the company’s website, they help their clients “reward the behavior they seek.” So that info you’re handing over at the checkout isn’t just getting you a discount. It’s giving them the means to get you to do what they want.

In 2012 alone, dunnhumby netted $1 billion. Clearly, there’s money in your data, wheelbarrows of it. And now, Tesco, a former client that bought a majority stake and now owns the consumer data firm, has fallen on hard times—nothing to do with dunnhumby, to be sure—and is putting dunnhumby up for sale. Buyers are lining up and can expect to pay about $3 billion. Who will buy has been the subject of much speculation, with some analysts wondering whether Kroger might be interested in acquiring the company it relies on. But according to Kroger spokesman Keith Dailey, regardless of who owns dunnhumby, Kroger will continue to have access to the data firm’s insights and analytics. Others speculate that global ad giant WPP might buy the company. With 40 percent of dunnhumby’s revenue coming from retailers and 60 percent from selling data to consumer goods companies like Coca-Cola, you can see how an advertising company would be interested. And you can also see how all that data isn’t exactly stationary.

dunnhumby isn’t anywhere near the only company doing this. As of 2011, global firms Acxiom, Alliance Data Systems, Experian plc, Infogroup, and WPP held some of the largest databases of consumer information. WPP’s “Xaxis” database was said to be the largest, only to be surpassed in 2012 when the Conway, Arkansas–based Acxiom Corporation announced it had amassed 50 trillion data transactions for 500 million customers, a database that included the majority of American adults. In the world of customer data, size matters, but what’s really important is what you can do with it. As stores of information multiply, so do the challenges of making sense of them.

I e-mailed dunnhumbyUSA and asked if they would tell me more about what they term the “customer science” involved in loyalty programs like Kroger’s. They asked whether I’d spoken to Kroger. When I called Kroger’s media spokesman, he politely declined on the grounds that they had a lot going on in the coming months. I followed up with dunnhumbyUSA’s representative via e-mail. Sorry, she wrote, “Kroger, dunnhumby’s joint partner, has suggested that we not participate in this story.”

Fair enough, I thought. Nobody ever got rich just giving their goods away. Except that these goods begin with us, and we do give them away. We trade our data by the terabyte for cents off the dollar. What are we thinking when we hand the stuff over? Or are we simply not thinking?

I got on the Internet and asked my Facebook and Twitter acquaintances, everybody I’m social-network-hooked-up-with, whether they harbor a strong opinion one way or the other about their personal data and what it’s ultimately used for. I got a couple of likes on Facebook, but with the exception of one person, nobody said anything. The one friend who did weigh in said it made her feel a little creepy, this data collection, a sentiment echoed in informal conversations I’ve had at the bar. Further, she remains unconvinced that using her loyalty card actually gets her a better deal, having once gone out of her way to a station that participates in the Kroger Fuel Program, which incentivizes shoppers by giving them bargains at the pump. She got a 10-cent-per-gallon discount, only to discover that gas was 10 cents cheaper at a nonparticipating station nearby.

Critics of loyalty card programs allege that the deals they dangle before us aren’t deals at all. Further, that they cost stores more money, and those costs are then passed on to us. While you can put a dollar figure on things like data and customer loyalty, they’re protean, difficult to nail down. Still, some Kroger Plus Card customers are sure they’re getting a bargain for their data.

I first heard about Rod Alexander, an adjunct associate professor in the accounting department at the University of Cincinnati’s Carl H. Lindner College of Business, from his wife, Katie. Such was her husband’s dedication to squeezing a deal out of Kroger’s Fuel Program that he had on occasion made her follow him to the gas station with a second, on-empty car to cash out their points before they expired.

Alexander is a numbers man if ever there was one. Even his speech is meticulous; precise enough to help a mathematically challenged listener like me understand his detail-oriented deal wrangling.

“I once paid $2.17 for 35 gallons of premium gasoline,” he told me proudly. And this was well before OPEC’s oil ministers opened the great oil faucet, flooding the land with $2-a-gallon gas. Certain items, including gift cards, earn you double fuel points, and occasionally they even run deals where they’ll net you twice that. Alexander watches and when opportunities arrive, he pounces, racking up points, then mambos all the way to the pump. Kroger limits a single fuel purchase to 35 gallons, but that’s a lot of gas. So Alexander also fills his cars’ trunks with empty cans. He’s accumulated quite a collection, he says.

Obviously, Kroger wouldn’t offer big discounts unless they were getting something in return. Whatever the case, they call it loyalty, Alexander calls it easy money. I ask whether it bothers him that they collect data when he uses his card. He concedes that, thanks to their targeted marketing, he probably spends a little more at their stores than he would otherwise.

Just how much data would you be willing to share, I ask him, for a deal? Is there some line you’d be unwilling to cross, wherein this quid pro quo becomes a no-go?

Probably not, he says, because as he sees it he’s buying what he would anyway, unlike those thrifty coupon-clippers he used to see zigzagging through the aisles in search of items they wouldn’t otherwise put in their cart. Alexander might be an extreme example, but he illustrates that in the data-for-deals exchange, a savvy customer can see their way to the benefits.

While dunnhumby didn’t get rich giving away their secrets, they’re fairly open about what they do. Their executives publish a wealth of articles on the latest trends in data science. And they’re not all that secretive about what goes in and what comes out of the analytical black box. The company even provides sample data sets for educational purposes: source files of dummy numbers, “data that replicates the patterns of real, in-store data” for budding data scientists to practice on. The largest of these downloadable data sets consists of 300 million transactions over 117 weeks for about 500,000 distinct customers buying 5,000 distinct products from about 760 different stores. Open any of the tables that comprise this massive relational database and you can see the categories into which information gets sorted with each swipe of a loyalty card.

Therein resides data’s real value to its diviners: Not only does it allow them to intimately get to know us; it allows them to know the secrets we keep even from ourselves.

There’s the time of the transaction, the place, the type of store; the products purchased and the sizes of the packages. They record where the product was located in the store, and how many items you bought of certain brands. They record a customer’s “life stage,” your “price sensitivity” (less affluent, mid-market, up-market, unclassified), household size, marital status, whether you rent or own. They even discern the type and scale of the shopping trip—were you focused on fresh produce? Was it a quick trip in for milk or an expedition to restock your pantry for the month? And of course, they tally the coupons and offers we redeem, as well as where those offers came from, demonstrating our susceptibility to specific types of marketing—which incentives worked, which didn’t. Given enough behavioral data, dunnhumby sees what is actually happening, and based on that data, builds models to test improvements and predict outcomes.

Professor Jeff Camm holds a degree in math from Xavier, and uses mathematics to solve problems. From his post at UC’s Lindner College of Business, where he has taught for 30 years, he’s watched the rise of analytics in the private sector, studied how mathematical models are used to boost efficiency and optimize processes for just about everything: distribution, resource allocation, even product design. Analytics really took off, he says, between 2005 and 2007. Part of what a company like dunnhumbyUSA does, he explains, is called “market basket analysis,” looking for correlations between what ends up side-by-side in our shopping carts.

“What if you know that every time someone buys beer, they buy an expensive can of cashews?” asks Camm. “So now you give them a little markdown on the beer and you know when they come into the store they buy cashews, and the profit margin on the cashews is high. You’re giving the shopper what he or she is looking for but you’re selling more than you would have sold otherwise. A lot of this is not new. What’s new is we have more data now and we have better tools for analyzing it.”

The trouble is that with enough data and excellent tools, even our most intimate secrets begin to come to light, including our sex lives. Target infamously demonstrated this when it used data to determine which of its shoppers were pregnant based on changes in their shopping behavior, as reported by Charles Duhigg in a 2012 New York Times article. We tend to stick to our habits, but major life changes like having a baby present rare windows of opportunity for retailers and merchants eager to sell us something we never needed before. For a brief moment, our needs and desires are up for grabs. An analyst at Target was able to identify certain changes in buying behavior which correlated with pregnancy. They started with shoppers’ data collected by the chain’s baby registry and looked at changing trends in their behavior as they approached their due date. For instance, women in their second trimester frequently bought significantly larger quantities of unscented lotion. And in their first 20 weeks, pregnant women often stocked up on supplements containing calcium, magnesium, and zinc. Armed with a workable model that assigned customers a pregnancy prediction “score” and could even estimate their due date, Target sent baby-product offers to the home of a teenage girl, much to her dad’s chagrin.

If this sounds personal, it is.

dunnhumbyUSA CEO Stuart Aitken recently told an industry conference in Las Vegas that the old practice of profiling customers—putting them into categories—is useless. “Just because I’m the same age as you, live next door and have 2.2 children, doesn’t mean we have the same preferences,” he said. Why fumble around with generalities when you have access to hard data and the processing power to extract the specifics of each individual’s experience? Such is dunnhumby’s skill at absorbing data and personalizing it that Kroger calls its direct mailings “snowflakes.” No two are alike. Their stores of individual customer data run long and deep. Nishat Mehta, executive vice president of global partnerships for dunnhumby told Forbes, “We make decisions not based on what you bought today but what you have bought over the last two years. We can recommend a product you buy every four months. You don’t have to know, but we know.”

Therein resides data’s real value to its diviners: Not only does it allow them to intimately get to know us; it allows them to know the secrets we keep even from ourselves.

“That’s the whole point of collecting the data,” says Camm. “We’re not necessarily lying; we just don’t always do what we think we’re doing. But when you’re collecting the data, it is what it is. I might think I eat healthy. But if you look at what I’m purchasing, it says otherwise, right?”

As cofounder and president of the dating website OkCupid, Christian Rudder had unprecedented access to the private information of millions. He’s the author of Dataclysm—Who We Are (When We Think No One’s Looking), a book that is an expansion of his blog, OkTrends. Also a mathematician, Rudder began to study and post statistical analyses of his online daters’ preferences and behavior. Access to the back end of a dating site exposes some bare truths that do not flatter humanity. When Rudder began to compare what users wrote in their dating profiles with how they actually searched, hooked up, and interacted with one another, stark disparities emerged. Many who adamantly denied having a racist bone in their body showed in their data a clear bias against African-Americans. And while most men claimed to be attracted to women their own age, when they rated women in secrecy—or at least what they thought was secrecy until it turned out Rudder was looking over their shoulders—those men, regardless of age, almost universally rated women in their 20s as the most attractive. A dating site sells a very different product than a grocery chain, but it gathers information in a similar way: users volunteer it.

Over and over, Rudder’s observations corroborate Professor Camm’s point: We might not intentionally delude ourselves, but our self-vision isn’t anywhere close to 20/20. And when it comes to what we tell others, we do, in fact, lie. Which is why big data is to market research what the laser-guided missile is to the spear.

“Companies and the government are collecting disparate pieces of your private life and trying to fashion them back into an image they can master,” Rudder writes. “The more privacy you lose, the more effective they are. The fundamental question in any discussion of privacy is the trade-off—what you get for losing it.”

That trade-off is very much on the mind of Jack Kennamer, CEO of the Cincinnati startup LOC Card, which has developed a universal loyalty card—a single piece of plastic that works at as many businesses as you link it to through a one-stop, online interface. Kennamer says his company sometimes refers to what it does as a dating service. They make it easy for customers and retailers to hook up and, should the relationship go south, they ensure an easy break up. Like online daters, LOC Card users get to pick and choose. They decide what the retailers associated with their loyalty card can learn, and what types of promotions they, the customers, will receive.

Although it’s a young business, LOC aims to eventually be the only loyalty card you’ll ever need. The idea came to Kennamer while he was standing in line at a Dick’s Sporting Goods store, Christmas shopping. “I heard this cadence,” Kennamer says. “Do you have a Dick’s Card?” “No.” “Do you want one?” “No.” And then a woman in the very long line ahead of him said “yes.”

“And as soon as she said yes, I thought the guy behind her was going to knock her lights out,” he recalls. All along the line, toes were tapping, but Kennamer was listening. “And I get near the register and the cashier asks the guy ahead of me does he want one and he pulls out his key ring and it looks like a donut and he says, ‘Sorry, no room,’ and puts it back in his pants pocket.”

There has to be a better way, thought Kennamer, and he started looking around to see whether anybody was already offering a solution. He noticed that loyalty program participation was declining. We’d hit a wall as far as how many targeted deals, e-mails, tweets, and Groupons we could absorb. Some of us were simply walking away, abandoning entire e-mail accounts to silently fill, unobserved, with commercial e-silt. People had a renewed interest in keeping their private data private. Women in particular, Kennamer says, were balking at the idea of forking over info to a stranger at the register.

“A year ago,” Kennamer says, “millennials were fast and loose with all their data.” But some research suggests that this key demographic has done a 180-degree turnaround. All of a sudden, “as a group they’re more concerned about privacy than any other group out there,” he says, citing a recent joint study by global trends and brand consultancies Contagious Communications and Flamingo.

It became clear to Kennamer that customers liked convenience, but what they really craved was control over their personal info. LOC isn’t in the business of collecting personal data, only streamlining and protecting the process—although from an analytics standpoint, customer data from across a spectrum of interactions with different businesses would be useful. dunnhumbyUSA has expressed an interest, Kennamer says, should his LOC Card gain traction with retailers and their customers.

What businesses really want, Kennamer says, is to connect with us outside of their stores via e-mail, social media, and online ads. I mention how I’d recently gotten unsolicited e-mails, following transactions on a debit card, from, the makers of those mobile credit card dongles that turn an iPad or iPhone into a swipeable point of sale. The e-mail asked me to rate my experience, offered me more. A loyalty card is one thing, but aren’t transactions on a personal card private?

“Probably you agreed to that,” Kennamer suggests. “It’s like if you have an iPhone, and the user agreement is 80 freaking pages long. Who reads any of that?”

While data privacy laws require banks and credit card providers to give us the opportunity to opt out of data collection, they’re already making inroads into selling consumer data. Needless to say, it pays to read the fine print.

The possibility that my beer and mayonnaise consumption could one day end up in health insurers’ hands isn’t so far out after all. Kevin Pledge, an actuary and CEO of a firm that advises life insurance companies, told The Economist that he avoided using his loyalty card on junk food. The Wall Street Journal recently reported that insurance giant Aviva hired Deloitte Consulting to study consumer-marketing data as a stand-in for blood and urine samples taken for assessing the risk of applicants. Deloitte’s predictive modeling forecasted health risks with accuracy close to that of more invasive measurements. A predilection for salty, fatty snacks and sweets correlates with a tendency toward high blood pressure or diabetes. The consumer data Aviva used came from Equifax Inc.’s marketing services unit, which has since been bought by Alliance Data, whose gargantuan servers process credit and loyalty card data for many U.S. and Canadian companies. Deloitte’s algorithms also made inferences about lifestyles. Subscribers to Runner’s World, for example, were assumed to be more active, and therefore healthier.

According to research conducted in 2014 by Colloquy, a market research firm that specializes in loyalty, there are 3.3 billion loyalty program memberships in the U.S. That translates to 12 active memberships—meaning members who have not just enrolled but are earning rewards—per household.

In their book Big Data: A Revolution that Will Transform How We Live, Work, and Think, Viktor Mayer-Schönberger and Kenneth Cukier point out that while such a practice, were it put into wide use, might result in some of us paying higher premiums, it could also reduce the cost of testing, as well as support preventative measures and, big picture, make insurance more affordable. Besides, who wants to submit to blood and urine samples in the first place?

Insurance companies emphasize the potential for prevention inherent in using consumer analytics for risk assessment. In 2009, the Centers for Disease Control and Prevention used loyalty card data to identify the source of a nationwide salmonella outbreak. Consumer data in a more general form—the search terms we type into Google—has even been used to accurately track epidemics. Google took the most common search terms we use and compared these with CDC data on the actual spread of flu. They processed millions of combinations until they found 45 search terms—phrases including words like “chills” and “fever”—that strongly correlate with the official, recorded progress of the disease. Now, like the CDC, Google tracks the flu, but in real time, without the one- to two-week delay of on-the-ground medical testing.

Still, you can’t get much more personal than medical information, and the recent spate of data breaches has caused a justified spike in our concern for who knows what about whom. Professor Camm says that companies he consults for are spooked. “I’ve never seen concern from executives the way I’ve seen it in the last two years around data integrity and data security,” he told me. “It’s a huge, huge issue.”

According to Michael LaMontagne, Vice President of Research and Analytics at Rockfish, an Arkansas-based digital advertising and marketing agency with offices in Cincinnati, consumers’ trust will come to rely heavily on transparency. Only those businesses that are absolutely clear regarding what they collect, how they secure it, and how they use it, will gain our trust, he told me in an e-mail. So perhaps companies will soon win our loyalty not by discovering our secrets, but by giving up their own.


At the gigahertz rate with which information technology cycles through new ideas, all of the above is already, like, so yesterday. The really valuable data is the stuff we collect on ourselves—“first-party” data collected in real time: information we input into apps to record our food intake or our expenditures, as well as biometric data tracked with wearable tech like the “always connected” Nike+ FuelBand that fitness buffs use to hone their workouts. Devices that disclose your location support “real-time marketing.” Identify your customers’ needs, track their location, and you can identify the optimal moment to push that offer, slip in that suggestion.

While our information makes us vulnerable, it also catches those who come to rely on it in a cycle of dependency. The need for newer, better information escalates as it becomes an essential tool to compete. But if they go too far, if they’re caught misusing it, they lose our trust, our trade, and with that the data that keeps them in the game.

Nishat Mehta, the dunnhumby executive, wrote in a 2013 column for the online publication AdExchanger that first-party data is the new frontier in customer analytics. With the release of the Apple Watch imminent, it seems likely to become for many an everyday accessory, with an array of sensors against our skin, recording our movements and pulse. The correlations between your heart rate and your context will show your reaction to certain products or media, even to other people. As network-connected microprocessors are designed into our cars, kitchen appliances, and clothes, the data stream of the minutiae of our lives will become a deluge, difficult to control. Cyber-security researcher Joe Giron recently discovered that a mobile app for the ride-sharing service Uber sends personal information to the company, including user’s call history and text message logs. The New York Times reported that both Apple and Android apps routinely gather information from smart phone address books.

“Third-party credit card providers,” Mehta writes, “can tell a marketer how much I spend at restaurants, but only I know what food ingredients I ate over the last 24 hours.”

Retailer and service providers can claim they own our data, he goes on, but the information we collect on ourselves, there’s no question: it’s ours. And in a world where data has value, that value exchange will need to be made explicit—not, as it is now, undefined and murkily traded for discounts through loyalty cards or for the free access we enjoy to search engines and social networks like Google and Facebook.

When things reach this level of datafication, Mehta is saying, no one—not us, not the recipients and users of our information—will be able to afford to be anything but perfectly transparent about what they collect and what they do with it. Perhaps we need some master app, akin to Kennamer’s LOC Card, a single secure portal for all the data we create.

In an increasingly networked future, we can’t expect to ever regain full control. The unintended consequences of the information we exchange will forever extend beyond our comprehension. But of this we can be certain: In the interconnected age ahead, we must become our own Big Brother—and our brother’s keeper.


Originally published in the March 2015 issue.

Illustration by Charis Tsevis.

Facebook Comments