Friday, May 28, 2010

Error on Both Sides

Error happens on both sides in sentiment analysis / opinion mining.

Assume you wanted to understand, quantitatively, if the books published by Penguin are better received/perceived by customers than say, DoubleDay. One way to crack that problem would be to go to Amazon.com and mine the opinions. It stands to reason that books published by one company would receive a higher weighted average star rating, wouldn't they? But what if that was inadequate, and you wanted to understand the general mood and tone of what was being actually written - presumably so you could learn and adjust? What then?

Assume extraction and a well-formatted file. In other words, assume a dataset.

Consider two machines, m and n, that operates on that dataset. They use two different algorithms for summarizing that data into easily digestible dimensions that are actionable. (What are those? That's subject to another post.)

Machine m is a rather naive algorithm. It takes into account keyword group frequencies and then buckets them. It's also the most scalable and chews through the dataset in just a few seconds. It's not taking into account kth nearest neighbor and doesn't make use of Machine Learning (ML) algorithms. Machine m would generate a lot of false positives as a result: it would assume that the sentence "I recommend this book for idiots and morons" is actually a good thing. It contains the cluster "I recommend" and "recommend" after all.

Machine n is more thorough at the cost of scalability. It'll actually bust out sentences, tokenizes words, and classifies everything. In the attempt to rule out false positives, it would begin to generate false negatives. While it would certainly rule out "I recommend this book for idiots and morons", it might also rule out "I recommend this book for the Queen West hipster". (The latter being a totally reasonable recommendation.)

Error happens on both sides - in the attempt to pursue greater accuracy - one might actually be acquiring more error for relatively marginal gains. (And I don't believe that 'error cancels out'. If you've seen it happen, could you show me?)

Thursday, May 20, 2010

eMetrics London Roundup

London was certainly an adventure and a lot was learned. It was a whirlwind 20 hours.

I spent a lot of time listening at eMetrics. There was a very large contingent of search and email marketers there - looking to see what this social media thing was all about. The panels on that topic were geared for that audience and it was particularly interesting to hear how others are explaining the opportunities in social media. I can't/won't share the details of private discussions that happened over the course of the day and next night.

I spent a lot time talking at eMetrics.

I went to London with Three Calls To Action of the Audience:

1. Join the WAA's European Research Committee and be one of the first to know.
2. Advocate within your organization that Word Of Mouth / Social Media Marketing is very effective in customer acquisition and retention.
3. Start with Earned Media Value as the KPI to communicate the value that Social Media delivers.

If there's another point to bring up - it's this: many readers of this space are architects of KPI's and models. Goal Alignment Strategy (GAS) and the underlining Goal Hierarchy (Goal Architecture) contain structure and incentives. Where there are incentives, there will be behavior. And so, as indirect architects of behavior - there's a degree of strategic foresight that is required. A major concern about many operationalizations of Earned Media Value is that they are not architected to engender good behavior. I spent the bulk of the presentation explaining the underlining architecture of Earned Media Value - and why how you define Earned Media Value is a major reflection on what you value as a marketer and an analyst.

You can check out some of the photos here.

I stopped in at the Syncapse Office and rapidly realized why everybody can hear me when I use Skype to call in. (I'll still keep on doing it. Seeing facial expressions while I'm meeting is still very dear to me.). They're an excellent team.

Sadly, I cut the trip short and didn't get to check in on Lepki or my other cohorts in London (Wiertz et al). Next time.

Finally, for those who are interested, I'll be sharing some more personal reflections on product development, evidence based marketing, and a big decision over on my other blog at christopherberry.ca

.

Sunday, May 9, 2010

London, eMetrics, and an Observation

I'll be in London next week for a whirlwind 46 hours, coming for eMetrics and staying for the company.

I'm looking forward to talking to Lovett to carry on where we left off two weeks ago. I'll be talking more about Villanueva and linking in Earned Media Value (EMV), as well as talking about the differentiation in the definition. I'd like to meet up with Andy Lepki who does the analytics over at The Guardian. There are a few tweeple on the list. If you're at eMetrics London too, don't be a stranger. I don't know nearly enough of you.

I took in the British Election coverage on BBC World - garnering scorn for how much of it took up the DVR the next day. (I fell asleep at 1:30am to the delightful droning of polite voices going on about swings and stats.)

I observed, both through the BBC's website, and through the great 3D room, just how the British handle information complexity. The UK's 650 is more than Canada's 308 seats, and naturally, I wanted to see how they handle twice the complexity in the same space. Using this notion of 'swings', they can explain, relatively easily, why different seats change hands. The linkage between a national public opinion sample and the local riding level is made pretty clearly. And, they highlight regional seats and use that model to predict whether or not a seat is at risk. It's not totally predictive - but it's good enough.

There are lessons there. And perhaps the future of analytics. As I've joked in the past - eventually you're going to need an analyst standing in front of a massive animated board with 90 seconds to get a point across. Maybe the future is a 3D room.

So, it's a pleasant coincidence that I'm heading over with the UK on the mind. See you there.