Tuesday, January 26, 2010

A Discrete Choice Dilemma

I hope to get my hands on Michael Lieberman's module on analyzing discrete choice data with SPSS.

I have been caught up with the CBC dilemma for quite some time now and this gives an opportunity to document where I am. Below is an excellent video on running a discrete Choice analysis with SPSS. A proportional hazards model (Cox Regression) is used. I have not worked through the math here but assume this is equivalent to using MNL with the time variables being appropriately defined. The last time I used a Cox regression, t stood for time to a terminal event in a survival analysis.

I found the approach interesting because of the design - uses once choice task per respondent (each respondent sees each option once) and the fact that individual characteristics are explicitly modeled (as required by the random utility MNL). Watch the video below:

BTW , McFadden's influential 1974 paper is available online at: http://bit.ly/73RmgR. But a much easier read for beginners is the chapter that he references in the video. It is quite detailed and starts with the basics - so if you are already familiar with the basics of logit/logistic regressions, you may skip this. Note that the key to the interpretation is in the odds-ratio which is the exponential of the corresponding beta and you look at whether it exceeds or is less than 1. The interpretation is with respect to the base chosen. The text is available at: http://bit.ly/bC9bbZ .

The problem with using SPSS for discrete choice (either MaxDiff or CBC) is that you cannot look at the individual utilities like, for example, you can when running a traditional constant sum Conjoint with SPSS (where you can assess reversals at the individual level). The data would be analyzed in the aggregate.

If you are using a Sawtooth Software solution with their Hierarchical Bayes module, this becomes possible. For example, in a MaxDiff, using HB, Sawtooth also returns the individual raw and rescaled scores along with an F-statistic which is the root likelihood parameter times 1000. Essentially, this is done by an assumption on the distribution of the part-worths over all respondents combined with a lower-level estimation as per MNL.

This is an efficient way to assess individual cases who may have run through the survey and randomly answered the choice questions (the value of the RLH here would be the same as for a pure chance, random draw) while yet meeting time-stamp criteria that you may have set up for qualifying acceptable data.

The catch, however, is that if you want to look at this by individual characteristics - intuitively, you need to segment the data first before using the HB utility estimation. In other words, one needs to allow for differences in the distribution of the part-worths over different segments of the population.

The Sawtooth application however uses a simple HB model where all respondents are assumed to come from the same population of individual characteristics. This runs counter to a central piece of random utility models. Note McFadden's warning around this early on in his paper. When individual characteristics are not modeled, these characteristics are, in essence, held to be unobservable/un-measurable/absent. McFadden's approach starts with the definition of an individual with measurable attributes (characteristics) faced with a choice decision. In a paper available from the Sawtooth website, Sentis and Li from Pathfinder Strategies  have argued based on their own empirical research, that this does not make a difference to the predictive accuracy of the analysis - predictions on the hold-out were not significantly improved by first segmenting the sample (by demographics and by K-means and Latent class segments) and then applying the HB estimation as compared to running the simple HB model over the entire data. This paper provides little consolation -  the data comes from a single dataset. In fact, this can be something that can be picked up by researchers. We seriously need a body of empirical evidence rather than a one dataset study to validate what is definitely a counter-intuitive conclusion.

Till such time as such complex HB modeling software does not become available (I am talking about user-friendly software here that allows me to set parameters with a simple GUI), we have no choice but to do one of the following:

1) Look at aggregate level analysis and use logit to directly model interactions between individual characteristics and product attributes multiplicatively - use SPSS plus an Excel simulation tool

2) Use HB estimation, assume that the same size fits all (one population, one distribution), and use averages to get at differences by characteristics

3) Plan for large samples, identify characteristics that are likely to impact choice, separately estimate utilities using HB for each sub-population. Note that unlike a Logit, you won't get statistically defined significance levels for the differences.

I would certainly go for 1) or 3).

 

Posted via email from Noumenon - The Wayfarer's Stack

Monday, January 25, 2010

In-depth + Survey for Uncertainty Management

Searching for dynamic heat maps - serendipitously discovered this excellent paper on a application of a qualitative methodology designed to assist managers assess strategic uncertainty which is contextualized here in business-environmental terms (rather than game theoretic terms). The authors suggest using impact and likelihood scores from managers within an organization to assess both, opportunities and exposure, following from strategic uncertainties on each of different uncertainty categories. In step 1, a qualitative approach is proposed to understand the sources of uncertainty followed by impact/likelihood ratings on 5-point scales (I would use 10 to cover greater range and also because likelihood ratings being subjective probability measures are better captured in the wider 10-point scale, especially if you are interviewing business managers). The result is a 16 quadrant map - with the quadrants for exposure being mapped onto the 4 quadrants for potential. Each quadrant then suggests a specific generic strategy for uncertainty management which I have tabulated in the picture above.

Read the paper at http://bit.ly/4BQYJw


Posted via email from Noumenon - The Wayfarer's Stack

Sunday, January 24, 2010

Flip Mining - the FlipKart way

FlipKart is the Indian or desi, as the pop-culture term goes, version of Amazon. The site is great for Indian buyers if you know what you are looking for and you will usually get a discount over Amazon prices. The flip in Flipkart is the data-mining bit. Apparently, the sites data mining capabilities are limited to a primitive word search across fields with no respect for library cataloging, leave alone market basket analysis a-la Amazon. I accessed the site for curiosity and hitting Nancy Duarte's "Slide:ology" returned a number of recommendations on 'similar' books - three of these were from the Nancy Drew series while another appeared to an attempt at Jane Eyre (Nancy by Rhoda Broughton).

Searching the site again for "Presentation" did return a number of books on the subject though the collection is not great going by the recommendations of Nancy Duarte and Garr Reynolds. Interestingly, Guy Kawasaki is cited as an author for PZ though the introduction makes clear that GK only wrote the foreword. Though we understand that FlipKart' ability to do a market basket would be limited by the low traffic for any one title, it's time they did something about their cataloging at least and returned serious results. 

     

Posted via email from Noumenon - The Wayfarer's Stack

On first looking into Duarte's Slide:ology

Nancy Duarte's Slide:ology took up the whole of Sunday - weekends are blessed as I get to read up and this weekend was especially blessed because of my first exposure to a text on presentation skills. I was more fascinated by the design side than the presentation side of things. The book is simply unputdownable.Raced through chapters 1 to 5 and took elaborate notes on Chapter 6 on MS Onenote - much more convenient to structure than Evernote. Chapter 6 is from where the real technical part of design kicks in and Nancy Duarte does a fabulous job of putting it across to non-pros. Dare say I am as excited as a Keats looking into Chapman's Homer for the first time.

Chapter 7 on visual elements is packed with information and worth the second read. This is what I am doing now. Found the explanation of the color wheel along with the classifications of color schemes (analogous, monochrome, complimentary, split-complimentary, triadic, tetradic) especially helpful. Also understood Serif and Sans-Serif, Kerning and Ligatures for the first time. 

Nancy style is very fluid and easy and she liberally references other well-known authors on the subject which is why the book is also very informative in terms of follow-up reading. Discovered the principles behind many of the small things that one already knows from being in the practice - axis of graphs should be aligned, sub-bullets have a smaller font, consistent application of caps and application/non-application of full-stops on bullets, color gradations must be discernible on print, avoid text-heavy slides etc.   

The text also led me to Garr Reynolds blog which led me to Kuler. Tried the color wheel at Kuler to come up with some designs and borrow a few others which I adapted. Reynold's Presentation Zen is third on my reading list now after Nancy's text and Aaker's "Building Strong Brands" (you can get it cheaper from the Reliance Store at Ambience Mall, Gurgaon) .  

Have not managed to get back to the Morgan Stanley report on the Mobile Internet - hope to do so over the week.   

Posted via email from Noumenon - The Wayfarer's Stack