Sunday, May 20, 2007

ANOVA woes??

It is amazing how people walk into interviews for analytics positions and proudly proclaim they know ANOVA. ANOVA is basic statistics and its knowledge is a must if you are applying for a position that requires you to be familiar with at least undergraduate statistics.
The case in point is a gentleman who walked in for an interview for a senior position - proclaimed that when he presented an ANOVA, all the client was interested in were percentage comparisons. Well, percentage comparisons are important, but an ANOVA would have more to do with mean comparisons and presented well, it is accessible to the layman.
Here are the basics:
Say 12 blind children currently under a traditional pedagogical method are assigned to two diffrent new education methods for blind children - one treatment group uses the Bosch method of teaching while the other uses the Ray Ban method (most pedagogical methods are likely to be ill-designed) and there is a third group which continues under the traditional method (whatever that is). This third group we would call the control group. Students are randomly assigned to each of the three different groups and there are 4 students assigned to each of the three groups.

Three months later their IQ scores are taken (or some other test of intelligence is assigned to them and scvores taken). The ANOVA will test the hypothesis of no difference among the groups against the hypothesis of some difference. It will not identify which groups differ and we will come to that at a later date.
So our hypothesis is

Mu(1) = Mu(2) = Mu(3)
where 1 represents the Bosch method, 2 the Ray Ban method and 3 the control group. Separate pair-wise tests have this problem - the alpha is larger than what you would be testing at and at the same time, it cannot be known.
ANOVA gets over this problem by testing all hypotheses simultaneously at a specified alpha.
Do the following:

Step 1 - Calculate the grand mean

Step 2 - Calculate all individual differences from the grand mean and square
them, then sum. This we will call the Total Sum of Squares (TSS)

Step 3 - For each group, calculate the mean. For each group calculate the difference of the group mean from the grand mean and square. Sum all these squares. This is the sum of squares due to treatments (between group sum of squares (SSB)).

Step 4 - Calculate the within group sum of squares (the sum of squares that is not due to treatments or the error (unexplained by the model) sum of squares) as TSS - BSS.

Step5 - Calculate the mean sum of sqaures due to treatments as BSS/J - 1 where J is the number of groups (in this case, 3). Call this MSB
Step 6 - Calculate the mean sum of squares due to error as WSS/n-J where n = number of observations (in this case, 12). Call this MSW

Step 7 - Calculate F=MSB/MSW.

Step 8 - Compare the calculated F above with the tabulated F from the F-tables for (J-1) degrees of freedom in the numerator, (n-J) degrees of freedom in the denominator and at the specified level of alpha (say 0.05)

Step 9 - If calculated F exceeds the Tabulated F for (J-1,n-J) degrees of freedom at the specified alpha level, reject the null hypothesis of no difference between treatments at the alpha level of significance. Otherwise, do not reject it.

Sunday, April 29, 2007

After a long time, books again - "Developing Business Strategy" by Aaker and "The Hidden Power of Social Networks" by Rob Cross and Andrew Parker. Cross & Parker's book is exciting and an introduction to Social Network Analysis in the Organizational Context. Primarily dealing with the need for appropriate connectivity in the knowledge organization - across verticals and hierarchies, merged entities and across organizations.
Draws attention to two types of entities in a network:
1) the centre comprising one or a few individuals who may become bottlenecks 2)the periphery - often non-networked but sometimes important repositories of knowledge.
In the context of the centre they also discuss the disadvantages of over-centralized decision making - people at the higher levels of hierarchy under such conditions can become bottlenecks.
The first chapter "Across the great divide" describes the need for network connectivities across the 4 divides mentioned above. Chapter 2 deals with the "sense-and-respond' organization - "As new challenges and opportunities arise, employees need to know who has relevant expertise - who knows what in the network".
The first example in Ch 2 discusses a case from the professional services industry. A consulting firm sets up a unit whose objective is to provide thought leadership and support to the firm's KM consultants. A network analysis reveals that the group is actually fragmented into a group of strategists & management experts and a group of technical people specializing on the IT side of KM. The two groups are held together by a single person who has knowledge of both domains and who keeps the two groups apart by discouraging interaction between them. Once the problem is understood - steps are taken for change - shared documentation is created by both groups collaboratively, a mixed-revenue sales goals system is implemented and fora created for communication between the two groups.