Steve Hein's Emotional Intelligence Home Page


Does Emotional Intelligence Meet Traditional Standards for an Intelligence? Some New Data and Conclusions

Richard D. Roberts

University of Sydney

Moshe Zeidner

University of Haifa

Gerry Matthews

University of Cincinnati


The following is from a pre-publication draft of the article for Emotion, Vol 1, no 3, pages 196-231. Please do not copy from this draft as it may differ from the final copy. Please see the journal Emotion published by the APA, or contact the authors for reprints of the full text. For now I simply share this without my comments or review in order to help present a balanced view of the field of EI at present. I will say that I believe the authors of this article have some valid points and I share several of their concerns. Soon I will be presenting excerpts from another of their articles as well as my own editorial on my own concerns about the Mayer et al model of EI. (S. Hein April 2002)



Start of article

Models of EI

The assessment of EI: Self-report and performance approaches

Issues pertaining to the scoring of EI tests.

EI: Empirical findings

Positive Results

Negative Results

Objectives of the present study

Emotional Intelligence: Relation to individual and group differences

Gender Differences

Ethnic group differences

The relationship between MEIS branch scores and the Big-Five factors.





Does Emotional Intelligence Meet Traditional Standards for an Intelligence? Some New Data and Conclusions

Richard D. Roberts Moshe Zeidner Gerry Matthews


Self-reported emotional intelligence (EI) appears to represent little more than personality, while performance-based measures hold promise. However, empirical information on the latter type of measure is sparse. To redress this imbalance, a multivariate investigation was conducted using the performance-based, Multi-Factor Emotional Intelligence Scales (MEIS). In addition, participants (N = 704) completed the Trait-Self-Description Inventory (TSDI, a measure of the Big Five personality factors), and the Armed Services Vocational Aptitude Battery (ASVAB, a measure of intelligence). Results were equivocal. While the MEIS showed convergent validity (correlating moderately with the ASVAB) and divergent validity (correlating minimally with the TSDI), different scoring protocols (i.e., expert and consensus) yielded contradictory (indeed, sometimes opposing) findings. Reliabilities of sub-scales comprising the MEIS were also poor in some instances, with factor analyses identifying problems in the hypothesized dimensional structure. Overall, it is questionable whether EI (as operationalized by the MEIS) represents a reliable and/or valid construct.


Start of the article

Emotional intelligence (EI) is a relatively new domain of psychological investigation, having recently gathered considerable momentum with widespread, international media attention. Daniel Goleman’s (1995) book on the topic appeared on the New York Times Best-Sellers List, at which instance a Time Magazine article was devoted to detailed exposition of the topic (Gibbs, 1995). More recently, the influential e-zine, Salon, devoted a lengthy article to discussion of its application in the work force. Clearly, this was inspired by a veritable plethora of trade texts (and web-sites) dealing with self-help and management practices, assessment, and organizational-based applications implicit to the concept of emotional intelligence (see e.g., Abraham, 2000; Bar-On, 1997, 2000; Bar-On, Brown, Kirkcaldy, & Thome, 2000; Cooper & Sawaf, 1997; Epstein, 1998; Ryback, 1998; Saarni, 1999; Weisinger, 1998).

EI has prospered, in part, due to the increasing personal importance of emotion management for people in modern society. Indeed, it is commonly claimed EI predicts important educational and occupational criteria beyond that predicted by general intellectual ability (e.g., Elias & Weissberg, 2000; Fisher & Ashkanay, 2000; Fox & Spector, 2000; Goleman, 1995; Mehrabian, 2000; Saarni, 1999; Scherer, 1997). Furthermore, its chief proponents appear to have made strides towards understanding its nature, components, determinants, effects, developmental track, and modes of modification (see Matthews, Zeidner, & Roberts, in preparation, for a critical review).

Historically, ‘emotional intelligence’ first appeared in the scientific literature in the early 1990’s (Mayer, DiPaulo, & Salovey, 1990; Salovey & Mayer, 1990), where the term was used to denote a type of intelligence that involved the ability to process emotional information. Subsequently, it has been proposed that EI incorporates a set of conceptually related psychological processes involving the processing of affective information. These processes include: (a) the verbal and non-verbal appraisal and expression of emotion in the self and others; (b) the regulation of emotion in the self and others, and (c) the utilization of emotion to facilitate thought (see Mayer & Geher, 1996; Mayer & Salovey, 1997; Salovey & Mayer 1990, 1994). Although various authors propose that emotional intelligence is a type of intelligence, in the traditional sense, contemporary research and theorizing lacks any clear conceptual model of intelligence within which to place the construct. For example, Spearman’s (1927) model of g (general ability) affords no special role for emotional intelligence. Neither is emotional or (social, for that matter) intelligence included in Thurstone’s (1938) list of Primary Mental Abilities or Guttman's (1964a, 1964b) Radex Model of Intelligence.

Although EI has captured the public’s imagination during the past five years, the concept's origins trace back to a number of constructs emanating from traditional psychometric models of intelligence. We now briefly examine the construct’s historical roots and conceptual linkages, pointing to similarities and differences between EI and a variety of cognate constructs (i.e., social intelligence, crystallized ability, behavioral cognition, personal intelligence) identified inside influential theories of intelligence.


Models of EI

Models of EI. One of the difficulties currently encountered in research on EI would appear to be the multitude of qualities covered by the concept (see Roberts, 2001). Indeed, many appear to overlap with well-established personality constructs, such as the Big-Five Factor Model (see Davies et al., 1998; McCrae, 2000). Mayer, Caruso, and Salovey (2000a, 2000b) warn that careful analysis is required to distinguish what is (and what is not) part of EI (see also Mayer, Salovey, & Caruso, 2000a, 2000b). Throughout, Mayer and colleagues distinguish between (1) ‘mental ability models’, focusing on aptitude for processing affective information, and (2) ‘mixed models’ that conceptualize EI as a diverse construct, including aspects of personality as well as the ability to perceive, assimilate, understand, and manage emotions. These 'mixed-models' include motivational factors and affective dispositions (e.g., self-concept, assertiveness, empathy; see Bar-On, 1997; Goleman, 1995).

In contrast, Mayer and colleagues have proposed a four-branch, mental ability model of EI, that encompasses the following psychological processes (see e.g., Mayer, Caruso et al., 2000a, 2000b; Mayer & Salovey, 1997; Mayer, Salovey, et al., 2000a, 2000b; Salovey & Mayer, 1990):

(a) The verbal and non-verbal appraisal and expression of emotion in the self and others. EI has been defined as "the ability to perceive emotions, to access and generate emotions so as to assist thought, to understand emotions and emotional knowledge, and to reflectively regulate emotions so as to promote emotional and intellectual growth" (Mayer & Salovey, 1997, p. 5). Inside this definitional framework, the most fundamental level of EI includes the perception, appraisal, and expression of emotions (Mayer, Caruso et al., 2000a). In other words, this aspect of EI involves the individual being aware both of their emotions and their thoughts concerning their emotions, being able to monitor emotions and differentiate among them, as well as being able to adequately express emotions.

(b) The utilization of emotion to facilitate thought and action. This component of EI involves assimilating basic emotional experiences into mental life (Mayer, Caruso et al., 2000a, 2000b). This includes weighing emotions against one another and against other sensations and thoughts, and allowing emotions to direct attention (e.g., holding an emotional state in consciousness long enough to compare its correspondence to similar sensations in sound, color, and taste). Marshalling emotions in the service of a goal is essential for selective attention, self-monitoring, self-motivation, and so forth.

(c) Understanding and reasoning about emotions. This aspect of EI involves perceiving the lawfulness underlying specific emotions (e.g., to understand that anger arises when justice is denied or an injustice is performed against one’s own self or close ones). This process also involves the understanding of emotional problems, such as knowing what emotions are similar and what relation they convey.

(d) The regulation of emotion in the self and others. According to Mayer, Caruso et al. (2000a), the highest level in the hierarchy of EI skills is the management and regulation of emotions. This facet of EI involves knowing how to calm down after feeling stressed out, or alleviating the stress and emotion of others. This facet facilitates social adaptation and problem solving.


The assessment of EI: Self-report and performance approaches

The assessment of EI: Self-report and performance approaches

Although several measures have been (or are currently being) designed for its assessment, it remains uncertain whether there is anything to EI that psychologists working within the fields of personality, intelligence, and applied psychological research do not already know. Moreover, the media hype and the vast number of trade texts devoted to the topic often subsume findings from these fields in a faddish sort of way, rather than deal directly with the topic as defined by its chief exponents. In short, like many psychological constructs, ‘emotional intelligence’ is often loosely defined in the literature, causing considerable confusion among researchers in the field.

Nevertheless, since the term first appeared, there has been a rapid propagation of measures of EI (for a review see Ciarrochi, Chan, Caputi, & Roberts, 2000). Popular measures of EI include the Bar-On Emotional Quotient Inventory (Bar-On, 1997, 2000), the EQ MAP Test (Cooper & Sawaf, 1997), the Schutte Self-Report Inventory (Schutte, Malouff, Hall, Haggerty et al., 1998), the Trait Meta-Mood Scale (Salovey, Mayer, Goldman, Turvey, & Palfai, 1995), and the Multi-Factor Emotional Intelligence Scale (Mayer, Caruso et al., 2000a). The content of these EI measures varies as a function of the different theoretical conceptualizations and interpretations of EI appearing in the literature (Mayer, Salovey et al., 2000a, 2000b). However, many commentators classify indicators of EI according to whether or not these derive from self-reports of typical behaviors in everyday life, as opposed to objective performance in controlled experimental settings. A brief overview and critique of these two distinctive approaches to the assessment of EI follows.

(a) Self-report measures of EI. Self-report protocols have been designed to assess beliefs and perceptions about an individual’s competencies in specific domains of EI (Salovey, Woolery, & Mayer, 2000). These indices generally ask a person to endorse a series of descriptive statements, usually on some form of rating scale. For example, in the Schutte Self-Report Inventory (Schutte et al., 1998) the individual rates themselves from ‘1’ (strongly disagree) to ‘5’ (strongly agree) on 33 statements (e.g., "I know why my emotions change"; "I expect good things to happen"). Self-report measures typically sample a diversity of constructs, and hence assume a mixed model of EI (i.e. as both ability and personality trait), in Mayer, Caruso et al.’s (e.g., 2000a, 2000b) terminology.

A number of problems and serious omissions currently plague the research on EI that employs self-report methodologies (cf. Petrides & Furnham, 2000). These self-report scales rely on a person’s self-understanding; if the self-reports are inaccurate, these measures yield information only concerning the person’s self-perception (rather than their actual level) of emotional intelligence. Self-perceptions may not be particularly accurate or even available to conscious interpretation, being vulnerable to the entire gamut of response sets and social desirability factors afflicting self-report measures, as well as deception and impression management. These problems are, of course, common to all scales based on self-report, including personality assessment. To counteract this criticism in other fields where self-reports are used, researchers have devised a number of procedures, including comparing self-assessed responses to reports provided by a respondent’s peers (see e.g., Costa & McCrae, 1992; Pavot, Diener, & Suh, 1998; Stoeber, 1998). However, validation studies of this type appear not to have been conducted with respect to self-report measures of EI. Hence, whether or not extant scales are free from response biases, and social desirability effects, remains an open, empirical question in urgent need of detailed investigation.

This issue notwithstanding, it is questionable whether items asking participants to self-appraise intellectual ability (e.g., "I am an extremely intelligent student") would make for a valid measure of general intelligence. Under the assumption that EI constitutes a traditional form of intelligence, the usefulness of analogous items about one’s EI seems doubtful (Salovey et al., 2000). Note that past research has reported rather modest associations between self-rated and actual ability measures, with self-report accounting for less than 10% of intelligence score variance. Thus, a meta-analytic review of 55 studies by Mabe and West (1982) yielded a mean correlation (‘validity coefficient of self-rating’) of 0.34 between self-evaluations of intelligence and objective intelligence test scores. More recent studies (see e.g., Paulhus, Lysy, & Yik, 1998), concur that the correlations between self-reports of intelligence and mental test performance tend to be rather modest (about r = .30).

Finally, tests of EI that assess non-cognitive traits (e.g., assertiveness, optimism, impulse control) seem to be tapping dimensions of individual differences that are entirely different from contemporary notions of what constitutes ‘intelligence’ (Davies et al., 1998). Indeed, the information derived from these instruments appears more pertinent to constructs comprising existing personality models (see McCrae, 2000). Empirical data pointing to the substantial relationship between EI and existing personality measures has, curiously, actually been used in support of the discriminant validity and conceptual soundness of EI (see e.g., Bar-On, 2000). For example, a recent study by Dawda and Hart (2000) revealed average correlations approaching 0.50 between measures of the Big Five Personality Factors (i.e., Neuroticism, Extroversion, Openness, Agreeableness, and Conscientiousness) and general EI derived from the Bar-On Emotional Quotient Inventory (see Table 7, p. 807). Noting the relative independence of each of the Big-Five Factors (e.g., Costa & McCrae, 1992), these data suggest that the Bar-On Emotional Quotient Inventory is nothing but a proxy measure of a composite of Big-Five Personality constructs, weighted most strongly towards low neuroticism.

(b) Performance-based EI measures. In view of the foregoing problems associated with the use of self-report measures, several authors have advocated the development of more objective, ability-based indicators of EI (e.g., Mayer, Caruso et al., 2000a, 2000b; Mayer & Salovey, 1997; Mayer, Salovey et al., 2000a, 2000b). According to these authors, ability testing is the "golden standard" in intelligence research, because intelligence refers to the actual capacity to perform well at mental problems -- not just one’s beliefs about such capacities (see also Carroll, 1993). Under this framework, a psychological instrument directly measures ability by having a person solve a problem (e.g., identify the emotion in a person’s face, story, or painting). In addition, the examinee’s answer should be available for evaluation against accuracy criteria (Mayer & Geher, 1996). Consequently, task-based measures engage participants in exercises designed to assess competencies supporting emotionally intelligent skills. The ability-based mode of assessment proposed by Mayer and Salovey (1997) and its underlying four-branch conceptual model of EI, has gained currency, largely because it appears ability-oriented and empirically based. Their four-branch model, described above, is currently operationalized through the Multi-Factor Emotional Intelligence Scale (MEIS; Mayer, Caruso et al., 2000a), and the recently developed Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT; Mayer, Caruso et al., 2000b).

As further discussed below, there is considerable difficulty in determining objectively correct responses to stimuli involving emotional content, and in applying truly veridical criteria in scoring tasks of emotional ability. Proponents of EI ability measures have thus promoted three alternative scoring procedures in order to discriminate right from wrong answers on ability tests of emotional intelligence (Mayer, Caruso et al., 2000a). These are:

(1) Consensual scoring. An examinee receives credit for endorsing responses that the group endorses. Thus, if the group agrees that a face (or design, passage of music etc.) conveys a ‘happy’ or ‘sad’ emotion, then that becomes the correct response. This approach assumes that observations for a large number of people can be pooled and can serve as reliable measures.

(2) Expert scoring. Experts in the field of emotions (e.g., psychologists, psychiatrists, philosophers, and so forth) examine certain stimuli (e.g., a face, passage of music, or design) and then use their best judgment to determine the emotion expressed in that stimulus. Presumably, the expert brings professional know how (along with a history of behavioral knowledge) to bear on judgments about emotional meanings. However, it has been argued that an expert’s assessment may be no more than a reliable indicator of the group consensus, albeit a particularly sensitive one (Legree, 1995). The test taker receives credit for ratings that correspond to those of the experts employed.

(3) Target scoring. A judge (i.e., the test taker) assesses what a target (artist, photographer, musician, and so forth) is portraying at the time they were engaged in some emotional activity (e.g., writing a poem, playing a musical score, painting, sculpting, photography, etc.). A series of emotion rating scales is then used to match the emotions conveyed by the stimuli to those reported by the target. It is commonly held that the target has more information than is available to the outside observer (Bar-On, 1997; Mayer, Caruso et al., 2000a, 2000b; Mayer & Geher, 1996) and is used as the criterion for scoring judges’ responses. Target scoring has received rather little attention in previous research, ostensibly because it is suitable only for emotion identification tasks, and not for other, higher-level aspects of EI. Hence, we will not discuss target scoring at length in the current paper, although it seems promising for some aspects of EI and might be explored further.


Issues pertaining to the scoring of EI tests.

Issues pertaining to the scoring of EI tests. The use of multiple scoring methods in objective assessment of EI contrasts with the scoring of conventional intelligence tests. The logic of facet analytic thinking (see e.g., Guttman & Levy, 1991; Most & Zeidner; 1995; Zeidner & Feitelson, 1989) is that the main criterion for an intelligence task is the application of a true veridical criterion against which one judges a response as correct or incorrect. Often, intelligence test items are based on some formal, rule-bound system that indicates unequivocally whether or not an answer is correct. Various formal systems are used depending on item content, such as mathematics (numerical tests), logic (reasoning tests), and geometry (spatial tests) and the semantics of language (verbal tests). It is also relatively straightforward to determine which individuals are expert in these areas, and so are professionally qualified to act as arbiters. By contrast, items used in early IQ tests that depended on subjective judgment, such as deciding which of several faces was most attractive, have been largely removed from tests, due, in part, to the risk of cultural bias. This is not to say that conventional intelligence testing is entirely free from scoring problems. An anonymous reviewer of this article pointed out that series completion problems such as "2, 4, 6,..??" could be completed in any way at all; use of the simplest rule (add 2) is arbitrary (but consensual). In addition, individual testing, especially of children, may require a judgment on the part of the tester as to whether a question has been correctly answered. Concerns also linger over the extent to which intelligence-testing is truly 'culture-fair', despite efforts to remove obvious sources of cultural bias. Nevertheless, there is generally a clear rationale for justifying the correctness of an answer, and it is rare for well-informed people to dispute the correct answer to an item.

The assessment of EI as a mental ability depends on the presumption that answers to stimuli assessing various facets of feelings can be categorized as correct or incorrect (Mayer & Salovey, 1997). If this presumption is incorrect, no scoring method can meet the basic psychometric criterion for ability tests, namely, the existence of a true and unequivocal veridical standard against which to judge responses. In fact, the likelihood of there being a veridical standard depends on the nature of the EI test item. As with cognitive intelligence, items may refer to psychological processes at different levels of abstraction from raw sense data. EI might, in principle, be assessed through 'lower-order' processes linked to sensation and perception, such as detecting the presence of an emotion in a face stimulus presented tachistoscopically, or deciding that two words had similar valence. Alternatively, EI test items may refer to 'higher-order' reasoning processes, such as choosing how to cope with a stressful encounter.

Mayer, Salovey, and Caruso (2000a) arrange the four branches in a hierarchy beginning with lower-level or basic skills of perception and appraisal of emotion, and finishing, at the highest level, with synthetic skills for emotion management that integrate lower-level skills. Basic skills appear to be those most open to objective assessment (although it is likely that perception and appraisal of emotion also involve high-level inference). For example, facial expression of emotion is sufficiently well-understood (e.g., Ekman, 1999) that objectively scored tests of identification of facial emotion might be feasible. In such a case, expert scoring seems appropriate, and there is no place for consensus scoring. Conversely, items for tests of the Managing Emotions branch are more problematic. Certain emotional reactions may be assessed according to logically consistent criteria only by reference to personal and societal standards (Matthews & Zeidner, 2000; Matthews, Zeidner, & Roberts, in preparation). For example, what is the best or right response to being insulted or mocked by a co-worker? Clearly, the best response would depend on the situation, the person’s experience with insults, cultural norms, the individual’s position in the status hierarchy, and so forth. Even within a single specified situation, it is often difficult to specify the 'best' response -- there are multiple criteria for adaptation that may conflict (e.g. preserving self-esteem, maintaining good relationships with others, advancing in one's career).

None of the scoring methods appears to be very satisfactory for 'higher-level' aspects of EI (which may be those most relevant to real-world functioning). Experts may be able to use psychological research to provide answers (as did Mayer, Caruso et al., 2000a), but there are two fundamental limitations to expert knowledge in this area. First, research typically reveals only statistical rather than directly contingent relationships: for example, being mocked by a co-worker typically (but not invariably) leads to anger. Second, there are multiple domains of expertise leading to conflicting viewpoints. If we put the question of how a child's emotional problems can best be managed to a cognitive therapist, an evolutionary psychologist, a psychoanalyst, a social worker, a high school teacher and a gender studies professor, what is the probability that these experts will agree on a solution? (We might feel fortunate to find agreement among any two of the above). The adequacy of consensus judgments is based on evolutionary and cultural foundations, where the consistency of emotionally signaled information appears paramount (Bar-On, 1997; Mayer, Caruso et al., 2000a). It is argued that the pooled response of large normative samples is accurate (Legree, 1995), although more evidence is needed. Even if so, there are serious concerns about bias in consensus judgment. Consensus may be influenced by non-veridical cultural beliefs, such as the traditional British belief that a 'stiff upper lip' is always the best response to emotional problems. There are also concerns about the validity of consensus judgments that cross gender and cultural boundaries. The popular, 'Venus and Mars' view of gender relations is that men are good at understanding the emotions of other men, but are inept at understanding women's feelings, and vice versa. In the worst case, consensus scoring might simply indicate extent of agreement with cultural or gender-based prejudices.

If we are prepared to set such difficulties of scoring principles aside, perhaps we can proceed pragmatically, rather as Binet did in developing intelligence tests that would discriminate children of high academic potential. Testing EI might well be worthwhile if there is evidence that EI tests are reliable, in measuring some underlying quality accurately, and valid, in predicting relevant criteria better than other tests do. Given that the MEIS is a new measure, it may be inappropriate to stifle research prematurely by applying over-stringent criteria. However, it is essential that there is convergence between different scoring methods, or the construct might be judged as unreliable. Mayer, Caruso et al. (2000a) point out that, as the different criteria represent different perspectives, it is unlikely that they would be in complete agreement. They go on to state that there should be a general, "rough" convergence, which would substantiate the view that emotional intelligence is, in fact, an intelligence. Unfortunately, it is unclear how high correlations should be to attain "rough" convergence, or whether it is satisfactory for correlations to be substantial but considerably less than unity (e.g., in the range 0.50 - 0.70). The pragmatic approach raises the issue of empirical findings using the MEIS, which will be considered next.


EI: Empirical findings

EI: Empirical findings

These theoretical issues notwithstanding, recent research by Mayer, Caruso et al. (2000a) suggests that state-of-the-art objective measures of EI meet the standards of validity and reliability expected of traditional cognitive ability measures. Indeed, although the scientific study of EI has only recently begun, the scant empirical evidence available is contradictory. A brief examination of these conflicting results follows.

EI measures: Positive results. Mayer, Caruso et al. (2000a) argue that standard criteria need to be met before any (new) form of intelligence can be considered to constitute a legitimate scientific domain. They focus on the following three standards, which have been replicated many times in psychometric studies of intelligence (and its taxonomic structure) over the past century (see e.g., Carroll, 1993; Cattell, 1971; Guttman & Levy, 1991; Horn & Hofer, 1992; Jensen, 1998):

(1) An ‘intelligence’ should be capable of reflecting "mental performance rather than preferred ways of behaving, or a person’s self-esteem, or non-intellectual attainments" (Mayer, Caruso et al., 2000a, p. 269-270). In short, this so-called conceptual criterion, asserts that the concept in question be operationalized as a set of abilities (in this case, emotion-related abilities), that have clearly defined performance components.

(2) A (new) intelligence should meet prescribed correlational criteria. For example, tests for different aspects of such an intelligence should be positively intercorrelated. Measures of a new ability should be related to existing psychometric intelligence tests (specifically demonstrating the "positive manifold" phenomenon represented by "a non-negative matrix of correlation coefficients", as prescribed by Guttman’s first law of intelligence (Guttman & Levy, 1991).

(3) Measures of intelligence should vary with experience and age.

It is claimed that available evidence supports the notion that EI meets all three criteria and so is a legitimate form of intelligence (Mayer & Cobb, 2000; Mayer & Salovey, 1993, 1997; Mayer, Salovey et al., 2000a, 2000b; Salovey et al., 2000). With respect to operationalization criteria, EI has been measured by a series of ‘ability’ tasks on state-of-the-art instruments, such as the MEIS, and has been objectively scored using consensus, expert, and (for some scales) target criteria. These criteria are claimed to converge (i.e., were positively correlated) to a satisfactory degree (Mayer, Salovey et al., 2000b). In the Mayer, Caruso et al. (2000a) data, correlations between consensus and expert test scores ranged from -0.16 to 0.95, with half of the 12 correlations exceeding r = 0.52. A median of 0.52 suggests the desired "rough convergence", though it is questionable whether correlations of this magnitude are sufficient to establish a reliable 'common element' to the two forms of scoring. Moreover, Mayer, Caruso et al. (2000a, 2000b) assert that the four-branch model has (more or less) been vindicated by a series of factor analyses, such that the component tests adhere to the stated performance model. Finally, sub-tests comprising the MEIS are generally claimed to exhibit satisfactory levels of internal consistency reliability (see also Ciarrochi, Chan, & Caputi, 2000).

In fulfilling the second criterion, which essentially captures major features of construct validation, measures of EI have been shown to have concurrent validity with cognate measures of EI, such as empathy, parental warmth, and emotional openness (Mayer, Caruso et al., 2000a; Mayer & Geher, 1996), which serve as criteria for validity assessment. Importantly, consensus and target scores appear to correlate to a similar degree with selected outside criteria (e.g., empathy, self-reported SAT scores, decreased emotional defensiveness) in student populations (Mayer & Geher, 1996), although comparability of consensus and expert scores as predictors has been neglected. Other evidence comes from studies using questionnaire measures of EI. For example, this form of EI predicts first-year college students’ success (Schutte et al., 1998). Self-reported EI is also negatively related to alexithymia (i.e., difficulties in identifying, evaluating, describing, and expressing feelings) as measured by the Toronto Alexithymia Scale (e.g., Schutte et al., 1998; Taylor, 2000).

However, the most important construct validation criterion, arguably, is the extent to which EI overlaps with other intelligence(s). In their pioneering study, Mayer, Caruso et al. (2000a) claim that MEIS measures were sufficiently differentiated from verbal intelligence to provide unique variance, but also sufficiently correlated to indicate that concepts underlying the MEIS form an intelligence. Somewhat curiously, the verbal intelligence measure, used in the Mayer, Caruso et al. (2000a) study (i.e., the Army Alpha), is seldom employed in contemporary investigations of cognitive ability. Moreover, another study, using a oft-used measure of cognitive abilities came up with a notably different finding that might be construed as questioning the claim that EI meets the standards expected of an intelligence. In particular, Ciarrochi et al. (2000) found near zero correlations between general EI, measured by total MEIS scores, and the Australian version of the Ravens Standard Progressive Matrices Test (RSPM; ACER, 1989) and negative correlations between an Understanding and Managing Emotions factor and RSPM score!

With respect to their third criterion, Mayer, Caruso et al. (2000a) report that differences in mean EI scores observed for adolescents and adults serve as evidence supporting the developmental criterion. Note, however, that the above study was based on a cross-sectional design and thus allows interpretation only in terms of age group differences -- not developmental -- differences. There is another interesting issue implicit to the issue of developmental differences, raised by consensual scoring. In particular, if one takes the consensus of the younger group, as the measure by which one should score these scales, it remains plausible that these age trends will reverse. In their study, Mayer, Caruso et al. (2000a) actually used an independent adult sample to obtain the consensus scores, meaning that this ‘rival hypothesis’ can certainly not be ruled out. In any event, the developmental criterion espoused by Mayer, Caruso et al. (2000a) is imprecise. In the intelligence literature, a particularly important finding is that certain classes of cognitive ability (e.g., fluid intelligence) actually decline with age (see e.g., Carroll, 1993; Cattell, 1971; Horn & Hofer, 1992). It is difficult to envisage what developmental trend, other than complete insensitivity to age, would call into question the validity of any given measure.

EI: Negative results. Mayer and Salovey (1993) had originally described EI as a type of social intelligence. However, despite much research, the independence of social intelligence from other types of intelligence (i.e., verbal) has not been successfully demonstrated (Carroll, 1993; Cronbach, 1960). Indeed, there is some evidence relating EI to crystallized intelligence, through its mutual relationships with putative measures of social intelligence (Davies et al., 1998). Davies et al. (1998) found a range of measures purportedly assessing EI to have poor psychometric properties. These authors found low correlations among three factors defining the EI construct in their study -- appraisal of emotions in the external world (perception) and appraisal of emotion in the self (awareness and clarity). A positive outcome evidenced in the Davies et al. (1998) investigation was that the perception of consensually judged emotion in external objects represented a clearly defined uni-factorial construct. However, two problems exist with emotion perception as a facet of EI. Firstly, the scales evidenced relatively low reliability. Secondly, consensual scoring may define the factor, rather than emotional content per se. This methods factor issue is an important one, certainly worthy of more careful consideration than has been given this issue to date.

One of the main criticisms subsequently leveled at the Davies et al. (1998) investigation was that EI measures were still in their infancy such that their conclusions appeared premature (e.g., Mayer, Caruso et al., 2000a; Mayer & Cobb, 2000; Mayer, Salovey et al., 2000a, 2000b). Thus, explicitly citing this reference, Mayer and Cobb note that "the Davies et al. study preceded publication of the highly reliable MEIS" (p. 173). The question, which should then be posed, is to what extent do available data support the efficacy of the MEIS, which (arguably) would now appear the premier vehicle for the assessment of EI (see e.g., Ciarrochi et al., 2001)?

In their recent psychometric analysis of scores obtained from the MEIS, Mayer, Caruso et al. (2000a) demonstrated that for consensus scores, reliabilities ranged from 0.49 to 0.94. Indeed, Ciarrochi et al. (2000) found remarkably similar reliabilities, in what essentially amounted to a replication of this study. Some of these reliabilities are not in the acceptable range, certainly for applied use of the measure for selection, intervention, and treatment purposes (see Anastasi & Urbina, 1997). For expert scores, the reliabilities obtained by Mayer, Caruso et al. (2000a) were even lower, ranging from 0.35 to 0.86. Consequently, it would appear that the accuracy of measurement is a function of the scoring procedure. It may well be that the expert scoring protocol is a misguided one, particularly since the test constructors made these judgments alone. Given the variation of reliabilities with disparate scoring procedures, it would appear that more detailed attention needs to be given to investigating the reliability of performance-based measures of EI.

There also seems to be some inconsistency between the theory underlying the MEIS and factor-analytic research attempting to uncover its factor structure. Although the authors claim that a four-factor model underlies the MEIS, exploratory and confirmatory analyses of various data sets suggests only a three-factor solution (e.g., Ciarrochi et al., 2000; Mayer, Caruso et al., 2000a). Interestingly, using confirmatory factor analysis, Mayer, Caruso et al. (2000a) found that the model estimates of Branch-2 and -3 facets correlated 0.87 and thus correctly asserted that these two facets are virtually indistinguishable. The facets of Understanding and Assimilation of emotion coalesced into a single factor. This noteworthy finding would seem important to replicate in a comparably large and relatively homogeneous sample.

Finally, the overlap between self-report scales of EI and existing personality scales, represents a serious challenge to the conceptualization of EI as a cognitive ability rather than a personality trait and may also extend to performance-based measures. In short, it is unclear whether EI tests possess discriminant validity with respect to existing measures. Because no study has yet examined the relation between performance-based EI measures and the Big-Five personality factors, this would appear an issue worthy of detailed investigation in its own right (see also Mayer, Caruso et al., 2000a). In addition, the dependence of validity coefficients on the various scoring methods has been neglected. Mayer, Caruso et al. (2000a) only report correlations between consensus-scored MEIS scales and criteria such as empathy, with no indication of whether similar correlations were obtained with expert-scoring.


Objectives of the present study

Objectives of the present study

In light of the preceding review, which highlights several contradictory findings, the present study attempts to provide further information that is pertinent to a balanced evaluation of the empirical and conceptual status of EI. To this end, the most comprehensive and contemporary performance-based measure of EI -- the Multi-Factor Emotional Intelligence Scale (MEIS) -- was examined (e.g., Mayer, Caruso et al., 2000a). While it is possible to focus on any number of research questions bearing on the MEIS, it seemed expedient (because the measure is relatively new) to focus on the following objectives of relatively major significance:

1. Is the construct of EI, as assessed by the MEIS, psychometrically sound? In particular, this study sets out to examine the factorial validity of the MEIS, using both exploratory and confirmatory methods. Thus far, the one confirmatory factor analysis conducted with performance-based measures of emotional intelligence (Mayer, Caruso et al., 2000a) yielded rather equivocal results, including marginal fit-statistics and evidence that two branches (e.g., Understanding and Assimilation) could not be differentiated. In addition to exploratory factor analyses, the current study employed structural equation modeling procedures to test the goodness of fit between the four-branch model of EI and the data. In addition, we examined sub-test reliabilities and the patterns of inter-test correlations.

2. Do the two different scoring criteria used in the MEIS (i.e., consensus and expert scoring) demonstrate convergent validity? Do they yield similar reliability coefficients? The Mayer, Caruso et al. (2000a) model predicts a 'positive manifold', or a nonnegative correlation matrix among the sub-tests, supporting three converging factors associated with Emotion Identification, Assimilation/Understanding Emotions, and Managing Emotions. The same factors should be found using both scoring methods, as they are construed as alternative (yet analogous) scoring protocols. Following Mayer, Caruso et al.’s (2000a) decision to focus almost all of their reported analyses on consensual scoring, Ciarrochi et al. (2000) conducted an investigation where no consideration was given to expert scores. Arguably, both studies highlight the need to examine alternative scoring procedures in close detail. In the present investigation, all responses were scored using both consensual and expert criteria, allowing us to determine the convergent validity of these measures. Thus, one of the major goals of this study is to examine in greater depth the relationship between consensus and expert scoring and to ascertain any problems inherent in these two ways of scoring behavioral measures of EI. Mayer and colleagues are not clear as to whether they believe these two forms of scoring are directly equivalent or more loosely related. Indeed, they generally encourage consensus coding for reasons of its facility. In addition, we examine the personality and individual differences correlates of the two scoring procedures.

3. What are the relationships between EI, personality, and ability factors? Put somewhat differently, to what extent does EI vary as a function of personality and intelligence constructs? Is the pattern of relations between EI and personality variables invariant across the type of scoring criteria employed? Based on prior research by Mayer, Caruso et al. (2000a), and the notion of divergent validity, the principal prediction is that EI should relate modestly to general cognitive ability. Mayer and colleagues have not specified the likely personality correlates of MEIS scores. Based upon past empirical research (e.g. Dawda & Hart, 2000), we might expect EI to relate to higher Agreeableness, Conscientiousness, Extroversion and Openness, and to lower Neuroticism. Associations should generalize across scoring criteria.

4. What is the nature (and magnitude) of gender, ethnic, and age differences in performance-based assessment of EI? The strongest prediction from previous research (e.g., Mayer, Caruso et al., 2000a) is that EI should be higher in women, irrespective of the scoring method employed. In addition, we assess to what degree individual and group differences vary with the type of scoring criteria used.


Emotional Intelligence: Relation to individual and group differences

Emotional Intelligence: Relation to individual and group differences

Gender differences. Table 11 presents descriptive statistics for the MEIS consensus- and expert-based branch and composite scores by gender. When consensus scores were used, females (M = 1.51, SD = 6.59) scored higher on composite MEIS scores than their male counterparts (M = -0.22, SD= 6.82), t (691) = -2.10, p < 0.05, by the order of about a quarter of a deviation (d = -0.26). In contrast, when expert scoring criteria were used males (M = 0.19, 1.51, SD = 6.00) scored higher than females (M = -1.30, 1.51, SD = 6.8), t (691) = 2.01, p < 0.05, by the order of a quarter of a standard deviation (d = 0.23). Thus, the direction of gender group effects varies as a function of the scoring criteria used.


Ethnic group differences

Ethnic group differences. Table 12 presents descriptive statistics for the MEIS, consensus- and expert-based branch and composite scores, by ethnic group. When consensus-based scores were employed, white majority (M = 0.28, SD = 6.58) and minority (M= -0.63, SD = 7.12) group participants were not reliably discriminable. However, when expert scoring criteria were used, majority group participants (M = 1.16, SD = 5.86) scored significantly higher than their minority group counterparts (M = -2.43, SD = 5.98), t (688) = 7.39, p < 0.001, by the order of over half a standard deviation (d = 0.60). Partial correlations between ethnic group membership and composite MEIS scores, holding ability constant, indicated that the partial correlations, r (655) = 0.22, p < 0.001 were not meaningfully different from the bivariate correlations, r (688) = 0.27, p< 0.001. These results suggest that the relationships between ethnic group and MEIS composite scores are not mediated by traditional cognitive abilities.


The relationship between MEIS branch scores and the Big-Five factors.

The relationship between MEIS branch scores and the Big-Five factors. Table 14 presents the correlations between MEIS consensus and expert scores (branch and total) and the Big-Five factors. As shown, total MEIS consensus-based scores correlated significantly, though modestly, with each of the Big-Five factors -- correlating negatively with Neuroticism, (r = -0.18) and positively with Conscientiousness (r = 0.16), Agreeableness (r = 0.24), Extraversion (r = 0.13), and Openness (r = 0.13). In contrast, MEIS expert-based scores failed to correlate significantly with any of the Big-Five factors, save for Openness (r = 0.15). However, when ability scores are statistically controlled using partial correlation procedures, the correlations between Openness and total expert-based MEIS scores approach zero (r = 0.05). Thus, the observed correlations between MEIS scores and the Openness factor may be mediated by cognitive ability.




Mayer, Caruso et al. (2000a) contend that if EI is to constitute a legitimate form of intelligence it should satisfy three criteria: operationalization as an ability, appropriate correlational properties, and incremental developmental trends. The data presented in this paper may be considered to provide only equivocal support for the first and second of these criteria. The third criterion was not a major focus of this study, but, as we have already pointed out, is not a necessary condition for many traditional forms of intelligence. Certainly, comparison of the present data with those from studies of self-report measures of EI (e.g. Dawda & Hart, 2000) suggests that the MEIS ‘performs’ better, in that it seems distinct from existing personality measures. Unfortunately, consensus- and expert-scored emotional intelligence also appear to be distinct (and in some cases independent) from each other. Factor correlations are insufficient to suggest that the two forms of scoring provide alternate measures of the same underlying construct. Consensus- and expert-scored scales differ also in their relationships with sex and ethnic group. Validity is demonstrated in some respects by the correlational data. Again, however, the two forms of scoring appear to support constructs that show only partial overlap, as evidenced by the lack of consistency in the linear associations between the two sets of branch scores and the personality and ability measures.

In the passages that follow, we consider in more detail the answers provided by the present investigation to three questions posed in the introduction, which relate to the psychometric adequacy of the MEIS, group differences, and personality and ability correlates. Throughout, we also touch on the fourth issue raised therein -- the extent to which consensus- and expert-scoring protocols converge. We conclude by identifying the key issues that will determine whether EI comes to be seen as a real addition to our understanding of personal qualities, or as a construct that is largely illusory. An important caveat is that the MEIS is a relatively new instrument still under development. Indeed, it is now being replaced by a new measure (MSCEIT) that may prove to have stronger psychometric properties. However, its theoretical underpinnings are identical to that of the MEIS and data published so far seem similar to those obtained with the MEIS (see e.g., Mayer, Caruso et al., 2000b). It is likely that the present results should generalize (in intelligence research, older forms of a test are expected to correlate substantially with newer forms), but we remain guarded in extending present findings to the newer performance-based EI measure.

Psychometric status of MEIS scales

Some features of the psychometric analyses support Mayer, Caruso et al.’s (2000a) claim that emotional intelligence meets criteria for an ‘intelligence’. We replicated the finding of a 'positive manifold' between sub-tests of the MEIS, and, generally, the pattern of correlations corresponded well to the Mayer, Caruso et al. (2000a) findings. Exploratory and confirmatory factor analyses showed broad similarities with Mayer et al.’s factor solutions, although there were some differences in detail, and, in the exploratory analyses, sub-scale communalities were often low. In fact, the confirmatory analyses tend to support Mayer et al.’s initial conception of four branches of EI, rather than the three-factor model that has subsequently been derived.

However, other aspects of the data render many of the EI concepts more problematic than is acceptable of ability measures. This is no small point since, beyond research purposes, the MEIS (or its derivative the MSCEIT) may be attractive to practitioners as a selection device. In particular, the reliability of sub-tests that form the highest branches of the model, and are thus probably the most important components of the MEIS for prediction of real-world social behaviors (e.g., Progressions, Managing Others), are among the poorest in this battery. In addition, intercorrelations between sub-tests, while resulting in 'positive manifold', are notably lower than is common in research involving cognitive ability measures (compare for example the data presented here with various data sets presented in Carroll [1993]). Further still, various factor analyses indicate a structure that is relatively unstable, certainly when compared to similar analyses that have been conducted with intelligence and personality measures.

Perhaps the most severe psychometric difficulty is the lack of convergence between expert- and consensus-scored dimensions. There are instances of agreement, especially for the Blends and Progressions tests, but in general, cross-correlations are too small to suggest convergence. The correlation between the general factors extracted from each of the two data sets was only 0.26. The correlations between corresponding Perception factors (r = 0.00) and Emotion Management factors (r = 0.48) seem to fall short of even the rough equivalence claimed by Mayer, Caruso et al. (2000a). The correlation for Understanding factors (r = 0.74) is more satisfactory, but still falls short of the correlations of 0.80, 0.90 or better that we would expect for alternate forms of a cognitive IQ test. To put these figures in perspective, consider that the cross-correlation places an upper bound on the reliability -- i.e. the extent to which two test versions are measuring the same underlying construct. The standard error of measurement (SEM) of a scale score is calculated as SDÖ (1 - r), where r is the reliability and SD the standard deviation (1 for factor scores). Hence, for r = 0.48, SEM is 0.72 SD, and thus the 95% confidence interval for a true score S would be S ± 1.41 SD (i.e. 1.96 ´ the SEM). For r = .74 the interval is S ± 1.00 SD. For an IQ test (M=100, SD=15) with an r of 0.48, the score of a person at the mean would be expected to vary between 79 and 121 on different occasions of testing -- not a satisfactory state of affairs. In the case of the MEIS scales, it is likely that scores on each version would fluctuate less, in practice, because each form of scoring may identify non-shared, unique variance that will raise reliability. However, the illustrative data highlight that measurement of whatever is common to corresponding expert- and consensus-scored factors does not meet normal standards for reliability. High standard error of measurement adversely affects the use of the instrument to compare individuals and to predict criteria, severely constraining practical utility. It is difficult to see how the degree of correspondence could be improved to an acceptable level by minor changes to tests or scoring procedures. In the case of the general factor, and two of the branches, it appears that expert and consensus scoring assess substantially different personal qualities.

A further difficulty derives from the patterning of loadings on the consensus and expert scored general factors. In cognitive abilities research, a ubiquitous finding is that the lower-order sensory processes encompassing the domain of human intelligence (e.g., auditory reception, tactile-kinesthetic perception) share lower loadings on the first principal component than do higher-order cognitive processes (e.g., inductive and deductive reasoning) (see e.g., Carroll, 1993; Horn, 1988; Roberts et al., 2001). As discussed in the introduction, EI also encompasses both lower-order and higher-order processes, arranged hierarchically by Mayer, Salovey et al. (2000a), implying that factor loadings for EI branches should vary with level in the hierarchy. In short, if EI really is a type of intelligence, then Emotion Perception should show the lowest loading on a general EI factor, while the Management branch should have the highest loading (with Assimilation and Understanding Emotion factors lying in between). As shown in Table 7, we found a similar result to Mayer, Caruso et al. (2000a) and Ciarrochi et al. (2000). That is, the highest loadings come from the Emotion Perception sub-tests, with Managing Emotion sharing amongst the lowest factor loading on the general EI factor. (This finding is, in any event, prefigured by the rather low correlations that Management of Emotion branch tests share with traditional cognitive abilities). The fact that exactly the opposite occurs with the MEIS to that expected of traditional cognitive abilities is a significant point and it has perhaps been ignored for too long. If EI is to truly be regarded as a type of intelligence, should it not mimic many of the replicable findings demanded of an ‘intelligence’? If not, might not the MEIS general factor be seen as some lower-level attribute of emotional processing, rather than an ‘intelligence’?

Group differences

We qualify our remarks on group differences by noting that military samples may not be representative of the general population. Nevertheless, the divergence of consensus- and expert-based scores is a cause for great concern. Within the field of cognitive abilities, what is known of gender and ethnic differences is that the patterns underlying them consistently emerge for scoring procedures (and, indeed, dependent variables) that are widely disparate. For example, Jensen (1998) has shown that salient group differences extend beyond measures of accuracy to measures of processing speed, with recent evidence indicating this might also hold true for putative indices of meta-cognition (see e.g., Pallier, Wilkinson, Danthiir et al., 2001). In the case of performance-based EI, there is considerable overlap in the procedures used to derive objective scores but the final results are equivocal. For example, do we take the evidence presented in this paper to indicate that, in the US Air Force, males are more emotionally intelligent than females or that females are more emotionally intelligent than males? Even though the sample of women was relatively small, and perhaps atypical of women in general, a reliable test should give an unequivocal answer to such a question.

The experts who provided the criteria for that scoring procedure were White males (with high ‘verbal’ intelligence). It is troubling that gender, ethnic, and intelligence differences exist in expert scores (favoring groups to which the experts belonged) and that the obverse was often true when consensual scoring procedures were implemented. Certainly, this finding calls into question the objectivity of expert-based scoring protocols that currently comprise the MEIS. It also raises the issue of how the data used for consensual scoring should be constituted. Mayer, Caruso et al. (2000a) point out that their consensus group (also used here) was biased towards women (67%), although they also present analyses of one sub-test suggesting that the gender difference in favor of women was not caused by differences in male-female criteria.

Personality and ability correlates of the MEIS

In favor of the hypothesis that the MEIS assesses an intelligence, branch scores correlated moderately and positively with general intelligence. Recently, Roberts et al. (2001) have provided data indicating that the ASVAB AFQT score is a particularly good indicator of acculturated knowledge (or crystallized intelligence). Thus, the relatively substantial correlations between this measure and the Assimilation, Understanding, and Management branch-scores are exactly as might be predicted of a cognitive ability that has close links to social intelligence (see e.g., Carroll, 1993; Davies et al., 1998). Equally, lower (yet positive) correlations between the Mechanics and Administration ASVAB composites and MEIS scores is entirely consistent with the fact these represent broad cognitive abilities that have relatively weak relationships with crystallized intelligence (i.e., broad visualization and clerical-perceptual speed, respectively) (see Roberts et al., 2001).

It is interesting that the association between MEIS total score and intelligence is substantially derived from the Understanding branch. For consensus scores, the bivariate association between Understanding and AFQT score explains 14.3% of the variance in the intelligence measure, and the remaining three branches explain a further 1.0% of the variance only. Similarly, for expert scores, Understanding explained 14.0% of AFQT score, with the other three branches contributing an additional 4.3%. The sub-tests of this branch have a somewhat intellectual character, in, for example, being able to define complex emotions in terms of simple ones (Complex Blends). Other sub-tests, such as those requiring understanding typical emotions and transitions, may have some commonality with crystallized intelligence tests that aim to assess general knowledge and practical judgment, such as the WAIS-R Information and Comprehension tests. One concern about this branch is the extent to which it assesses abstract, explicit knowledge of emotions -- of the type that might be acquired from an undergraduate psychology course. EI might reside in the more contextually-bound, and possibly implicit, knowledge, that supports action in real-world situations.

As noted in the introduction, self-report assessment has consistently resulted in substantial correlations between EI and personality (e.g., Dawda & Hart, 2000). This has led several researchers to question whether emotional intelligence can contribute anything to the science of psychological assessment that is not already encapsulated by personality theory (e.g., Davies et al., 1998). The correlations obtained here with Big Five personality factors were qualitatively similar to those found with self-report EI measures. However, the relatively small magnitudes of the associations between the Big Five personality factors and each of the MEIS branch-scores also offer promise to the conceptual independence of performance-based EI. Performance-based EI shares relationships with measures of personality that resemble the relationships that ‘traditional’, objective measures of intelligence share with personality. Both EI and traditional intelligence tend to correlate only modestly (i.e., seldom exceeding 0.30) with personality scales. Furthermore, correlations tend to vary across the type of intelligence domain (i.e., primary mental ability or branch) assessed. Multiple regression analysis identified two relationships that seem both meaningful and consistent across scoring methods, between (1) Management and Extraversion, and (2) Understanding and Openness.

Although the magnitude and direction of correlates was broadly as expected, these data also raise problems. Again, there were discrepancies between expert and consensus scoring. Consensus scores were generally more predictive of personality in both bivariate and multivariate analyses. The multiple regressions showed that some branch scores, especially Emotion Perception, showed significant relationships of opposing sign in analyses of expert- and consensus-based data. In consensus-scored data, the person adept at identifying emotions is somewhat unintelligent, agreeable, and extraverted, but, in expert-scored data, these qualities are associated with poor emotion perception. Consensus-scored Emotion Management was a relatively strong predictor of Conscientiousness and Agreeableness, a result consistent with expectation, but these relationships were near zero in the expert-scored data. In some cases, the branch-score data show a lack of coherent associations with criteria. In the consensus-scored data, it is unclear why Emotion Perception should be negatively related to general intelligence, when Assimilation and Management show positive relationships, in the multiple regression analyses. Likewise, there is no obvious rationale for agreeable individuals being emotionally intelligent with respect to Perception and Management but unintelligent with respect to Understanding. At the least, the ability model of EI requires a more detailed account of the role of personality.

Perspectives on emotional intelligence

The data analyses raise both some immediate problems with assessment of EI using the MEIS, and some more fundamental issues about the contribution of research on EI. It is likely that some of the psychometric problems we have indicated are soluble through the normal processes of test refinement. There seems to be no barrier in principle to the development of internally consistent scales whose intercorrelations support a replicable three- or four-branch factor solution. However, the data also highlight the more fundamental problem of differences between scoring methods. Dimensions derived from expert and consensus scores fail to correlate well, at both scale and factor levels, and it is especially disturbing that the general factors extracted from the two types of data correlated at only 0.26. Furthermore, expert- and consensus-based scores give very different patterns of association with group and individual difference factors. The discrepancies are sufficiently large that they imply that one or other scoring method should be discarded, in that it is hard to envisage modifications that would bring factors that are correlated at less than 0.50 into alignment.

An optimistic perspective on emotional intelligence would consider the problems of different scoring protocols as surmountable. Indeed, they would appear the type of conceptual problem that plagued Binet, Simon, Wechsler, and their predecessors historically, when they first tried to develop intelligence tests (see e.g., Gregory, 1997). In the emotional domain, it may be inappropriate to insist that test items should have rigid, unequivocal right and wrong answers. Currently, Salovey et al. (2000) recommend a consensus-based approach to scoring the MEIS on the basis that large normative samples tend to be reliable judges of emotional questions. More generally, consensual scoring (or its derivatives) has had great survival value, witnessed in chronicles from the first century, AD (e.g., gladiators in the Coliseum) on up through to the present day, where it is through this process that many of our political leaders are elected. Given that intelligence is often equated with the ability of the organism to adapt to its environment (see e.g., Sternberg, 1985; Thorndike et al., 1921) the ecological validity of this scoring procedure per se should not be underestimated.

However, there is a further twist to the role of consensual agreement. In general terms, we would expect a person whose beliefs match the group consensus to be better adapted to the social environment than someone with deviant views. The conformist is likely to receive more positive reinforcement from others and to negotiate challenges shaped by social convention such as finding a job and life partner more easily. In other words, consensual ‘emotional intelligence’ may be adaptive not because it refers to any cognitive advantage, but because of the social advantages of conformity. Consistent with this suggestion, people with generally desirable personality characteristics such as Conscientiousness and Agreeableness seem to be perceived by others as better at emotion management, as evidenced by the consensus-based data here, but this link is not evident in the expert-scored data. A ‘conformity’ construct is of real-world relevance, but it is highly misleading to label it as an ‘intelligence’, because it relates to person-environment fit, rather than to any characteristic of the individual. Indeed, in some instances it is the nonconformist who should be deemed emotionally intelligent: for example, a writer or artist who finds a new and original way of expressing an emotion.

Another possibility is to develop some hybrid scoring protocol. For example, deriving consensual scores from the corpus of experts/professionals who read this journal, or who are members of organizations such as the International Society for Research into Emotions, seems feasible and conceptually justifiable. Within this hybrid model, it is expert consensus that forms a theoretically defensible scoring criterion, assuming that problems of bias associated with the demographic characteristics of experts may be avoided.

Against these proposals, we might wonder whether an intelligence may ever be satisfactorily defined by consensus, even expert consensus. We have previously discussed the difficulties in principle in deciding on the 'best' or most adaptive response to an emotional encounter (Matthews & Zeidner, 2000; Matthews, Zeidner & Roberts, in preparation). If there is no 'right answer', consensus is of no validity. Furthermore, because consensus methods are largely a function of the particular group assessed (cultural, ethnic, age, or gender), what might be the consensus and modal response for one group may not be so for another group, making scores relatively incomparable across different groups (ethnic, national, etc.) of examinees. There are also difficult issues about the legitimacy and 'ownership' of expertise: many disciplines and sub-disciplines may claim primacy. It is also questionable whether well-educated, professional people should solely set standards for emotional intelligence.




Mayer, Salovey et al. (2000a) deserve much credit for formulating a novel, clearly-articulated model of EI, and seeking to operationalize its constructs as ability tests through careful construct validation studies. The view expressed here is that, despite the merits of this project, there are significant measurement problems to be overcome. Perhaps inevitably, given the level of interest in EI, criticism of work on EI has already invoked strong emotions in many of its chief exponents and critics. For example, Salovey et al. (2000) equated the conclusions reached in Davies et al. (1998) with the French Academy of Sciences decision, at a time when logical positivism was growing, to destroy all meteorites housed in museums because they were ‘heavenly bodies’, and heaven did not exist. We aim to conclude with a balanced appraisal of the promise and problems of 'ability-model' EI research. >From a positive perspective, emotional intelligence is a real quality of the person distinct from existing personality and ability factors, best measured by performance-based tests such as the MEIS. The problems raised here may be essentially technical problems to be solved by advances in test construction.

The sceptic may prefer another astronomical metaphor. In 1877, the Italian astronomer Schiaparelli announced that his telescope revealed linear channels on the surface of Mars. His observation inspired Percival Lowell to map hundreds of ‘canals’ in fine detail, and to enthrall the general public with his popular lectures on the construction of the canals by a Martian civilization. There are various suggestive parallels with EI. Respected scientists, who made important contributions to other areas of astronomy, reported the initial observations. The ‘canals’ were not completely fanciful; Mars does have surface markings (of an irregular nature). An elaborate empirical and theoretical artifice was constructed from fairly modest beginnings, and popular interest was fired by excessive speculation. It remains to be seen whether emotional intelligence, like the canals of Mars, is the product of the tendency of even expert observers to see, in complex data, patterns that do not exist.

At this early stage of research, it is premature to label emotional intelligence as either a real 'meteorite' or an illusory 'Martian canal', but the present study has identified significant issues that require resolution, related to both reliability and validity. The most severe problems relate to scoring (i.e., reliability), including the difficulty of justifying both expert and consensus scoring, the limited psychometric convergence of these methods, and their differing relationships with other criteria. At the least, further progress requires a single scoring method with a strong rationale that what is being measured is a form of cognitive ability. In particular, it should be demonstrated that items have better and worse answers with respect to some objective criterion (although human judgment may be required to assess how well answers meet the criterion). The difficult problems of group differences and possible cultural and gender bias must also be addressed. The data provide some support for Mayer, Salovey et al.'s (2000a) 'correlational criteria', for example, by demonstrating only modest correlation between EI and general intelligence. The data also showed some overlap with personality constructs. Although performance-based EI measures appear to be free of the redundancy with existing personality scales that plagues questionnaire measures, the validity coefficients for the MEIS also appear to be typically small and often less than 0.30 (see Mayer, Salovey et al., 2000a). It is unclear whether the predictive validity of the MEIS would be maintained with personality and ability controlled statistically. Finally, it is not established that emotional intelligence as operationalized by the MEIS is a major construct on a par with general intelligence as a determinant of individual differences in meaningful, real-world behavior. There may well be further ‘primary ability’ dimensions such as emotion perception that are poorly assessed by existing instruments. However, to assert that these abilities are as important in human functioning as general intelligence is to go far beyond the available data. An urgent task for future research is to show real-world adaptive advantages for high scorers on the MEIS, over and above those they obtain from their higher general intelligence and their personality characteristics.


Steve Hein's Emotional Intelligence Home Page