Ethics in Language Testing
by Glenn Fulcher, University of Surrey

Moral dilemmas are not new. As long as we live in societies, we will all be faced with them. And a moral dilemma can easily become a crisis. In the 4th century BC this is precisely what happened in Athens. The Sophists were teaching that morality was a set of rules invented by the rulers to keep the ruled under control. Nothing more, nothing less. If ethics is merely a matter of social convenience, behaviour can only be regulated by convention. And convention may differ from society to society, community to community. This is ethical relativism.

Since the second world war, Western society has once again entered into a period of moral and ethical relativism. The most significant philosophical shift of the last 50 years can be characterised as a movement from asking what knowledge is, to what knowledge means. Knowledge in Postmodernism is a product available for consumers, and consumers have the right to select any kind of knowledge they wish to "know". Theories, especially "big" theories, are very much out of fashion. There is no explanation for "everything." The new shopping mall is the Internet, where knowledge is available in many shapes and forms for the choosing. Similarly, we are now free to choose our own ethical base, our own values.

The postmodern writer Jaques Lacan argued that humans are essentially linguistic beings. Building on semiotic theory, he argued that language exists as a structure and that we enter into it. Our self-identity is therefore a fiction, and so morality is essentially a linguistic construct of the society into which we are born. "Reason" has been the basis of moral philosophy since Plato, but reason and the language of logic is now seen as just another linguistic game. Reason does not rule. All values are relative, and we may select those we wish to follow from the range on offer as if buying a new pair of shoes - discarding them next week if they don't fit well, or are unsuitable for a particular walk. Ethics in a postmodern world is local, temporary, and without a logical base.

It is not surprising that the moral problems of the late 20th Century have finally caught up with applied linguists and language testers. The 19th Language Testing Research Colloquium in 1997 was held on the theme "Fairness in Language Testing", and issue 14, 3, 1997 of Language Testing was a special volume on ethics in language testing, edited by Alan Davies. Hamp-Lyons (1998: 324) suggests that the current interest in ethics in language testing stems from the fact that language testers have had a "positivist" approach to their discipline: that "the object of our enquiry really exists." In a postmodern world, it does not. It is a fiction. Nevertheless, Hamp-Lyons claims that it is the understanding that there are no absolute ethical principles that lead us to engage with moral philosophy to look at "fairness" in testing. She raises questions like these:

· If tests are used to keep people out of countries, jobs or education, but this was not their stated purpose when designed, is it the responsibility of the test developers?

· Are testing organisations (ETS/UCLES) responsible for the use of test scores to make decisions that they say they should not be used to make.

· Are testing organisations responsible for any damage caused by cramming schools that earn large amounts of money out of providing test practice, rather than teaching?

Ultimately, Hamp-Lyons cannot answer these questions. This is because she holds that "the growing interest in ethics reflects a post-modern concern with self-evaluation and self-reflection." (ibid., 329). Within this framework, the only answer can be "for me it's okay, for you it isn't, and next week we might change our minds." The requirement that researchers provide evidence for validity may be an Anglo-American quirk, that has no bearing whatsoever on the on-going use of the EIKEN test in Japan, for example. Different society, different ethics.

"It is not surprising that the moral problems of the late 20th Century have finally caught up with applied linguists and language testers."

Davies (1997) is surely on safer grounds than Hamp-Lyons in dealing with ethics as part of the process of becoming a profession. Professionalism is to do with codes, contracts, training, and standards of practice. These may develop and change over time, but the profession at any one time agrees that they will behave according to an agreed norm - irrespective of where you live, what nationality you are, or what kind of institution you work in. Language testers have long accepted that the APA Standards for Educational and Psychological Testing (1985) form the ground rules for their activities. These have been questioned since (notably in the writings of Samuel Messick), and are currently being updated, but until the new document is released the 1985 publication remains the benchmark of good practice. Other important documents include the Code of Fair Testing Practices in Education, drawn up by the National Council on Measurement in Education, available on the Internet at:

A report of the International Language Testing Association entitled Report of the Task Force on Testing Standards is also available on the internet at the ILTA web page, maintained at the University of Surrey:

This lengthy document, published in 1995, is a review of the standards used and implemented by a range of tests in use throughout the world. This work was preparatory to drawing up a code of practice for the International Language Testing Association, which is work currently in progress. For a review of these standards and other related work, see the review by Davidson, Turner and Huhta (1998).

Another related area that has recently attracted a great deal of attention is that of accountability. For Norton (1998: 313) this refers to the responsibility of the language testers to the "stakeholders" in the testing process. These include the test takers, teachers, school administrators, community agencies, public officials and so on. The principle is that language testers should be accountable to these stakeholders, and is therefore concerned with trying to ensure positive impact of testing practices on them. One aspect of this is the legal framework that may exist to ensure fairness in assessment. In the United States, for example, legislation exists to prevent the discrimination against any subgroup of the test taking population, and this legislation has had a direct impact upon the adoption of testing standards (Fulcher and Bamford, 1996). In the future, it is more rather than less likely that language testing in other countries will have to take account of legal requirements for accountability to society.

Finally, the term "washback" is being investigated more carefully than ever before because of the ethical consequences of test use on teaching practice. The so-called "washback hypothesis" was formulated by Wall and Alderson (1993), and since then a number of studies investigating washback have appeared (Alderson & Hamp-Lyons, 1996; Hamp-Lyons, 1997). This early research indicates that the relationship between testing and teaching is much more complex than we had thought. An expansion of the notion of washback is that of "impact" (Wall, 1996; 1998), which seeks to investigate the relationship between test use and the society in which it is used.

These developments have recently been seen to widen the debate in language testing, to include not just the technical aspects of test development and implementation, but the entire context in which test development and delivery is undertaken and implemented. The debate is increasingly looking to philosophy, ethics, social policy studies and critical theory to broaden our understanding of the impact of language testing practices in a variety of contexts. (McNamara, 1998).
But is this a new awareness that has suddenly blossomed in the last few years? The answer is a most definite "no". Whilst the new studies are to be welcomed, they are generated by a sense of panic, or moral directionlessness. When considering ethical issues as applied to practical testing issues, there is the "sense that there is something fishy going on", but no real basis for investigating why something may be wrong. These issues were raised over a decade ago by Messick (1981, 1989a) and fully formulated in his classic essay on validity (Messick, 1989b). Messick called the study of this field the "consequential basis for test use", and isolated as important the types of messages that test developers sent to test takers and teachers through the use of terminology and test labels. This is what language testers refer to as "washback", but Messick called the "consequential basis for test interpretation." He distinguished between this and the "consequential basis for test use", which he described as the impact of the (mis)use of the test that had harmful unintended (for example, systematic bias) or intended (for example, discriminating against certain nationalities for immigration purposes) consequences for test takers or society. The Messick framework incorporates ethics into the concept of test validity or, more specifically, construct validity. Those undertaking studies at the moment would do well to go back to the theoretical framework of Messick for guidance, rather than flailing around in the postmodern sea of uncertainty. But the elegant Messick framework incorporates a sense of justice and fairness that goes way into the history of testing. In the early days of investigating reliability, Edgeworth (1888: 616) considered "fairness" in these terms:

"There are some of the pass men as good as some of the honour men; but, like the unsung brave 'who lived before Agamemnon,' they are huddled unknown amongst the ignominious throng, for want, not of talent, or learning, or industry, or judgment, but luck."

Similarly, Edgeworth (1888: 626) says:

"…there remains an inevitable injustice in excluding those who are just below the boundary line of that class. Can nothing be done to mitigate this hardship? Out not the excluded candidates to have at least a chance of entering within the pale, corresponding to the probability that they really deserve to be there?"


The whole history of the study of reliability is essentially a striving for "fairness". It is a great pity that many of our largest examination boards still do not understand the concept, let alone calculate it.

The current interest in ethical issues provides a host of opportunities for our students to combine an interest in sociolinguistics with language testing at the dissertation stage of the course. More studies of the use of test scores are needed, especially where these may be harmful. We need to plot the impact that tests have on test takers and the societies in which they are used. And we need to develop criteria to decide when and how test use is right or wrong. It is once again open season on ethics.


Alderson, J. C. and Hamp-Lyons, L. 1996. "TOEFL preparation courses: a study of washback." Language Testing13, 3, 280 - 297.

American Psychological Association. 1985. Standards for educational and psychological testing. Washington D.C.: APA.

Davidson, F., Turner, C. E. and Huhta, A. 1998. "Language Testing Standards." In Corson, D and Clapham, C. (Eds.) Language Testing and Assessment. Vol. 7 of the Encyclopedia of Language and Education. Amsterdam: Kluwer Academic Publishers. 303 - 311.

Davies, A. 1997. "Demands of being professional in language testing." Language Testing 14, 3, 328 - 339.

Edgeworth, F. Y. 1888. "The statistics of examinations." Journal of the Royal Statistical Society 51, 599 - 635.

Fulcher, G. and Bamford, R. 1996. "I didn't get the grade I need. Where's my solicitor?" System 24, 4, 437 - 448.

Hamp-Lyons, L. 1997. "Washback, impact and validity: ethical concerns." Language Testing 14, 3, 295 - 303

Hamp-Lyons, L. 1998. "Ethics in Language Testing." In Corson, D and Clapham, C. (Eds.) Language Testing and Assessment. Vol. 7 of the Encyclopedia of Language and Education. Amsterdam: Kluwer Academic Publishers. 323 - 333.

McNamara, T. 1998. "Policy and social considerations in language assessment." Annual Review of Applied Linguistics 18, 304 - 319.

Messick, S. 1981. "Evidence and Ethics in the Evaluation of Tests." Educational Researcher 10, 9, 9 - 20.

Messick, S. 1989a. "Meaning and Values in Test Validation: The Science and Ethics of Assessment." Educational Researcher 18, 2, 5-11.

Messick, S. 1989b. "Validity". In Linn, R. L. Educational Measurement. American Council on Education/Macmillan. 13 - 103.

Norton, B. 1998. ""Accountability in Language Testing." In Corson, D and Clapham, C. (Eds.) Language Testing and Assessment. Vol. 7 of the Encyclopedia of Language and Education. Amsterdam: Kluwer Academic Publishers. 313 - 322.

Wall, D. and Alderson, J. C. 1993. "Examining washback: the Sri Lankan Impact Study." Language Testing 10, 1, 41 - 70.

Wall, D. 1996. "Introducing new tests into traditional systems: insights from general education and from innovation theory." Language Testing 13, 3, 334 - 354.

Wall, D. 1998. "Impact and Washback in Language Testing." In Corson, D and Clapham, C. (Eds.) Language Testing and Assessment. Vol. 7 of the Encyclopedia of Language and Education. Amsterdam: Kluwer Academic Publishers. 291 - 302.

Internet sources of information on ethics and fair testing:

Consortium for Equity and Standards in Education

Fair Test

Fair Test Examiner

Other resources in Language Testing are available at:

Back to Table of Contents