Computer Software Grades Essays Just As Well As People, Profs Announce

April 16, 1998

New computer software can grade the content of essay exams just as well as people and could be a major boon in assessing student performance, researchers at the University of Colorado at Boulder and New Mexico State University announced today.

"From sixth graders to first-year medical students we get consistently good results," said Thomas K. Landauer, a CU-Boulder psychology professor who has worked on the technology behind the program for 10 years. "It's ready."

The computer software, called Intelligent Essay Assessor, uses mathematical analysis to measure the quality of knowledge expressed in essays. It is the only automatic method for scoring the knowledge content of essays that has been extensively tested and published in peer-reviewed journals.

The system was developed by Landauer, Darrell Laham, a CU-Boulder doctoral student and Peter W. Foltz, an assistant professor of psychology at NMSU. They will discuss the system Thursday, April 16, during the annual meeting of the American Educational Research Association in San Diego.

"We are continually surprised at how well it works," said Landauer, who started on the project as director of cognitive science research at Bellcore.

The grading system has important implications for assessing student writing and helping students improve their writing, Foltz said. In one of his undergraduate psychology classes at NMSU last fall, Foltz tested a version of the program.

"Students submitted essays to a web page and received immediate feedback about the estimated grade for their essays, and suggestions about what was missing," Foltz said. "Students could revise their essays and resubmit them as many times as they wanted. The students' essays all improved with each revision."

Foltz also gave students the choice of having their essays graded by a human or by the computer. "They all chose to have the computer do the grading," he said.

Educators laud essay exams because they provide a better assessment of students' knowledge than other types of tests. A huge drawback is that the tests are time-consuming and difficult to grade fairly and accurately, particularly for large classes or nationally administered exams.

But computer-based evaluations of student writing are becoming increasingly feasible because of the growing numbers of students who write using computers. The researchers have applied for a patent on their software.

The new system requires a computer with about 20 times the memory of an ordinary PC to do the statistical analysis that it needs to "understand" essays. It uses Latent Semantic Analysis, a new type of artificial intelligence that is much like a neural network. "In a sense, it tries to mimic the function of the human brain," Laham said.

First the software program is "fed" information about a topic in the form of 50,000 to 10 million words from on-line textbooks or other sources. It learns from the text and then assigns a mathematical degree of similarity or "distance" between the meaning of each word and any other word. This allows students to use different words that mean the same thing and receive the same score. For example, they could use "physician" instead of "doctor."

The program then evaluates essays in two primary ways. The first is for a teacher or professor to grade enough essays to provide a good statistical sample and then use the software to grade the remainder.

"It takes the combination of words in the student essay and computes its similarity to the combination of words in the comparison essays," Laham said. The student then receives the same grade as the human-graded essays to which it is most closely matched.

"The program has perfect consistency in grading -- an attribute that human graders almost never have," Laham said. "The system does not get bored, rushed, sleepy, impatient or forgetful." In one test, both the Intelligent Essay Assessor and faculty members graded essays from 500 psychology students at CU-Boulder. "The correlation between the two scores was very high -- it was the same correlation as if two humans were reading them," Landauer said.

The software only evaluates knowledge content and is not designed to grade stylistic considerations like grammar and spelling, researchers said. Existing programs already can do those functions.

A second Intelligent Essay Assessor method compares all the student essays to a single professor's or expert's essay, a so-called "gold standard." A third variation can tell students what important subject matter was missing from their essays and where to find it in the textbook.

Previous methods of automatic essay scoring simply counted words and then analyzed mechanics and aspects of grammatical style, the researchers said.

There is a strong correlation between students who write the most and students who write the best, researchers said. This is because students who know a lot write a lot.

The amount of content also counts in the Intelligent Essay Assessor, but it is measured by concepts, not by the number of words. The researchers recommend setting an essay word limit to eliminate length as a factor.

Because the system does not analyze surface form, it is possible that someone could include all the right words in an essay -- in random order -- and get a good grade, they said. The system will flag unusual essays for that and other reasons for a human to check. But the team discovered an even better safeguard while trying to fool the system.

"If you wrote a good essay and scrambled the words you would get a good grade," Landauer said. "But try to get the good words without writing a good essay!

"We've tried to write bad essays and get good grades and we can sometimes do it if we know the material really well. The easiest way to cheat this system is to study hard, know the material and write a good essay." - 30 -

University of Colorado at Boulder

Related Psychology Articles from Brightsurf:

More than one cognition: A call for change in the field of comparative psychology
In a paper published in the Journal of Intelligence, researchers argue that cognitive studies in comparative psychology often wrongly take an anthropocentric approach, resulting in an over-valuation of human-like abilities and the assumption that cognitive skills cluster in animals as they do in humans.

Psychology research: Antivaxxers actually think differently than other people
As vaccine skepticism has become increasingly widespread, two researchers in the Texas Tech University Department of Psychological Sciences have suggested a possible explanation.

In court, far-reaching psychology tests are unquestioned
Psychological tests are important instruments used in courts to aid legal decisions that profoundly affect people's lives.

Psychology program for refugee children improves wellbeing
A positive psychology program created by researchers at Queen Mary University of London focuses on promoting wellbeing in refugee children.

Psychology can help prevent deadly childhood accidents
Injuries have overtaken infectious disease as the leading cause of death for children worldwide, and psychologists have the research needed to help predict and prevent deadly childhood mishaps, according to a presentation at the annual convention of the American Psychological Association.

Raising the standard for psychology research
Researchers from Stanford University, Arizona State University, and Dartmouth College used Texas Advanced Computing Center supercomputers to apply more rigorous statistical methods to psychological studies of self-regulation.

Psychology: Robot saved, people take the hit
To what extent are people prepared to show consideration for robots?

Researchers help to bridge the gap between psychology and gamification
A multi-disciplinary research team is bridging the gap between psychology and gamification that could significantly impact learning efforts in user experience design, healthcare, and government.

Virtual reality at the service of psychology
Our environment is composed according to certain rules and characteristics which are so obvious to us that we are scarcely aware of them.

Modeling human psychology
A human being's psychological make-up depends on an array of emotional and motivational parameters.

Read More: Psychology News and Psychology Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to