How AI is used to score writing exams

July 27, 2019

0 1,160 4 minutes read

Our AI writing scoring uses a technology called latent semantic analysis. LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics.

Similarly to our speech recognition acoustic models, we first establish a language specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language.

Once the language model has been established, we then train the engine to score every single written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human marked items, and check that the machine scores are very highly correlated to the human scores.

The benchmark is always the expert human scores. If our AI system doesn’t very closely match the scores given by human markers, we remove the item, as it is essential to match standard set by human markers.

AAI’s ability to mark multiple traits

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation.

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, and maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error.

AAI’s lack of bias

A fundamental premise for any test is that there should be no advantage or disadvantage given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel that they may have received a different score if someone else had heard them or read their work.

Our AI systems remove the issue of bias completely. This is done by ensuring our speaking and writing AI systems are trained on a very wide range of human accents and writing types.

We don’t want perfect native speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialled our items and trained our engines using millions of student responses, and continue to do this now as new items are developed.

TThe benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative as well as summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

Address the increasing importance of formative assessment to drive personalized learning and diagnostic assessment feedback

Allow students to practice and get instant feedback inside and outside of allocated teaching time

Address the issue of teacher workload

Create a virtuous combination between human and machines, taking advantage of what humans do best and what machines do best.

Provide fair, fast and unbiased summative assessment scores in high stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it, A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide constant opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high stakes assessments.

EExamples of AI assessments in ELT

At Pearson we have developed a range of assessments using AI technology.

VVersant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

Find out more about how the Versant tests can help you automate scoring processes and learn why it’s important to place university students in the right English program.

PPTE Academic

The Pearson Test of English Academic is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days.

Find out more on the PTE Academic website.

EEnglish Benchmark

English Benchmark is also scored using the same automated assessment technology. This test, which is taken on a tablet, is aimed at young learners and takes the form of a fun, game-like test. Covering the skills of speaking, listening, reading and writing, it not only measures the student’s ability, but also gives suggestions on follow up activities and next teaching steps.