LingURed – Linguistically Supported Revising and Editing

“This was former camel trading venue was home to the 1982 stock market crash in Kuwait, which wiped out many billions in regional wealth at the time.” (Financial Times Blog, November 30, 2009)

How do errors like this happen? It is very likely that the duplicate verb results from editing and revising the text. In this case, we may assume that the author first wrote, “This was a former camel trading venue, which was home to the 1982 stock market crash in Kuwait, which wiped out ...” On rereading, he noticed the rather repetitive “which ... which” and decided to revise the sentence. He removed the “a” and inserted a “was” but forgot to remove the first “was.”

But why did the author forget to remove the first “was”? Revising and editing puts a high cognitive load on writers: First, writing in itself is a very demanding task; second, revising and editing requires writers to translate their linguistic goals (here: the transformation of the original sentence structure into the new one) into word processor operations to realize them.

The problem is that word processors operate almost exclusively on characters or ranges of characters, not on linguistic elements such as words, phrases, or sentences. However, natural-language text is not merely a string of characters, but a complex arrangement of interdependent structures: A text consists of paragraphs, which consist of sentences, which consist of phrases, etc.

Writers think about their texts in terms of language units: For example, when revising a text, you may want to put a sentence into the active voice. You may not know the linguistic terminology, but you certainly do not think “I want to delete these five characters and copy those characters to that position.” This conceptual mismatch between the models of writers and word processors necessitates the translation mentioned above.

This translation causes further cognitive load: The writer has to fully concentrate on the character-level operations: Placing the cursor at the correct point, deleting characters, etc. requires to focus on very small parts of the text. The bigger picture of the sentence is lost, resulting in errors like shown in the example above.

In the LingURed project, we are investigating how to prevent these errors. In particular, we are developing functions for word processors that are based on linguistic knowledge and operate on language units instead of characters. The idea is to provide functions that are closer to the conceptual model of the writer and thus reduce the amount of translation required when editing and revising.

The LingURed project team consists of:

  • Cerstin Mahlow
  • Michael Piotrowski

We work together with researchers in the areas of writing research and authoring aids, both nationally and internationally. In order to foster the cooperation between computational linguistics and writing research, we organized the NAACL-HLT 2010 Workshop on Computational Linguistics and Writing Research (CL&W 2010).

Recognizing the part of speach of word forms and their syntactic functions enables further linguistically motivated functions, e.g., “syntax highlighting.” This screenshot shows the highlighting of finite verb forms.

How does it work? Look here: LingURed-Webseite.