C-STAR Subjective Evaluation

Each group participating in the C-STAR III evaluation this year has been assigned a series of translations for human grading.

The grading assignments for each group are split into 5 files, listed under the group name. In order to keep the overall file length manageable for a single grader in a single grading session, each file contains around 100 sentences.

To start grading a file, simply click on the filename in this list. The grading page will open automatically, after prompting you for a group user name and password.

A sample grading turn is shown below. For each translation, the grader must supply grades for fluency and adequacy. These terms are explained in the instructions section of each of the grading pages.

IMPORTANT NOTE: Each of the files contains 100 translations that must be graded in a single grading session. Graders should complete the entire file and submit the grades before moving on to another file or closing the browser.

Special Characters and Null Translations: NO_TRANSLATION_FOUND indicates that the translation engine could not produce any output for the current input sentence. Non-Latin characters in the output indicate that some words from the input could not be translated.

Instructions:

On this page you will be presented with a series of translations to evaluate. Each translation segment appears underneath a Reference sentence.

Compare the two sentences and then make a decision about the quality of the translation, which is labelled “Evaluate this segment.” You will be asked to judge the quality of the translation based on two criteria: Adequacy and Fluency

Adequacy indicates how much of the information from the Reference sentence is also in the sentence below it. Please select one of “All”, “Most”, “Much”, “Little”, or “None”. 
Fluency indicates how the evaluation segment sounds to a native speaker of English.

Please select the phrase that best describes the level of English used in the translation: “Flawless English”, “Good English”, “Non-native English”, “Disfluent English”, or “Incomprehensible”. 

An area for comments is also provided. You may choose to leave this field blank. 

A rule of thumb for grading is to spend no more than 30 seconds on each sentence. When you are finished, be sure to click on the “Submit” button at the bottom of the page.

A result should be returned to you displaying the average scores that you assigned to this document. You may wish to save the result page in order to verify that your results were stored.