CSTAR Workplan

At the  C-STAR consortium  meeting held in Trento on December 2002, 
the decision was taken to organize, on a regular basis, speech translation 
evaluation campaigns and workshops, mainly focusing on  speech translation 
research and evaluation. 

ctivities within C-STAR will as well  include the 
development of a large multilingual parallel corpus to be used for common 
evaluations.

Evaluation Campaign 2003

The first evaluation campaign and workshop will be in May 2003 and Sept. 2003, 
respectively.

This year, both events will be restricted to CSTAR members only, 
and  the evaluation will be limited  to written texts.

In particular, training and 
testing data will be based on the BTEC  corpus developed by ATR and extended 
by the partners to their respective languages.

Specifications

– The first evaluation campaign will concentrate on assessing text translation algorithms in the tourism domain.  Translation directions will be from Chinese, Italian,  Japanese, and Korean into English, for the primary  condition, and any 
   other direction for the secondary condition.

– Training  data will  consist of  a fixed  amount of  English sentences provided 
    with   translations  into the respective source language. Participants will be allowed 
    to use any additional monolingual resources, e.g.  text corpora,  grammars,  word lists, 
    segmentation tools.

– Test data of the primary condition will consist of  English sentences taken from phrase-books not included in the training data.   Test data  for the 
    the secondary condition will consist of manual translations of the English sentences 
    into all the considered source languages.

–  The  primary condition will  be mandatory for all  participants.  Participants will 
    be  invited   to  submit  more  runs  for   each  condition,  possibly corresponding to 
    different translation directions.

Evaluation Protocol

– Automatic scoring will be carried out with the  NIST/BLEU software. In particular, a 
   server will be set-up that will permit participants to remotely score the output of their system.  Hence,  for each translation direction, multiple translations will be used as references.

– Subjective evaluation on the primary condition will be distributed across the participant sites.  English native speakers will evaluate the output of each system against one gold-standard reference. The evaluation will follow guidelines similar to those applied by LDC 
   in the NIST MT evaluation campaigns.

– While  automatic evaluation  will be  applied to  all  submitted runs, subjective evaluation will 
    be applied to only one run per participant, namely the first run submitted  under the primary 
    condition.

– Finally, participants are allowed to discuss their results without restriction. Disclosure of the 
   results of other participants are not allowed without their permission.