Refining the state-of-the-art in Machine Translation, optimizing NMT for the JA <-> EN language pair by leveraging personal domain expertise
Authors:
Matthew Bieda
Abstract:
Documenting the construction of an NMT (Neural Machine Translation) system for En/Ja based on the Transformer architecture leveraging the OpenNMT framework. A systematic exploration of corpora pre-processing, hyperparameter tuning and model architecture is carried out to obtain optimal performance. The system is evaluated using standard auto-evaluation metrics such as BLEU, and my subjective opini…
▽ More
Documenting the construction of an NMT (Neural Machine Translation) system for En/Ja based on the Transformer architecture leveraging the OpenNMT framework. A systematic exploration of corpora pre-processing, hyperparameter tuning and model architecture is carried out to obtain optimal performance. The system is evaluated using standard auto-evaluation metrics such as BLEU, and my subjective opinion as a Japanese linguist.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts
Authors:
Timothy McPhillips,
Tianhong Song,
Tyler Kolisnik,
Steve Aulenbach,
Khalid Belhajjame,
Kyle Bocinsky,
Yang Cao,
Fernando Chirigati,
Saumen Dey,
Juliana Freire,
Deborah Huntzinger,
Christopher Jones,
David Koop,
Paolo Missier,
Mark Schildhauer,
Christopher Schwalm,
Yaxing Wei,
James Cheney,
Mark Bieda,
Bertram Ludaescher
Abstract:
Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow…
▽ More
Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future versions of YesWorkflow also will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems.
△ Less
Submitted 9 February, 2015;
originally announced February 2015.