Corpus Linguistics 2011 - DTAL（旧RCEAL）留学記録

バーミンガム大学で開催された標題の学会に出席（＆発表）しました。ケンブリッジからバーミンガムへ行くには主に(A)直通バス、(B)直通電車、(C)ロンドン経由の電車、の三種類の方法がありますが、予算を抑えるためにCを選択。ケンブリッジ→キングスクロス→ユーストン→バーミンガムで片道3時間程度です。出席者は計250名前後だそうで、その中で日本人は[twitter:@langstat]さんなど10名程度でした。

学会のテーマがDiscourse and Corpus Linguisticsということもあって、（批判的）談話分析の発表が数多くあり、私の興味があるような統語レベル以下の言語特徴を見るL2習得研究はあまりありませんでした。

計29セッション中の27発表と基調講演3本の30発表を聞きました。30件中1件は自分の発表で残りも「発表部屋にいただけ」のものがいくつかありますが。私が回ったのは以下の発表。

#	タイトル	発表者
初日
Plenary 1.	'Flavours' of corpus linguistics: the case of evaluative language	Susan Hunston
1.	'Read less, more TV': A corpus linguistic perspective on television discourse	Monika Bednarek
2.	From experiments to corpora: The Ariadne Corpus Management System	Frankfurt am Main
3.	A fast and user-friendly interface for large treebank: The treebank.info project	Peter Uhrig and Thomas Proisl
4.	Identifying new verb co-occurrence patterns as criterial features: Using ICCI and JEFLL	Yukio Tono
5.	Concessives in discourse: The case of multi-word discourse markers	Anne Li-E Liu
6.	PECHAKUCHA
7.	発表者不在によりキャンセル
8.	Methodological challenges in bilingual corpus-assisted discourse analysis	Rachelle Freake
9.	Combining the analysis of lexical bundles and POS-n-grams: A phraseological comparison of the BNC and ukWaC	Federico Gaspari
10.	Evaluating the lexical potential of spoken pedagogic corpora: The SACODEYL corpora	Johannes Widmann
二日目
11.	The Bielefeld Jigsaw Map Game (JMG) Corpus	Andy Lucking et al.
12.	Building "directional corpora" for unbiased contrastive analysis	Bruno Cartoni and Thomas Meyer
13.	Cross-linguistic influence on the accuracy order of English grammatical morphemes: Insights from a learner corpus	Akira Murakami
14.	COMENEGO	Daniel Gallego Hernandez and Ramesh Krishnamurthy
Plenary 2	Discourse, news representations and Corpus Linguistics	Paul Baker
15.	A Verbal Autopsy Corpus for machine learning of causes of death	Samuel Danso et al.
16.	Investigating syntactic complexity in L2 narrative and argumentative writing	Nicholas Wood and Nicolai Struc
17.	Towards a multi-label sentence classifier for automatic identification of rhetorical moves in English abstracts	Carmen Dayrell et al.
18.	A corpus-based study of the discourse of contemporary machine translation	Dorothy Kenny
19.	A freeware, open-source, web-based framework for distribution and analysis of single and parallel corpora	Laurence Anthony et al.
20.	21st Century Corpus Workbench: Updating a query architecture for the new millennium	Stefan Evert and Andrew Hardie
三日目
21.	Quantitative measure of productivity and their significance	Stefan Evert
22.	How large is the core of language?	Vaclav Cvrcek
23.	How can corpora improve multiple choice grammar questions with possible answers?	Hiroko Usami
24.	A methodlogical fusion: Problems combining CDA with corpus linguistics	Laura Louise Paterson
Plenary 3	Quantitative and exploratory corpus approaches to registers and text types	Stefan Th. Gries
25.	Workshop: Open educational principles for designing and developing digital language learning and teaching collections	Alannah Fitzgerald and Shaoqun Wu
26.	The use of an error-tagged learner corpus to investigate L1 Mandarin learners' English article interlanguage before and after explicit grammar teaching	Richard Nickalls
27.	休憩
28.	From crawled collections to comparable corpora: An approach based on automatic archive identification	Richard Forsyth and Serge Sharoff
29.	Constructing Chinese as a second language learner corpus and a web-based concordancer	Hao-Jan Howard Chen

15や17のような自然言語処理系の研究から、10や25のように相当に教育寄りのものまで、色々な発表を聞きました。特に面白いと思ったのは19、20、21、26あたりなのですが、私の研究に近い26を紹介しようと思います。発表者はバーミンガム大学のPhD学生だそうです。

概要：L1中国語の英語学習者の冠詞使用を15週間に亘り追い、小さな縦断的学習者コーパスを作成。エラータグ付けし冠詞発達を見る。

研究設問

L2学習者にとってどの冠詞が最も難しいのか

学習者の冠詞使用に影響する要因は何か

手法

被験者：30人のL1中国語の英語学習者。CEFRのB1〜B2レベル。

群分類：明示的指導（エラーを含む冠詞部分に下線を引くことによるフィードバック？）を受ける群が2群＋統制群の3群

データ（＝コーパス）：授業の第一週、第八週、第14週、第15週に被験者が書いた短いエッセー。いずれもビジネス系のテーマ。計40,100語

タグ付け：名詞句を自動抽出した後に冠詞の誤りのタグを手動で付与する

結果

TLUで見ると無冠詞が最も正確（0.90）で、定冠詞（0.83）、不定冠詞（0.81）と続く

定冠詞を見てみると、誤使用の誤りも多いが、種類一般を指すtheなどでは脱落の誤りも見られる

明示的指導群は統制群よりも正確性が伸びるが（第一週→第八週）、明示的指導がなくなると元の正確性に戻る（第八週→第14週）

感想

定冠詞・不定冠詞とcomplementaty distributionになるであろう無冠詞も同時に見ているのが良いと思う。質疑応答中にもあったがほかの決定詞も見てみると面白いだろう。（実はここは自分の研究でできていないところ）

「コーパス」というと何を代表しているのか等のややこしい問題が出てくるので、通常の実験データとして扱えば良いのではないか。ただ実験データをコーパスと呼ぶ例はほかにも多くあるので大きな問題ではないように思う。

TLUスコアに統計処理を施しても良さそう。ただTLUスコアの統計的な比較は厄介ではある。

三日間、朝から夕方まで発表を聞き続けると疲れます。研究発表なので密度が濃く、集中力が途切れて一部聞き逃しついていけなくなる、というパターンに何度か陥りました。特にこちらへ来てからは授業もあまりないため、一日の中で多くの時間集中していなければいけないということがほとんどなくなり、そういう意味での集中力が落ちたのかもしれません。

学会ではEPPや先週のランカスターでの集中講義（サマースクール）などで知り合った何人かに再会しました。特に昨年12月のSketch Engineの講習会以来コーパスのイベントがある度に会っているマレーシア人はバーミンガムでPhDをしているということもあり、食事処の紹介を含め大変お世話になりました。この学会で新たに知り合った人も数人いますし、そういう意味でも少しずつ国際学会に行く楽しみが増えつつあります。