to star items.

Accepted Paper

Extending a Grammar Support Database with Learner Corpora for Data-Driven Learning: An AI-Assisted Approach  
Keiko Hori (Toyo University)

Send message to Author

Paper long abstract

Data-Driven Learning (DDL) is an approach in which learners inductively discover linguistic features by examining corpus data (Johns, 1991). While DDL has been shown to promote discovery learning and learner autonomy, previous studies have also pointed out several obstacles to its implementation, including the need for training and teacher support, the difficulty of understanding corpus examples, and the considerable time required for analysis (Lusta, Demirel, & Mohammadzadeh, 2023). As a result, DDL has not yet been widely adopted.

One way to address the difficulty of corpus examples is to use learner corpora. The research group has developed Hagoromo, a database of example sentences for function words designed to support grammar instruction. By incorporating both correct and incorrect examples extracted from learner corpora, the group has explored ways to extend this resource for use in DDL. Sentences produced by learners tend to be easier for other learners to understand and often share common points of misuse. Previous research has reported that learners refer to peer-produced examples more than native-speaker examples when creating their own sentences (Hori, 2019).

Until now, the extraction of examples from learner corpora has been carried out manually, and identifying grammatical correctness, causes of errors, and alternative expressions has required substantial effort. To improve efficiency, generative AI was used to judge correctness and identify causes of misuse, followed by human verification. This approach significantly reduced the workload involved in example extraction.

The AI-based judgments were found to be accurate in over 83% of cases. However, some inaccurate judgments were observed, such as failures to identify errors related to differences in perspective caused by transitive and intransitive verb constructions, as in the following example using sura:

Te o kizutsukete, kaku koto sura dekinai. (“I injured my hand and cannot even write.”)

In future work, careful human review will be combined with AI assistance to further expand learner-corpus-based examples.

________________________________________________________________

References

堀恵子(2019)「機能語ウェブツールを使った自律的文法学習の効果」『ヨーロッパ日本語教育』24, 568-579. https://eaje.eu/pdfdownload/pdfdownload.php?index=586-597&filename=koto-hori.pdf&p=belgrade

Johns, T. (1991). Should You Be Persuaded: Two Examples of Data-Driven Learning Materials. English Language Research Journal, 4, 1-16.

Lusta, A., Demirel, Ö., & Mohammadzadeh, B. (2023)Language corpus and data-driven learning (DDL) in language classrooms: A systematic review. Heliyon 9(2):e22731 DOI:10.1016/j.heliyon.2023.e22731

Abstract in Japanese (if needed): 学習者コーパスを利用したDDL推進のための用例抽出 -生成AI利用による効率化- 堀 恵子(東洋大学) データ駆動型学習(DDL)とは,学習者がコーパスデータを見て,帰納的に言語の特徴を発見し,学ぶ方法である(Johns1991)。学習者の発見学習,自律学習を促す効果が指摘されているが,一方で,DDLの導入時にトレーニングや教師の支援が必要であること,コーパスの例文理解が難しいこと,時間がかかることなどが指摘されており(Lusta, A., Demirel, Ö., & Mohammadzadeh, B. 2023),広く普及するには至っていない。 これらの問題のうち,例文の難しさを克服する方法として,学習者コーパスを利用する方法がある。発表者のグループでは,文法教育の支援ツールとして機能語用例文データベース「はごろも」を開発してきたが,用例に学習者コーパスから抽出した正用と誤用の例文を加えることで,DDLにも活用できる方法を模索してきた。学習者が作成した例文は,他の学習者にとって理解しやすく,また誤用しやすい点も共通している。学習者が例文を作成するとき,母語話者の例文より,学習者の例文をよりよく参照したとの報告がある(堀2019) これまで学習者コーパスからの例文抽出は人手で行ってきたが,例文の正誤判断と誤用の原因や代替表現を指摘する作業には,多くの手間がかかった。そこで生成AIを利用して,正誤判断と誤用の原因を指摘させ,その後人手で確認することで,例文抽出の作業を効率化することができた。 生成AIによる正誤判断は,83%以上が適確と判断されたが,不的確な例としては,前件と後件の自動詞・他動詞による視点の異なりに関して,誤用と判断しなかった例などがある。 *手を傷つけて、書くことすらできない。(「すら」の例文) 今後は,人手による慎重なチェックを行い,学習者コーパスの用例を増やしていきたい。                                                                                                                                         ________________________________________________________________ 参考文献 堀恵子(2019)「機能語ウェブツールを使った自律的文法学習の効果」『ヨーロッパ日本語教育』24, 568-579. <https://eaje.eu/pdfdownload/pdfdownload.php?index=586-597&filename=koto-hori.pdf&p=belgrade> Johns, T. (1991). Should You Be Persuaded: Two Examples of Data-Driven Learning Materials. English Language Research Journal, 4, 1-16. Lusta, A., Demirel, Ö., & Mohammadzadeh, B. (2023)Language corpus and data-driven learning (DDL) in language classrooms: A systematic review. Heliyon 9(2):e22731 DOI:10.1016/j.heliyon.2023.e22731
Contribution AJE001
Association of Japanese Language Education: 1
  Session 4 Friday 28 August, 2026, -