Technologically and linguistically adventurous EFL teacher, trainer, writer and manager

corpus (pl. corpora)
a collection of written or spoken material stored on a computer and used to find out how language is used
From the Cambridge English Dictionary online

I’ve been interested in corpora for a while now, but never seem to have time to go beyond my very basic understanding of how the Brigham Young University corpus interface works. I’ve always used it for the BNC (British National Corpus), which covers 1980-1993, but discovered a few seconds ago (!) that COCA (Corpus of Contemporary American English) is constantly updated, so I think I’ll be switching to that from now on!

All I knew before was how to do a basic search for a term and how to look for collocates, possible with a verb or noun near the key word if I was feeling very adventurous. Thanks to three talks I attended on different versions of corpora during the conference, I now feel like I know much more! 🙂

COCA

Jennie Wright did a very practical session introducing us to the basic functions of COCA, with three activities you can take straight into the classroom. Mura Nava, the master of corpora, helpfully collected my tweets from the session (and added notes to make it clearer – thanks!) which show all three activities, and Jennie has shared the list of corpora resources on her blog. She particularly recommended COCA Bites, a series of very short YouTube videos designed to introduce you to the corpus.

One thing I particularly like about COCA is the fact that parts of speech are highlighted in different colours. Here’s an example of a KWIC search for ‘conference’, giving concordance lines with the key word in a single column (a function Jennie taught me!)

COCA 'conference' search

SKELL

James Thomas taught us how to answer language questions from corpora, focussing on the SKELL (Sketch Engine for Language Learning) concordancer (thanks for correcting that James!). I didn’t realise that SKELL was created by the people at Masaryk University, in (one of) my second home(s) Brno 🙂 Again, Mura collected the tweets, this time by me, Leo Selivan (another corpus master) and Dan Ruelle.

What makes SKELL different to many corpora is that it uses algorithms to select 40 sentences from however many the search finds, getting rid of as many as possible with obscure words or which are overly long to make it easier for learners to use. This works well for common words, but not always for slightly more obscure words, like ‘mansplain‘ (possibly the word of the conference, thanks to David Crystal’s opening plenary!) You can also use the ‘word sketch’ function on the corpus to show you lots of collocates, a function I think I will now use instead of a collocations dictionary! Michael Houston Brown has a very clear introduction to SKELL on Mura’s eflnotes blog.

One slight problem, as with all corpora, is that it cannot distinguish between different senses of the same word, which may confuse learners. In this example, conference is listed both in the sense of the IATEFL conference, and as a sporting league. This could also be seen in the COCA image above, but I think it is easier to spot here.

SKELL 'conference' search

If you’d like to find out more, James has recently written an article for the Humanising Language Teaching magazine.

Making your own corpus

Chad Langford and Joshua Albair are clearly die-hard corpus fans. They trawled through over one million words from over 8,000 TripAdvisor restaurant reviews to create their own corpus of review language. The findings were very interesting and showed up some clear features of the genre, but I’m not sure how practical it would be for most teachers to do this kind of project as anything other than a hobby. They’re based at Lille University, but they didn’t say how much of their time was dedicated to this project versus teaching, or how many groups they used it with, so it was difficult to work out the return on their investment of time. Nevertheless, it was very interesting to see how you go about building a corpus. Again, thanks to Mura for collating my tweets with more information in them.

Extras

Mura also collated tweets for one more corpus-related talk at IATEFL, based on the English Grammar Profile. Cambridge have recorded all of their talks from the conference, including this one, so you can watch it at your leisure. He has a free ebook with examples of the BYU-COCA corpus interface.

There are interviews with some of the presenters of corpus talks at this year’s IATEFL, including James, Chad and Josh, on Mura’s blog. This list of talks shows everything connected to corpora from this year’s conference.

Advertisements

Comments on: "IATEFL Birmingham 2016: Corpora" (5)

  1. Dana Comisu said:

    Thank you so much, Sandy! Priceless!

    Like

  2. mhoustonbrown said:

    Thank you for mentioning my post on SkELL. I have a new post with a couple ideas of how teachers can turn the “drawback” of SkELL not differentiating senses of a word to their advantage. Please check it out if you get the chance 🙂

    https://corpling4efl.wordpress.com/2016/04/19/skell-homonymy-and-polysemy/

    Like

  3. Chad Langford said:

    Hello, Sandy. Thanks a lot for your comments and for coming to our session.
    The work on our project was done on our own time. We work at a university, but we’re English teachers, not university professors – we’re not allocated any research time or funds, so we just do it on top of a full-time teaching load because we find it stimulating. Lots of weekends and late nights, as you can imagine!
    As for the number of groups, we’ve used the materials we’ve developed with various learners from age 18 to about 70 (continuing studies learners come in all ages), in levels ranging from A2 to C1.
    Sorry you came away from it thinking it a somewhat impractical enterprise – I certainly think teachers can create and work with much smaller corpora and get a lot out of it, and give back from it to their learners in meaningful ways.
    As far as investment of time is concerned, we think that if the time we spend can be of potential use to others, then it is time well spent and worth it. We’ve developed a lot of material and share it freely and gladly with anyone who’s interested. And the satisfaction we get from that is return enough for us. 😉
    Thanks for your great blog presence; it’s always a real pleasure to read.
    Best, – Chad

    Liked by 1 person

    • Hi Chad,
      Thanks very much for clarifying points from that. I did think it sounded intellectually stimulating, but I wondered at the time that went into it, so thanks for clarifying that.
      Have you made the corpus available to others in any way? If you’d like to, you’d be welcome to share some/all of the process of making it on a guest post here, as I think people would be interested in it – my superficial explanation of it in the post is probably not enough if people want to have a go at it themselves.
      Glad you enjoy the blog,
      Sandy

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: