Plec 0.16

PLEC online tools help page


This is a help page for the PELCRA Learner English Corpus online tools.

Table of Contents

Corpus statistics


An up-to-date overview of the corpus contents is available on the Browse data => Summary page. The current number of text, sentence and word segments units in the written and spoken segments of the corpus can be found in a dynamically generated table available on that page.


Text browser


The corpus text browser tool is available from the Browse data => Texts menu. Every text included in the PLEC corpus is listed in the browser together with some of its bibliographic annotation. A more detailed view of a text's metadata can be displayed by clicking on its title in the list. The annotation of levels was added manually after inspecting samples of each text or transcription.


Corpus search engine


The PLEC corpus search engine (available from the Search menu) is an advanced search tool which can be used to run linguistic queries against the data collected in the project.


Exact matches

To find exact matches of a word or phrase, simply type it in the search box as in this example query for the occurrences of the word area.

Variants


You can search for multiple orthographic or semantic variants in a single query by concatenating term variants with a pipe symbol. For example, the query: area|areas returns example sentences containing either of these variants, whereas a query such as:

    in a|the city|cities

results in concordances containing phrases such as in a city in the city and in the cities (and theoretically also in a* cities).


Slop factor

You can define more flexible phrase queries for combinations of words by increasing the slop factor parameter. For example, the following query:

    look|looked|looking|looks up

with a slop factor of 2 returns results such as:

a) young audience who would probably not bother to look the words up,
b) when subjects of the study know the words , and do not really need to look them up,
c) This procedure is equivalent to looking up words in the dictionary.

Thus, the slop factor of n means that up to n words can occur between the query terms.

Word order

The word order restriction on query terms can be relaxed by unchecking the Preserve order option in the search form. For example, the query:

    nice people

with a slop factor of 3 and the Preserve order option unchecked yields concordances such as:

a) You had promised qualificated , nice and energetic people who were not only teach us Chinese but also show us a little China .
b) The people were very nice so I could talk with them about everything work , world events etc . the surroundings made me feel very confident and happy

Term negation

You can negate a term variant so that it won't show up in the results. For example, the following query:

    big cit*|!cities

returns concordances containing occurrences of the phrase big city but not big cities.

Wildcards

Orthographic variants of a term can be specified using the asterisk symbol as a wildcard matching zero or more word characters, e.g.:

    histor*

will match both history and historical.

Part of speech annotation

The PLEC corpus has been part-of-speech tagged with the Stanford POS tagger. This makes it possible to search the corpus using morphological criteria. There are currently two types of POS-queries supported by the search engine. Users can specify a part of speech category for a lemma by prefixing it with >l_ and appending the appropriate tag introduced by another underscore. It is possible to use wildcards to expand POS tag queries.

For example the following query:

    >l_test_V*

matches the occurrences of test, tests, tested and testing used as any verbs, whereas:

    >l_test_N*

matches the occurrences of test and tests as nouns. The query:

    >l_test_NNS

only matches instances of tests as plural nouns.

It is also possible to use POS tags as place-holder in phrase queries. You can use the following query to find all occurrences of adjectives preceding the nouns person or people:

    >p_JJ* people|person|persons

Multiple queries

It is possible to write multiple queries which will return a union of results in a single run by clicking on the plus button next to the search text field. The example below shows how this functionality can be used to search for such + noun patterns with an optional article and an optional adjective:
multiple queries

Sample results of this complex query are shown below:

multiple queries

Context

A larger context of a matched concordance can be show by clicking on the plus and speaker icons in the right-hand side of the concordance. Spoken context windows offer the additional functionality of playing out the original recording for a matched transcription snippet by clicking on the Play icon next to each utterance.

multiple queries

Faceted search

When you run a query matching some result, a set of categories retrieved for those results is displayed just under the search form. We call those categories facets and they can be used as filters to narrow down the results of the search.

multiple queries

By checking one or more of the category boxes you can add them as necessary conditions to the original query.

Exporting results



The results of search engine queries can be exported as Microsoft Excel spreadsheets (also supported by LibreOffice). You can click on the floating Excel spreadhseet icon next to the current results view do download them automatically.


Phrases browser




Syntactic annotation browser