You might want to open your browser's console before you hit "Learn", so you can see more info about the learner's progress.
You might want to open your browser's console before you hit "Learn", so you can see more info about the learner's progress.
Start a new simulation

 
Drop your files or folder here,
or click to select files
 
Files you use here go into your browser's local storage. They never leave your computer.
Name:

Description:

Save   Clear
If you used the Sublexical Learner in your work, please reference our paper and our learner:

  • Allen, Blake & Michael Becker (2015) Learning alternations from surface forms with sublexical phonology. lingbuzz/002503.
  • Allen, Blake & Michael Becker (April 26, 2022 ) The sublexical morphophonological learner [web application], version of April 26, 2022 . Retrieved from http://sublexical.phonologist.org/.
Get the source code at github.com/kuzum99/sublexical.

What's in here?

A web application for learning the relation between a pair of morphological categories, e.g. singular and plural. You can try some of the existing simulations, or input your own files.

Starting your own simulation

You can give the learner your own files. The files never get sent anywhere outside your computer. They are kept in your browser's local storage; they will not be available on a different computer or on a different browser on the same computer.

Training data

This is the only required file. It is a text file, with a pair of words in each line, separated by a tab. A third tab can specify the observed frequency of the paradigm. Anything after the third tab is ignored.

The symbols in each word are separated by spaces, e.g. "w ʌ ɡw ʌ ɡ z".

Testing data (wugs)

This is a text file, with a word in each line. Anything after a tab character is ignored.

The symbols in each word are separated by spaces, e.g. "w ʌ ɡ".

If you don't supply testing data, the learner will take in the real words of the language, put them into sublexicons, and fit a grammar to each sublexicon. No wug testing will occur.

Constraint set

This is a text file, with one constraint in each line. Following a tab, an optional mu can be specified (negated), followed by an optional sigma after the second tab. Anything after the third tab is ignored. On the use of mu and sigma, consult the manual for the MaxEnt Grammar Tool.

If you don't supply a constraint set, no forms will violate any constraints. The harmony of each candidate will be zero, and the probability distribution will be uniform and flat.

Markedness constraints are regular expressions. Two things to think about:

  • Segments are separated by spaces, so "[+round][+voice]" will likely never assign any violation marks. To assign one violation mark to a [+round] segment followed by a [+voice] segment, write "[+round] [+voice]".
  • Violation marks can be assessed unintentionally if one symbol is a substring of another symbol, e.g. [-cont,+back] will match "t", but also the "t" of "tʲ". A judicious use of spaces in the constraint formulation can help.

We support two kinds of faithfulness constraints. All faithfulness constraints are identified with an initial "F: ".

  • Context-free Ident for a given feature, e.g. "F: Ident [back]".
  • Contextual Ident, e.g. "F: Ident [back] [+low]" assigns violation marks to changes of the feature [back] on [+low] segments.

Feature matrix

This is a tab-separated text file, with one sound/phone/segment in each line. This first tab in each line specifies the symbol.

The first row must specify column names, separated by tabs. All columns must have non-empty names, including the first one.

Another row must specify the empty symbol, which must be named "empty". The feature specifications of this segment are used in calculating the cost of a mismatch when bases and derivatives are aligned.

If you don't supply a feature file, the default feature file will be used instead. This file is based on Hayes' 2009 textbook, with minor modifications.

Configuration file

This is a text file in JSON format. It is a rather strict format. If you don't supply a configuration file, the default configuration file will be used instead.

You can use this file to give a name and description for your data, even if all of the parameters have their default values.

See below for more info about using this file.

Simulation Parameters

General

  • Minimal hypothesis size: 1
  • Minimal sublexicon size: 1
  • Mutation type: both
  • Mutation orientation: product
  • Deletion orientation: product
  • Metathesis orientation: product
  • Learning data size: all
  • Use Grammars Proper: true
  • Skip wug-testing: false
  • Nucleus feature: syllabic

MaxEnt

  • Use Gaussian priors: true
  • Default μ: 0
  • Default σ: 100000
  • Iteration count: 10000
  • Learning rate: 1
  • No positive weights: true
  • Initial weight: 0

Aligner

  • Use features: true
  • Insert/delete penalty: 1.5
  • Substitution penalty: 2
  • Metathesis penalty: 1.5
  • Tolerance: 0

Troubleshooting

The "console"?

The details vary from browser to browser, so use Google to find specific instructions for your browser. For example, if you are using Firefox, Google "firefox how to open the web console".

Opening the web console/Javascript console is highly recommended, e.g. in order to get potentially useful information about any errors, and to get more information about the learner's progress.

"Warning: unresponsive script"

Ask your browser to allow the script to continue. Some simulations can run for many hours.

The browser crashed.

Yes, this happens sometimes. Usually not because there is too much data, but because there are too many patterns in it (=too many hypotheses). Increase the minimal hypothesis size.

"Error: your browser is not supported"

We try to support any browser that has support for promises. In particular, we often do our testing with Chrome and Firefox, so those are your best bets.