This database contains the local and global cooccurences for 66,000 English words. These values were obtained by scanning over 300 million words of English text using a series of mathematical formulas that remove the influences of word frequency. The local co-occurences can be likened to conditional probabilities that one word will apear with others (cat occurs near whisker) whereas the global tells us that one word is found in the same context as another (cat and kitten have similar local co occurence vectors). The values here are simply the counts for the neighbours of the target items (either local or global). The frequency counts here and the orthographic neighbourhood counts were derived from the same corpus. A paper describing the corpus and the methods for deriving these values will soon be posted here as will more extensive neighbourhood information. Subsantial contributions to this work came from Jon Casey, Kevin Durda, Rick Caron and Chris Westbury. Jon and Kevin are math graduate students at the University of Windsor, Rick Caron is a Professor of Math at the University of Windsor and Chris Westbury is a professor of Psychology at the University of Alberta.
If you would like more information regarding these values please contact Lori Buchanan in the Psychology Department at the University of Windsor.
The citation for this website is:
Durda, K., Buchanan, L. (2006). WordMine2 [Online] Available: http://web2.uwindsor.ca/wordmine
CATSCAN and Lori Buchanan's lab are funded by SSHRC, NSERC, CFI, OIT, CRC, PREA, SHARCNET and the University of Windsor