Medical treebanking home

(Biomedical treebankers' base page)

Annotators' home


Where to find it

TreeEditor

Information about the annotation tool, TreeEditor: what the buttons do, what the keyboard shortcuts are, how to save, etc.

Using TreeEditor on unagi through a Windows machine. [2003-03-28]

Inserting traces and empty words [2003-04-02]

Treeprint

[2003-06-25] If you need to save or send a tree or node in email, you can use treeprint.pl, a Perl script that reads the output of TreeEditor's PRINT button and outputs it as a tree, using ASCII characters to simulate the display format of TreeEditor. For example, the first training example

  (Paragraph (S (NP-SBJ I) (VP read (NP the Treebank manual)) .))
comes out as this:
1.
Paragraph
+-- S
    +-- NP-SBJ
    |   +-- I
    +-- VP
    |   +-- read
    |   +-- NP
    |       +-- the
    |       +-- Treebank
    |       +-- manual
    +-- .
You will probably want to save the script in the directory where you do your treebanking. Your browser should open this link as text, which you can copy and paste where you want. Alternatively, in your unagi login account go to that directory and execute this command:
cp ~mamandel/ph/annotators/medtree/treeprint.pl .
Either way, you should then make it executable with the command
chmod u+x treeprint.pl
To read the directions, just type "treeprint.pl" at the command line in the directory it's installed in.

Meeting notes

Assignment from 2003-04-02

Meeting summary of 2003-04-02

Email from May 2003 that didn't get to the list but should have

Annotators' blog, June 2003 [2004-01-12]

Syntax text

Beatrice Santorini's intro to syntax text is quite a good introduction or refresher. The first four chapters are probably the most directly relevant, since beyond that it gets into more detailed theory than we use in treebank annotation on the whole, although the issues are the same. (But it can't hurt if you feel like looking at the whole thing!)

(Here's a version with frames if you want to see the top-level table of contents at the same time.)

Treebank Guidelines

A detailed description of the Penn Treebank style, with many examples, in PDF format. You can search these guidelines using the "Find" button in Acrobat (use the Acrobat "Find" button with the binoculars icon, not your web browser's "Find"!). This is the same text that is printed in your manuals.

There is also a version of the manual in the LaTeX markup language on unagi at /mnt/unagi/nldb/manual/current/manual.glommed . This format can be hard to read, so be warned, but if you're up to it, it can be useful. It begins with the following note:

This file contains the LaTeX source for the chapters of the (January 1995) Treebank parsing manual, to let annotators search the manual electronically. Some junk (esp. {verbatim}) has been removed for a modicum of legibility. The rest of the manual should be in /mnt/unagi/nldb/manual/current .

Recent decisions described in minutes (but not the manual) are also appended to the end.

For your convenience, I've put two copies of the manual on the open shelves in the IRCS suite for your use while working there. The shelves are straight ahead of you as you pass the Fishbowl, and the manuals are in the left-hand section of the shelves, about the second shelf from the top. They're labeled "Penn Treebank II Guidelines" on the spine. Please return them to the same shelf when finished with them. Thanks. (MM)

Reference Tools

Database of tags. In the "file set" field, select the domain you are interested in, either "cyp450 entity a" or "oncology entity a". In the "annotation type" field (really the label of a tag) the drop-down menu begins with the POS tags; the entity and pretagging tags are at the bottom simply because they all begin with lowercase letters, which come after capital letters and punctuation marks in the ASCII sort order. The "annotation text" field is optional; if you leave it blank you'll get all the strings in the specified file set tagged with the specified label, and if you put anything into it the search will return only the strings in the file set with the label that includes the contents of the field. [2004-05-17]

[2004-05-17]: NOT WORKING: Marty McCormick has written a concordance program that will find up to 100 occurrences of a term in Medline, displaying each hit with some context. This can help you find other uses of the word or phrase you are trying to annotate.

Training files

The training files are available as (nearly-)plain* text here.
* They have HTML paragraph markings in them.


Annotators' home

2004-05-21