WordFreak User's Guide
From BioIE Wiki
Main Page : General Annotation Guidelines : WordFreak User's Guide
This page describes the mechanics of using WordFreak for the various stages of text annotation. What to do is detailed in the guidelines for annotation in general, pretagging, and entity annotation in the individual domains.
Contents |
Introduction
Our tool for named entity annotation is WordFreak, originally developed by Tom Morton and since then maintained by Eric Pancoast, Seth Kulick, and Shawn Medero. WordFreak requires the Java Runtime Environment appropriate to your machine and operating system. The different domains that we annotate require somewhat different builds of WordFreak; each domain's guidelines include a link to the appropriate build.
Before you begin
Make a directory specifically for your annotation. When you take a file to work on, you will check it out (procedures have changed over the course of the project; you'll be told what's in current use) and put a copy in your annotation directory.
The window says "Enter your username and launch WordFreak." All your annotations will be identified with this username, so use some form of your real name, like Mandel or mamandel or MarkM -- not something generic like annotator or test, and not a nickname like dairysnake473 -- and use the same username each time. Then click "Launch".
Starting annotation
WordFreak opens with one tab, "Untitled Project"; you don't need to create a WordFreak project. Select Text from the Viewer menu, either at the bottom of the project screen or in the menu bar at the top. The View tab won't show any text until you select an Annotation (below).
Select a file to annotate with the Add button (icon is a document with an orange plus sign) in the WordFreak toolbar or the one on the side of the project tab. It will open a Browse dialog window.
- If you are starting on a new source file that hasn't been pretagged or annotated, choose Files of Type "Text Files (*.txt, *.sgm)"; this will work even though we use different extensions for the source files. WordFreak will ask you if you want to create an annotation file, answer Yes.
- If you are annotating a file that has already been pretagged and possibly annotated, you also have the option of choosing WordFreak files (*.ann).
Click Load. A green checkmark will appear on the icon of the annotation file in the project view.
NOTE: WordFreak allows you to have multiple files loaded at the same time, but this can lead to problems. After annotating and saving a file, you can click Remove in the project tab to clear the project before adding another file.
Pre-tagging
(WordFreak uses the name "tagging" to refer to work done automatically by programs that it calls, and "annotation" to refer to work done by a human annotator. That distinction isn't always necessary or made in other contexts.)
Tag the text for paragraphs, (sentences,) and tokens, in that order. If this has not yet been automated for the domain you are working in, you will have to do it semi-manually, telling WordFreak to use its taggers for these types of tagging. In the Human Diseases and Office Letters domains we are pretagging only paragraphs and tokens, not sentences.
First, select the file in the project view if it isn't already selected.
Paragraph tagging
Paragraph tagging isn't a complex task, but most of the operation is the same for all types of tagging, so I'll go into considerable detail here. (NOTE: WordFreak will only run a tagger if a matching Annotation is selected. Some builds have taggers for Bio Paragraph and Bio Sentence without an Annotation that WordFreak recognizes as matching, so if the Tag button is grayed out after you have selected a tagger and an annotation, you may need to select just plain Paragraph, etc.)
- Under "Tagger" select "Paragraph".
- Do the same under "Annotation". A second window will appear, labeled "Chooser". At the top it will have two rows of buttons with simple icons, and then a single wide button labeled "paragraph". (As you switch between WF and other applications, you may find that the main WF window is on top but the Chooser window is hidden behind other application windows. You can bring it forward with Window | Bring all to front.)
- Click the Tag button in the toolbar; its icon is a document with an orange lightning bolt. The tagging should take place too fast to notice, but a dialogue window showing the tagging progress may appear briefly.
- Switch to the Text tab. A paragraph of the text should be highlighted in light gray. .
- At the top of the Chooser window there are two rows of four buttons that you can use to correct the tagging. Since you are annotating paragraphs at the moment, these buttons refer to paragraphs:
| icon | function | tooltip |
| first row | ||
| < | previous tagged entity | left |
| > | next tagged entity | right |
| + | tag selection as entity | add |
| – | untag selected entity | remove |
| second row | ||
| <=| | extend beginning of selection | grow left |
| >=| | contract beginning of selection | shrink left |
| |=< | contract end of selection leftwards | grow right |
| |=> | extend end of selection rightwards | shrink right |
Use the > and < buttons to show each tagged paragraph in turn. There should be no problem with the paragraph tagging; it's a pretty straightforward task for the tagging program (except for list paragraphs). The text may include some XML labels in angle brackets, like "<ABSTRACT>", and the highlighting may not include those; that's all right. The highlighting may or may not also include the blank line between paragraphs, and that's all right too.
(When you have more than one file loaded, if you're at the beginning or end of one of them, the Chooser > and < buttons will move you to the previous or next file. You can also move between them directly with Annotation | Go To .)
What to do if the tagging is wrong? Suppose two paragraphs are highlighted together as a single paragraph. The easiest way to fix this is in two steps:
- Remove Tag: With the mistagged section highlighted, click the – button in the Chooser. The highlighting will disappear.
- Add Tags:
- Drag the cursor over the first of the mistakenly combined paragraphs. Be sure to get it all, including the period and any other punctuation at the end. WordFreak will not let you extend a selection into a part of the text that already is tagged. (This is also true for sentences and tokens, but not named entities; we'll discuss that in its place). Again, it's OK if you catch an extra blank line.
- If you have trouble getting the selection to work at the beginning or end of a paragraph, start dragging from the second or third character in, and after you've selected most of the paragraph you can use the second row of buttons (shrink and grow) to adjust the ends of the selection. You may also find it helpful to increase the font size with the Font menu.
- When the paragraph is selected, click the "paragraph" button in the Chooser. (The + button would also work here, but the situation is more complicated with other types of tagging, so it's best to make a habit of using the labeled button.)
- If you have trouble getting the selection to work at the beginning or end of a paragraph, start dragging from the second or third character in, and after you've selected most of the paragraph you can use the second row of buttons (shrink and grow) to adjust the ends of the selection. You may also find it helpful to increase the font size with the Font menu.
- Do the same for the second paragraph.
- Drag the cursor over the first of the mistakenly combined paragraphs. Be sure to get it all, including the period and any other punctuation at the end. WordFreak will not let you extend a selection into a part of the text that already is tagged. (This is also true for sentences and tokens, but not named entities; we'll discuss that in its place). Again, it's OK if you catch an extra blank line.
Check your work by clicking < and > to be sure that the highlighting is correct. If it's off by just a little, you can use shrink and grow, and, as always, you can ignore the space between paragraphs. When the paragraphs are correct, save your work, return to the project view, and go on to sentences.
Note on clicking vs. dragging
You're probably used to applications in which a mouse click in text sets an insertion cursor so you can start typing or editing at that point. But in WordFreak you can't type or edit, so there is no insertion cursor. Instead, a mouse click selects the nearest tagged entity of any of the types currently shown in the Chooser window. In order to select any text in WordFreak you have to drag the mouse at least a little bit.
Sentence tagging
- Set "Tagger" to "Open Sentence" if it is available or "Sentence" otherwise, and "Annotation" to "Sentence". The Chooser will now show two tag buttons, "sentence" and "section". "Sentence" here is reserved for real biomedical text, while "Section" is meant for parts of the text like authors' names, PubMed classification information, and other ancillary material. (This distinction is important in annotating PubMed abstracts; we don't bother with it in office letters.)
- Click the Tag button in the main window.
- Switch to the text view. The first tagged sentence or section will be highlighted. These two tags are exclusive: A piece of the text can be tagged as a section or a sentence, but not both.
- Check the sentence tagging the same way as you checked the paragraph tagging.
- Save your work.
Token tagging
Roughly speaking, a token is a single word, number, or punctuation mark.
- Use the "PTB Token" tagger if it is available in your build of WordFreak (PTB = Penn TreeBank), and set "Annotation" to "Token".
- Click Tag.
- You may want to look at the tokenization of one or two files to get a feel for it, but you don't want to check it through. For one thing, it's pretty reliable, and for another it would take too long to check.
- Save your work.
And that finishes the pre-tagging.
Biomedical named entity annotation
Everything you've done so far on this file has been to prepare it for the named entity annotation. Your work here will provide the training material for automatic taggers for subsequent annotation and research.
The guidelines for named entity annotation are organized by domain, as listed under Entity on the main page of the wiki. The mechanics of entity annotation are the same as for the other kinds, except that in certain situations one entity-tagged string of text can be part of another or have two different tags. In the latter case, you must be careful to add the second tag instead of replacing the first one with it:
- Move the highlight off the string by
- using the Chooser < or > button (or the arrow keys while the cursor is on the text window) to switch it to a previous or subsequent tagged string, or
- clicking the mouse (without dragging it) on another tagged string, selecting it, or
- dragging the mouse across another part of the text
- Now drag the mouse across the string you want to double-tag. This will select the token(s) but not the tag(s) already there. (See note on clicking vs. dragging.) Check the status line to be sure.
- Apply the tag.
- Check by using the Chooser < and > buttons (or the arrow keys while the cursor is on the text window). The highlight in the text window should stay on the same string, while the status line and the Chooser window should change to show the different tags.
Main Page : General Annotation Guidelines : WordFreak User's Guide
