WordFreak how-to

(draft...)

Annotators' home


(The name: You'll see WordFreak, Wordfreak, and wordfreak. The designer doesn't seem worried by it, so I won't be either. A lot of the time I just save my fingers and type WF.)

Setup and startup

Files

Decide where you want to keep the files you're annotating. If you're running WordFreak on your own machine, two reasonable options are in the same directory as WordFreak itself, or in a subdirectory of that directory. If you're running WordFreak on a machine in the IRCS suite, you'll need to have an account of your own to save the files on; the local files on these machines are considered temporary and may be wiped at night. When you take a file to work on, check it out and put a copy in your files directory.

Start up WordFreak

If you already have WF, start it up. If you're connected to the Web, WF will access the appropriate URL and see if there's a newer version, and if there is, it will update itself and start up. Once WF has started you can safely disconnect from the Web and run offline.

If you don't have WF on your machine, get it from the appropriate URL for your group ("Getting WordFreak").

Add a file

WF will open a browser window. Navigate if necessary to the directory where the file is; or you can open a Windows Explorer window on that directory, copy the directory pathname from the Address field, and paste it into the WF browser's address field.

Specify the type of file you are looking for. If you are starting a fresh file, choose either "Text files" (if the file's extension is .txt or .sgm) or "All files". Select the file you want and click OPEN. The file will appear in the project view. WordFreak will ask if you want to create an annotation file; answer Yes.

WF never modifies the text file; it saves its annotations in an annotation file, whose name is the name of the text file but with .ann added at the end. To work on a file with existing annotations you can either open the text file or open the .ann file; WF will look for the corresponding file name and open them both. They must be in the same directory.

Load the file

A green checkmark will appear on the icon of the annotation file in the project view.

If you have more than one file loaded, switch to this one with

Tagging and annotating

WordFreak uses the name "tagging" to refer to work done automatically by programs that it calls, and "annotation" to refer to work done by a human annotator. That distinction isn't always necessary or made in other contexts.

About pre-tagging

Before annotating the text, you must tag it for paragraphs, sentences, and tokens, in that order. We intend to automate this task, but for now you will have to do it semi-manually, telling WordFreak to use the tagger plug-ins for these types of tagging.

First, select the file in the project view if it isn't already selected.

Paragraphs

Paragraphs aren't complex, but most of the operation is the same for all three types of tagging, so I'll go into considerable detail here.

Use the > and < buttons to show each tagged paragraph in turn. (You can also do this with the arrow keys, when the text window is highlighted: left or up for the previous tag, right or down for the next. [2003-07-23]) There should be no problem with the paragraph tagging; it's a pretty straightforward task for the tagging program. The text may include some XML labels in angle brackets, like "<ABSTRACT>", and the highlighting may not include those; that's all right. The highlighting may or may not also include the blank line between paragraphs, and that's all right too.

(When you have more than one file loaded, if you're at the beginning or end of one of them, the Chooser > and < buttons will move you to the previous or next file. You can also move between them directly with Annotation  |  Go To .)

Annotating

What to do if the tagging is wrong? Then you have to fix it. The process for that is the same as for the manual annotation that constitutes your main job, so I'll describe the mechanics here.

Suppose two paragraphs are highlighted together as a single paragraph. The easiest way to fix this is in two steps: remove and add. (I'm talking about removing and adding tags in the Chooser, not removing and adding files in the main WordFreak window!)

Remove Tag:

Click (not drag) in the Text view. (In entity annotation, there may sometimes be tags within tags. Move between them with the Chooser < and > or the keyboard arrow keys.) With the mistagged section highlighted, click anywhere in the paragraph you want to remove. It will highlight in purple. Click the button in the Chooser. The highlighting will disappear.

Add Tags:

This is going to be a long explanation because I am folding a lot of information about selecting and annotating into it. Here goes:

  1. Selecting text
    1. Drag the cursor over the first of the mistakenly combined paragraphs. (See Clicking vs. dragging below.) Be sure to get it all, including the period at the end. You may notice that WordFreak will not let you extend a paragraph selection into a part of the text that already is tagged. (This is also true for sentences and tokens, but not named entities; we'll discuss that in its place). Again, it's OK if you catch an extra blank line.
    2. If you have trouble getting the selection to work at the beginning of a paragraph, try dragging from left to right instead of right to left. Or if you're just missing a few characters at the beginning or end you can use the second row of buttons (shrink and grow) to adjust the ends of the selection.
    3. [2003-07-23] The status line
      Look at the bottom of the text window. You'll see
      1. the name of the .ann file
      2. the name of the annotator who assigned the currently highlighted tag; here that will be the username with which you launched WordFreak, but for automatically assigned tags it will be tagger
      3. the type of tag -- here, paragraph -- in parentheses
      4. the starting and ending byte offsets of the tagged text, in the form number..number
      5. a measure of confidence in the correctness of the tag, represented as
        • a square, which may be partially or completely filled with color, indicating a measure of confidence
        • a number describing how much of the square is colored in; this will be 1 for human annotations and some smaller number for automatic annotations, or 0 or 1 for some taggers that don't assign confidence measures
      6. how far along in the text the current tag is, represented as
        • two numbers separated by a slash, x/y, meaning that this is tag number x out of y tags in this text. (These refer to the kind of annotation currently selected and shown in the Chooser window. Other types of tag aren't counted here.)
        • a horizontal bar partially or completely colored in from left to right
      This display is more reliable than the highlighting in the Chooser window to tell you about the current selection. It may not make much difference for paragraphs, but it does for more complicated types of tagging.
  2. When the paragraph is selected, tag it as a paragraph. You can either
    1. click the "paragraph" button in the Chooser, or
    2. [2003-07-23] right-click the selected text. A "label>" field will appear next to the mouse pointer. Move the pointer onto it, and a selection menu will pop up. This menu has only one item on it, "paragraph", but it's more useful with other kinds of tagging. Left-click on "paragraph" to tag the selection.
    3. (The Chooser's + button would also work here, but the situation is more complicated with other types of tagging, so it's best to avoid it.)
  3. Do the same for the second paragraph.

Check your work by clicking < and > to be sure that the highlighting is correct. If it's off by just a little, you can use shrink and grow; and, as always, you can ignore the space between paragraphs. When the paragraphs are correct, return to the project view and go on to sentences.

Sentence Tagging

  1. Set "Annotation" to "Sentence". The Chooser will now show two tag buttons, "sentence" and "section". Set "Tagger" to "Bio Sentence". (You may see more than one sentence tagger; be sure to choose "Bio Sentence".)
  2. Click the Tag button in the main window.
  3. Switch to the text view. The first tagged sentence or section will be highlighted. Note: These two tags are exclusive: A piece of the text can be tagged as a section or a sentence, but not both. "Section" here is meant for parts of the text like titles and other header material; if the tagger doesn't see a period or other sentence-like punctuation at the end of a piece of text, it calls it a section.
  4. Check the sentence tagging the same way as you checked the paragraph tagging.

Token Tagging

  1. Set "Annotation" to "Token". Set "Tagger" to "Bio Token". (You may see more than one token tagger; be sure to choose "Bio Token".)
  2. Click Tag.
  3. You may want to look at the tokenization of one or two files to get a feel for it, but you don't want to check it through. For one thing, it's pretty reliable, and for another it would take too long to check. -- "Tokenization" means the way the text is divided into tokens. "Tokenizer" is another term for "token tagger".

POS Tagging

  1. [2004-02-24] Set "Annotation" to "POS". Set "Tagger" to "Bio POS". (You may see more than one POS tagger; be sure to choose "Bio POS".)
  2. Click Tag.

And that finishes the pre-tagging. Now you can get to the meat of your work.


Biomedical annotation

We don't have taggers yet for the entity categories, or good POS taggers for biomedical text, which is why we need your work.

...


Tips

Adding, loading, and saving files

[2003-08-19] WordFreak has shown various kinds of instability if you have more than one file at a time loaded into it, or even added. In fact, some serious bugs seem to show up even if you close and remove a file before adding and loading a new one. So...

  1. Save your file after each type of correction: tag, annotate, save; tag, annotate, save. See Saving your work, below.
  2. After finishing your work on a file and saving it, remove it from the Project view with the "Remove" button on the right of the pane, before adding and loading another file.
  3. Even following these procedures, you may find that to avoid tagging problems you have to close WordFreak and start it up again after working on two files. That can be a real time-eater, especially if WordFreak refuses to start up if not connected to the Internet; but it's still better than having to do all the tagging by hand.

Clicking vs. dragging

[2003-07-23] You're probably used to applications, like word processors, in which a mouse click in text sets an insertion cursor so you can start typing or editing at that point. But in WordFreak you can't type or edit, so there is no insertion cursor.

Instead, a mouse click selects the nearest tagged entity (of any of the types currently shown in the Chooser window). In order to select a token or a chunk of text in WordFreak you have to drag the mouse at least a little bit; even one pixel will do. This extreme motion sensitivity can make it hard to select a tagged string. Check the status line to be sure you've actually got what you wanted.

[2003-12-02] Eric Pancoast writes: "It is possible to overlap tags in annotation tasks like entity tagging. If an annotator selects by clicking and dragging (which selects tokens rather than an annotation) it will allow them to tag something twice. So the annotators should know that if they want to switch an annotation from one type to another, they should click instead of click-drag, use the arrow-keys, or use the next-previous annotation buttons to select the annotation."

Setting Annotation and Tagging

[2004-01-11] For each level of automatic tagging -- paragraph, sentence, token, and POS -- set the WF settings in the following order:

  1. Choose the tagger.
  2. Choose the annotation type.
  3. Invoke the tagger.

Don't delete the first sentence tag in a paragraph

[2003-10-02] There is a known bug in WF's handling of the first sentence of a paragraph. If you delete the sentence tag on the first sentence of a paragraph, WF moves the left edge of the paragraph tag, apparently to where the first remaining sentence tag begins. Then, when you tag the first sentence correctly (or try to), either the tagging doesn't take, or it seems to take, but subsequent tagging is messed up or impossible.

This bug is on Eric's list, but it isn't fixed yet. The workaround is a pain but is doable: Never delete the first sentence tag in a paragraph. Instead, "shrink-right" or "grow-right"* that tag in the Chooser, one boring click at a time (since Java doesn't understand holding down the mouse button), till it ends at the right place. Then adjust other sentence tags as needed.

* (Of course, if the sentence tag ends somewhere in the middle of the real first sentence, you must first delete the tag on the SECOND sentence. That does not cause problems.)

Entering comments

[2003-07-23] For some purposes you may need or want to enter a comment in the comment field at the bottom of the chooser window. We've had some problems keeping comments in the right place once they're entered, but this seems to work:
  1. Be sure the reference you want is tagged and selected.
  2. Click in the comment field, and be sure not to move the mouse pointer back into the main window.
  3. Type your comment.
  4. Use the "<" or ">" button at the top of the chooser window (or the arrow keys when the text window is selected) to move the highlight to the next or previous tag.

Saving your work

[2003-08-04] In annotation as in all other computer work, it's wise to "Save early and often".

  1. WordFreak's File menu begins with several "Project" operations, including "Save Project" and "Save Project As...". These will not save your work! The WordFreak concept of project includes the names of files that you are working on and what viewer, tagger, and annotation to start up with; saving the project just means saving that list of information. So skip the project commands. Instead...
  2. To save your work, choose "Save >". That will open a submenu that begins with "Save All". This option is known to be buggy. Instead, choose the name of the file from that submenu and save that. If you've got several files open, save them individually.

Problems with saving files

[2003-04-10] If WF says it can't save your file:

  1. exit WF
  2. delete the .ann file
  3. rename the .save-temp file by removing the ".save-temp" from the end of the name. That's your .ann file.

Double tagging

[2003-07-23] In some special situations you may have to tag the same string of text with two different tags. Be careful to add the second tag instead of replacing the first one with it:

  1. Move the highlight off the string by
    1. using the Chooser "<" or ">" button (or the arrow keys while the cursor is on the text window) to switch it to a previous or subsequent tag, or
    2. clicking the mouse (without dragging it) on another tag, selecting it, or
    3. dragging the mouse across another part of the text
  2. Now drag the mouse across the string you want to double-tag. This will select the token(s) but not the tag(s) already there. (See Clicking vs. dragging.) Check the status line to be sure.
  3. Apply the tag.
  4. Check by using the Chooser "<" and ">" buttons (or the arrow keys while the cursor is on the text window). The highlight in the text window should stay on the same string, while the status line and the Chooser window should change to show the different tags.


Annotators' home

2004-02-24