Email Contact: bioie@ldc.upenn.edu
This material is based upon work supported by the National Science Foundation under Grant No.: EIA-0205448
Home
      -Overview
      -About Us/Credits
Resources
      -Data
      -Publications
      -Works of Interest
Documentation
      -User Guide
      -Guided Tour
      -Description of Data
      -Software/Tools
      -Docs for Annotators
Software/Tools
      -WordFreak
      -LAW Workflow System
      -Annotation Database
      -Auto Text Processing
      -TreeEditor
      -Taggers
Archive Releases
 
Printer Friendly Version
 
Mining the Bibliome
WordFreak
 
WordFreak was developed by Tom Morton and was maintained by programmer-analysts Jeremy Lacivita and Eric Pancoast. Seth Kulick currently assists with any WordFreak issues. For more infomation on using WordFreak, refer to the WordFreak User's Guide. In any case, get Java first.

Contents

  1. Setup and Startup
  2. Getting WordFreak
  3. Shortcuts
  4. WordFreak How-to
  5. Tagging and Annotating
  6. Tips




Setup and Startup

Files

Decide where you want to keep the files you're annotating. If you're running WordFreak on your own machine, two reasonable options are in the same directory as WordFreak itself, or in a subdirectory of that directory. If you're running WordFreak on a machine in the IRCS suite, you'll need to have an account of your own to save the files on; the local files on these machines are considered temporary and may be wiped at night. When you take a file to work on, check it out and put a copy in your files directory.

Java

WordFreak requires the Java Runtime Environment appropriate to your machine and operating system.

[2004-05-18] Go to www.java.com and click the "Free Download" link in the upper right corner.

For Macintosh OS X (requires MacOS X 10.2.3 or later):

  • Java 1.4.2 is most easily acquired through the "Software Update" control panel in "System Preferences" on Mac OS X as long as you have a relatively fast internet connection. If you don't have a fast connection, I would go to apple.com (see next bullet point) to download the latest Java distribution from a fast connection, then get it to your machine by CD or zip disk. You may also need to download the latest OS update: http://www.info.apple.com/support/downloads.html
    -- All of these updates are no cost to users of Mac OS 10.2 and higher. If you need to update to 10.2 let me (Mark Mandel) know.
  • Also available at http://www.apple.com/downloads/macosx/apple/java.html.

Getting WordFreak

Download WordFreak from here. The window says "Enter your username and launch WordFreak." All your annotations will be identified with this username, so use some form of your real name, like Mandel or mamandel or MarkM, not something arbitrary or generic like annotator or test or foobar. Click "Launch", and Wordfreak will install itself and start up. You may get a warning that your system doesn't know if this program is safe to install and you are "strongly advised against it"; ignore that warning and proceed. Wait for WordFreak to finish bringing up the main window. It may flash once or twice while doing so.

Shortcuts

The second time you run the same version of WordFreak from the Web (at least on Windows machines) Java WebStart will ask you if you want to install shortcuts. Say yes. After that, you can start WordFreak directly from the shortcut instead of going to the web site. If you are connected to the Internet when you start WordFreak, it will automatically check the web site for a new version, and download that if it finds one.

The shortcuts are set up with the name that you use when you launch WordFreak from the web site. That means that all the annotations you make or change in a WordFreak session launched from a shortcut will be labeled with that name. Publicly accessible computers, such as the ones in the IRCS office suite, may already have shortcuts on their Windows desktops. Do not use anyone else's shortcut.

(So how can you tell if a shortcut is someone else's? At the bottom of the main WordFreak window is a dark bar labeled "Filters" with an arrowhead (a triangle) at the right-hand end, pointing right. Click on the arrowhead and another section will appear below the dark bar. In the right-hand portion of this window is a heading "Include these Annotators". If you don't see your own ID in that list, this isn't your shortcut.) [2004-07-15]

In case of problems, contact Seth Kulick, who currently helps support WF, or the BioIE manager.

WordFreak How-to

(The name: You'll see WordFreak, Wordfreak, and wordfreak. The designer doesn't seem worried by it, so I won't be either. A lot of the time I just save my fingers and type WF.)

Start up WordFreak

If you already have WF, start it up. If you're connected to the Web, WF will access the appropriate URL and see if there's a newer version, and if there is, it will update itself and start up. Once WF has started you can safely disconnect from the Web and run offline.

If you don't have WF on your machine, get it from the appropriate URL for your group ("Getting WordFreak").

Add a File

  • Click an Add button (icon is a document with an orange plus sign), either the one in the WordFreak toolbar or the one on the side (in the project view)
    or
  • Project | Add
WF will open a browser window. Navigate if necessary to the directory where the file is; or you can open a Windows Explorer window on that directory, copy the directory pathname from the Address field, and paste it into the WF browser's address field.

Specify the type of file you are looking for. If you are starting a fresh file, choose either "Text files" (if the file's extension is .txt, .src, or .sgm) or "All files". Select the file you want and click OPEN. The file will appear in the project view. WordFreak will ask if you want to create an annotation file; answer Yes.

WF never modifies the text file; it saves its annotations in an annotation file, whose name is the name of the text file but with .ann added at the end. To work on a file with existing annotations you can either open the text file or open the .ann file; WF will look for the corresponding file name and open them both. They must be in the same directory.

Load the File

  • Select it in the project view and click the Load button
    or
  • Project | Load

A green checkmark will appear on the icon of the annotation file in the project view.

If you have more than one file loaded, switch to this one with

  • Annotation | Go To

Tagging and Annotating

WordFreak uses the name "tagging" to refer to work done automatically by programs that it calls, and "annotation" to refer to work done by a human annotator. That distinction isn't always necessary or made in other contexts.

About Pre-tagging

Before annotating the text, you must tag it for paragraphs, sentences, and tokens, in that order. We intend to automate this task, but for now you will have to do it semi-manually, telling WordFreak to use the tagger plug-ins for these types of tagging.

First, select the file in the project view if it isn't already selected.

Paragraphs

Paragraphs aren't complex, but most of the operation is the same for all three types of tagging, so I'll go into considerable detail here.

  • Under "Annotation" select "Paragraph". Use Annotation  |  Set Annotation  |  Paragraph  , not the "Tagger" menu below the main pane. A second window will appear, labeled "Chooser". At the top it will have two rows of buttons with simple icons, and then a single wide button labeled "paragraph".
  • Do the same under "Tagger".

    (NOTE: As you switch between WF and other applications, you may find that the main WF window is on top but the Chooser window is hidden behind other application windows. You can bring it forward with Window | Bring all to front.)

  • Click the Tag button. It's below the word "Tagger" in the top menu; its icon is a document with an orange lightning bolt. The tagging should take place too fast to notice, but a dialogue window showing the tagging progress may appear briefly. You'll know that the tagger has done its job by seeing the "paragraph" button in the Chooser window become highlighted in purple.
  • Switch to the Text view (click the tab at the top of the main pane). A paragraph of the text should be highlighted in purple.
  • At the top of the Chooser window there are two rows of four buttons that you can use to correct the tagging. Since you are annotating paragraphs at the moment, these buttons refer to paragraphs:

    icon function tooltip
    first row
    < previous tagged entity left
    > next tagged entity right
    + tag selection as entity add
    untag selected entity remove
    second row
    <=| extend beginning of selection grow left
    >=| contract beginning of selection shrink left
    |=< contract end of selection leftwards grow right
    |=> extend end of selection rightwards shrink right

Use the > and < buttons to show each tagged paragraph in turn. (You can also do this with the arrow keys, when the text window is highlighted: left or up for the previous tag, right or down for the next. [2003-07-23]) There should be no problem with the paragraph tagging; it's a pretty straightforward task for the tagging program. The text may include some XML labels in angle brackets, like "<ABSTRACT>", and the highlighting may not include those; that's all right. The highlighting may or may not also include the blank line between paragraphs, and that's all right too.

(When you have more than one file loaded, if you're at the beginning or end of one of them, the Chooser > and < buttons will move you to the previous or next file. You can also move between them directly with Annotation  |  Go To .)

Annotating

What to do if the tagging is wrong? Then you have to fix it. The process for that is the same as for the manual annotation that constitutes your main job, so I'll describe the mechanics here.

Suppose two paragraphs are highlighted together as a single paragraph. The easiest way to fix this is in two steps: remove and add. (I'm talking about removing and adding tags in the Chooser, not removing and adding files in the main WordFreak window!)

Remove Tag

Click (not drag) in the Text view. (In entity annotation, there may sometimes be tags within tags. Move between them with the Chooser < and > or the keyboard arrow keys.) With the mistagged section highlighted, click anywhere in the paragraph you want to remove. It will highlight in purple. Click the button in the Chooser. The highlighting will disappear.

Add Tags

This is going to be a long explanation because I am folding a lot of information about selecting and annotating into it. Here goes:

  1. Selecting text
    1. Drag the cursor over the first of the mistakenly combined paragraphs. (See Clicking vs. dragging below.) Be sure to get it all, including the period at the end. You may notice that WordFreak will not let you extend a paragraph selection into a part of the text that already is tagged. (This is also true for sentences and tokens, but not named entities; we'll discuss that in its place). Again, it's OK if you catch an extra blank line.
    2. If you have trouble getting the selection to work at the beginning of a paragraph, try dragging from left to right instead of right to left. Or if you're just missing a few characters at the beginning or end you can use the second row of buttons (shrink and grow) to adjust the ends of the selection.
    3. [2003-07-23] The status line
      Look at the bottom of the text window. You'll see
      1. the name of the .ann file
      2. the name of the annotator who assigned the currently highlighted tag; here that will be the username with which you launched WordFreak, but for automatically assigned tags it will be tagger
      3. the type of tag -- here, paragraph -- in parentheses
      4. the starting and ending byte offsets of the tagged text, in the form number..number
      5. a measure of confidence in the correctness of the tag, represented as
        • a square, which may be partially or completely filled with color, indicating a measure of confidence
        • a number describing how much of the square is colored in; this will be 1 for human annotations and some smaller number for automatic annotations, or 0 or 1 for some taggers that don't assign confidence measures
      6. how far along in the text the current tag is, represented as
        • two numbers separated by a slash, x/y, meaning that this is tag number x out of y tags in this text. (These refer to the kind of annotation currently selected and shown in the Chooser window. Other types of tag aren't counted here.)
        • a horizontal bar partially or completely colored in from left to right
      This display is more reliable than the highlighting in the Chooser window to tell you about the current selection. It may not make much difference for paragraphs, but it does for more complicated types of tagging.
  2. When the paragraph is selected, tag it as a paragraph. You can either
    1. click the "paragraph" button in the Chooser, or
    2. [2003-07-23] right-click the selected text. A "label>" field will appear next to the mouse pointer. Move the pointer onto it, and a selection menu will pop up. This menu has only one item on it, "paragraph", but it's more useful with other kinds of tagging. Left-click on "paragraph" to tag the selection.
    3. (The Chooser's + button would also work here, but the situation is more complicated with other types of tagging, so it's best to avoid it.)
  3. Do the same for the second paragraph.

Check your work by clicking < and > to be sure that the highlighting is correct. If it's off by just a little, you can use shrink and grow; and, as always, you can ignore the space between paragraphs. When the paragraphs are correct, return to the project view and go on to sentences.

Sentence Tagging

  1. Set "Annotation" to "Sentence". The Chooser will now show two tag buttons, "sentence" and "section". Set "Tagger" to "Bio Sentence". (You may see more than one sentence tagger; be sure to choose "Bio Sentence".)
  2. Click the Tag button in the main window.
  3. Switch to the text view. The first tagged sentence or section will be highlighted. Note: These two tags are exclusive: A piece of the text can be tagged as a section or a sentence, but not both. "Section" here is meant for parts of the text like titles and other header material; if the tagger doesn't see a period or other sentence-like punctuation at the end of a piece of text, it calls it a section.
  4. Check the sentence tagging the same way as you checked the paragraph tagging.

Token Tagging

  1. Set "Annotation" to "Token". Set "Tagger" to "Bio Token". (You may see more than one token tagger; be sure to choose "Bio Token".)
  2. Click Tag.
  3. You may want to look at the tokenization of one or two files to get a feel for it, but you don't want to check it through. For one thing, it's pretty reliable, and for another it would take too long to check. -- "Tokenization" means the way the text is divided into tokens. "Tokenizer" is another term for "token tagger".

POS Tagging

  1. [2004-02-24] Set "Annotation" to "POS". Set "Tagger" to "Bio POS". (You may see more than one POS tagger; be sure to choose "Bio POS".)
  2. Click Tag.

And that finishes the pre-tagging. Now you can get to the meat of your work.

Biomedical Annotation

We don't have taggers yet for the entity categories, or good POS taggers for biomedical text, which is why we need your work.

  • Set "Annotation" to the appropriate choice for your project -- Oncology or CYP450 (POS annotators have already done so) -- and go to work as you've learned.

Tips

Adding, Loading, and Saving Files

[2003-08-19] WordFreak has shown various kinds of instability if you have more than one file at a time loaded into it, or even added. In fact, some serious bugs seem to show up even if you close and remove a file before adding and loading a new one. So...

  1. Save your file after each type of correction: tag, annotate, save; tag, annotate, save. See Saving your work, below.
  2. After finishing your work on a file and saving it, remove it from the Project view with the "Remove" button on the right of the pane, before adding and loading another file.
  3. Even following these procedures, you may find that to avoid tagging problems you have to close WordFreak and start it up again after working on two files. That can be a real time-eater, especially if WordFreak refuses to start up if not connected to the Internet; but it's still better than having to do all the tagging by hand.

Clicking vs. Dragging

[2003-07-23] You're probably used to applications, like word processors, in which a mouse click in text sets an insertion cursor so you can start typing or editing at that point. But in WordFreak you can't type or edit, so there is no insertion cursor.

Instead, a mouse click selects the nearest tagged entity (of any of the types currently shown in the Chooser window). In order to select a token or a chunk of text in WordFreak you have to drag the mouse at least a little bit; even one pixel will do. This extreme motion sensitivity can make it hard to select a tagged string. Check the status line to be sure you've actually got what you wanted.

[2003-12-02] Eric Pancoast writes: "It is possible to overlap tags in annotation tasks like entity tagging. If an annotator selects by clicking and dragging (which selects tokens rather than an annotation) it will allow them to tag something twice. So the annotators should know that if they want to switch an annotation from one type to another, they should click instead of click-drag, use the arrow-keys, or use the next-previous annotation buttons to select the annotation."

Setting Annotation and Tagging

[2004-01-11] For each level of automatic tagging -- paragraph, sentence, token, and POS -- set the WF settings in the following order:

  1. Choose the tagger.
  2. Choose the annotation type.
  3. Invoke the tagger.

Don't Delete the First Sentence Tag in a Paragraph

[2003-10-02] There is a known bug in WF's handling of the first sentence of a paragraph. If you delete the sentence tag on the first sentence of a paragraph, WF moves the left edge of the paragraph tag, apparently to where the first remaining sentence tag begins. Then, when you tag the first sentence correctly (or try to), either the tagging doesn't take, or it seems to take, but subsequent tagging is messed up or impossible.

This bug is on Eric's list, but it isn't fixed yet. The workaround is a pain but is doable: Never delete the first sentence tag in a paragraph. Instead, "shrink-right" or "grow-right"* that tag in the Chooser, one boring click at a time (since Java doesn't understand holding down the mouse button), till it ends at the right place. Then adjust other sentence tags as needed.

* (Of course, if the sentence tag ends somewhere in the middle of the real first sentence, you must first delete the tag on the SECOND sentence. That does not cause problems.)

Entering Comments

[2003-07-23] For some purposes you may need or want to enter a comment in the comment field at the bottom of the chooser window. We've had some problems keeping comments in the right place once they're entered, but this seems to work:
  1. Be sure the reference you want is tagged and selected.
  2. Click in the comment field, and be sure not to move the mouse pointer back into the main window.
  3. Type your comment.
  4. Use the "<" or ">" button at the top of the chooser window (or the arrow keys when the text window is selected) to move the highlight to the next or previous tag.

Saving Your Work

[2003-08-04] In annotation as in all other computer work, it's wise to "Save early and often".

  1. WordFreak's File menu begins with several "Project" operations, including "Save Project" and "Save Project As...". These will not save your work! The WordFreak concept of project includes the names of files that you are working on and what viewer, tagger, and annotation to start up with; saving the project just means saving that list of information. So skip the project commands. Instead...
  2. To save your work, choose "Save >". That will open a submenu that begins with "Save All". This option is known to be buggy. Instead, choose the name of the file from that submenu and save that. If you've got several files open, save them individually.

Problems with Saving Files

[2003-04-10] If WF says it can't save your file:

  1. exit WF
  2. delete the .ann file
  3. rename the .save-temp file by removing the ".save-temp" from the end of the name. That's your .ann file.

Double Tagging

[2003-07-23] In some special situations you may have to tag the same string of text with two different tags. Be careful to add the second tag instead of replacing the first one with it:

  1. Move the highlight off the string by
    1. using the Chooser "<" or ">" button (or the arrow keys while the cursor is on the text window) to switch it to a previous or subsequent tag, or
    2. clicking the mouse (without dragging it) on another tag, selecting it, or
    3. dragging the mouse across another part of the text
  2. Now drag the mouse across the string you want to double-tag. This will select the token(s) but not the tag(s) already there. (See Clicking vs. Dragging.) Check the status line to be sure.
  3. Apply the tag.
  4. Check by using the Chooser "<" and ">" buttons (or the arrow keys while the cursor is on the text window). The highlight in the text window should stay on the same string, while the status line and the Chooser window should change to show the different tags.