WordFreak was developed by Tom Morton and was maintained by programmer-analysts Jeremy Lacivita and Eric Pancoast. Seth Kulick currently assists with any WordFreak issues. For more infomation on using WordFreak, refer to the
WordFreak User's Guide.
In any case, get Java first.
Contents
- Setup and Startup
- Getting WordFreak
- Shortcuts
- WordFreak How-to
- Tagging and Annotating
- Tips
Setup and Startup
Files
Decide where you want to keep the files you're annotating. If
you're running WordFreak on your own machine, two reasonable options
are in the same directory as WordFreak itself, or in a subdirectory of
that directory. If you're running WordFreak on a machine in the IRCS
suite, you'll need to have an account of your own to save the files
on; the local files on these machines are considered temporary and may
be wiped at night. When you take a file to work on, check it out and put a copy in your
files directory.
Java
WordFreak requires the Java Runtime Environment appropriate to your
machine and operating system.
[2004-05-18] Go to www.java.com and click the "Free Download" link
in the upper right corner.
For Macintosh OS X (requires MacOS X 10.2.3 or later):
- Java 1.4.2 is most easily acquired through the "Software Update"
control panel in "System Preferences" on Mac OS X as long as you have
a relatively fast internet connection. If you don't have a fast
connection, I would go to apple.com (see next
bullet point) to download the latest Java distribution from a fast
connection, then get it to your machine by CD or zip disk. You may
also need to download the latest OS update: http://www.info.apple.com/support/downloads.html
-- All of these updates are no cost to users of Mac OS 10.2 and
higher. If you need to update to 10.2 let me (Mark Mandel) know.
- Also available at http://www.apple.com/downloads/macosx/apple/java.html.
Getting WordFreak
Download WordFreak from
here.
The window says "Enter your username and launch WordFreak." All your
annotations will be identified with this username, so use some form of
your real name, like Mandel or mamandel or MarkM,
not something arbitrary or generic like annotator or
test or foobar. Click "Launch", and Wordfreak will
install itself and start up. You may get a warning that your system
doesn't know if this program is safe to install and you are "strongly
advised against it"; ignore that warning and proceed. Wait for WordFreak to
finish bringing up the main window. It may flash once or twice while
doing so.
Shortcuts
The second time you run the same version of WordFreak from the Web
(at least on Windows machines) Java WebStart will ask you if you want
to install shortcuts. Say yes. After that, you can start WordFreak
directly from the shortcut instead of going to the web site. If you
are connected to the Internet when you start WordFreak, it will
automatically check the web site for a new version, and download that
if it finds one.
The shortcuts are set up with the name that you use when you
launch WordFreak from the web site. That means that all the
annotations you make or change in a WordFreak session launched from a
shortcut will be labeled with that name. Publicly accessible
computers, such as the ones in the IRCS office suite, may already have
shortcuts on their Windows desktops. Do not use anyone else's
shortcut.
(So how can you tell if a shortcut is someone else's? At the
bottom of the main WordFreak window is a dark bar labeled "Filters"
with an arrowhead (a triangle) at the right-hand end, pointing
right. Click on the arrowhead and another section will appear below
the dark bar. In the right-hand portion of this window is a heading
"Include these Annotators". If you don't see your own ID in that list,
this isn't your shortcut.) [2004-07-15]
In case of problems, contact Seth Kulick,
who currently helps support WF, or the BioIE manager.
WordFreak How-to
(The name: You'll see WordFreak, Wordfreak, and
wordfreak. The designer doesn't seem worried by it, so I won't
be either. A lot of the time I just save my fingers and type
WF.)
Start up WordFreak
If you already have WF, start it up. If you're connected to the
Web, WF will access the appropriate URL and see if there's a newer
version, and if there is, it will update itself and start up. Once WF
has started you can safely disconnect from the Web and run offline.
If you don't have WF on your machine, get it from the appropriate
URL for your group ("Getting
WordFreak").
Add a File
- Click an Add button (icon is a document with an orange plus
sign), either the one in the WordFreak toolbar or the one on the side
(in the project view)
or
- Project | Add
WF will open a browser window. Navigate if necessary to the directory
where the file is; or you can open a Windows Explorer window on that
directory, copy the directory pathname from the Address field, and
paste it into the WF browser's address field.
Specify the type of file you are looking for. If you are starting a
fresh file, choose either "Text files" (if the file's extension is
.txt, .src, or .sgm) or "All files". Select the file you want and click OPEN.
The file will appear in the project view. WordFreak will ask if you
want to create an annotation file; answer Yes.
WF never modifies the text file; it saves its annotations in an
annotation file, whose name is the name of the text file but
with .ann added at the end. To work on a file with existing
annotations you can either open the text file or open the .ann file;
WF will look for the corresponding file name and open them both. They
must be in the same directory.
Load the File
- Select it in the project view and click the Load button
or
- Project | Load
A green checkmark will appear on the icon of the annotation file in
the project view.
If you have more than one file loaded, switch to this one with
Tagging and Annotating
WordFreak uses the name "tagging" to refer to work done
automatically by programs that it calls, and "annotation" to refer to
work done by a human annotator. That distinction isn't always
necessary or made in other contexts.
About Pre-tagging
Before annotating the text, you must tag it for paragraphs,
sentences, and tokens, in that order. We intend to automate
this task, but for now you will have to do it semi-manually, telling
WordFreak to use the tagger plug-ins for these types of tagging.
First, select the file in the project view if it isn't already
selected.
Paragraphs
Paragraphs aren't complex, but most of the operation is the same
for all three types of tagging, so I'll go into considerable detail
here.
Use the > and < buttons to
show each tagged paragraph in turn. (You can also do this with the
arrow keys, when the text window is highlighted: left or up for the
previous tag, right or down for the next. [2003-07-23]) There should be no problem with the
paragraph tagging; it's a pretty straightforward task for the tagging
program. The text may include some XML labels in angle brackets, like
"<ABSTRACT>", and the highlighting may not include
those; that's all right. The highlighting may or may not also include
the blank line between paragraphs, and that's all right too.
(When you have more than one file loaded, if you're at the
beginning or end of one of them, the Chooser > and
< buttons will move you to the previous or next file. You
can also move between them directly with
Annotation | Go To .)
Annotating
What to do if the tagging is wrong? Then you have to fix it. The
process for that is the same as for the manual annotation that
constitutes your main job, so I'll describe the mechanics here.
Suppose two paragraphs are highlighted together as a single
paragraph. The easiest way to fix this is in two steps: remove and
add. (I'm talking about removing and adding tags in the Chooser, not
removing and adding files in the main WordFreak window!)
Remove Tag
Click (not drag) in the Text view.
(In entity annotation, there may sometimes be tags within tags. Move between them with the Chooser < and > or the
keyboard arrow keys.) With the mistagged section highlighted, click
anywhere in the paragraph you want to remove. It will highlight in
purple. Click the
– button in the Chooser. The highlighting will disappear.
Add Tags
This is going to be a long explanation because I am folding a lot
of information about selecting and annotating into it. Here goes:
- Selecting text
- Drag the cursor over the first of the mistakenly combined
paragraphs. (See Clicking vs. dragging
below.) Be sure to get it all, including the period at the end. You
may notice that WordFreak will not let you extend a paragraph
selection into a part of the text that already is tagged. (This is
also true for sentences and tokens, but not named entities; we'll
discuss that in its place). Again, it's OK if you catch an extra blank
line.
- If you have trouble getting the selection to work at the beginning
of a paragraph, try dragging from left to right instead of right to left.
Or if you're just missing a few characters at the beginning or end you can
use the second row of buttons (shrink and grow) to adjust the ends of
the selection.
- [2003-07-23]
The status line
Look at the bottom of the text window. You'll see
- the name of the .ann file
- the name of the annotator who assigned the currently highlighted
tag; here that will be the username with which you launched WordFreak,
but for automatically assigned tags it will be tagger
- the type of tag -- here, paragraph -- in parentheses
- the starting and ending byte offsets of the tagged text, in the
form number..number
- a measure of confidence in the correctness of the tag, represented as
- a square, which may be partially or completely filled with color,
indicating a measure of confidence
- a number describing how much of the square is colored in; this
will be 1 for human annotations and some smaller number for
automatic annotations, or 0 or 1 for some taggers
that don't assign confidence measures
- how far along in the text the current tag is, represented as
- two numbers separated by a slash, x/y, meaning that
this is tag number x out of y tags in this text. (These
refer to the kind of annotation currently selected and shown in the
Chooser window. Other types of tag aren't counted here.)
- a horizontal bar partially or completely colored in from left to
right
This display is more reliable than the highlighting in the Chooser
window to tell you about the current selection. It may not make much
difference for paragraphs, but it does for more complicated types of
tagging.
-
When the paragraph is selected, tag it as a paragraph. You can either
- click the "paragraph" button in the Chooser, or
- [2003-07-23]
right-click the selected text. A "label>" field will appear next
to the mouse pointer. Move the pointer onto it, and a selection menu will pop
up. This menu has only one item on it, "paragraph", but it's more useful with
other kinds of tagging. Left-click on "paragraph" to tag the selection.
- (The Chooser's + button would also work here, but the
situation is more complicated with other types of tagging, so it's
best to avoid it.)
- Do the same for the second paragraph.
Check your work by clicking < and > to be sure
that the highlighting is correct. If it's off by just a little, you
can use shrink and grow; and, as always, you can ignore the space
between paragraphs. When the paragraphs are correct, return to the
project view and go on to sentences.
Sentence Tagging
- Set "Annotation" to "Sentence". The Chooser will now show two tag
buttons, "sentence" and "section". Set "Tagger" to "Bio
Sentence". (You may see more than one sentence tagger; be sure to
choose "Bio Sentence".)
- Click the Tag button in the main window.
- Switch to the text view. The first tagged sentence or section will
be highlighted. Note: These two tags are exclusive: A piece of the
text can be tagged as a section or a sentence, but not both. "Section"
here is meant for parts of the text like titles and other header
material; if the tagger doesn't see a period or other sentence-like
punctuation at the end of a piece of text, it calls it a section.
- Check the sentence tagging the same way as you checked the
paragraph tagging.
Token Tagging
- Set "Annotation" to "Token". Set "Tagger" to "Bio
Token". (You may see more than one token tagger; be sure to choose
"Bio Token".)
- Click Tag.
- You may want to look at the tokenization of one or two files to
get a feel for it, but you don't want to check it through. For one
thing, it's pretty reliable, and for another it would take too long to
check. -- "Tokenization" means the way the text is divided into
tokens. "Tokenizer" is another term for "token tagger".
POS Tagging
- [2004-02-24]
Set "Annotation" to "POS". Set "Tagger" to "Bio POS". (You may
see more than one POS tagger; be sure to choose "Bio POS".)
- Click Tag.
And that finishes the pre-tagging. Now you can get to the meat of
your work.
Biomedical Annotation
We don't have taggers yet for the entity categories, or good
POS taggers for biomedical text, which is why we need your work.
- Set "Annotation" to the appropriate choice for your project --
Oncology or CYP450 (POS annotators have already done so) -- and go to
work as you've learned.
Tips
Adding, Loading, and Saving Files
[2003-08-19]
WordFreak has shown various kinds of instability if you have more than
one file at a time loaded into it, or even added. In fact, some
serious bugs seem to show up even if you close and remove a file
before adding and loading a new one. So...
- Save your file after each type of correction: tag, annotate, save;
tag, annotate, save. See Saving your work, below.
- After finishing your work on a file and saving it, remove it from
the Project view with the "Remove" button on the right of the pane,
before adding and loading another file.
- Even following these procedures, you may find that to avoid
tagging problems you have to close WordFreak and start it up again
after working on two files. That can be a real time-eater, especially
if WordFreak refuses to start up if not connected to the Internet; but
it's still better than having to do all the tagging by hand.
Clicking vs. Dragging
[2003-07-23] You're probably used to
applications, like word processors, in which a mouse click in text
sets an insertion cursor so you can start typing or editing at that
point. But in WordFreak you can't type or edit, so there is no
insertion cursor.
Instead, a mouse click selects
the nearest tagged entity (of any of the types currently shown in the
Chooser window). In order to select a token or a chunk of text in
WordFreak you have to drag the mouse at least a little bit;
even one pixel will do. This extreme motion sensitivity can make it
hard to select a tagged string. Check the status
line to be sure you've actually got what you wanted.
[2003-12-02]
Eric Pancoast writes: "It is
possible to overlap tags in annotation tasks like entity tagging. If
an annotator selects by clicking and dragging (which selects tokens
rather than an annotation) it will allow them to tag something twice.
So the annotators should know that if they want to switch an annotation
from one type to another, they should click instead of click-drag, use
the arrow-keys, or use the next-previous annotation buttons to select
the annotation."
Setting Annotation and Tagging
[2004-01-11]
For each level of automatic tagging -- paragraph, sentence, token, and
POS -- set the WF settings in the following order:
- Choose the tagger.
- Choose the annotation type.
- Invoke the tagger.
Don't Delete the First Sentence Tag in a Paragraph
[2003-10-02]
There is a known bug in WF's handling of the first sentence of a
paragraph. If you delete the sentence tag on the first sentence of a
paragraph, WF moves the left edge of the paragraph tag, apparently to
where the first remaining sentence tag begins. Then, when you tag the
first sentence correctly (or try to), either the tagging doesn't take,
or it seems to take, but subsequent tagging is messed up or
impossible.
This bug is on Eric's list, but it isn't fixed yet. The workaround
is a pain but is doable: Never delete the first sentence tag in a
paragraph. Instead, "shrink-right" or "grow-right"* that tag in the
Chooser, one boring click at a time (since Java doesn't understand
holding down the mouse button), till it ends at the right place. Then
adjust other sentence tags as needed.
* (Of course, if the sentence tag ends somewhere in the middle of
the real first sentence, you must first delete the tag on the SECOND
sentence. That does not cause problems.)
Entering Comments
[2003-07-23]
For some purposes you may need or want to enter a comment in the
comment field at the bottom of the chooser window. We've had some
problems keeping comments in the right place once they're entered, but
this seems to work:
- Be sure the reference you want is tagged and selected.
- Click in the comment field, and be sure not to move the
mouse pointer back into the main window.
- Type your comment.
- Use the "<" or ">" button at the top of the chooser
window (or the arrow keys when the text window is selected)
to move the highlight to the next or previous tag.
Saving Your Work
[2003-08-04]
In annotation as in all other computer work, it's wise to "Save early
and often".
- WordFreak's File menu begins with several "Project" operations,
including "Save Project" and "Save Project As...". These will not save
your work! The WordFreak concept of project includes the names
of files that you are working on and what viewer, tagger, and
annotation to start up with; saving the project just means saving that
list of information. So skip the project commands. Instead...
- To save your work, choose "Save >". That will
open a submenu that begins with "Save All". This option is known to be
buggy. Instead, choose the name of the file from that submenu and save
that. If you've got several files open, save them individually.
Problems with Saving Files
[2003-04-10]
If WF says it can't save your file:
- exit WF
- delete the .ann file
- rename the .save-temp file by removing the ".save-temp" from the end
of the name. That's your .ann file.
Double Tagging
[2003-07-23]
In some special situations you may have to tag the same string of
text with two different tags. Be careful to add the second tag
instead of replacing the first one with it:
- Move the highlight off the string by
- using the Chooser "<" or ">" button (or the arrow
keys while the cursor is on the text window) to switch it to a
previous or subsequent tag, or
- clicking the mouse (without dragging it) on another tag, selecting
it, or
- dragging the mouse across another part of the text
- Now drag the mouse across the string you want to double-tag. This
will select the token(s) but not the tag(s) already there. (See Clicking vs. Dragging.) Check the status line to be sure.
- Apply the tag.
- Check by using the Chooser "<" and ">" buttons (or the arrow keys
while the cursor is on the text window). The highlight in the text
window should stay on the same string, while the status line and the
Chooser window should change to show the different tags.
|