This is the Miner: Automatic Text Processing guidelines page. The Miner tool uses automatic taggers to tag and annotate selected pieces of text that the user provides in the tool.
- User's Guide
- Defining Listed Tagger Types
- Output Files
User's Guide
To begin, users may select a piece of text to be copied and pasted into the text area or submit a zipped file on the Miner tool by clicking the "Browse..." button identifying where the file is located. From the 'Annotator List' located above the text area, users may select one or more taggers at one time, then pressing on the " >> add >> " button to add the selected taggers into the 'Annotator Pipeline' list. The selected taggers located in the 'Annotator Pipeline' list may be rearranged by using the "Up" and "Down" button. The order in which the taggers are arranged, from top to bottom, is the order in which each tagger goes through the text to tag and annotate the text. Taggers may also be removed from the 'Annotator Pipeline' list by selecting the specific tagger to be removed, then pressing the " << rem << " button. Once the acquired taggers have been selected in the 'Annotator Pipeline" list with the acquired text to be tagged by the tagger(s), the "Submit Job" button may be pressed to run the specified tagger(s), creating output files of the automatically tagged text.
Defining Listed Tagger Types
The listed tagger types in Miner include 'Bio_POS', 'Bio_Paragraph', 'Bio_Sentence', 'Bio_Token', 'Deliminated_Sentence', 'Deliminated_Tokenizer', 'Paragraph', 'RegEx', and 'Simple_Token'. POS, known as part of speech taggers, automatically tag parts of speech in a text. Paragraph taggers tag pararaphs, sentence taggers tag sentences or sections, and token taggers tag each character and space in the text. All "Bio" heading taggers take into account statistical biological models in the text. "Non-bio" related taggers are not based upon these statistical biological models. 'Deliminated_Sentence' taggers tag return characters, and 'Deliminated_Tokenizer' taggers tag spaces within the text.
Output Files
Once the Miner tool is run, it creates a directory output of filename, size, along with the date and time of when the file was last modified. Once the user clicks and goes into the filename directory, there is a list of created files including the source text files. To see the annotated files in html, the user may click on "sourcetext.txt.html" and be able to hover the mouse over the text to see the highlighted tagged sections and terms. There is also a "legend.html" for users to understand what the highlighted sections and terms represent.
Miner (Automatic Text Processing):
http://bioie.ldc.upenn.edu/_miner/
|