Database of Tags

From BioIE Wiki

Jump to: navigation, search

Main Page  : Database of Tags


The Database of Tags allows search of all the annotations in the v0.9 release (December 2004). The fields allow you to restrict the search. Shift-click to select multiple contiguous list entries; control-click to select multiple individual entries or to deselect a selected entry.

Each annotation that satisfies the search requirements is displayed on a separate line. The display is headed with the range of results on the page and the number of matches. There can be more matches than results if a matching annotation contains more than one occurrence of the search text. E.g., the first page of sentences containing "cancer" in oncology release 12-04, with 50 results per page, reports "1-50 of 765 Results" and "55 Search String Matches".

  • File Set: Select the domain(s) you are interested in.
  • Annotation Type: (= tag) The drop-down menu lists all the tag labels in ASCII sort order, beginning with the POS tags, which all consist entirely of capital letters and punctuation marks. The entity and pretagging tags, which comprise only lowercase letters, digits, and hyphens, follow.
  • Search Text: (optional) Return all strings including the specified text. If this field is left blank, the query will return all annotations with the specified label(s) in the specified file set(s). Wildcards are not permitted and the search is not case-sensitive.
  • Results per Page: all, 10, 25, 50, 100, 200, 500, 1000 (default 100)
  • Concordance: If Search Text is specified and Concordance View is checked, the returned strings will be displayed in a monospace font, with the Search Text underlined and centered, and with up to Context width characters of the annotation (see Note) on each side (default 35). For example:
    • A search of oncology release 12-04 for Annotation Type malignancy-developmental-state with Search Text "age" and Concordance View checked produces 14 hits, aligned with enough space to accommodate the longest left and right contexts, which in this case are both less than 35 characters: "infants less than 1 year of age" and "age of six months".
    • Changing Context width to 10 narrows the display, truncating these strings to "1 year of age" and "age of 6 mont".
    • Note: The context is limited to the annotation with the specified tag, not the surrounding text. If you ask for a concordance view of all instances of the AFX (affix) tag with the text of "pre" in both domains, you will get 42 mentions of "pre" with no context, because there are no AFX mentions which include "pre" as a proper substring. A concordance view of AFX "pr" produces all of those along with 1 "supra" and a lot of "proto"s, all aligned on "pr".
  • The Export to Plain Text option preserves the newlines of the source text, but this can split a single hit over two or more lines, and the concordance view does not center the target string properly. Also, the Search Text is displayed with a tab character on each side, so that a search for the string "cancer" will show "GB cancers" as GB    cancer  s.


Main Page  : Database of Tags

Personal tools