|
BioC Implementations
|
BioC Corpora
BioC Tools
|
|
BioC-formatted corpora and BioC-compliant tools
BioC is a simple XML format to share text documents and annotations. It allows a large number of different annotations to be represented.
We provide simple code to hold this data, read it and write it back to XML files, and perform basic text processing tasks.
BioC-formatted corpora
Abbreviation detection in Biomedical domain.
These collections of PubMed abstracts manually annotated for abbreviated terms in biomedical text,
have been converted to BioC format and re-evaluated by four annotators to improve their consistency and quality levels.
- Schwartz and Hearst Corpus
1000 PubMed abstracts, as presented in original paper. The BioC-compliant tool,
the Shwartz and Hearst Algorithm, is included in the BioC-Java package.
- Ab3P corpus
1250 PubMed abstracts, as presented in: original paper
The BioC-compliant Ap3P Algorithm, is included in BioC-C++ package.
- BIOADI corpus.
1200 PubMed abstracts, as presented in: original paper.
- MEDSTRACT corpus
199 PubMed citations, the old version of the corpus presented in orginal paper.
- Other ...
BioC-compliant tools
- BioC Implementations
To help developers and improve interoperability between systems, BioC libaries have been implemented in several programming languages:
- BioC-C++
- BioC-Java
- BioC-SWIG for Python and Perl
- PyBioC
- BioC NLP Pipeline
- NCBI Text Mining tools
- Nactem BioC resources
- NICTA Brat2BioC
- iSimp
| |