I provide help and
consultations for biomedical informatics, high throughput genomics and
proteomics bioinformatics, research design and statistical analysis.
Here you can read in more detail the services
I provide.
The ReleMed search engine
Encountering extraneous articles in response to a query submitted to
MEDLINE/PubMed is not uncommon. However, every one of the articles
retrieved contains all of the query words. This led us to the
conclusion that the presence of query words in an article is not a
sufficient condition for the article to be relevant to user's query,
although it is a necessary. About 83% of queries sent to PubMed, NLM's
search engine for MEDLINE, are multi-word queries. When submitting a
query with multiple words, the user is usually interested in some type
of relationship between the words, such that the "presence of
relationship" between the query words in the article also becomes a
necessary condition for relevance. We proposed that if two words occur
within an article, the probability that a relation between them is
explained is clearly higher when the words occur within the same
sentence (or adjacent sentences) versus remote sentences.
We have developed "Relemed", www.relemed.com,
a search engine for MEDLINE. Relemed increases specificity and
precision of retrieval by searching for query words within sentences
rather than the whole article. It uses sentence-level concurrence as a
statistical surrogate for the existence of relationship between the
words. It also estimates a relevance score and sorts the results on
this basis, thus shifting irrelevant articles lower down the list. We
used distributed parallel search architecture, to keep the response
time short despite the heavy natural language processing required. You
can learn more here.
The Dual-mining methodology
Data mining can be utilized to automate analysis of
substantial amounts of data produced in many organizations. However,
data mining produces large numbers of rules and patterns, many of which
are not useful. Existing methods for pruning uninteresting patterns
have only begun to automate the knowledge acquisition step (which is
required for subjective measures of interestingness), hence leaving a
serious bottleneck. In this project we proposed a method, an automatic
acquisition of knowledge, to shorten the pattern list by locating the
novel and interesting ones.
The dual-mining method is based on automatically comparing the strength
of patterns mined from a database with the strength of equivalent
patterns mined from a relevant knowledgebase. When these two estimates
of pattern strength do not match, a high "Surprise score" is assigned
to the pattern, identifying the pattern as potentially interesting. The
surprise score captures magnitude of novelty or interestingness of the
mined pattern. In addition, we show how to compute p values for each
surprise score, thus filtering out noise and attaching statistical
significance.
We have implemented the dual-mining method using scripts written in
Perl and R. We applied the method to a large patient database
(University of Virginia's Clinical Data Repository) and a biomedical
literature citation knowledgebase (MEDLINE). Learn more here.
Results of
biomedical research are eventually expressed and archived in natural
human languages, such as English. For example, PubMed is an index of
about 16 million published papers. I am interested in design and
implementation of analysis methods that can extract and process
data/knowledge inherent in such data repositories.
In
the research project the
Science Discovery System (SDS),
we are pooling two huge bodies of information, the
biomedical knowledge (an instance of which is PubMed with 15 million
published
papers indexed) and patient data (such as UVa Clinical Data Repository
with
over 750,000 patients digitized data), with the goal to discover novel
regularities, and generate new hypotheses worthy of focused research.
The
ultimate goal would be to provide a tool that could lead to new basic
and
applied discoveries that would advance research, clinical care, and
improve
human health.
The Proportional Odds Ratio model
One of the fields I am interested in collaboration is
"meta-analysis".
I have published a few papers about "meta-analysis".
A meta-analysis combines the results of several studies that address a
set of related research hypotheses. Meta-analysis is widely used in
evidence-based medicine today.
We have developed the Proportional Odds Ratio (POR)
model. It relaxes the homogeneity-of-ORs assumption. Furthermore, it
generalizes the method of meta-analysis to more complex scenarios,
including dependent studies, multiple outcomes, multiple thresholds,
multi-category or continuous tests, and individual-level data.
You can see a current list of my papers
in PubMed.
To contact me:
(434) 982 4436
(434) 924 8437 (fax)
MirSiadaty@virginia.edu
Postal Address:
UVA School of Medicine, DHES,
Hospital West Complex, Room 3181 (Box 800717)
Charlottesville, VA 22908