Mir's Picture



   Mir S. Siadaty, M.D. M.S.
    Clinical Informatics
    UVA School of Medicine



Services | Search-Engine Algorithms | SDS / Dual-Mining MethodMeta-Analysis | POR | Curriculum Vitae | Pics | About Dr. Siadaty


This is a web page of Mir Said Siadaty in University of Virginia.  I am on faculty in the School of Medicine, Division of Clinical Informatics.

I provide help and consultations for biomedical informatics, high throughput genomics and proteomics bioinformatics, research design and statistical analysis. Here you can read in more detail the services I provide.

The ReleMed search engine
Encountering extraneous articles in response to a query submitted to MEDLINE/PubMed is not uncommon. However, every one of the articles retrieved contains all of the query words. This led us to the conclusion that the presence of query words in an article is not a sufficient condition for the article to be relevant to user's query, although it is a necessary. About 83% of queries sent to PubMed, NLM's search engine for MEDLINE, are multi-word queries. When submitting a query with multiple words, the user is usually interested in some type of relationship between the words, such that the "presence of relationship" between the query words in the article also becomes a necessary condition for relevance. We proposed that if two words occur within an article, the probability that a relation between them is explained is clearly higher when the words occur within the same sentence (or adjacent sentences) versus remote sentences.
We have developed "Relemed", www.relemed.com, a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list. We used distributed parallel search architecture, to keep the response time short despite the heavy natural language processing required. You can learn more here.

The Dual-mining methodology
Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this project we proposed a method, an automatic acquisition of knowledge, to shorten the pattern list by locating the novel and interesting ones.
The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "Surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures magnitude of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance.
We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database (University of Virginia's Clinical Data Repository) and a biomedical literature citation knowledgebase (MEDLINE). Learn more here.

Results of biomedical research are eventually expressed and archived in natural human languages, such as English. For example, PubMed is an index of about 16 million published papers. I am interested in design and implementation of analysis methods that can extract and process data/knowledge inherent in such data repositories.
In the research project the Science Discovery System (SDS), we are pooling two huge bodies of information, the biomedical knowledge (an instance of which is PubMed with 15 million published papers indexed) and patient data (such as UVa Clinical Data Repository with over 750,000 patients digitized data), with the goal to discover novel regularities, and generate new hypotheses worthy of focused research. The ultimate goal would be to provide a tool that could lead to new basic and applied discoveries that would advance research, clinical care, and improve human health.

The Proportional Odds Ratio model
One of the fields I am interested in collaboration is "meta-analysis".
I have published a few papers about "meta-analysis". A meta-analysis combines the results of several studies that address a set of related research hypotheses. Meta-analysis is widely used in evidence-based medicine today.
We have developed the Proportional Odds Ratio (POR) model. It relaxes the homogeneity-of-ORs assumption. Furthermore, it generalizes the method of meta-analysis to more complex scenarios, including dependent studies, multiple outcomes, multiple thresholds, multi-category or continuous tests, and individual-level data.


You can see a current list of my papers in PubMed.


To contact me:
(434) 982 4436
(434) 924 8437 (fax)
MirSiadaty@virginia.edu
Postal Address:
UVA School of Medicine, DHES,
Hospital West Complex, Room 3181 (Box 800717)
Charlottesville, VA 22908






Go to the University of Virginia home page
Maintained by MirSiadaty@virginia.edu
Last Modified: 11 April 07