Laboratory 6: Mining information from PubMed

In this laboratory you will find publications that explicitly mention your gene or pathways that your gene is involved in by mining PubMed. The R package that does this is RISmed. 

Part 1: Write an R script to do the following:
  1. Find the pubmed IDs for documents that mention your gene or one of its aliases anywhere in the PubMed title or the MeSH terms.  Save the list as a text file and output the number of pubs that actually met that criteria.
  2. Find the pubmed IDs for documents that mention your gene anywhere in the PubMed entry.  Save the list as a text file and output the number of pubs that actually met that criteria.
  3. Find the pubmed IDs for the documents that mention your pathway anywhere in the PubMed entry. Save the list as a text file and output the number of pubs.
  4. Find the pubmed IDs for the documents that mention the top three cancers associated with your gene in the title of a PubMed article (Use the terms used in the Lawrence Nature paper.) Save the list as a text file and output the number of pubs.
  5. Find the pubmed IDs for the documents that have any keywords matching mesh terms for the top three cancers associated with your gene. Use entry terms from  MESH (http://www.ncbi.nlm.nih.gov/mesh) when you do the search. Save the list as a text file and output the number of pubs.
  6. Find the intersection of the pubmed IDs from items 1-3.  How many are there?
  7. Find the intersection of the pubmed IDs from items 4-5. How many are there?
  8. Find the intersection of the pubmed IDs from items 6 and 7. How many are there?
Note: By default, the functions return up to 1000 results. Although you can set the parameters to retrieve more, it is probably better to do it in batches of 1000.  In the above examples, you should write your code to fetch 1000 at a time. You should fetch no more than 10,000 results. If there are a huge number write code to pick 10,000 from the last 5 years. You should adapt the function developed in class to do this.

Part 2:  From the pubmed IDs above, pick an article to summarize. Try to pick one that is most relevant to cancer.  Which lists from Part 1 is it in? Answer the following questions:
  1. Why did you pick this article in particular?
  2. What is the issue addressed in the paper (i.e., the gap in knowledge)?
  3. Why is this issue important (i.e., the grab)?
  4. Why was this paper written (i.e., what will you get out of it)?
  5. How was the study conducted (i.e., who were the subjects, what was measured, where was it done, when was it done)?
  6. How were the data collected?
  7. What outcomes were measured?
  8. What were the results?
  9. How did this answer a research question (or what about the research question did the paper answer)?
Note: The "gap-grab-get" concept originated with Bill Hendricson and Carolina Livi of UTHSCSA.

Some useful resources: