Skip to main content
NIH Clinical Center
  Home | Contact Us | Site Map | Search
About the Clinical Center
For Researchers and Physicians
Participate in Clinical Studies

Back to: About the Clinical Center > Departments and Services > NIH Clinical Center Radio > Grand Rounds Podcasts
NIH Clinical Center Radio
Transcript

NIH CLINICAL CENTER GRAND ROUNDS
Episode 2009-003
Time:  1:07:07
Recorded January 28, 2009

The Emerging Paradigm of Clinical Genomics: Technologic Developments
Presented by:  Eric Green, MD, PhD.
Scientific Director, NHGRI

The Emerging Paradigm of Clinical Genomics: Clinical Implementation
Presented by: Leslie G. Biesecker, MD
Chief, Genetic Disease Research Branch, NHGRI

ANNOUNCER:  Discussing Outstanding Science of the Past, Present and Future – this is NIH Clinical Center Grand Rounds.

(Music establishes, goes under VO)

ANNOUNCER:  Greetings and welcome to NIH Clinical Center Grand Rounds.  On this edition, two researchers from the National Human Genome Research Institute at the NIH will address the emerging paradigm of clinical genomics. Dr. Eric Green, scientific director, will focus on technologic developments, and Dr. Leslie G. Biesecker, chief, Genetic Disease Research Branch, will present on clinical implementation.  If you would like to see a close-captioned videocast of today's subject, log on to http://videocast.nih.gov and click the "Past Events" link.  We take you to the Lipsett Ampitheater in the NIH Clinical Center in Bethesda, Maryland, where Dr. John Gallin, director of the NIH Clinical Center, will introduce our first speaker.

(Music fades)


GALLIN:  Good afternoon, welcome to Clinical Center Grand Rounds.  I’m glad you all survived the weather and the ice and nobody got hurt.  Today we have a special presentation on the emerging paradigm of clinical genomics by two of the best speakers not only at NIH but anywhere.  I’m going to introduce both of them and they will give their presentations and we'll have questions as appropriate. 

So covering the technological development aspects is going to be Dr. Eric Green, scientific director of the National Human Genome Research Institute since 2002.  Eric serves as chief of the institute’s technology grants and director of the NIH Sequencing Center.  He has been significantly involved in all phases of the Human Genome Project, most recently he leads a number of efforts that use contemporary strategies for large scale DNA sequencing to succeed genomic variation among humans and to examine the microbial communities that exist on and off the human body.  Dr. Green received his degree from the University of Wisconsin in Madison, an MD and PhD degrees from Washington University.  He completed residencies in pathology and inteRNAl medicine and was postdoctoral research fellow at the University of Washington School Of Medicine.  In 1994, he joined the newly established intermural program of the National Centers or Genome Research at the NIH.  He has many memberships, including the Association of AmErican Physicians.  And he is the founding editor of the jouRNAl, genome research and a series editor of genomic analysis, a laboratory manual.

Our second speaker who will cover the clinical implications, Dr. Leslie Biesecker, chief of the National Human Genome Institute’s Genetic Diseases Research Branch.  His current research focuses on genotype correlations of disorders associated with rare genomic variance and his laboratory is working on two classes of disorders.  Multiple congenital anomaly streams and atherosclerosis.  The goals are to study the patients affected by these disorders to better understand basic mechanisms of disease and development.  Dr. Biesecker received his BS degree from the University of California in riverside and his MD at the University of Illinois.  He then completed a clinical fellowship in pediatrics at the University of Michigan.  He later was a research fellow in pediatrics.  He came to the NIH in 1995, and has been an associate professor at Johns Hopkins University School of Public Health in 2002, and his current position is grants chief since 2006.  He is a member of many prestigious societies including ASCI, the IP, and AmErican Society for Human Genomics.  He's a fellow of the AmErican College and his honors include the NIH Director's Award in 2002, and NIH Director's Award for developing the DNA testing procedure for World Trade Center victims families.  He's also helped with the Hurricane Katrina Initiative.  Welcome to both of you and we look forward to the presentations.

GREEN:  Thank you, John.  Let me start off by showing this.  Thank you for braving the ice and making it here.  I'd rather spend time teaching than reading objectives.  The fact of the matter is, this is an image that many are using but I think it's one that accurately captures the situation we have right now.  Such waves of data technologies and opportunities keep hitting the shore one after the other.  Like a great wave this could be exciting if you're a surfer, and exhilarating, and at the same time if you're in a boat, it can be quite daunting and at times almost terrifying.  That very much captures what I think is going on in genomics right now.
The origins of this have to do with the Human Genome Project, and completed a finished version of this in 2003.  But as many of us involved in this initiative, we recognize that it was just the beginning of a journey.  And that beginning we hoped would eventually lead us down a path that perhaps would change the way we practice medicine in small ways or great.  There are many ways of describing this.  I like to use the phrase genomic medicine and the realization as an ultimate goal of why we sequence genome in the first place.  By genomic, I mean healthcare tailored to the individual based on for about that individual.  Recognizing going from basic sequence to actually changing the way some aspects of medicine are practiced is going to be a difficult path, it's going to have multiple steps, many of which we can't define.  The goal of investigators like myself and many others in this arena, very much are passionate about taking us down the path from basic human genome sequence to changing aspects of the way we practice. 

We feel quite good about having succeeded in the initial goal of sequencing the genome.  We recognize we don't know what this will look like at the other end, but we are motivated by the basic concept we have to make this journey if we are to fulfill the promise of why we sequence the genome in the first place. 

What I’m going to do is to give you three major areas where I believe our important steps in that journey towards genomic medicine.  Each of these steps talks about new approaches.  And each of them really I describe as very broad landscapes, simply because of time.

Last, I think we'll take it from here and we'll drill down and introduce a specific project that's ongoing that you will see is basically taking advantage of these advances, both in allude dating function, empowering studies, and most exciting as you will see, implementing new DNA sequencing technologies.

So let me start with the first, elucidating dating human genomic function, reminding you that the goal of the project was the sequence of the human genome.  Shown here is .001% of the sequence and to remind you that the goal was simply to order the roughly 3 billion letters.  We have learned a lot and how it functions but we have a long way to go.  The reason after we completed sequencing the human genome, we started the rat, and the dog, and the cow and so forth.  We would get great insight to figure out which of these 3 billion letters is actually important.  Turns out probably not all of them are.  Based on these initial studies comparing our genome to others, it is now thought that something like 5% of the letters that are within our blue print are the ones that are directly functionally important.  We got to figure out what they are, and so like a good first year medical student what do we do?  Pull out highlighters and start highlighting the important stuff.  We pull out one highlighter and highlight the sequence we now know directly codes for protein, protein coding sequences genes. 

But one of the things we have learned from our initial studies of sequencing other genomes is that there is actually a larger amount of functional sequence we need to highlight in a different color because it's functional, it's evolutionary conserved, but it doesn't code for protein directly.  About 5% of these letters we believe are functionally important.  Of that only about a third directly codes for protein, the other 2/3 is non-coding DNA doing other things.  We are trying to learn the language of the sequence.

There is a lot to be learned.  Grammar, syntax, understanding the words, understanding how it flows together to create a language.  We know a lot about coding sequences, genes, that yellow stuff.  We understand that you can put different combinations of exxons together through ultimate splicing.  We have a lot of complexity from this.  We know the language of genes because when you get into coding sequence.  We have a look up table.  We have a genetic code.  We understand how the triplets work and how they encode for a specific amino acid.  We have a tremendous amount of knowledge about the language of genes.  This is not where the great challenge resides.

The great challenge in understanding this really now resides in the non coding functional sequences.  What are these?  What is that purple stuff?  Well, we know about some of it.  We're learning more and more but it's complicated.  We know it involves elements that regulate genes.  Turning them on and off in development, in specific turns.  Increasingly, we are -- doesn't necessary mean how all the regulatory elements work.  Increasingly, we're learning that non coding DNA functions to help package chromosomes correctly.  To help segregate during stem cell division and to replicate them appropriately and accurately.  And particularly relevant over the last 5 or 10 years has been the recognition that there is a whole RNA world, non-coding RNA is involved in all sorts of processes, especially gene regulation that's very important.

So DNA encodes RNA, but not necessarily going on and encoding a protein.  Macro, micro RNA's, so forth.  This a world we have more to learn about and it's getting increasingly relevant.  I don't have time to describe all the initiates.

A very major project is called “Encode,” which looked at the first one percent of the human genome, doing comparative sequence analysis to understand evolutionary constraint across the genome and then a whole lot of other laboratory based meds to hone in and accurately highlight the functions.  It's now scaled up to remaining 100%, the other 99% of the human genome, and this project goes forth and is really the kind of thing that will -- the encode project will last for a small number of years but I think our interpretation, and particularly the non coding portions will probably last decades longer.

So I just want give you a brief understanding that increasingly we're gaining a detailed view of all the human chromosomes and all the functional elements, but this is an ongoing process.  We're on a trajectory to slowly but surely understand the language.

Second area where there is another major wave of data acquisition and knowledge.  That data acquisition and knowledge is really focused on understanding not how all of us are the same, but we're all roughly 99.9% identical to sequence level.  Where we are different, where there are variants at certain base positions that are accumulated across our genome.  Any two of us probably have 3 to 5 million differences, if you look at our precise genome sequence.  And the vast, vast, vast, vast, majority of those variants are non consequential, have no impact.  But a small percent of them have an impact represented here by bomb -- represented by bombs.  Sometimes it's positive, sometimes it's negative.  What is relevant to enterprises going on would be those variants that have consequences that are relevant to human disease.  That of course is a major important step to understand in route to full fledged genomic medicine.

There is a dichotomy that I think is worth pointing out to you in understanding the architect, because I think it's relevant to understand the current wave.  You can simply over-classify this into two major categories.  One would be rare, genetically simply.  These are used interchangeably.  Single gene disorders if you will that are caused by a mutation, in a DNA, precise position in a gene, a dominant cause of that disease.

There might be other confounding factors such a environmental exposures or other changes but the dominant cause of which is a single mutation in a defined part of the genome.  But these are the rare disorders.

Far more relevant to healthcare burden in the world are common otherwise non as complex, here there is an interplay of multiple little genetic variants, those little bombs, keep contributing along with influence on the environment that all conspire together to yield a clinical situation like hypertension or asthma or mental illness, et cetera, et cetera, et cetera, the very common diseases we're seeing every day in our healthcare systems.

How is this played out?  Understanding the genomic contributions of situations like that?  Well, in the case of rare diseases it's been a great triumph.  Single gene disorders, where the gene was identified and each of the years and remind you, the genomic project began there.  Year after year, the technologies that allow them to identify the genetic basis of gene disorders.  We stopped graphing in 2005 because it made the point.  This does not represent the major dominant healthcare burden in the world, rather that's represented by more common complex genetic disorders.  To fully engage and really attack that problem was going to require a much more complicated suite of technologies and information in order to sort of deal with the nuances of small -- of genetic changes that each contributing a small percent to the overall genetic risk that would all add up, would actually yield a situation that would lead to a disease.  Recognizing this was going to be far more complicated and statistically to untangle, to tackle this we needed a better cataloging of a genetic variance.

This was a basis for the project called the halt map project, published in the first of three publications.  And I’m going to oversimplify this for the purpose of time.  The bottom line, the idea was it turns out is that the 3 to 5 million variants that each of us have actually come together in little groups, neighborhoods.  Not totally random all over the place.  They come together.  These blocks of variants go from one generation to the next, and the whole problem of understanding genetic variation can be partially simplified by just understanding what that architecture looks like across human population.  And through a project that was done for a lot of common variants, many millions that exist across the human population.  And that, then, lead to a situation where instead of having to worry about all 3 to 5 million variants, you could actually focus on a subset of them, because each of those would represent a neighborhood if you will that has other variants associated.  You don't have to worry about the other ones.  That set up a situation that would allow groups of individuals with complex genetic diseases to be studied in affordable fashion by taking groups of these representative variants and typing them across large group of people with diseases, asthma, various ailments, always having large groups.  And analyzing the data to try to develop an association between the inheritance of a specific part of the genome and of the disease.  And just by developing those statistical correlations, you can zoom in on parts of the genome that might have been one of the variants that contributes to the disease. 

This is known as genome wide association study.  You're looking across the whole genome and asking the question, out of a thousand people that I have with this type of disease, is there any correlation between inheriting any one part of the genome and actually getting the disease?  And if you study enough people, we thought as a community that you would be able to see those correlations.

Did it work?  Back of the envelope said it should.  Did it?  Well, the first data started to emerge in about 2005 and low and behold, they did a study and found a region on chromosome one over there that was genetically associated with the inheritance of macular degeneration.  That was exciting.  It worked.

2006 brought a couple of other successes.  2007, all hell broke lose.  The first quarter, you saw some.  Every one of these is a major publication, second quarter, 2007, third quarter 2007.  Fourth quarter 2007.  Did it let up in 2008?  No.  First quarter 2008, second quarter 2008, third quarter, fourth quarter, the graphics said we need a different way to represent it.  It's getting too crowded.  200 came in during the last quarter of 2008.  This was remarkable.

Another wave, scattering information about where to look in the genome that harbor variants that are relevant to common diseases.  This was a major success and in fact in 2007, it was recognized by science magazine being the break through of the year.  All of this work of genetic viruses, halt map, genome-wide association studies giving us clues about where to look in the genome to try to those variants.  This is where it becomes daunting, because going to the next step is still a huge mountain.

What is emerging, this is an important nuance that will carry through in thinking about what the future is going to bring.  Is that it's turning out that the nuance is that with these complex genetic diseases, we are beginning to get clues that many of them, probably most of them, in fact perhaps the vast majority of them, the variants are not going to reside in coding sequences.  They're going to reside in that really tough stuff, this non coding DNA, which is going to be a little more difficult for us to understand how it is functional in the normal state and how it leads to disease in the case of disease states.

I’m going to give you a brief tour, because it really illustrates the point.  One of the early genome wide association studies, cones disease, found a region -- the gene here and here and no known genes in between, right in the middle of a desert with no known coding sequences.

Here are two studies, coronary artery disease both confirming the same region on chromosome 9 in this case, that the greatest variant resides here but the 1 gene is here, one here.  This is non-coding DNA.  This hits home locally for a graduate student in my laboratory.  She's studying dyslexia.  We have a critical region where all genetic studies have been performed indicated that the critical region for the variant associated sits in this interval, one gene here, here, and we think we found it, a gene regulatory element.  It's not always between genes.

Here is an example of work from a couple of works looking for type two diabetes.  Hear is a gene, igf2b2, that's the interval right there.  Non-coding DNA once again.  It turns out with hundreds and hundreds of studies being performed, they are cataloging this, also gives them a chance to analyze where these genetic association peaks are sitting relative to annotations highlighting across the human genome.  That's fueling this fire that indeed a lot of the causative variants are going to be in non coding DNA.  I'll show a quick summary of their work.  Bottom line, graph in blue is the variant that is the index, this one that in the association studies found the interval.  Engraved in pink are the neighbors, the linked variants close by, and simply cataloging where did these sit down when they were found by association.  Are they sitting down in coding regions?  The vast minority of them are falling in coding regions.  You can see the vast, vast, vast majority are following in non coding regions.  These aren't necessarily the causative ones.  In some cases there might be undetected coding sequences that are harboring the variance next to these that were used for the association.  It's all possible.  But that's unlikely in the large majority of instances.

Bottom line is that what we are now looking at with respect to complex genetic diseases is likely a long road of trying to understand the variance in non coding DNA for which we have a tremendous amount to learn about the language.  This is going to be a grand challenge in genetics for decades ahead.

So really remind you what I've told you about, as we've tried to go down this path towards genomic medicine.  I told you how we are interpreting the human genome sequence and all the functional elements, and empowering genetic studies through that map and giving us in title wave full of harbor variants that are contributing.

The last area I want to review is another major wave in terms of technology advances.  This is a photograph of one of the major laboratories responsible for sequencing the first human genome sequence.  If we are going to do clinical research and eventually use genome sequences as part of clinical research or certainly clinical care, we can't have factories taking weeks and months to produce a sequence.  We need to make this a diagnostic kind of endeavor.  We need to be able to sequence human genomes on devices like little chips or microsomething-or-others.  It needs to be a major technological advance.

The good news, we are about and have entered a major phase of that advance with what are known as next generation DNA sequencing technologies.  Again, showing you only iconic views of a series of new technologies that are light years ahead in terms of efficiency and scale and cost with respect to the kind of technologies that were used for sequencing the human genome.

If you are interested in reading about this, you can hardly open a journal these days without finding a review article about it.  There was a whole special issue about these new technologies, and there are many articles to read in detail.  Are these hypothetical technologies?  No.  These are real.  Shown here are through platforms that already are sold.  All three exist within NIH laboratories -- NIH laboratories.  Maybe within this building.
Are these the only three platforms?  Oh, my gosh.  There are multiple other ones coming over the next few years.  These are just companies that have announced they have instruments.  Each one in principle leap-frogging the previous one in terms of the throughput and cost and so forth.  Lots of -- as with all new technologies, lots of challenges.

This is absolutely a technological wave that I personally have not seen in genomics in my 20 years being involved, remarkable not only each what they can do but the fact they're different, and each one seems to be getting incrementally better and better.  As -- and by the way, there are other models for implementing such sequencings technologies such as this company, complete genomics which last year announced it is ready to offer a service in the near future of sequencing a human genome for $5,000.  They don't sell an instrument, they sell the service.  I can't yet tell you whether I believe them, but the fact is even if it's $10,000, that is a bargain.  These are the things to look for, whether they are services or instruments, various models are being taken out for a test ride and we're seeing them play out now.  The fact of the matter is, once again, it's exhilarating, but it's daunting.  The amount of data that spews out of these sequencing machines is overwhelming and becomes -- and so every genome center, another genomics group, everybody who is working with these tells you it becomes overwhelming so now the bottom neck becomes analysis of the data and efforts to try to figure out how to analyze large amounts of sequence data.
So the implications for our journey that overlaying all of these steps, the first two I introduced you to and many others, are the nifty methods that allow us to read this at low cost.  This is absolutely potential for changing the way we think about using sequencing as a tool in route to genomic medicine.  In fact, a recent issue of nature featured the fact we are now at a point where sequencing an individual's genome is becoming not routine, but gets published in nature.  Published in this issue of nature, 3 individuals having their genome sequenced using these technologies.

Before I tell you about the examples, let's pause.  Why is it we want to sequence somebody's entire genome?  Why not sequence the important stuff?  Why not just sequence the highlights?  The genes?  Just the genes?  Well, let me remind you, these are things I told you, why is this so relevant.  Well, remember.  Less than a third of the functional sequence in the human genome encodes protein.  I told you 5% of the genome is conserved, thought to be functionally important.  Only about 1 1/2% directly codes for protein.  That leaves about 3 1/2% functional, non-coding.  That was the purple stuff.  And the problem is, we don't know where to put that highlighter down.  We don't have a complete inventory.

Don't think for a minute that non coding functional sequence is irrelevant, I told you it's very relevant for disease, especially common diseases.  So maybe non coding DNA I think in general is less so important for single gene disorders.  Everything we are now starting to see about complex disorders says it's very important.  That's the site that is harboring many of the variants.  Situation actually gets more complicated.  It's not about single base variance.

There is also structural genomic variance, copy number, and these are the abbreviation here.  Some of those are very
relevant.  It might be the only way to really understand the genomic basis of a particular disease, might be to see the structural differences relative to other individual genome.  In general, I think many people regarding you want to aim for comprehensiveness.  That is why there has been a series of demonstration projects where individuals have had their genome sequenced.

The first one was this guy, Craig Venter.  The second was this guy, Jim Watson.  You might have concluded after the first two -- well, never mind.  The fact of the matter is, they decided to go for more diversity here, and as a result in that special issue, three more --  individuals of African, Asian, and tumor specimen, all sequenced for different reasons.

But now there is publications of 5 individuals who have been sequenced.  How can we progress?  These are demonstration projects, multi million dollar projects.  These are not the kind of projects as described here that one would apply in a clinical research setting to sequence the genome of 100 people.  Nothing like that.  But we're on a progression toward getting there.

And that really is the goal, is to reduce individual genome sequencing to a situation that would allow you to do this in a clinical research setting and maybe even eventually in clinical practice.  What will that progression look like in my opinion?  Just to really set the stage in particular for the second talk you're going to hear.  In route to routine, when I say routine cheap, research and for clinical practice.  I would say that right now, we are here.  The now is we pretty much do genes, gather a bunch of genes and sequence the coding regions and maybe some associated regulatory elements but we are months, and I mean months away from being able to do sequencing every exxon all coding regions in the human genome using these new technologies.  After there will be other stuff.

Maybe have a good enough catalog of the non coding functional sequences, so we'll sequence the 5% we know, the yellow stuff and purple stuff, and we may only be a year or 2 away from that.  But eventually as costs come down, technologies improve, whole genome sequencing.  Watch this play out.  I think so we're here now.  Within a year, I predict we will be here.

What's going on at NIH?  This is my transition to our second speaker.

Twelve years ago, I founded a sequencing center called the NIH intramural center.  If I look at pictures that exist today, the truth of the matter, it's a state of the art sequencing center, smaller than the big centers, but use the same equipment, same program.  That was founded 12 years ago.  We have been involved in various things.  Increasingly we want to use this as an engine to help facilitate clinical research and push us down the path toward genomic medicine.  More and more what my personal passion is, what I want to see happen just as an example, going else where as well, take an enterprise like this.  We are importing these next generation sequencing technologies and trying to push the envelope as best we can in all these technologies in route to whole genome sequencing, but marry it to this fantastic place we're in now.  If we're going to get down this journey, these groups should be interacting.

So this has been a major goal of mine and our institutes.  We're very motivated to do this.  We have launched a number of projects, some large and some small, but the flag ship project is a project that now with me setting a landscape, I’m going to turn this over the les who is going to describe how it fits into the journey going from sequencing the human genomic to doing something about genomic medicine.

So, Les.

[applause].

Biesecker:  okay.  Now we need to transition.  Over to the Mac, if you would.

I will display this, and while you're gazing or not at this, I want to pose to you a question that I want you to think about during this talk, which is if you had, as a research investigator in the clinic, the opportunity to integrate the entire genomic of a human subject, what question would you ask and how would you organize your research to take advantage of your ability to do that?  And like that tsunami picture that Eric uses, this can be an overwhelming thing to consider. 

I want to give you background on how we're beginning to build the infrastructure needed to accomplish the research projects and how we think about the questions.  So Eric introduced to you a couple of these pilot sequencing projects that are out there.  There is all different kinds of flavors of these pilot projects out there.  And the way we think about this is thinking about how these projects fit into what we call space.
Three key variables to think about.  The first, we're in a clinical center, so the first thing, the patients.  What kind of patients, diseases, and how do we approach them, subjects as patients.  The next is clinical data which is the key, I think, to understanding human genome function, and to understand the clinical data that are associated with those patients.  And, of course, we're greedy, so we want a lot of clinical data and we want high quality data, just like we want lots of subjects.  Then this new key variable we're throwing into the mix is the breadth of interrogation of the genome.  You can interrogate one gene at a time, a dozen, or all 30,000 at a time now because of this technology, or you can interrogate the entire 3 gigabases of the genome.

So the work we've done in the past, number of people are comfortable with is single genome studies.  You look at one gene in a number of subjects and you understand how that relates.  The individual genomic projects that he mentioned are here.  These are projects that have subject ends of one at this time.  They have full genomic data and minimal clinical data other than what we know publicly about these individuals.  They have some things to offer us but not a lot.

A project not mentioned was this project called thousand genomes which was an effort to assay a thousand genomes by complete sequencing to understand the entire spectrum of genetic variation, all those samples are not associated with clinical data so they sit on the floor.  It tries to occupy the middle of the space as a pathway towards where we want to be which is here.  Which is to say we want complete genome interrogation, we want it in a lot of subjects, and we want really deep clinical data.  We can't do that practically now.  We start here and our goal is to work toward this ideal point.

Now, following on the description of thinking about rare and common variation.  I think clinical phenotypes this way, which is, that the phenotypes that we see in our patients are a complex mixture of genomic variation, some of which is common, and some of which is rare.  And that is independent of whether the phenotype itself is common or rare.  Phenotypes are add mixtures, and different phenotypes have different add mixtures, and different patients with the same phenotype can have different add mixtures.

Our task is to understand this, and within individual patients, to begin to dissect the pathways causing the phenotypes and to develop novel and focused personalized ways to approach phenotypes therapeutically.  The only way to assess this variation is through sequencing.  When you are interested in common variants, there are multiple techniques available to assess action like common snips and things like that.  You cannot assess rare variation with those technologies.  Sequencing is the only way to do that.  Plus other than sequence variation, which we should -- we're focusing on here today, should realize there are other structural variants that we're developing methods of approaching as well.  Those will be in the mix as we go along.

Okay.  So as I mentioned we wanted to start to build the infrastructure to develop a research pipeline to accommodate these kinds of datasets.  So what we are setting out to do in this project is develop a robust infrastructure that allows for generation, interpretation, and use of whole genome datasets to understand human disease.  And to pilot that, we want to start with a phenotype that that is the appropriate complexity, dissect that and understand how to do that.

It is essential we do not forget the patients, when we interrogate their entire genome for all variants.  And our understanding in understanding the relationship to those variants to all phenotypes, that is a consent that novel, intriguing, and we have to develop ways to interact with the patients and protect their interests.

What is this?  This is where we're starting.  Our target goal is to bring 1,000 patients from the NIH clinical center recruited from your area and starting with the phenotype of artherosclerosis again, because it has the attributes that allows us to pilot this.  What's important is not only are they consented to study genes, they are consented to allow follow up sequencing of all genes, the entire genome.  As well, the patients are consented to allow re-contact to follow up for iterative phenotypic studies that allows us to understand the relationship between the variants we discover and phenotypes in that patient.

To model whole genome acquisition, we're starting with 400 candidate genes.  The patients are consented for whole genome sequencing.  Out of respect and interest in working with the patients, we are returning the results to the patients and in that process we learn how that interaction should take place, what kind of results the subjects want back and how to learn from them, the best way to do that.  Progress.

So we started this just about 2 years ago now and we have enrolled the cohort.  We have having a hard time keeping up -- hard time keeping up with the public's interest.  We're limited in clinical input.  567 people have come through and signed the consent form -- 567 people have come through.  We have interrogated 219 of 400 genes.  Those are pieces, so it's 3,500 genomic targets.  The dataset, we just did a data freeze.  That comprises 825 million base pairs of sequence.  We're just short of a gigabase of sequence in this clinical cohort to date.

Here is the gene we're starting out interrogating.  It's an interesting mix of genes proposed by people collaborating with us, known to cause human cardiovascular diseases, some of which cause it in mice, so a range of candidate genes that gives us a feel for the genome landscape, a variation in this phenotype and allow us to begin to develop ways to deal with different kinds of variants.

Here is the scary data flood.  This slide is both scary and exciting for the following reason.  If I take all of the genetic variants we have detected in these subjects to date, which is again about a third of the genes, and about a quarter of the patients, only part way through, the frequency of the variation looks like this.

This is a log scale for the count of the number of variants, and this number is how often a particular variant was found in our study.  And what you see is rather remarkable, which is that the most common situation is that most variants, the market of the majority of the variants were found one time in one patient.  That means that half of the genomic variation between individuals in this study is attributable to variants that are unique.  That poses all kinds of challenges to untangle the relationship of these many unique variants to phenotype, and how to combine those results to move forward and understand pathophysiology.

Another key concept is the genetic concept of frequency.  In general, this is a very vague curve here, doesn't have any units on it.  We don't understand the shape of the graph because we don't have enough data yet.  In general, variants that are rare are more often highly penetrant.

And the work that Eric described with the association studies is exciting but the lesson learned is that if you take a trait with an overall genetic contribution to that trait of, say, 60 or 65%, that is, 60 to 65% of whether or not that person has a trait is determined by their genes.  If you do whole genome association with common snip chips, you find only about 10% of that genetic variation using that approach.  That means that much more of the ability of these common traits is due to an aggregate or a mixture of many rarer variants and we have to interrogate those by sequencing.

When we think about how to do this work, how to use the data, interact with the subjects, because is all very new territory.  Well, why don't we start from what we know how to do?  We understand in medical and clinical genetics research how to work with this.  Variants that are rare in the population and that cause high penetrenc3 phenotypes are something that medical teams work with every day.  We know how to do this work.  We know how to analyze the data.

Why don't we start working with our huge overwhelming datasets with the kind of variants we understand how to interpret, the kind we know how to report to subjects, and then begin to explore this curve, see what the shape of the curve is, see what kinds of variants are here, and build in this direction to understand how to expand this work to more and more variants of different types and different frequencies.  Now, again, the common variant snip data work from a clinical genetic standpoint is probably not even really genetics.  When we see variants that are common in the population, many, as much as 40%, that have a tiny genetic affect on whether or not you have an attribute, that is really not living in the realm of clinical genetics.

I don't want to be generating clinical testing reports and giving them to an individual and say that you're one or two percent more likely than the person sitting next to you to have a phenotype.  That's not useful.  This is genetics up here.  We have to figure out how to build down this way to figure how to interact with the subject and use the data.

Now, the other challenging part about this paradigm is this.  Genomics has revolutionized biology by forcing it to recognize the power of hypothesis generating research.  We're used to studies where we have to come up with a hypothesis first, apply an assay to test it and determine whether the answer is yes or no.  Genomics turns that completely on its head.  You generate the data first.  You look at the data, and the date -- data helps you to develop hypothesis where you then retest, or develop more specific single hypothesis, the test going forward.

In the clinic, this is how we do research in general in this place and all around the world.  We generate a hypothesis, then we find patients who have the particular phenotype we want to study, and we correlate.  We decide if the result of that assay explains the phenotype or whatever else we're looking at.  Clinical genomics is going to turn this on its head.  We're going to instead something that looks more like this.  We will apply our assay first to a group of subjects we don't know very much about.  Then we will take their genomic data and sort our subjects by their geneotypes, not by the phenotypes.  You can generate a hypothesis of that.  You can look at the general clinical effects in that group, and then sort your phenotypes and understand that, and make correlations in this manner.

This will be a hard thing to do in clinical research, as physicians we are generally trained only to apply tests to patients when you know how to interpret the results.  And we're saying that that -- going to be turned on its head for this research.  It's going to require new ways of thinking about things.  The data flow is complex for this kind of a project.  This is a simplified diagram.  Samples are collected, analyzed, determined whether they were associated with the phenotype. 

Two things from this, early in the process we separate diagnostic samples from samples.  The subjects participate in the decision about whether or not or what kind of results are returned to them in a way that protects their confidentially and their ability to control the kinds of information that they want and can use.  As an example, calling from this dataset of 567 patients we have already 6 families who had to a degree previously undiagnosed genetic forms of cholesterol.  This is a lady we saw in the clinical center early, very well treated for her disease.  She has a very high burden of atherosclerosis, but again, well controlled lipids.  One of the genes is the low density hypo proven receptor.  This lady was found to have a mutation in that gene, a mutation that has been found before in other families. 

We are in the process of implementing the medical genetics approach.  We're comfortable with the notion of taking a result, working with the individual, and working our way through the family to identify other family members.  Is this in effect amplifying the results, by identifying this one patient?  We can identify 5 to 10 more people who have this highly lethal but highly treatable phenotype that we can work with.  We're working with a number of collaborators and a number of institutes around the campus and we have a number of subprojects under way.
There is no way the investigators in our institute can possibly use all the data we're going to generate.  So this is a partial list of some of the things that we have on going of groups working with us to use these data productively.

So far, what we've learned is that large numbers of human subjects, patients, are interested in this kind of research and are desirous of having their entire genomes interrogated.  We consent and the patients can comprehend the implications of consenting to whole genome sequencing.  We can generate clinically relevant results.  The subjects can receive those results, interpret them, act upon them, made medical changes, modeling some of the attributes that we're all so hopeful we can develop in the future, and going forward we're growing to expand the studies in breadth and depth.  We're going to go from the gene list I showed you, on to sequencing, and then soon to whole gene sequencing.  We're going to broaden our population and reach out to different ethnic groups to understand the totality of human genomic variation, make associations of variance in geneotype.  We want to understand how do the subjects view this kind of research.  How useful to them.

So I have a long list of collaborating here.  People have been very enthusiastic about working with us to help develop this project.  As I mentioned our recruitment has been very successful.  To begin to develop and be a model for personalized genomic medicine in research and eventually patient care.

I'll stop there and take any questions.

Thank you.

GALLIN: Thanks to both of you for a great presentation.  I will start off with a question.

Eric, you talked about the relationship between the environment and the genome and les, you didn't bring in the environment in your talk.  Is there an opportunity now to start doing that?

GREEN:  Within the context --

GALLIN:  Yes.

GREEN:  Yes.  So it's an important variable.  I think a good thing to remember is that as geneticists and genomicists, it is not the case we are necessarily less interested in the environment than we are in the genetic contribution of the phenotypes.  But understanding the genetic component is a more tractable problem.  The tools we have to understand environmental contributions to phenotype are very, very poor tools.  That's an area of research that a number of people are working on hard, that becomes a more trackable problem, I’m sure we'll integrate that.  We do include environment in our assessments of our patients, exercise, how many cigarettes, alcohol, all of those are environmental things that we are acquiring.  The main problem is the tools that understand the environmental exposure are not very sophisticated yet.

QUESTION:  That was basically the question that I was going to ask.  I was amazed there was so little mentioned of environment in that whole presentation.  Maybe a dirty word in the genomic institute.  Very interesting article in the New York Times magazine section two or three weeks ago, or the results of many, for example, psychiatric analysts looking for genetic basis.  The disappointments in many of these have made us question whether or not environmental are much stronger than are generally acknowledged, and whether they're at a level of known epigenetic modifications or diffuse unknown unknowns, as somebody might call them.  It seems clear that, I mean forward movement of this approach needs a strong environmental study component.

GREEN:  I completely agree.  I would echo what Les said.  As the technologies develop for doing that better, it should be incorporated.  One of the advantages, at least being able to follow them, be able to contact them.  There are a lot of large cohorts that are being subjected these days, but it's impossible to get back to the human being.  One advantage of this we hope is that in 5 years we have abilities to more sophisticated way monitor such things, we can get back and talk to people who are already studied.

QUESTION:   NCI Frederick.  Two excellent talks, doctors.  But I guess I have a little hesitation, Dr. Biesecker about your comment that some of the CDCV's are not genetic per se in the sense that -- what?

BIESECKER:  Common disease common variant in your slide.  So --

QUESTION:  Are not --are not genetic diseases.  So c282y hemo, c1, for coagulation testing at this point for -- you know, I think there are a series of these that common disease, common variant that are now reaching quick clinical application and are really germaine.  But Eric, I wanted to ask you one thing also.  The issue came to mind when I was reading the piece as to whether this kind of whole genome sequencing is a toy, or is it a medical test?  So first question first.

BIESECKER:  So we don't disagree about this.  I was trying to be provocative, so thank you.  When you have a genetic variant which is present in let's say 40% of the population and the relative risk of that variant is 1.06, I don't think that's genetics.  That gets to the first question which is I don't think that's genetics because I think environment is actually more important in understanding what's going on with the patient than is that particular variant.  And returning such a variant without integrating environmental and other data I think is not genetics, not only that but not good genetics.  You have to think about models and incorporate the variables we understand, mix in the genetic variant, and give an aggregate risk to understanding what's going on with that patient.

QUESTION:  Except in a case where people are applying the genetics in the lab right now.

BIESECKER:  And you could also say for example sickle cell disease.  That is a common variant.  There are exceptions.  I’m trying to get your juices going a little.

GREEN:  So the question is are these sequencing technologies are a toy or medical test.  You gave us examples where it is a medical test, where it's a single nucleotide that you already know.  I don't mean to imply that I think it's going to be standard of care any time soon or if ever.  Maybe it will be more simplified that we'll understand in a clinical work up, it will be more sophisticated than what we're doing now.  In general, the technologies are certainly not a toy as we explore all of these things.  What that looks like in clinical practice, I readily admit, I don't know yet.

QUESTION:  Ken Crammer, NCI.  Wonderful talks.  Your project, you found a high frequency of very rare variations.  How do you know that these are not somatic mutations in the individuals in the blood or skin?  Are you looking to see if these are inheritable in the families?

BIESECKER: So you can be sampling blood and using fresh DNA in the technology we're using, you can conclude they are unlikely to be low frequency somatic variation.  As far as the family studies, we are not setting out to separate those but variation is clearly an important component of phenotype, an example, cancer, of course, as you know.  And understanding the relationship between the -- understanding the relationship, I think that will be best pursued through our colleagues in nci, a huge study that I’m sure you're aware, to understand the relationship of somatic variation.

QUESTION:  One other thought.  Very, very interesting presentations.  Thank you.  Drug response was mentioned as an environmental influence, and, of course, that can be looked at in that way.  But the drug response in itself is a very complex phenotype.  And I wonder if this project or others that you're involved with incorporate  pharmacogenomics in terms of these kind of analyses you're pursuing.

GREEN: I may have misspoken.  I meant environmental exposure was the drug used by the patient.  So what medical treatment they are currently under, I consider a environmental variable.  I completely agree with you.  Pharmacogenetics is a phenotype.  I think we'll have fantastic opportunities and we're working on pilot studies to understand the relationship between genetic variation and response to drugs in these subjects in these studies.  We're with you on that.

GALLIN: Thank you.  Thanks to our speakers and for the questions.  It was a great session.

[applause]

(Music fades in, under VO)

ANNOUNCER:  You’ve been listening to a pair of lectures discussing the emerging paradigm of clinical genomics. Eric Green, MD, PhD, scientific director, talked about technologic developments, while Dr. Leslie G. Biesecker discussed clinical implementation.  Again, you can see a closed-captioned videocast of this lecture by logging onto http://videocast.nih.gov -- click the "Past Events" link.  The NIH CLINICAL CENTER GRAND ROUNDS podcast is a presentation of the NIH Clinical Center, Office of Communications, Patient Recruitment and Public Liaison.  For more information about clinical research going on every day at the NIH Clinical Center, log on to http://clinicalcenter.nih.gov. From America’s Clinical Research Hospital, this has been NIH CLINICAL CENTER GRAND ROUNDS.  In Bethesda, Maryland, I’m Bill Schmalfeldt at the National Institutes of Health, an agency of the United States Department of Health and Human Services.


This page last reviewed on 05/4/09



National Institutes
of Health
  Department of Health
and Human Services
 
NIH Clinical Center National Institutes of Health