(http://lagan.stanford.edu/lagan_web/index.shtml). Enter sequences, enter in the
species names down below, and then submit.
a. Draw out a rough sketch or screenshot of the tree (use organism names)
b. Open the VISTA Browser link for chimp. Which species has
by far the longest conservation with chimp?
Humans have the longest conservation with chimps
c. Which species has the least conservation with chimp?
Zebrafish have the least amount of conservation with chimp.
d. Run a Clustal-W2 alignment of the DNAs ( ).
Copy and paste here CLUSTAK guide tree. Does the tree agree with the
LAGAN tree?
The trees agree in some parts but
differ in many other parts. I
performed this tree with a Maximum
likelihood method and it shows that
humans are closer to cows than
chimps, which is represented in the
LAGAN tree.
Zebrafish
Frog
Mouse
Cow
Chimp
Human
c o w g e n o m i c
h u m a n g e n o m i c
m o u s e g e n o m i c
c h i m p g e n o m i c
f r o g
z e b r a f i s h g e n o m i c
0 . 2 0This study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
2. Using the sequences with gi numbers 513174699 and 345782223 ( or,
XM_003640935.2 and XM_540154 respectively) , run pairwise LAGAN, which can
be found at http://lagan.stanford.edu/lagan_web/index.shtml. Click the Vista Point
link for 345782223.
a. Give the approximate nucleotide positions of an exon?
b. Under Tools, click the CNS link. Where are the reported conserved regions?
c. What is the listed percent identity in those regions?
d. Why is there no phylogenetic tree produced by pairwise-LAGAN?
3. Retrieve the problem 3 cow sequence from Blackboard. Run a BLAT search
(http://genome.ucsc.edu/cgi-bin/hgBlat) of the human genome (Feb. 2009).
a. What is the percent identity of the match with the longest span?
The percent identity is 89.9%.
b. Click the browser link for that hit. What is the chromosomal location?
It is located on chromosome 11.
c. What is the gene name under RefSeq Genes (you may have to click to the left
to expand the RefSeq region)?
The gene is CSRP3.
d. Set the Flagged SNPs track to pack. How many SNPs in this region are
flagged as having a clinical association?
There are a total of 27 flagged clinical association.
e. How many of those are non-synonymous SNPs (red)?
There are a total of 22 non-synonymous SNPs
f. What is a non-synonymous SNP?
Non-synonymous SNP are changes that change the sequence in the protein.
g. Turn the Genscan Genes track to full. Are there any genes predicted in this
region?
There is a gene in the predicted region.
h. How well does the pattern of exons from Genscan match the actual exons?
The exons match almost exactly with the Genscan gene except for two exon.This study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
i. Click the gene name (under UCSC Genes). Under Orthologous Genes in
Other Species, list the three species in the table with a known ortholog.
The three species are Mouse, rat, and zebrafish.
j. Under Other Names for this Gene, click the RefSeq accession link. According
to the summary, what are the consequences of mutations in this gene?
Mutations in this gene can cause hereditary hypertrophic cardiomyopathy
(HCM) and dilated cardiomyopathy (DCM).
k. According to the summary, are there alternatively spliced transcripts?
There are alternative splices, but it encodes for the exact same protein.
l. Where does the difference occur in the alternate transcripts?
There are different 5’UTR.
m. What effect does that variation have on the protein product?
There is no change in the effects.
4. Go to the VISTA Point (http://pipeline.lbl.gov/cgi-bin/gateway2) and search the
mouse genome (Dec 2011 assembly) for APEX2.
a. What is the position listed in the Position window.
The position of this gene is at chrX:150,571,761-150,572,744
b. Looking at the human conservation, how many exons are shown?
There are six exons shown.
c. Add rat and dog tracks. What species has the highest conservation with mouse
in the introns?
The rat has the highest conservation of the introns.
d. Which exon is the longest (from right)?
The longest exon from the right is 182 bp
e. Go back to VISTA Point and enter APEX2 for human (Feb. 2009). What is the
longest exon (from left this time)? Be careful of wrap-around.
The longest exon is 182bp.
f. In which intron does mouse have a long region that is not well conserved?
55031692 (147008848) to 55031802 (147008742) = 111bp at 72.1% intronThis study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
5. Go to the Caenorhabditis elegans BLAST page
(http://www.wormbase.org/tools/blast_blat) and enter the protein sequence with gi
#147903687. Make sure you use blastp and the Database is C. elegans (WS238) pro–
teins. (Could not find WS238. The databases jumped from WS230 to WS240)
a. What is the E-value of the best match?
The best E-value is 9e-15
b. Open the Gene Summary link of the batch match. What is the gene name?
The gene name is lin-28
c. How many transcripts (gene models) are shown for this gene in the Location
graphic? Write their identity.
There are a total of 2 transcripts. One LIN-28 isoform a and the other is iso–
form b. Isoform a is 227aa and isoform b is 196aa.
d. What are the protein lengths?
Isoform a: 227aa
Isoform b: 196aa
e. The exon boundaries suggest how many coding exons for each?
The exon boundaries suggest that there are a total of 2 coding exons.
6. Use PubMed to locate a paper from Neurology by Y.J. Wu and C.C. Lin (first two
authors).
a. Follow the protein (RefSeq) link and write the gi# and protein name for the
human sodium channel.
Protein name: Sodium channel protein type 4 subunit alpha
Gi#: 40255316
b. Do a blastp search of the Reference proteins vs. mammals. Find the horse
(Equus caballus) protein. Name the protein and its gi#.
Protein: sodium channel protein type 4 subunit alpha
Gi#:149723244
c. Who is the last author the Wang et al. 1992 reference?
The last author of the reference is E.P. HoffmanThis study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
d. Use that author’s name in PubMed to find a paper titled “Voltage-Gated
Ion…” What journal?
The journal is Annual Review of Medicine from 1995.
The investigation into these genes/proteins began when the author (a genetic screener
specializing in family histories) was asked to investigate why top quarter horses were
dying for no apparent reason. The breeding industry wanted to keep this phenomenon
secret (it stood to lose a lot of money). He found that many of the dead horses were
descendants of a horse called “Impressive” and had a defect in that protein, caused by a
single amino acid change.
e. What specific amino acid causes the horse defect (look at figure legends)?
The change of phenylalanine to leucine is what causes this disease.
f. What is the analogous disease in humans?
The analogous disease in human is hyperkalemic periodic paralysis.
g. According to the paper, how was the mouse model for this disease identified?
The disease was identified with the mouse’s inability to stand upright after
being turned on its side.
7. In the NCBI UniGene database, use the query zebrafish [orgn] AND “lactase-like a”
(lactase-like a is in quotes) and open the UniGene record.
a. Click on the EST profile link to see the distribution of these transcripts in
various EST library categories. What tissue/body sites has the highest
expression of this gene?
The eye has the highest expression of this gene.
b. What two developmental stages have significant expression?
The two most significant expression are Pharyngula and the adult stages of
development.
c. Go back to the UniGene record. Click the “show more entries with profiles
like this” link after EST Profile. You will see expression neighbors. How
many of the first 20 results have “crystallin” in the title?
There a total of 9 entries out of 20 that have crystallin in the title.This study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
d. Back at the original Lactase record, under Links (upper right), click
Homologous UniGene (not HomoloGene). You should see sequence similar
clusters in other organisms. How many of the results are from mammals?
There are 6 results that are from mammals.
e. Find the mouse (Mus musculus) lactase-like in the mammalian result. Open
the record and click EST Profile to verify a similar gene expression pattern in
mice. What is the number in the column (left of the spot) for eye?
The number in the column for eye is 53.
f. From the mouse lactase-like record, follow the GEO Profiles link. Look at the
record whose ID ends in -23 (should be first result). What is the Data Set
type?
The data set type is expression profiling by array.
g. Click the graph associated with that GDS record. What two time points are
being compared for adipose?
The two time points are day 5 and 14.
h. Go back to the expression profile. Which time point clearly has higher
expression? Briefly explain.
The time point for day 14 is higher in expression levels. At a log ratio of 0.6,
there is a much greater increase in expression compared to a below 0 log ratio
value.
i. Use the GDS number to search Data Sets. What is the platform accession
number?
The accession number is GPL891
j. Open the platform link. Look at the description. What is the title of the
Platform?
The title is Agilent-011978 Mouse microarray G4121A
k. What company manufactured the platform?
Agilent technologies
l. What is the technology type?
It is an in situ oligonucleotide.This study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
m. How many total rows (probes) are in the table?
There are a total of 215 rows in the table.
8. Do a search of the NCBI Gene database for HTT limited to the gene symbol field and
human limited to the organism field. Open the appropriate record.
a. In what disease is huntingtin implicated ? An increase in the length of what
type of repeat causes the disease?
The huntingtin gene is implicated in the Huntington’s disease. There is an
increase in the polyglutamine repeats.
b. How many kilobases are in the huntingtin locus and how many exons? Follow
the Map Viewer link. In what cytogenetic location of the chromosome is
huntingtin (e.g. 1p11.1)?
There are 180 kb and 67 exons in the huntingtin locus. The location of
huntingin is located at 4p16.3.
c. Open the protein (pr) link for the HTT. How many proteins come up?
41 proteins come up.
Look at the other links on the genome record (go back to the chromosome).
d. How many STSs are in this region?
There is only one STS in this region.
e. At the chromosome, follow the OMIM link. Open the HTT link and click
Table of Contents. Look at Molecular Genetics. According to Duyao et al.,
what is the range of CAG repeats in HD patients?
The CAG repeats in HD patients is between 37-86.
9. At the NCBI Genome database, find the genome of Thermomonospora curvata.
Click the graphics link. Go to Tools/Search and enter “Tcur_3981.”
a. Find the record in the protein database. What is protein (enzyme) name and
accession number?
The enzyme name is Propanoyl-CoA C-acyltransferase and the accession
number is ACY99510
b. Write the length and EC number of the protein?This study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
EC number: 2.3.1.176
Length: 385aa
c. Follow the BLink link for BLAST neighbors. How many hits in archaea,
bacteria and fungi?
There are 1598 hits in archaea, 27047 hits in bacteria, and 555 in fungi.
d. Limit the output to fungal species. What organism contains the best matches?
The best match is from Lichtheimia corymbifera.
e. Go back to showing all BLink results (reset selection) and click Taxonomy
Report. Find the best match in Actinomadura atramentaria. What is its gi
number?
The gi# is gi|518459692
f. Open the gi link. Write the length and calculated MW of the enzyme.
Lenth is 385aa and the MW is 40716
g. What role does sterol carrier protein (SCP) plays?
It’s role is in multiple interactions of intracellular lipid circulation and
metabolism.
h. Find and list three BioSystems pathways associated with this enzyme.
The three pathways that this protein is associated are fatty acid metabolism,
biosynthesis of unsaturated fatty acids, and butanoate metabolism.This study source was downloaded by 100000817883310 from CourseHero.com on 05-12-2022 12:07:13 GMT -05:00
https://www.coursehero.com/file/22009348/Ibtx-Lab4/
Powered by TCPDF (www.tcpdf.org)