Commit a4fd034b authored by Kosmas Hench's avatar Kosmas Hench

update suppl fig documentation

parent 5ddd61bf
ID rID population
28383 R28383 H. unicolor
28384 R28384 H. puella
28385 R28385 H. nigricans
28387 R28387 H. nigricans
28390 R28390 H. nigricans
28391 R28391 H. unicolor
28394 R28394 H. nigricans
28399 R28399 H. nigricans
AG9RX46 RAG9RX46 H. nigricans
AG9RX47 RAG9RX47 H. puella
AG9RX48 RAG9RX48 H. puella
AG9RX49 RAG9RX49 H. nigricans
AG9RX50 RAG9RX50 H. nigricans
AG9RX51 RAG9RX51 H. puella
AG9RX52 RAG9RX52 H. nigricans
AG9RX53 RAG9RX53 H. puella
AG9RX54 RAG9RX54 H. unicolor
AG9RX55 RAG9RX55 H. unicolor
PL17_01 RPL17_01 H. unicolor
PL17_02 RPL17_02 H. puella
PL17_04 RPL17_04 H. puella
PL17_16 RPL17_16 H. puella
PL17_17 RPL17_17 H. puella
PL17_18 RPL17_18 H. puella
28386 R28386
ids population
R28383 uni
R28384 pue
R28385 nig
R28387 nig
R28390 nig
R28391 uni
R28394 nig
R28399 nig
RAG9RX46 nig
RAG9RX47 pue
RAG9RX48 pue
RAG9RX49 nig
RAG9RX50 nig
RAG9RX51 pue
RAG9RX52 nig
RAG9RX53 pue
RAG9RX54 uni
RAG9RX55 uni
RPL17_01 uni
RPL17_02 pue
RPL17_04 pue
RPL17_16 pue
RPL17_17 pue
RPL17_18 pue
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -13,7 +13,7 @@ knitr::opts_knit$set(root.dir = './F_scripts')
## Summary
This is the accessory documentation of Figure 1.
The Figure can be recreated by runing the **R** script F1.R:
The Figure can be recreated by running the **R** script F1.R:
```sh
cd $WORK/3_figures/F_scripts
......@@ -25,7 +25,7 @@ rm Rplots.pdf
## Details of F1.R
In the following, the individual steps of the R script are documented. Is an executable R script that depends on a variety of image manipulation and geodata packages.
Aditionally, the supporting R script (**F1.functions.R**) in needed:
Additionally, the supporting R script (**F1.functions.R**) in needed:
```{r head,results='hide',message=FALSE}
......@@ -51,10 +51,10 @@ source("../../0_data/0_scripts/F1.functions.R")
### Fig. 1 a)
First, we load the data sheet containing the sampling location as well as the population specifics for each sample and subsample it to include only the samples included in the resequencing study.
First, we load the data sheet containing the sampling location as well as the population specifics for each sample and sub sample it to include only the samples included in the re-sequencing study.
The Coordinates need to be transformed from (deg min sec) to decimal degrees.
The sample inventory is summarised over populations and merged with the coordinates later used for the sample pointers.
The sample inventory is summarized over populations and merged with the coordinates later used for the sample pointers.
```{r mapSampleData, results='hide', message=FALSE, warning=FALSE}
......@@ -84,7 +84,7 @@ worldmap = clipPolys(worldmap, xlim=xlimW,ylim=ylimW, keepExtra=TRUE)
```
External images are loaded for annotation.
This includes the hamlet images, the contry flags and legend.
This includes the hamlet images, the country flags and legend.
```{r mapAnnotationImgs, results='hide', message=FALSE, warning=FALSE}
# compas and legend
......@@ -117,7 +117,7 @@ sampSize <- data %>% group_by(spec,loc) %>% summarise(n=length(id)) %>% filter(s
rep(-98.75,3)+xpos)) # honduras
```
Next, we read in the shappe files containg the distribution of the halet species.
Next, we read in the shape files containing the distribution of the hamlet species.
One file contains the overlap of the distribution ranges of *H. nigricans*, *H. puella* and *H. unicolor*, while the other contains the distribution range of the whole genus (*Hypoplectrus*).
These distributions rely on the data provided by the [*Smithsonian Tropical Research Institute*](http://biogeodb.stri.si.edu/caribbean/en/research/index/range) that were processed using [`gimp`](https://www.gimp.org/) and [`qgis`](https://qgis.org/en/site/) (including the [`Georeferencer Plugin`](https://docs.qgis.org/2.8/en/docs/user_manual/plugins/plugins_georeferencer.html))
......@@ -180,7 +180,7 @@ p1
### Fig. 1 b)
Next we create the subfigure containing the *F~ST~* information (Fig. 1 b).
Next we create the sub figure containing the *F~ST~* information (Fig. 1 b).
We first read in the data and define the color palette.
```{r fstData, results='hide', message=FALSE, warning=FALSE}
......@@ -232,8 +232,8 @@ p5
### Fig. 1 c)
The last subfigure contains the PCA output for each location.
For those, we define the colorpalette for hamlet species and read in the PCA data generated by `pcadapt`.
The last sub figure contains the PCA output for each location.
For those, we define the color palette for hamlet species and read in the PCA data generated by `pcadapt`.
```{r pcaData, results='hide', message=FALSE, warning=FALSE}
......@@ -376,7 +376,9 @@ F1 <- ggdraw()+
y = c(.995,.42,.42),
label =letters[1:3])
```
It is then exported using `ggsave()`.
```{r exportF1, eval=FALSE}
ggsave(plot = F1,filename = '../output/F1.pdf',width = 183,height = 145,units = 'mm',device = cairo_pdf)
......
......@@ -14,7 +14,7 @@ knitr::opts_knit$set(root.dir = './F_scripts')
This is the accessory documentation of Figure 2.
The Figure can be recreated by runing the **R** script F2.R:
The Figure can be recreated by running the **R** script F2.R:
```sh
cd $WORK/3_figures/F_scripts
......@@ -100,8 +100,8 @@ nhuh <- read.csv('../../2_output/08_popGen/05_fst/nighon-unihon.50kb.5kb.windowe
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='NU',RUN='NHUH');
```
The individual spcies comparisons are combined into a single data set.
Then the order of the species comparisons and of the coparison types are set.
The individual species comparisons are combined into a single data set.
Then the order of the species comparisons and of the comparison types are set.
```{r combineData, results='hide', message=FALSE, warning=FALSE}
data <- rbind(pn,pu,nu,ppnp,ppup,npup,pbnb,pbub,nbub,phnh,phuh,nhuh)
......@@ -110,7 +110,7 @@ data$RUN <- factor(as.character(data$RUN),
data$COL <- factor(as.character(data$COL),levels=c('PN','PU','NU'))
```
Then, the geonme wide weighted mean of *F~ST~* is loaded.
Then, the genome wide weighted mean of *F~ST~* is loaded.
```{r gwData, results='hide', message=FALSE, warning=FALSE}
gwFST <- read.csv('../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_fst.txt',sep='\t')
......@@ -119,8 +119,8 @@ gwFST$RUN <- factor(gwFST$run,levels=levels(data$RUN))
### Computing 99.98 *F~ST~* percentiles
The upper 99.98 *F~ST~* percintile is computed per run.
Then windows above this threshold are indentified for each global run.
The upper 99.98 *F~ST~* percentile is computed per run.
Then windows above this threshold are identified for each global run.
```{r fstThresholds, results='hide', message=FALSE, warning=FALSE}
threshs <- data %>% group_by(RUN) %>% summarise(thresh=quantile(WEIGHTED_FST,.9998))
......@@ -166,7 +166,7 @@ yl <- expression(italic('F'[ST]))
secScale <- 20
```
The baseplot is done with `ggplot()`.
The base plot is done with `ggplot()`.
```{r basePlot, results='hide', message=FALSE, warning=FALSE}
p1 <- ggplot()+
......@@ -259,7 +259,15 @@ F2 <- ggdraw(p1)+
draw_grob(panGrob, 0.954, boxes$y[3]+.78*yd, .045, .045)
```
It is then exported using `ggsave()`.
The *F~ST~* outlier windows are then exported to be used as reference in other figures.
```{r exportMusks, eval=FALSE}
write.table(x = data2, file = 'fst_outlier_windows.txt',sep='\t',quote = F,row.names = F)
write.table(x = musks, file = 'fst_outlier_ID.txt',sep='\t',quote = F,row.names = F)
```
The figure is then exported using `ggsave()`.
```{r, eval=FALSE}
ggsave(plot = F2,filename = '../output/F2.png',width = 183,height = 183,dpi = 200,units = 'mm')
......@@ -272,4 +280,5 @@ ggsave(plot = F2,filename = '../output/F2.png',width = 183,height = 183,dpi = 20
F2
```
</center>
---
\ No newline at end of file
......@@ -44,11 +44,11 @@ source('../../0_data/0_scripts/F3.functions.R');
source('../../0_data/0_scripts/F3.plot_fun.R')
```
The script `F3.plot_fun.R`contains a function (`create_K_plot()`) that plots a single genomic range (a single subfigure) as seen in Figure 3.
The Details of this scriptare explained below.
The script `F3.plot_fun.R`contains a function (`create_K_plot()`) that plots a single genomic range (a single sub figure) as seen in Figure 3.
The Details of this script are explained below.
Here we apply the `create_K_plot()` function one for every subplot (a,b,c & d).
Within the function we specify the curent LG, the annotation file, the extand of the region in question, candiate genes, genes of secondary interest, SNP positions of interest, and the outlier region ID as defined in Figure 2.
Within the function we specify the current LG, the annotation file, the extend of the region in question, candidate genes, genes of secondary interest, SNP positions of interest, and the outlier region ID as defined in Figure 2.
```{r subPlots,results='hide',message=FALSE, warning=FALSE}
p1 <- create_K_plot(searchLG = "LG09",gfffile = '../../1-output/09_gff_from_IKMB/HP.annotation.named.gff',xr = c(17800000,18000000),
......@@ -158,7 +158,7 @@ Then the different data types are imported using the previously loaded import fu
data_dxy<-dxy_list$data_dxy
```
Then the color themes, axsis labels further global plotting settings are set.
Then the color themes, axis labels further global plotting settings are set.
```{r, eval=FALSE}
clr <- c('#fb8620','#1b519c','#d93327')
......@@ -230,8 +230,8 @@ The first data track contains the gene models from the annotation file as well a
panel.border = element_blank());
```
To achieve propper formatting, the gene labels are adjusted independently depending on the outlier region ID.
This is done after ploting to be able to represent special formats within the plot labels (eg. greek letters *AND* italics)
To achieve proper formatting, the gene labels are adjusted independently depending on the outlier region ID.
This is done after plotting to be able to represent special formats within the plot labels (eg. Greek letters *AND* italics)
```{r, eval=FALSE}
if(muskID=='A'){
......@@ -257,7 +257,7 @@ This is done after ploting to be able to represent special formats within the pl
}
```
The second data track contans the *F~ST~* values.
The second data track contains the *F~ST~* values.
```{r, eval=FALSE}
p12 <- ggplot()+
......@@ -285,7 +285,7 @@ The second data track contans the *F~ST~* values.
plotSET
```
The third data track contains th GxP p values.
The third data track contains the GxP p values.
```{r, eval=FALSE}
p13 <- ggplot()+
......@@ -330,7 +330,7 @@ The fourth data track contains the d~XY~ values.
```
Finally the individual tracks are stacked and aligned using `plot_grid()` (`cowplot` package).
The complete subfigure is then returned by the function.
The complete sub figure is then returned by the function.
```{r, eval=FALSE}
p2 <- plot_grid(p11,p12,p13,p14,
......
......@@ -213,7 +213,7 @@ The position of some of the gene labels need shifting to avoid overlapping.
genes$labPOS[genes$name %in% c("sws2a\u03B1","sws2a\u03B2","sws2b","lws")] <- genes$labPOS[genes$name %in% c("sws2a\u03B1","sws2a\u03B2","sws2b","lws")]+c(-16,-1,12,20)
```
Then we define a data frame for the poligons that zoom onto the gene position within the annotation/legend of subfigure a.
Then we define a data frame for the polygons that zoom onto the gene position within the annotation/legend of sub figure a.
```{r eval=FALSE}
GZrS <- 40000;GZrS2 <- 20000; #gene zoom offset
......@@ -249,15 +249,15 @@ Some parameters are set for plotting (base colors for the LD color gradient, gen
GLABoffset <- 8
```
At this point, the data set selection is checked. If the current selection is the global data set, the additional annotaion is included.
At this point, the data set selection is checked. If the current selection is the global data set, the additional annotation is included.
```{r eval=FALSE}
if(sel %in% c(1,8)){
```
For the additional annotaion, several helpers are needed and defined as data frames.
The construction of these is obscurred by the fact that in the base plot, the later horizontal bands are plottet tilted by an angle of 45 degrees.
Therfore, a simple y offset affects both x and y axsis in the base plot.
For the additional annotation, several helpers are needed and defined as data frames.
The construction of these is obscured by the fact that in the base plot, the later horizontal bands are plotted tilted by an angle of 45 degrees.
Therefore, a simple y offset affects both x and y axis in the base plot.
```{r eval=FALSE}
rS <- 100000 # width of grey annotation band
......@@ -419,7 +419,7 @@ Finally, the function returns the current base plot stored in `p1`.
Within this script, the ILD data of the genome wide subsets of SNPs is loaded and and plotted.
First we read in the global data set, then we loop over the individual population subsets and append the data to the gobal data set.
First we read in the global data set, then we loop over the individual population subsets and append the data to the global data set.
Since we're only interested in the distribution of r^2, we select only this column and create extra columns for the run ID, and the run type (global, subset by species, subset by location)
```{r eval=FALSE}
......@@ -450,7 +450,7 @@ BW$run <- factor(BW$run,levels=c('Global',"Panana","Belize","Honduras",
"H. nigricans","H. puella","H. unicolor"))
```
To be able to include the mean values to the boxplots we create a small summary table of the LD data.
To be able to include the mean values to the box plots we create a small summary table of the LD data.
```{r eval=FALSE}
dt2 <- BW %>%
......@@ -525,7 +525,7 @@ dataBoxGenes$xS <- factor(dataBoxGenes$run,
levels = c("global","boc","bel","hon","nig","pue","uni"))
```
To be able to include the mean values to the boxplots we create a small summary table of the LD data.
To be able to include the mean values to the box plots we create a small summary table of the LD data.
```{r eval=FALSE}
BoxGenes_summary <- dataBoxGenes %>%
......
......@@ -14,7 +14,7 @@ knitr::opts_knit$set(root.dir = './F_scripts')
This is the accessory documentation of Figure 5.
The Figure can be recreated by runing the **R** script F5.R:
The Figure can be recreated by running the **R** script F5.R:
```sh
cd $WORK/3_figures/F_scripts
......@@ -30,7 +30,7 @@ Is an executable R script that depends on a variety of image manipulation and da
It Furthermore depends on the R scripts `F4.functions.R` (located under `$WORK/0_data/0_scripts`).
This script is a modification of sript F4.R. It uses the same funtions and just differs in the data sets and settings
This script is a modification of script F4.R. It uses the same functions and just differs in the data sets and settings
```{r head,results='hide',message=FALSE}
library(tidyverse)
......
......@@ -130,6 +130,9 @@ p1 <- ggplot()+
)+
facet_grid(RUN~.)
write.table(x = data2, file = 'fst_outlier_windows.txt',sep='\t',quote = F,row.names = F)
write.table(x = musks, file = 'fst_outlier_ID.txt',sep='\t',quote = F,row.names = F)
legGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/legend-pw-cairo.svg"))))
carGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/caribbean-cairo.svg"))))
panGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/panama-cairo.svg"))))
......
......@@ -9,8 +9,8 @@ source('../../0_data/0_scripts/F4.functions.R')
plts <- list()
for(k in 1:10){
plts[[k]] <- trplot(k)
for(k in 1:7){
plts[[k]] <- trplot(k)
}
tN <- theme(legend.position = 'none')
......@@ -72,5 +72,4 @@ F4 <- ggdraw(pGr1)+
y=c(.99,.81,.81,.57,.57),
size = 14)
#ggsave(plot = F4,filename = '../output/F4..pdf',width = 183,height = 230,units = 'mm',device = cairo_pdf)
ggsave(plot = F4,filename = '../output/F4.png',width = 183,height = 235,units = 'mm',dpi = 150)
#!/usr/bin/env Rscript
library(grid)
library(gridSVG)
library(grImport2)
library(grConvert)
library(tidyverse)
library(cowplot)
library(hrbrthemes)
karyo <- read.csv('../../0_data/0_resources/F2.karyo.txt',sep='\t') %>%
mutate(GSTART=lag(cumsum(END),n = 1,default = 0),
GEND=GSTART+END,GROUP=rep(letters[1:2],12)) %>%
select(CHROM,GSTART,GEND,GROUP)
# global -------------
pn <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/pue-nig-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PN',RUN='PN');
pu <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/pue-uni-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PU',RUN='PU');
nu <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/nig-uni-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='NU',RUN='NU');
# PAN -------------
ppnp <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/pueboc-nigboc-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PN',RUN='PPNP');
ppup <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/pueboc-uniboc-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PU',RUN='PPUP');
npup <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/nigboc-uniboc-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='NU',RUN='NPUP');
# Bel -------------
pbnb <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/puebel-nigbel-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PN',RUN='PBNB');
pbub <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/puebel-unibel-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PU',RUN='PBUB');
nbub <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/nigbel-unibel-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='NU',RUN='NBUB');
# Hon -------------
phnh <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/puehon-nighon-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PN',RUN='PHNH');
phuh <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/puehon-unihon-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='PU',RUN='PHUH');
nhuh <- read.csv('../../2_output/08_popGen/06_GxP/smoothed/50kb/nighon-unihon-gemma.lm.50k.5k.smooth',sep='\t') %>%
merge(.,(karyo %>% select(-GEND,-GROUP)),by='CHROM',allx=T) %>%
mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL='NU',RUN='NHUH');
# ------------------------------------------
data <- rbind(pn,pu,nu,ppnp,ppup,npup,pbnb,pbub,nbub,phnh,phuh,nhuh)
data$RUN <- factor(as.character(data$RUN),
levels=c('PN','NU','PU','PPNP','NPUP','PPUP','PBNB','NBUB','PBUB','PHNH','NHUH','PHUH'))
data$COL <- factor(as.character(data$COL),levels=c('PN','PU','NU'))
data2 <- read.csv('fst_outlier_windows.txt', sep='\t')
musks <- read.csv('fst_outlier_ID.txt', sep='\t')
clr <- c('#fb8620','#d93327','#1b519c')
annoclr <- rgb(.4,.4,.4)
lgclr <- rgb(.9,.9,.9)
yl <- expression(paste('-log'[10],' (',italic('p'),'value)',sep=''))
p1 <- ggplot()+
geom_rect(inherit.aes = F,data=karyo,aes(xmin=GSTART,xmax=GEND,
ymin=-Inf,ymax=Inf,fill=GROUP))+
geom_vline(data=data2,aes(xintercept=xmean),col=annoclr,lwd=.2)+
geom_point(data=data,aes(x=GPOS,y=avgp_wald,col=COL),size=.01)+
geom_text(data=musks,aes(x=muskX,y=9.5,label=musk))+
scale_y_continuous(name = yl,breaks=0:4*2.5,labels = c(0,'',5,'',10))+
scale_x_continuous(expand = c(0,0),breaks = (karyo$GSTART+karyo$GEND)/2,labels = 1:24,position = "top")+
scale_color_manual(name='comparison',values=clr)+
scale_fill_manual(values = c(NA,lgclr),guide=F)+
theme_ipsum(grid = F,
axis_title_just = 'c',
axis_title_family = 'bold',
strip_text_size = 9)+
theme(plot.margin = unit(c(2,25,40,0),'pt'),
panel.spacing.y = unit(3,'pt'),
legend.position = 'none',
axis.title.x = element_blank(),
axis.ticks.y = element_line(color = annoclr),
axis.line.y = element_line(color = annoclr),
axis.text.y = element_text(margin = unit(c(0,3,0,0),'pt')),
axis.title.y = element_text(vjust = -1),
strip.background = element_blank(),
strip.text = element_blank()
)+
facet_grid(RUN~.)
legGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/legend-pw-cairo.svg"))))
carGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/caribbean-cairo.svg"))))
panGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/panama-cairo.svg"))))
belGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/belize-cairo.svg"))))
honGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/honduras-cairo.svg"))))
ys <- .0765
ysc <- .0057
yd <- .2195
boxes = data.frame(x=rep(.9675,4),y=c(ys,ys+yd+ysc,ys+(yd+ysc)*2,ys+(yd+ysc)*3))
S02 <- ggdraw(p1)+
geom_rect(data = boxes, aes(xmin = x, xmax = x + .02, ymin = y, ymax = y + yd),
colour = NA, fill = c(rep(lgclr,3),annoclr))+
draw_label('Global', x = .9775, y = boxes$y[4]+.08,
size = 9, angle=-90,col='white')+
draw_label('Belize', x = .9775, y = boxes$y[2]+.08,
size = 9, angle=-90)+
draw_label('Honduras', x = .9775, y = boxes$y[1]+.08,
size = 9, angle=-90)+
draw_label('Panama', x = .9775, y = boxes$y[3]+.08,
size = 9, angle=-90)+
draw_grob(legGrob, 0.01, 0, 1, 0.08)+
draw_grob(carGrob, 0.954, boxes$y[4]+.78*yd, .045, .045)+
draw_grob(belGrob, 0.954, boxes$y[2]+.78*yd, .045, .045)+
draw_grob(honGrob, 0.954, boxes$y[1]+.78*yd, .045, .045)+
draw_grob(panGrob, 0.954, boxes$y[3]+.78*yd, .045, .045)
ggsave(plot = S02,filename = '../output/S02.png',width = 183,height = 183,dpi = 200,units = 'mm')
#ggsave('fst_mac02_weight_99.98.eps',width = 183,height = 183,dpi = 200,units = 'mm')
#!/usr/bin/env Rscript
library(tidyverse)
library(cowplot)
library(tximport)
library(DESeq2)
library(vsn)
library(gplots)
library(ggforce)
library(grid)
library(gridSVG)
library(grImport2)
library(grConvert)
clp <- function(x){y<-substr(x,2,nchar(x));y}
base_dir <- "../../2_output/09_expression/kallisto/"
samples <- substr(dir(base_dir),10,nchar(dir(base_dir)))
kal_dirs <- sapply(samples, function(id) file.path(base_dir, paste("kallisto_",id,sep='')))
pops <- read.csv('../../0_data/0_resources/retina_samples_ID.txt',sep='\t')[1:24,] %>% mutate(pop=substr(population,4,6))
s2c <- data.frame(path=kal_dirs, sample=samples, population = pops$pop, stringsAsFactors=FALSE)
t2g <- read.csv('../../0_data/0_resources/transcripts2genname.txt',sep='\t',stringsAsFactors = F)
retDesign <- read.csv('../../0_data/0_resources/retina_samples_ID_altVersion.csv',sep='\t',row.names=1)
txi.kallisto <- tximport(paste(kal_dirs,"/abundance.tsv",sep=''),
type = "kallisto", txOut=T,tx2gene = t2g,
geneIdCol="transcripts2genname.txt")
testdf <- data.frame(abund = row.names(txi.kallisto$abundance),
count = row.names(txi.kallisto$counts),
length=row.names(txi.kallisto$length),
t1=t2g$target_id,
t2=t2g$HPgene) %>% mutate(check = (abund!=count)+(abund!=length)+(abund!=t1))
row.names(txi.kallisto$abundance) <- t2g$HPgene
row.names(txi.kallisto$counts) <- t2g$HPgene
row.names(txi.kallisto$length) <- t2g$HPgene
samples <- pops%>%select(rID,pop);row.names(samples) <- samples$rID
dds <- DESeqDataSetFromTximport(txi.kallisto, samples, ~pop)
dds <- DESeq(dds)
res <- list()
res[[1]] <- results(dds,contrast = c("pop","pue","nig"))
res[[1]] <- res[[1]][order(res[[1]]$padj),]
res[[2]] <- results(dds,contrast = c("pop","pue","uni"))
res[[2]] <- res[[2]][order(res[[2]]$padj),]
res[[3]] <- results(dds,contrast = c("pop","nig","uni"))
res[[3]] <- res[[3]][order(res[[3]]$padj),]
selGene <- c(rownames(res[[1]])[1:3],
rownames(res[[3]])[1:3],
rownames(res[[2]])[1:3],
"SWS2abeta","SWS2aalpha","SWS2b","LWS",
"Casz1_2-1","Casz1_2-2","Casz1_2-3","invs",
"RORB")
select <- (counts(dds,normalized=TRUE) %>% rownames()) %in% selGene
hmcol <- colorRampPalette(viridis::plasma(9))(100)
annodat <- data.frame(xmin=c(.5,9.5,19.5),
xmax=c(9.5,19.5,24.5),
ymin=rep(.1,3),
ymax=rep(.4,3),
clr=c('black',"#d45500",'black'),
fll=c('black',"#d45500",'white'))
rld <- rlogTransformation(dds, blind=TRUE)
testdat <- as.data.frame(assay(rld)[select,order(retDesign$population)])%>%
tibble::rownames_to_column(var = 'gene') %>%
gather(.,key = 'sample',value = 'counts',2:25)
ordS <- retDesign %>% tibble::rownames_to_column(var = 'Sample') %>% arrange(population)
testdat$sample <- factor(as.character(testdat$sample),levels = ordS$Sample)
testdat$gene <- factor(as.character(testdat$gene),levels =rev(selGene[!duplicated(selGene)]))
d2 <- data.frame(name = factor(rev(selGene[!duplicated(selGene)]),
levels=rev(selGene[!duplicated(selGene)])))
d3 <- left_join(data.frame(name=selGene,comp=c(rep('np',3),rep('nu',3),
rep('pu',3),rep('vision',9))),d2,by='name') %>%
group_by(name) %>% mutate(n = 1/length(comp),s=(row_number()-1)/2*2*pi,e=s+n*2*pi) %>%
ungroup() %>% mutate(name = as.character(name),clr=c('#fb8620','#d93327','#1b519c','#333333')[as.numeric(comp)])
d3$name <- factor(d3$name,levels=rev(selGene[!duplicated(selGene)]))
testdat <- testdat %>% mutate(lab = tolower(gene))
YL <- expression(italic("rorb"),italic("invs"),
italic("casz1"^3),
italic("casz1"^2),italic("casz1"^1),italic("lws"),
italic("sws2b"),italic(paste(sws2a,"\u03B1")),
italic(paste(sws2a,"\u03B2")),italic("hpv1g3179"),
italic("wdfy3_1"),italic("hmgxb3"),italic("tuba1a"),
bolditalic("cyp3a27"),bolditalic("sema4b"),italic("fam96b"),bolditalic("chac1_1"))
testdat$sample <- factor(clp(as.character(testdat$sample)),levels=clp(levels(testdat$sample)))
p2 <- ggplot(testdat,aes(x=sample,y=gene,fill=counts))+
geom_tile()+coord_equal(xlim = c(0.5,26.1))+
geom_rect(inherit.aes = F,data=annodat,
aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax),col='black',fill=annodat$fll)+
geom_circle(inherit.aes = F,data=d2, aes(x0 = 25.2,y0=as.numeric(name),r=.4),fill=rgb(.9,.9,.9))+
geom_arc(inherit.aes = F,
aes(x0 = 25.2,y0 = as.numeric(as.factor(name)), r = .4, start = s, end = e,col=comp),size=1.5,
data = d3)+
scale_x_discrete(name='',expand = c(0,0))+
scale_y_discrete(name='Gene',labels=YL)+
viridis::scale_fill_viridis(name="Expression: ",option = 'C',
guide = guide_colorbar(direction = "horizontal",title.position = "top"))+
scale_color_manual(name="",values=c('#fb8620','#d93327','#1b519c','#666666'),guide=F)+
theme_minimal(base_size = 13)+
theme(axis.text.x = element_text(angle=90),
plot.margin = unit(c(1,1,50,1),'pt'),
axis.text.y = element_text(face = 'italic'),
legend.position = c(.05,-.28),panel.grid = element_blank(),legend.key.width = unit(30,'pt'))
legGrob <- gTree(children=gList(pictureGrob(readPicture("../../0_data/0_img/legend-expression-cairo.svg"))))
Cdf3 <- as.data.frame(assay(rld)[,order(retDesign$population)]) %>%
tibble::rownames_to_column(var = 'gene') %>%
gather(key = "Sample",value = "Count",R28385:RPL17_01)
selGene2 <- c("SWS2abeta","SWS2aalpha","SWS2b","LWS",
"Casz1_2-1","Casz1_2-2","Casz1_2-3",
"invs","RORB")
Cgenes2 <- as.data.frame(assay(rld)[,order(retDesign$population)]) %>%
tibble::rownames_to_column(var = 'gene') %>%
filter(gene %in% selGene2) %>%
gather(key = "Sample",value = "Count",R28385:RPL17_01)
Flab <- expression("All",italic("casz1"^1),italic("casz1"^2),italic("casz1"^3),
italic("invs"),italic("lws"),italic("rorb"),
italic(paste(" sws2a","\u03B1")),italic(paste(" sws2a","\u03B2")),italic("sws2b"))
p3 <- ggplot(Cdf3,aes(x=0,y=Count))+
geom_boxplot(aes(fill='all'))+
geom_boxplot(inherit.aes = F,
data=Cgenes2,
aes(x=as.numeric(as.factor(gene)),y=Count,
group=gene,fill=gene))+
scale_x_continuous('Gene',expand=c(0,0),breaks=0:9,labels=Flab)+
scale_y_continuous(expression(Count~(italic(rld(n)))),expand=c(0,0))+
scale_fill_manual('Gene',values = c('lightgray',viridis::viridis(9)),
labels=Flab)+
theme_minimal(base_size = 13)+
theme(legend.position = 'none',
axis.text.x = element_text(angle=90))
S04 <- ggdraw()+
draw_plot(p2,0,0,.6,1)+
draw_plot(p3,.61,.17,.39,.775)+
draw_grob(legGrob, 0.2, 0, 0.8, 0.17)+
draw_plot_label(x=c(0,.6),y=c(.99,.99),label = c('a','b'))
ggsave(plot = S04, filename = '../output/S04.pdf',width=12,height=6,device = cairo_pdf)
#!/usr/bin/env Rscript
library(grid)
library(gridSVG)
library(grImport2)
library(grConvert)
library(tidyverse)
library(cowplot)
library(hrbrthemes)
karyo <- read.csv('../../0_data/0_resources/F2.karyo.txt',sep='\t') %>%
mutate(GSTART=lag(cumsum(END),n = 1,default = 0),
GEND=GSTART+END,GROUP=rep(letters[1:2],12)) %>%
select(CHROM,