Commit 24f7697f authored by Kosmas Hench's avatar Kosmas Hench

Figgure one corrected

parent 489dd18f
......@@ -196,7 +196,7 @@ fstdata <- read.csv('../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_f
pairclass=paste(pop1,pop2,sep='-'))
# defining the pairwise color palette
clrP <- c('#fb8620','#1b519c','#d93327')
clrP <- c('#1b519c','#fb8620','#d93327')
```
And than generate the bar plot:
......
......@@ -92,7 +92,7 @@ fstdata <- read.csv('../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_f
pop1 = substr(as.character(pair),1,1),
pop2 = substr(as.character(pair),3,3),
pairclass=paste(pop1,pop2,sep='-'))
clrP <- c('#fb8620','#1b519c','#d93327')
clrP <- c('#1b519c','#fb8620','#d93327')
p5 <- ggplot(fstdata,aes(x=x,y=fst,fill=pairclass))+
geom_bar(stat='identity')+
......
This source diff could not be displayed because it is too large. You can view the blob instead.
No preview for this file type
......@@ -201,7 +201,7 @@ fstdata <- read.csv('../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_f
pairclass=paste(pop1,pop2,sep='-'))
# defining the pairwise color palette
clrP <- c('#fb8620','#1b519c','#d93327')
clrP <- c('#1b519c','#fb8620','#d93327')
```
And than generate the bar plot:
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......@@ -380,7 +380,7 @@ ccc&lt;-<span class="kw">rgb</span>(<span class="dv">0</span>,.<span class="dv">
<span class="dt">pairclass=</span><span class="kw">paste</span>(pop1,pop2,<span class="dt">sep=</span><span class="st">&#39;-&#39;</span>))
<span class="co"># defining the pairwise color palette</span>
clrP &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&#39;#fb8620&#39;</span>,<span class="st">&#39;#1b519c&#39;</span>,<span class="st">&#39;#d93327&#39;</span>)</code></pre></div>
clrP &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&#39;#1b519c&#39;</span>,<span class="st">&#39;#fb8620&#39;</span>,<span class="st">&#39;#d93327&#39;</span>)</code></pre></div>
<p>And than generate the bar plot:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">p5 &lt;-<span class="st"> </span><span class="kw">ggplot</span>(fstdata,<span class="kw">aes</span>(<span class="dt">x=</span>x,<span class="dt">y=</span>fst,<span class="dt">fill=</span>pairclass))<span class="op">+</span>
<span class="st"> </span><span class="co"># bar layer</span>
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......@@ -251,7 +251,7 @@ code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Inf
<h1 class="title">Script repository</h1>
<h3 class="subtitle"><em>(Hench <em>et al.</em> supplement)</em></h3>
<h4 class="author"><em>Kosmas Hench</em></h4>
<h4 class="date"><em>2018-12-10</em></h4>
<h4 class="date"><em>2018-12-12</em></h4>
</div>
<div id="intro" class="section level1">
<h1><span class="header-section-number">1</span> Intro</h1>
......
......@@ -2,7 +2,7 @@
title: "Script repository"
subtitle: "(Hench *et al.* supplement)"
author: "Kosmas Hench"
date: "2018-12-10"
date: "2018-12-12"
documentclass: book
bibliography: [book.bib]
biblio-style: apalike
......
[
["index.html", "Script repository 1 Intro 1.1 Data 1.2 Figures 1.3 Background", " Script repository (Hench et al. supplement) Kosmas Hench 2018-12-10 1 Intro This repository contains a cleaned up version of the scripts used in the paper “Association between vision and pigmentation genes during genomic divergence”. It documents the entire progression from raw data to the final manuscript figures. A visual overview of the process is given in Workflow. 1.1 Data The raw data used within the study is stored at the European Nucleotide Archive (ENA). It can be retrieved using the project accesion number PRJEB27858. This includes the raw data used for the genome assembly, the resequencing data used for the population genetic analysis as well as the RNA sequencing data. External data that is used within the scripts can not be provided (eg. the stickleback reference genome) and needs to be accessed independently. 1.2 Figures A more detailed documentation exists for all the figures of the manuscript: F1, F2, F3 &amp; F4 as well as for all the supplementary figures: S01, S02, S03, S05, S06, S07, S08, S09, S10, S11, S12, S13, S14, S15, S16 &amp; S17 The only exception to this is the supplementary figure S04. This figure is a byproduct of the anchoring step during the assembly and was produced by the Allmaps software. Afterwards, Inkscape was used to adjust the coloration and labels of the linkage maps. 1.3 Background All scripts assume two variables to be set within the bash environment: $WORK is assumed to point to the base folder of this repository $SFTWR is a folder that contains all the software dependencies that are used within the scripts The dependencies need to be downloaded and installed separately. The scripts are organized/ numbered in chronological order. Multiple scripts with equal numbers (eg. 2.2.4.pca_bel.sh, 2.2.4.pca_hon.sh &amp; 2.2.4.pca_pan.sh) usually work on parallel branches of the process and can be executed in parallel. In contrast to this, scripts with higher numbers usually depend on the output of scripts with lower numbers and should therefore be executed afterwards. Most of the scripts start with a comment block that defines the requested resources for the used computer cluster: #PBS -l elapstim_req=&lt;runtime&gt; #PBS -l memsz_job=&lt;memory&gt; #PBS -b &lt;threads&gt; #PBS -l cpunum_job=&lt;cores&gt; #PBS -N &lt;job-name&gt; #PBS -q &lt;job-que&gt; #PBS -o &lt;stdout-log&gt;.stdout #PBS -e &lt;stderr-log&gt;.stderr "],
["index.html", "Script repository 1 Intro 1.1 Data 1.2 Figures 1.3 Background", " Script repository (Hench et al. supplement) Kosmas Hench 2018-12-12 1 Intro This repository contains a cleaned up version of the scripts used in the paper “Association between vision and pigmentation genes during genomic divergence”. It documents the entire progression from raw data to the final manuscript figures. A visual overview of the process is given in Workflow. 1.1 Data The raw data used within the study is stored at the European Nucleotide Archive (ENA). It can be retrieved using the project accesion number PRJEB27858. This includes the raw data used for the genome assembly, the resequencing data used for the population genetic analysis as well as the RNA sequencing data. External data that is used within the scripts can not be provided (eg. the stickleback reference genome) and needs to be accessed independently. 1.2 Figures A more detailed documentation exists for all the figures of the manuscript: F1, F2, F3 &amp; F4 as well as for all the supplementary figures: S01, S02, S03, S05, S06, S07, S08, S09, S10, S11, S12, S13, S14, S15, S16 &amp; S17 The only exception to this is the supplementary figure S04. This figure is a byproduct of the anchoring step during the assembly and was produced by the Allmaps software. Afterwards, Inkscape was used to adjust the coloration and labels of the linkage maps. 1.3 Background All scripts assume two variables to be set within the bash environment: $WORK is assumed to point to the base folder of this repository $SFTWR is a folder that contains all the software dependencies that are used within the scripts The dependencies need to be downloaded and installed separately. The scripts are organized/ numbered in chronological order. Multiple scripts with equal numbers (eg. 2.2.4.pca_bel.sh, 2.2.4.pca_hon.sh &amp; 2.2.4.pca_pan.sh) usually work on parallel branches of the process and can be executed in parallel. In contrast to this, scripts with higher numbers usually depend on the output of scripts with lower numbers and should therefore be executed afterwards. Most of the scripts start with a comment block that defines the requested resources for the used computer cluster: #PBS -l elapstim_req=&lt;runtime&gt; #PBS -l memsz_job=&lt;memory&gt; #PBS -b &lt;threads&gt; #PBS -l cpunum_job=&lt;cores&gt; #PBS -N &lt;job-name&gt; #PBS -q &lt;job-que&gt; #PBS -o &lt;stdout-log&gt;.stdout #PBS -e &lt;stderr-log&gt;.stderr "],
["workflow.html", "2 Workflow 2.1 Info", " 2 Workflow 2.1 Info In the following, I give an overview over the individual steps done in during de novo assembly of the barred hamlet (Hypoplectrus puella) genome. Within the scheme, each arrow represents a script (or class of scripts) connecting individual steps of the assembly/ analysis. The respective scripts are store in the folders $WORK/1_assembly_scripts &amp; $WORK/2_reSeq-scripts. For the final visualization (the green squares in panel 2.2.Popgen), additional documentation is provided in the following pages. "],
["figure-1.html", "3 Figure 1 3.1 Summary 3.2 Details of F1.R", " 3 Figure 1 3.1 Summary This is the accessory documentation of Figure 1. The Figure can be recreated by running the R script F1.R: cd $WORK/3_figures/F_scripts Rscript --vanilla F1.R rm Rplots.pdf 3.2 Details of F1.R In the following, the individual steps of the R script are documented. Is an executable R script that depends on a variety of image manipulation and geodata packages. Additionally, the supporting R script (F1.functions.R) in needed: library(sp) library(maptools) library(scales) library(PBSmapping) library(RColorBrewer) library(tidyverse) library(ggforce) library(cowplot) library(scatterpie) library(hrbrthemes) library(cowplot) library(grid) library(gridSVG) library(grImport2) library(grConvert) library(ggmap) library(marmap) source(&quot;../../0_data/0_scripts/F1.functions.R&quot;) 3.2.1 Fig. 1 A) First, we load the data sheet containing the sampling location as well as the population specifics for each sample and sub sample it to include only the samples included in the re-sequencing study. The Coordinates need to be transformed from (deg min sec) to decimal degrees. The sample inventory is summarized over populations and merged with the coordinates later used for the sample pointers. dataAll &lt;- read.csv(&#39;../../0_data/0_resources/F1.sample.txt&#39;,sep=&#39;\\t&#39;) %&gt;% mutate(loc=substrRight(as.character(id),3)) data &lt;- dataAll %&gt;% filter(sample==&quot;sample&quot;) dataSum &lt;- data%&gt;% rowwise()%&gt;% mutate(nn = sum(as.numeric(strsplit(as.character(`Latittude.N`),&#39; &#39;)[[1]])*c(1,1/60,1/3600)), ww = sum(as.numeric(strsplit(as.character(`Longitude.W`),&#39; &#39;)[[1]])*c(1,1/60,1/3600))) %&gt;% group_by(loc) %&gt;% summarise(n=mean(nn,na.rm = T),w=-mean(ww,na.rm = T)) %&gt;% as.data.frame() %&gt;% cbind(.,read.csv(&#39;../../0_data/0_resources/F1.pointer.csv&#39;)) Then, we read in the coordinates of helper points that will be used to draw the bezier curves of the sample pointers. We also set the spacial range of the map. crvs &lt;- read.csv(&#39;../../0_data/0_resources/F1.curve.csv&#39;) xlimW = c(-100,-53); ylimW = c(7,30.8) The base map relies on the world data from the maps package. This data is loaded and clipped to the extend of the plotting area. worldmap = map_data(&quot;world&quot;) names(worldmap) &lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;PID&quot;,&quot;POS&quot;,&quot;region&quot;,&quot;subregion&quot;) worldmap = clipPolys(worldmap, xlim=xlimW,ylim=ylimW, keepExtra=TRUE) External images are loaded for annotation. This includes the hamlet images, the country flags and legend. # compas and legend compGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/north-cairo.svg&quot;)))) legGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/legend-map-cairo.svg&quot;)))) # hamlets nigGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/nigricans-cairo.svg&quot;)))) pueGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/puella-cairo.svg&quot;)))) uniGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/unicolor-cairo.svg&quot;)))) # flags belGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/belize-cairo.svg&quot;)))) honGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/honduras-cairo.svg&quot;)))) panGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/panama-cairo.svg&quot;)))) # globe and pairwise legend globGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/caribbean-cairo.svg&quot;)))) legPGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/legend_pairs-cairo.svg&quot;)))) We create a summary data table, containing the sample size of each population and the position for the annotation within the map. ypos = c(2,1,0)*1.6; xpos=c(1,0,1)*4.57 sampSize &lt;- data %&gt;% group_by(spec,loc) %&gt;% summarise(n=length(id)) %&gt;% filter(spec!=&#39;gum&#39;) %&gt;% arrange(loc) %&gt;% cbind(y=c(rep(24.4,3)+ypos, # belize rep(13.6,3)+ypos, # panama rep(9.2,3)+ypos), # honduras x=c(rep(-94.7,3)+xpos, # belize rep(-78,3)+xpos, # panama rep(-98.75,3)+xpos)) # honduras Next, we read in the shape files containing the distribution of the hamlet species. One file contains the overlap of the distribution ranges of H. nigricans, H. puella and H. unicolor, while the other contains the distribution range of the whole genus (Hypoplectrus). These distributions rely on the data provided by the Smithsonian Tropical Research Institute that were processed using gimp and qgis (including the Georeferencer Plugin) uS &lt;- readShapeSpatial(&#39;../../0_data/0_resources/F1_shps/F1.npu_distribution.shp&#39;); uS_f &lt;- fortify(uS,region=&quot;DN&quot;); names(uS_f) &lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;POS&quot;,&quot;hole&quot;,&quot;piece&quot;,&quot;id&quot;,&quot;PID&quot;); uS_f$PID &lt;- as.numeric(uS_f$PID) uS_f &lt;- clipPolys(uS_f, xlim=xlimW,ylim=ylimW, keepExtra=TRUE) hS &lt;- readShapeSpatial(&#39;../../0_data/0_resources/F1_shps/F1.hypoplectrus_distribution.shp&#39;); hS_f &lt;- fortify(hS,region=&quot;DN&quot;); names(hS_f) &lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;POS&quot;,&quot;hole&quot;,&quot;piece&quot;,&quot;id&quot;,&quot;PID&quot;); hS_f$PID &lt;- as.numeric(hS_f$PID) hS_f &lt;- clipPolys(hS_f, xlim=xlimW,ylim=ylimW, keepExtra=TRUE) We set some colors used for plotting: clrShps &lt;- c(&quot;#ffc097&quot;,&quot;#b1d4f8&quot;) # hamlet distributions cFILL &lt;- rgb(.7,.7,.7) # landmass ccc&lt;-rgb(0,.4,.8) # sample pointers Now, we got everything needed for the base map (Fig. 1 a): p1 &lt;- ggplot()+ # set coordinates coord_map(projection = &#39;mercator&#39;,xlim=xlimW,ylim=ylimW)+ # axes labels labs(y=&quot;Latitude&quot;,x=&quot;Longitude&quot;) + # preset theme theme_mapK + # landmass layer geom_polygon(data=worldmap,aes(X,Y,group=PID),fill = cFILL, col=rgb(0,0,0),lwd=.2)+ # genus distribution layer geom_polygon(data=hS_f,aes(X,Y,group=PID),fill=clrShps[2],col=rgb(.3,.3,.3),lwd=.2)+ # species distribution layer geom_polygon(data=uS_f,aes(X,Y,group=PID),fill=clrShps[1],col=clrShps[1],lwd=.2)+ # point background geom_point(data=sampSize,aes(x=x,y=y),col=&#39;white&#39;,size=7)+ # sample size annotation geom_text(data=sampSize,aes(x=x,y=y,label=n),size=3)+ # sample pointer curves geom_bezier(data = crvs,aes(x= x, y = y, group = loc),col=ccc,lwd=.4) + # sample pointer points geom_point(data = crvs[c(3,6,9),],aes(x= x, y = y, group = loc),size=5,col=ccc,shape=21,fill=&#39;white&#39;)+ # sample pointer labels geom_text(data = crvs[c(3,6,9),],aes(x= x, y = y, label=c(&#39;B&#39;,&#39;P&#39;,&#39;H&#39;)),size=3)+ # sample point anchors geom_point(data=dataSum,aes(x=w,y=n),size=1.5,shape=21,fill=&#39;white&#39;) 3.2.2 Fig. 1 B) Next we create the sub figure containing the FST information (Fig. 1 B). We first read in the data and define the color palette. fstdata &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_fst.txt&#39;, sep=&#39;\\t&#39;,skip = 1,head=F,col.names = c(&#39;pair&#39;,&#39;fst&#39;)) %&gt;% filter(!pair %in% c(&#39;NU&#39;,&#39;PN&#39;,&#39;PU&#39;)) %&gt;% arrange(fst) %&gt;% mutate(x=row_number()) %&gt;% rowwise() %&gt;% mutate(loc = substr(as.character(pair),2,2), pop1 = substr(as.character(pair),1,1), pop2 = substr(as.character(pair),3,3), pairclass=paste(pop1,pop2,sep=&#39;-&#39;)) # defining the pairwise color palette clrP &lt;- c(&#39;#fb8620&#39;,&#39;#1b519c&#39;,&#39;#d93327&#39;) And than generate the bar plot: p5 &lt;- ggplot(fstdata,aes(x=x,y=fst,fill=pairclass))+ # bar layer geom_bar(stat=&#39;identity&#39;)+ # adding annotations annotation_custom(legPGrob,xmin =0.15,xmax =5,ymin =.008,ymax = .035)+ # set color palette scale_fill_manual(values = clrP)+ # y label and axis scale_y_continuous(name=expression(italic(&quot;F&quot;[&#39;ST&#39;])), expand = c(0,0))+ # x labels and axis scale_x_continuous(expand = c(.01,0),breaks = 1:9, labels = c(&#39;H&#39;,&#39;H&#39;,&#39;B&#39;,&#39;H&#39;,&#39;P&#39;,&#39;B&#39;,&#39;B&#39;,&#39;P&#39;,&#39;P&#39;))+ # preset theme theme_mapK+ # theme adjustments theme(legend.position = &#39;none&#39;, panel.grid.major.y = element_line(color=rgb(.9,.9,.9)), axis.title.x = element_blank(), axis.text.x = element_text(size = 8)); 3.2.3 Fig. 1 C) The last sub figure contains the PCA output for each location. For those, we define the color palette for hamlet species and read in the PCA data generated by pcadapt. clr&lt;-c(&#39;#000000&#39;,&#39;#d45500&#39;,&#39;#000000&#39;) fll&lt;-c(&#39;#000000&#39;,&#39;#d45500&#39;,&#39;#ffffff&#39;) eVclr &lt;- c(rep(&#39;darkred&#39;,2),rep(cFILL,8)) # Belize data belPCA &lt;- readRDS(&#39;../../2_output/08_popGen/04_pca/bel/belpca.Rds&#39;) # Honduras data honPCA &lt;- readRDS(&#39;../../2_output/08_popGen/04_pca/hon/honpca.Rds&#39;) # Panama data panPCA &lt;- readRDS(&#39;../../2_output/08_popGen/04_pca/pan/panpca.Rds&#39;) The data for each PCA is merged with metadata about the samples and than plotted. Further, the explained variance of each PC is extracted for the axis labels. # Honduras PCA (data) dataHON &lt;- cbind(dataAll %&gt;% filter(sample==&#39;sample&#39;,loc==&#39;hon&#39;)%&gt;% select(id,spec), honPCA$scores); names(dataHON)[3:12]&lt;- paste(&#39;PC&#39;,1:10,sep=&#39;&#39;) exp_varHON &lt;- (honPCA$singular.values[1:10])^2/length(honPCA$maf) xlabHON &lt;- paste(&#39;PC1 (&#39;,sprintf(&quot;%.1f&quot;,exp_varHON[1]*100),&#39;%)&#39;)# explained varinace)&#39;) ylabHON &lt;- paste(&#39;PC2 (&#39;,sprintf(&quot;%.1f&quot;,exp_varHON[2]*100),&#39;%)&#39;)# explained varinace)&#39;) # Honduras PCA (plot) p2 &lt;- ggplot(dataHON,aes(x=PC1,y=PC2,col=spec,fill=spec))+geom_point(size=1.1,shape=21)+ scale_color_manual(values=clr,guide=F)+ ggtitle(&#39;Honduras&#39;)+ scale_fill_manual(values=fll,guide=F)+theme_mapK+ theme(legend.position=&#39;bottom&#39;,plot.title = element_text(size = 11,hjust = .45), panel.border = element_rect(color=rgb(.9,.9,.9),fill=rgb(1,1,1,0)), axis.title.y = element_text(vjust = -7), axis.title.x = element_text(vjust = 6), plot.margin = unit(rep(-10,4),&#39;pt&#39;))+#coord_equal()+ scale_x_continuous(name=xlabHON,breaks = c(-.2,.5))+ scale_y_continuous(name=ylabHON,breaks = c(-.4,.2)) # Belize PCA (data) dataBEL &lt;- cbind(dataAll %&gt;% filter(sample==&#39;sample&#39;,loc==&#39;bel&#39;) %&gt;% select(id,spec), belPCA$scores); names(dataBEL)[3:12]&lt;- paste(&#39;PC&#39;,1:10,sep=&#39;&#39;) exp_varBEL &lt;- (belPCA$singular.values[1:10])^2/length(belPCA$maf) xlabBEL &lt;- paste(&#39;PC1 (&#39;,sprintf(&quot;%.1f&quot;,exp_varBEL[1]*100),&#39;%)&#39;) ylabBEL &lt;- paste(&#39;PC2 (&#39;,sprintf(&quot;%.1f&quot;,exp_varBEL[2]*100),&#39;%)&#39;) # Belize PCA (plot) p3 &lt;- ggplot(dataBEL,aes(x=PC1,y=PC2,col=spec,fill=spec))+geom_point(size=1.1,shape=21)+ scale_color_manual(values=clr,guide=F)+ ggtitle(&#39;Belize&#39;)+ scale_fill_manual(values=fll,guide=F)+theme_mapK+ theme(legend.position=&#39;bottom&#39;,plot.title = element_text(size = 11,hjust = .45), panel.border = element_rect(color=rgb(.9,.9,.9),fill=rgb(1,1,1,0)), axis.title.y = element_text(vjust = -7), axis.title.x = element_text(vjust = 6), plot.margin = unit(rep(-10,4),&#39;pt&#39;))+ scale_x_continuous(name=xlabBEL,breaks = c(-.15,.25))+ scale_y_continuous(name=ylabBEL,breaks = c(-.3,.2)) # Panama PCA (data) dataPAN &lt;- cbind(dataAll %&gt;% filter(sample==&#39;sample&#39;,loc==&#39;boc&#39;,spec!=&#39;gum&#39;)%&gt;% select(id,spec), panPCA$scores); names(dataPAN)[3:12]&lt;- paste(&#39;PC&#39;,1:10,sep=&#39;&#39;) exp_varPAN &lt;- (panPCA$singular.values[1:10])^2/length(panPCA$maf) xlabPAN &lt;- paste(&#39;PC1 (&#39;,sprintf(&quot;%.1f&quot;,exp_varPAN[1]*100),&#39;%)&#39;) ylabPAN &lt;- paste(&#39;PC2 (&#39;,sprintf(&quot;%.1f&quot;,exp_varPAN[2]*100),&#39;%)&#39;) # Panama PCA (plot) p4 &lt;- ggplot(dataPAN,aes(x=PC1,y=PC2,col=spec,fill=spec))+geom_point(size=1.1,shape=21)+ scale_color_manual(values=clr,guide=F)+ ggtitle(&#39;Panama&#39;)+ scale_fill_manual(values=fll,guide=F)+theme_mapK+ theme(legend.position=&#39;bottom&#39;,plot.title = element_text(size = 11,hjust = .45), panel.border = element_rect(color=rgb(.9,.9,.9),fill=rgb(1,1,1,0)), axis.title.y = element_text(vjust = -7), axis.title.x = element_text(vjust = 6), plot.margin = unit(rep(-10,4),&#39;pt&#39;))+ scale_x_continuous(name=xlabPAN,breaks = c(-.15,.2))+ scale_y_continuous(name=ylabPAN,breaks = c(-.2,.2)) 3.2.4 Fig 1 (complete) Finally, we have all the pieces and can put together Fig. 1. We need a few helper variables for the placing and scaling of the annotations: # scaling (x,y) hSC &lt;- c(.07,.07) # hamlet placing (within group) yf &lt;- c(0.07,0.035,0.00);xf&lt;-c(0.07,0,0.07) # hamlet group placing yS &lt;- c(.485,.577,.805);xS&lt;-c(.09,.417,.156) Using ggdraw() form the cowplot package, the figure is put together. F1 &lt;- ggdraw()+ # include subplots draw_plot(plot = p1,0,.38,.81,.64)+ draw_plot(plot = p5,0,.01,.24,.35)+ draw_plot(plot = p2,.25,.01,.23,.37)+ draw_plot(plot = p3,.5,.01,.23,.37)+ draw_plot(plot = p4,.75,.01,.23,.37)+ # annotations (legend) draw_grob(legGrob, .375,.395,.65,.65)+ # annotations (compas) draw_grob(compGrob, 0.605, 0.482, 0.06, 0.06)+ # annotations (hamlets) draw_grob(nigGrob, xf[1]+xS[1], yf[1]+yS[1], hSC[1], hSC[2])+ # Honduras draw_grob(pueGrob, xf[2]+xS[1], yf[2]+yS[1], hSC[1], hSC[2])+ # Honduras draw_grob(uniGrob, xf[3]+xS[1], yf[3]+yS[1], hSC[1], hSC[2])+ # Honduras draw_grob(nigGrob, xf[1]+xS[2], yf[1]+yS[2], hSC[1], hSC[2])+ # Panama draw_grob(pueGrob, xf[2]+xS[2], yf[2]+yS[2], hSC[1], hSC[2])+ # Panama draw_grob(uniGrob, xf[3]+xS[2], yf[3]+yS[2], hSC[1], hSC[2])+ # Panama draw_grob(nigGrob, xf[1]+xS[3], yf[1]+yS[3], hSC[1], hSC[2])+ # Belize draw_grob(pueGrob, xf[2]+xS[3], yf[2]+yS[3], hSC[1], hSC[2])+ # Belize draw_grob(uniGrob, xf[3]+xS[3], yf[3]+yS[3], hSC[1], hSC[2])+ # Belize # annotations (flags) draw_grob(panGrob, 0.3, 0.485, 0.042, 0.042)+ draw_grob(belGrob, 0.2, 0.67, 0.042, 0.042)+ draw_grob(honGrob, 0.28, 0.61, 0.042, 0.042)+ draw_grob(honGrob, 0.285, 0.37, 0.042, 0.042)+ draw_grob(belGrob, 0.535, 0.37, 0.042, 0.042)+ draw_grob(panGrob, 0.785, 0.37, 0.042, 0.042)+ # annotations (subfigure labels) draw_plot_label(x = c(.001,.001,.24), y = c(.99,.42,.42), label = letters[1:3]) It is then exported using ggsave(). ggsave(plot = F1,filename = &#39;../output/F1.pdf&#39;,width = 183,height = 145,units = &#39;mm&#39;,device = cairo_pdf) "],
["figure-1.html", "3 Figure 1 3.1 Summary 3.2 Details of F1.R", " 3 Figure 1 3.1 Summary This is the accessory documentation of Figure 1. The Figure can be recreated by running the R script F1.R: cd $WORK/3_figures/F_scripts Rscript --vanilla F1.R rm Rplots.pdf 3.2 Details of F1.R In the following, the individual steps of the R script are documented. Is an executable R script that depends on a variety of image manipulation and geodata packages. Additionally, the supporting R script (F1.functions.R) in needed: library(sp) library(maptools) library(scales) library(PBSmapping) library(RColorBrewer) library(tidyverse) library(ggforce) library(cowplot) library(scatterpie) library(hrbrthemes) library(cowplot) library(grid) library(gridSVG) library(grImport2) library(grConvert) library(ggmap) library(marmap) source(&quot;../../0_data/0_scripts/F1.functions.R&quot;) 3.2.1 Fig. 1 A) First, we load the data sheet containing the sampling location as well as the population specifics for each sample and sub sample it to include only the samples included in the re-sequencing study. The Coordinates need to be transformed from (deg min sec) to decimal degrees. The sample inventory is summarized over populations and merged with the coordinates later used for the sample pointers. dataAll &lt;- read.csv(&#39;../../0_data/0_resources/F1.sample.txt&#39;,sep=&#39;\\t&#39;) %&gt;% mutate(loc=substrRight(as.character(id),3)) data &lt;- dataAll %&gt;% filter(sample==&quot;sample&quot;) dataSum &lt;- data%&gt;% rowwise()%&gt;% mutate(nn = sum(as.numeric(strsplit(as.character(`Latittude.N`),&#39; &#39;)[[1]])*c(1,1/60,1/3600)), ww = sum(as.numeric(strsplit(as.character(`Longitude.W`),&#39; &#39;)[[1]])*c(1,1/60,1/3600))) %&gt;% group_by(loc) %&gt;% summarise(n=mean(nn,na.rm = T),w=-mean(ww,na.rm = T)) %&gt;% as.data.frame() %&gt;% cbind(.,read.csv(&#39;../../0_data/0_resources/F1.pointer.csv&#39;)) Then, we read in the coordinates of helper points that will be used to draw the bezier curves of the sample pointers. We also set the spacial range of the map. crvs &lt;- read.csv(&#39;../../0_data/0_resources/F1.curve.csv&#39;) xlimW = c(-100,-53); ylimW = c(7,30.8) The base map relies on the world data from the maps package. This data is loaded and clipped to the extend of the plotting area. worldmap = map_data(&quot;world&quot;) names(worldmap) &lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;PID&quot;,&quot;POS&quot;,&quot;region&quot;,&quot;subregion&quot;) worldmap = clipPolys(worldmap, xlim=xlimW,ylim=ylimW, keepExtra=TRUE) External images are loaded for annotation. This includes the hamlet images, the country flags and legend. # compas and legend compGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/north-cairo.svg&quot;)))) legGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/legend-map-cairo.svg&quot;)))) # hamlets nigGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/nigricans-cairo.svg&quot;)))) pueGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/puella-cairo.svg&quot;)))) uniGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/unicolor-cairo.svg&quot;)))) # flags belGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/belize-cairo.svg&quot;)))) honGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/honduras-cairo.svg&quot;)))) panGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/panama-cairo.svg&quot;)))) # globe and pairwise legend globGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/caribbean-cairo.svg&quot;)))) legPGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/legend_pairs-cairo.svg&quot;)))) We create a summary data table, containing the sample size of each population and the position for the annotation within the map. ypos = c(2,1,0)*1.6; xpos=c(1,0,1)*4.57 sampSize &lt;- data %&gt;% group_by(spec,loc) %&gt;% summarise(n=length(id)) %&gt;% filter(spec!=&#39;gum&#39;) %&gt;% arrange(loc) %&gt;% cbind(y=c(rep(24.4,3)+ypos, # belize rep(13.6,3)+ypos, # panama rep(9.2,3)+ypos), # honduras x=c(rep(-94.7,3)+xpos, # belize rep(-78,3)+xpos, # panama rep(-98.75,3)+xpos)) # honduras Next, we read in the shape files containing the distribution of the hamlet species. One file contains the overlap of the distribution ranges of H. nigricans, H. puella and H. unicolor, while the other contains the distribution range of the whole genus (Hypoplectrus). These distributions rely on the data provided by the Smithsonian Tropical Research Institute that were processed using gimp and qgis (including the Georeferencer Plugin) uS &lt;- readShapeSpatial(&#39;../../0_data/0_resources/F1_shps/F1.npu_distribution.shp&#39;); uS_f &lt;- fortify(uS,region=&quot;DN&quot;); names(uS_f) &lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;POS&quot;,&quot;hole&quot;,&quot;piece&quot;,&quot;id&quot;,&quot;PID&quot;); uS_f$PID &lt;- as.numeric(uS_f$PID) uS_f &lt;- clipPolys(uS_f, xlim=xlimW,ylim=ylimW, keepExtra=TRUE) hS &lt;- readShapeSpatial(&#39;../../0_data/0_resources/F1_shps/F1.hypoplectrus_distribution.shp&#39;); hS_f &lt;- fortify(hS,region=&quot;DN&quot;); names(hS_f) &lt;- c(&quot;X&quot;,&quot;Y&quot;,&quot;POS&quot;,&quot;hole&quot;,&quot;piece&quot;,&quot;id&quot;,&quot;PID&quot;); hS_f$PID &lt;- as.numeric(hS_f$PID) hS_f &lt;- clipPolys(hS_f, xlim=xlimW,ylim=ylimW, keepExtra=TRUE) We set some colors used for plotting: clrShps &lt;- c(&quot;#ffc097&quot;,&quot;#b1d4f8&quot;) # hamlet distributions cFILL &lt;- rgb(.7,.7,.7) # landmass ccc&lt;-rgb(0,.4,.8) # sample pointers Now, we got everything needed for the base map (Fig. 1 a): p1 &lt;- ggplot()+ # set coordinates coord_map(projection = &#39;mercator&#39;,xlim=xlimW,ylim=ylimW)+ # axes labels labs(y=&quot;Latitude&quot;,x=&quot;Longitude&quot;) + # preset theme theme_mapK + # landmass layer geom_polygon(data=worldmap,aes(X,Y,group=PID),fill = cFILL, col=rgb(0,0,0),lwd=.2)+ # genus distribution layer geom_polygon(data=hS_f,aes(X,Y,group=PID),fill=clrShps[2],col=rgb(.3,.3,.3),lwd=.2)+ # species distribution layer geom_polygon(data=uS_f,aes(X,Y,group=PID),fill=clrShps[1],col=clrShps[1],lwd=.2)+ # point background geom_point(data=sampSize,aes(x=x,y=y),col=&#39;white&#39;,size=7)+ # sample size annotation geom_text(data=sampSize,aes(x=x,y=y,label=n),size=3)+ # sample pointer curves geom_bezier(data = crvs,aes(x= x, y = y, group = loc),col=ccc,lwd=.4) + # sample pointer points geom_point(data = crvs[c(3,6,9),],aes(x= x, y = y, group = loc),size=5,col=ccc,shape=21,fill=&#39;white&#39;)+ # sample pointer labels geom_text(data = crvs[c(3,6,9),],aes(x= x, y = y, label=c(&#39;B&#39;,&#39;P&#39;,&#39;H&#39;)),size=3)+ # sample point anchors geom_point(data=dataSum,aes(x=w,y=n),size=1.5,shape=21,fill=&#39;white&#39;) 3.2.2 Fig. 1 B) Next we create the sub figure containing the FST information (Fig. 1 B). We first read in the data and define the color palette. fstdata &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_fst.txt&#39;, sep=&#39;\\t&#39;,skip = 1,head=F,col.names = c(&#39;pair&#39;,&#39;fst&#39;)) %&gt;% filter(!pair %in% c(&#39;NU&#39;,&#39;PN&#39;,&#39;PU&#39;)) %&gt;% arrange(fst) %&gt;% mutate(x=row_number()) %&gt;% rowwise() %&gt;% mutate(loc = substr(as.character(pair),2,2), pop1 = substr(as.character(pair),1,1), pop2 = substr(as.character(pair),3,3), pairclass=paste(pop1,pop2,sep=&#39;-&#39;)) # defining the pairwise color palette clrP &lt;- c(&#39;#1b519c&#39;,&#39;#fb8620&#39;,&#39;#d93327&#39;) And than generate the bar plot: p5 &lt;- ggplot(fstdata,aes(x=x,y=fst,fill=pairclass))+ # bar layer geom_bar(stat=&#39;identity&#39;)+ # adding annotations annotation_custom(legPGrob,xmin =0.15,xmax =5,ymin =.008,ymax = .035)+ # set color palette scale_fill_manual(values = clrP)+ # y label and axis scale_y_continuous(name=expression(italic(&quot;F&quot;[&#39;ST&#39;])), expand = c(0,0))+ # x labels and axis scale_x_continuous(expand = c(.01,0),breaks = 1:9, labels = c(&#39;H&#39;,&#39;H&#39;,&#39;B&#39;,&#39;H&#39;,&#39;P&#39;,&#39;B&#39;,&#39;B&#39;,&#39;P&#39;,&#39;P&#39;))+ # preset theme theme_mapK+ # theme adjustments theme(legend.position = &#39;none&#39;, panel.grid.major.y = element_line(color=rgb(.9,.9,.9)), axis.title.x = element_blank(), axis.text.x = element_text(size = 8)); 3.2.3 Fig. 1 C) The last sub figure contains the PCA output for each location. For those, we define the color palette for hamlet species and read in the PCA data generated by pcadapt. clr&lt;-c(&#39;#000000&#39;,&#39;#d45500&#39;,&#39;#000000&#39;) fll&lt;-c(&#39;#000000&#39;,&#39;#d45500&#39;,&#39;#ffffff&#39;) eVclr &lt;- c(rep(&#39;darkred&#39;,2),rep(cFILL,8)) # Belize data belPCA &lt;- readRDS(&#39;../../2_output/08_popGen/04_pca/bel/belpca.Rds&#39;) # Honduras data honPCA &lt;- readRDS(&#39;../../2_output/08_popGen/04_pca/hon/honpca.Rds&#39;) # Panama data panPCA &lt;- readRDS(&#39;../../2_output/08_popGen/04_pca/pan/panpca.Rds&#39;) The data for each PCA is merged with metadata about the samples and than plotted. Further, the explained variance of each PC is extracted for the axis labels. # Honduras PCA (data) dataHON &lt;- cbind(dataAll %&gt;% filter(sample==&#39;sample&#39;,loc==&#39;hon&#39;)%&gt;% select(id,spec), honPCA$scores); names(dataHON)[3:12]&lt;- paste(&#39;PC&#39;,1:10,sep=&#39;&#39;) exp_varHON &lt;- (honPCA$singular.values[1:10])^2/length(honPCA$maf) xlabHON &lt;- paste(&#39;PC1 (&#39;,sprintf(&quot;%.1f&quot;,exp_varHON[1]*100),&#39;%)&#39;)# explained varinace)&#39;) ylabHON &lt;- paste(&#39;PC2 (&#39;,sprintf(&quot;%.1f&quot;,exp_varHON[2]*100),&#39;%)&#39;)# explained varinace)&#39;) # Honduras PCA (plot) p2 &lt;- ggplot(dataHON,aes(x=PC1,y=PC2,col=spec,fill=spec))+geom_point(size=1.1,shape=21)+ scale_color_manual(values=clr,guide=F)+ ggtitle(&#39;Honduras&#39;)+ scale_fill_manual(values=fll,guide=F)+theme_mapK+ theme(legend.position=&#39;bottom&#39;,plot.title = element_text(size = 11,hjust = .45), panel.border = element_rect(color=rgb(.9,.9,.9),fill=rgb(1,1,1,0)), axis.title.y = element_text(vjust = -7), axis.title.x = element_text(vjust = 6), plot.margin = unit(rep(-10,4),&#39;pt&#39;))+#coord_equal()+ scale_x_continuous(name=xlabHON,breaks = c(-.2,.5))+ scale_y_continuous(name=ylabHON,breaks = c(-.4,.2)) # Belize PCA (data) dataBEL &lt;- cbind(dataAll %&gt;% filter(sample==&#39;sample&#39;,loc==&#39;bel&#39;) %&gt;% select(id,spec), belPCA$scores); names(dataBEL)[3:12]&lt;- paste(&#39;PC&#39;,1:10,sep=&#39;&#39;) exp_varBEL &lt;- (belPCA$singular.values[1:10])^2/length(belPCA$maf) xlabBEL &lt;- paste(&#39;PC1 (&#39;,sprintf(&quot;%.1f&quot;,exp_varBEL[1]*100),&#39;%)&#39;) ylabBEL &lt;- paste(&#39;PC2 (&#39;,sprintf(&quot;%.1f&quot;,exp_varBEL[2]*100),&#39;%)&#39;) # Belize PCA (plot) p3 &lt;- ggplot(dataBEL,aes(x=PC1,y=PC2,col=spec,fill=spec))+geom_point(size=1.1,shape=21)+ scale_color_manual(values=clr,guide=F)+ ggtitle(&#39;Belize&#39;)+ scale_fill_manual(values=fll,guide=F)+theme_mapK+ theme(legend.position=&#39;bottom&#39;,plot.title = element_text(size = 11,hjust = .45), panel.border = element_rect(color=rgb(.9,.9,.9),fill=rgb(1,1,1,0)), axis.title.y = element_text(vjust = -7), axis.title.x = element_text(vjust = 6), plot.margin = unit(rep(-10,4),&#39;pt&#39;))+ scale_x_continuous(name=xlabBEL,breaks = c(-.15,.25))+ scale_y_continuous(name=ylabBEL,breaks = c(-.3,.2)) # Panama PCA (data) dataPAN &lt;- cbind(dataAll %&gt;% filter(sample==&#39;sample&#39;,loc==&#39;boc&#39;,spec!=&#39;gum&#39;)%&gt;% select(id,spec), panPCA$scores); names(dataPAN)[3:12]&lt;- paste(&#39;PC&#39;,1:10,sep=&#39;&#39;) exp_varPAN &lt;- (panPCA$singular.values[1:10])^2/length(panPCA$maf) xlabPAN &lt;- paste(&#39;PC1 (&#39;,sprintf(&quot;%.1f&quot;,exp_varPAN[1]*100),&#39;%)&#39;) ylabPAN &lt;- paste(&#39;PC2 (&#39;,sprintf(&quot;%.1f&quot;,exp_varPAN[2]*100),&#39;%)&#39;) # Panama PCA (plot) p4 &lt;- ggplot(dataPAN,aes(x=PC1,y=PC2,col=spec,fill=spec))+geom_point(size=1.1,shape=21)+ scale_color_manual(values=clr,guide=F)+ ggtitle(&#39;Panama&#39;)+ scale_fill_manual(values=fll,guide=F)+theme_mapK+ theme(legend.position=&#39;bottom&#39;,plot.title = element_text(size = 11,hjust = .45), panel.border = element_rect(color=rgb(.9,.9,.9),fill=rgb(1,1,1,0)), axis.title.y = element_text(vjust = -7), axis.title.x = element_text(vjust = 6), plot.margin = unit(rep(-10,4),&#39;pt&#39;))+ scale_x_continuous(name=xlabPAN,breaks = c(-.15,.2))+ scale_y_continuous(name=ylabPAN,breaks = c(-.2,.2)) 3.2.4 Fig 1 (complete) Finally, we have all the pieces and can put together Fig. 1. We need a few helper variables for the placing and scaling of the annotations: # scaling (x,y) hSC &lt;- c(.07,.07) # hamlet placing (within group) yf &lt;- c(0.07,0.035,0.00);xf&lt;-c(0.07,0,0.07) # hamlet group placing yS &lt;- c(.485,.577,.805);xS&lt;-c(.09,.417,.156) Using ggdraw() form the cowplot package, the figure is put together. F1 &lt;- ggdraw()+ # include subplots draw_plot(plot = p1,0,.38,.81,.64)+ draw_plot(plot = p5,0,.01,.24,.35)+ draw_plot(plot = p2,.25,.01,.23,.37)+ draw_plot(plot = p3,.5,.01,.23,.37)+ draw_plot(plot = p4,.75,.01,.23,.37)+ # annotations (legend) draw_grob(legGrob, .375,.395,.65,.65)+ # annotations (compas) draw_grob(compGrob, 0.605, 0.482, 0.06, 0.06)+ # annotations (hamlets) draw_grob(nigGrob, xf[1]+xS[1], yf[1]+yS[1], hSC[1], hSC[2])+ # Honduras draw_grob(pueGrob, xf[2]+xS[1], yf[2]+yS[1], hSC[1], hSC[2])+ # Honduras draw_grob(uniGrob, xf[3]+xS[1], yf[3]+yS[1], hSC[1], hSC[2])+ # Honduras draw_grob(nigGrob, xf[1]+xS[2], yf[1]+yS[2], hSC[1], hSC[2])+ # Panama draw_grob(pueGrob, xf[2]+xS[2], yf[2]+yS[2], hSC[1], hSC[2])+ # Panama draw_grob(uniGrob, xf[3]+xS[2], yf[3]+yS[2], hSC[1], hSC[2])+ # Panama draw_grob(nigGrob, xf[1]+xS[3], yf[1]+yS[3], hSC[1], hSC[2])+ # Belize draw_grob(pueGrob, xf[2]+xS[3], yf[2]+yS[3], hSC[1], hSC[2])+ # Belize draw_grob(uniGrob, xf[3]+xS[3], yf[3]+yS[3], hSC[1], hSC[2])+ # Belize # annotations (flags) draw_grob(panGrob, 0.3, 0.485, 0.042, 0.042)+ draw_grob(belGrob, 0.2, 0.67, 0.042, 0.042)+ draw_grob(honGrob, 0.28, 0.61, 0.042, 0.042)+ draw_grob(honGrob, 0.285, 0.37, 0.042, 0.042)+ draw_grob(belGrob, 0.535, 0.37, 0.042, 0.042)+ draw_grob(panGrob, 0.785, 0.37, 0.042, 0.042)+ # annotations (subfigure labels) draw_plot_label(x = c(.001,.001,.24), y = c(.99,.42,.42), label = letters[1:3]) It is then exported using ggsave(). ggsave(plot = F1,filename = &#39;../output/F1.pdf&#39;,width = 183,height = 145,units = &#39;mm&#39;,device = cairo_pdf) "],
["figure-2.html", "4 Figure 2 4.1 Summary 4.2 Details of F2.R", " 4 Figure 2 4.1 Summary This is the accessory documentation of Figure 2. The Figure can be recreated by running the R script F2.R: cd $WORK/3_figures/F_scripts Rscript --vanilla F2.R rm Rplots.pdf 4.2 Details of F2.R In the following, the individual steps of the R script are documented. Is an executable R script that depends on a variety of image manipulation and data managing packages. library(grid) library(gridSVG) library(grImport2) library(grConvert) library(tidyverse) library(cowplot) library(hrbrthemes) 4.2.1 Reading in Data Figure 2 depends on the results of the FST computation by VCFtools in broad resolution (50 kb windows, 5 kb increments). The results of the individual species pairs are loaded individually, the genomic positions added to the loci and details about the species comparisons added. Further more, the extent of the individual LGs in loaded. karyo &lt;- read.csv(&#39;../../0_data/0_resources/F2.karyo.txt&#39;,sep=&#39;\\t&#39;) %&gt;% mutate(GSTART=lag(cumsum(END),n = 1,default = 0), GEND=GSTART+END,GROUP=rep(letters[1:2],12)) %&gt;% select(CHROM,GSTART,GEND,GROUP) # Global ------------- pn &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/pue-nig.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PN&#39;,RUN=&#39;PN&#39;); pu &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/pue-uni.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PU&#39;,RUN=&#39;PU&#39;); nu &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/nig-uni.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;NU&#39;,RUN=&#39;NU&#39;); # Panama ------------- ppnp &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/pueboc-nigboc.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PN&#39;,RUN=&#39;PPNP&#39;); ppup &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/pueboc-uniboc.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PU&#39;,RUN=&#39;PPUP&#39;); npup &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/nigboc-uniboc.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;NU&#39;,RUN=&#39;NPUP&#39;); # Belize ------------- pbnb &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/puebel-nigbel.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PN&#39;,RUN=&#39;PBNB&#39;); pbub &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/puebel-unibel.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PU&#39;,RUN=&#39;PBUB&#39;); nbub &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/nigbel-unibel.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;NU&#39;,RUN=&#39;NBUB&#39;); # Honduras ------------- phnh &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/puehon-nighon.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PN&#39;,RUN=&#39;PHNH&#39;); phuh &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/puehon-unihon.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;PU&#39;,RUN=&#39;PHUH&#39;); nhuh &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/nighon-unihon.50kb.5kb.windowed.weir.fst&#39;,sep=&#39;\\t&#39;) %&gt;% merge(.,(karyo %&gt;% select(-GEND,-GROUP)),by=&#39;CHROM&#39;,allx=T) %&gt;% mutate(POS=(BIN_START+BIN_END)/2,GPOS=POS+GSTART,COL=&#39;NU&#39;,RUN=&#39;NHUH&#39;); The individual species comparisons are combined into a single data set. Then the order of the species comparisons and of the comparison types are set. data &lt;- rbind(pn,pu,nu,ppnp,ppup,npup,pbnb,pbub,nbub,phnh,phuh,nhuh) data$RUN &lt;- factor(as.character(data$RUN), levels=c(&#39;PN&#39;,&#39;NU&#39;,&#39;PU&#39;,&#39;PPNP&#39;,&#39;NPUP&#39;,&#39;PPUP&#39;,&#39;PBNB&#39;,&#39;NBUB&#39;,&#39;PBUB&#39;,&#39;PHNH&#39;,&#39;NHUH&#39;,&#39;PHUH&#39;)) data$COL &lt;- factor(as.character(data$COL),levels=c(&#39;PN&#39;,&#39;PU&#39;,&#39;NU&#39;)) Then, the genome wide weighted mean of FST is loaded. gwFST &lt;- read.csv(&#39;../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_fst.txt&#39;,sep=&#39;\\t&#39;) gwFST$RUN &lt;- factor(gwFST$run,levels=levels(data$RUN)) 4.2.2 Computing 99.98 FST percentiles The upper 99.98 FST percentile is computed per run. Then windows above this threshold are identified for each global run. threshs &lt;- data %&gt;% group_by(RUN) %&gt;% summarise(thresh=quantile(WEIGHTED_FST,.9998)) data2 &lt;- data %&gt;% filter(RUN %in% levels(data$RUN)[1:3]) %&gt;% merge(.,threshs,by=&#39;RUN&#39;,all.x=T) %&gt;% mutate(OUTL = (WEIGHTED_FST&gt;thresh)) %&gt;% filter(OUTL) %&gt;% group_by(RUN) %&gt;% mutate(CHECK=cumsum(1-(BIN_START-lag(BIN_START,default = 0)==5000)),ID=paste(RUN,&#39;-&#39;,CHECK,sep=&#39;&#39;)) %&gt;% ungroup() %&gt;% group_by(ID) %&gt;% summarise(CHROM=CHROM[1], xmin = min(BIN_START+GSTART), xmax=max(BIN_END+GSTART), xmean=mean(GPOS), LGmin=min(BIN_START), LGmax=min(BIN_END), LGmean=min(POS), RUN=RUN[1],COL=COL[1]) %&gt;% ungroup() %&gt;% mutate(muskS = letters[as.numeric(as.factor(xmin))], musk=LETTERS[c(1,3,4,4,1,2,2,3)]) musks &lt;- data2 %&gt;% group_by(musk) %&gt;% summarise(MUSKmin=min(xmin), MUSKmax=max(xmax), MUSKminLG=min(LGmin), MUSKmaxLG=max(LGmax)) %&gt;% merge(data2,.,by=&#39;musk&#39;) %&gt;% mutate(muskX = ifelse(musk %in% c(&quot;A&quot;,&quot;B&quot;), MUSKmin-1e+07,MUSKmax+1e+07)) 4.2.3 Plotting For plotting, colors,labels and the scaling of the secondary axis are pre-defined. clr &lt;- c(&#39;#fb8620&#39;,&#39;#d93327&#39;,&#39;#1b519c&#39;) annoclr &lt;- rgb(.4,.4,.4) lgclr &lt;- rgb(.9,.9,.9) yl &lt;- expression(italic(&#39;F&#39;[ST])) secScale &lt;- 20 The base plot is done with ggplot(). p1 &lt;- ggplot()+ # background shading for LGs geom_rect(inherit.aes = F,data=karyo,aes(xmin=GSTART,xmax=GEND, ymin=-Inf,ymax=Inf,fill=GROUP))+ # vertical lines for outlier windows geom_vline(data=data2,aes(xintercept=xmean),col=annoclr,lwd=.2)+ # Fst values geom_point(data=data,aes(x=GPOS,y=WEIGHTED_FST,col=COL),size=.01)+ # labels for the outlier windows geom_text(data=musks,aes(x=muskX,y=.65,label=musk))+ # vertical line separating sliding Fst from genome wide Fst geom_vline(data = data.frame(x=567000000),aes(xintercept=x),col=annoclr,lwd=.2)+ # genome wide Fst geom_rect(data=gwFST,aes(xmin=570*10^6,xmax=573*10^6,ymin=0,ymax=gwFST*secScale))+ # layout of y axis and secondary y axis scale_y_continuous(name = yl,limits=c(-.03,.83), breaks=0:4*0.2,labels = c(0,&#39;&#39;,0.4,&#39;&#39;,0.8), sec.axis = sec_axis(~./secScale,labels = c(0,&#39;&#39;,.02,&#39;&#39;,.04)))+ # layout of x axis scale_x_continuous(expand = c(0,0),limits = c(0,577*10^6), breaks = c((karyo$GSTART+karyo$GEND)/2,571*10^6), labels = c(1:24,&quot;g.w.&quot;),position = &quot;top&quot;)+ # set color map scale_color_manual(name=&#39;comparison&#39;,values=clr)+ # set fill of LG backgrounds scale_fill_manual(values = c(NA,lgclr),guide=F)+ # pre-defined plot theme theme_ipsum(grid = F, axis_title_just = &#39;c&#39;, axis_title_family = &#39;bold&#39;, strip_text_size = 9)+ # tweaks of the plot theme theme(plot.margin = unit(c(2,25,40,0),&#39;pt&#39;), panel.spacing.y = unit(3,&#39;pt&#39;), legend.position = &#39;none&#39;, axis.title.x = element_blank(), axis.ticks.y = element_line(color = annoclr), axis.line.y = element_line(color = annoclr), axis.text.y = element_text(margin = unit(c(0,3,0,0),&#39;pt&#39;)), axis.title.y = element_text(vjust = -1), strip.background = element_blank(), strip.text = element_blank())+ # separate the plots of the individual species pairs facet_grid(RUN~.) External images are loaded for annotation. legGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/legend-pw-cairo.svg&quot;)))) carGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/caribbean-cairo.svg&quot;)))) panGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/panama-cairo.svg&quot;)))) belGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/belize-cairo.svg&quot;)))) honGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/honduras-cairo.svg&quot;)))) # variables for annotation placement ys &lt;- .0765 ysc &lt;- .0057 yd &lt;- .2195 boxes = data.frame(x=rep(.9675,4),y=c(ys,ys+yd+ysc,ys+(yd+ysc)*2,ys+(yd+ysc)*3)) 4.2.4 Composing the final figure And finally the base plot is combined with the external annotations. F2 &lt;- ggdraw(p1)+ geom_rect(data = boxes, aes(xmin = x, xmax = x + .02, ymin = y, ymax = y + yd), colour = NA, fill = c(rep(lgclr,3),annoclr))+ draw_label(&#39;Global&#39;, x = .9775, y = boxes$y[4]+.08, size = 9, angle=-90,col=&#39;white&#39;)+ draw_label(&#39;Belize&#39;, x = .9775, y = boxes$y[2]+.08, size = 9, angle=-90)+ draw_label(&#39;Honduras&#39;, x = .9775, y = boxes$y[1]+.08, size = 9, angle=-90)+ draw_label(&#39;Panama&#39;, x = .9775, y = boxes$y[3]+.08, size = 9, angle=-90)+ draw_grob(legGrob, 0.01, 0, 1, 0.08)+ draw_grob(carGrob, 0.954, boxes$y[4]+.78*yd, .045, .045)+ draw_grob(belGrob, 0.954, boxes$y[2]+.78*yd, .045, .045)+ draw_grob(honGrob, 0.954, boxes$y[1]+.78*yd, .045, .045)+ draw_grob(panGrob, 0.954, boxes$y[3]+.78*yd, .045, .045) The FST outlier windows are then exported to be used as reference in other figures. write.table(x = data2, file = &#39;../../0_data/0_resources/fst_outlier_windows.txt&#39;,sep=&#39;\\t&#39;,quote = F,row.names = F) write.table(x = musks, file = &#39;../../0_data/0_resources/fst_outlier_ID.txt&#39;,sep=&#39;\\t&#39;,quote = F,row.names = F) The figure is then exported using ggsave(). ggsave(plot = F2,filename = &#39;../output/F2.png&#39;,width = 183,height = 183,dpi = 200,units = &#39;mm&#39;) #ggsave(&#39;fst_mac02_weight_99.98.eps&#39;,width = 183,height = 183,dpi = 200,units = &#39;mm&#39;) "],
["figure-3.html", "5 Figure 3 5.1 Summary 5.2 Details of F3.R 5.3 Details of F3.plot_fun.R", " 5 Figure 3 5.1 Summary This is the accessory documentation of Figure 3. The Figure can be recreated by running the R script F3.R: cd $WORK/3_figures/F_scripts Rscript --vanilla F3.R rm Rplots.pdf 5.2 Details of F3.R In the following, the individual steps of the R script are documented. Is an executable R script that depends on a variety of image manipulation and data managing and genomic data packages. It Furthermore depends on the R scripts F3.functions.R,F3.plot_fun.R which themselves depend on F3.getDXY.R and F3.getFSTs.R (both located under $WORK/0_data/0_scripts). library(grid) library(gridSVG) library(grImport2) library(grConvert) library(tidyverse) library(rtracklayer) library(hrbrthemes) library(cowplot) require(rtracklayer) source(&#39;../../0_data/0_scripts/F3.functions.R&#39;); source(&#39;../../0_data/0_scripts/F3.plot_fun.R&#39;) The script F3.plot_fun.Rcontains a function (create_K_plot()) that plots a single genomic range (a single sub figure) as seen in Figure 3. The Details of this script are explained below. Here we apply the create_K_plot() function one for every subplot (a,b,c &amp; d). Within the function we specify the current LG, the annotation file, the extend of the region in question, candidate genes, genes of secondary interest, SNP positions of interest, and the outlier region ID as defined in Figure 2. p1 &lt;- create_K_plot(searchLG = &quot;LG09&quot;,gfffile = &#39;../../1-output/09_gff_from_IKMB/HP.annotation.named.gff&#39;,xr = c(17800000,18000000), searchgene = c(&quot;SOX10_1&quot;),secondary_genes = c(&quot;RNASEH2A&quot;),searchsnp = c(17871737,17872597,17873443), muskID = &#39;A&#39;) p2 &lt;- create_K_plot(searchLG = &quot;LG12&quot;,gfffile = &#39;../../1-output/09_gff_from_IKMB/HP.annotation.named.gff&#39;,xr = c(20080000,20500000), searchgene = c(&quot;Casz1_2&quot;),secondary_genes = c(),searchsnp = c(20316944,20317120,20323661,20323670,20333895,20347263), muskID = &#39;B&#39;) p3 &lt;- create_K_plot(searchLG = &quot;LG12&quot;,gfffile = &#39;../../1-output/09_gff_from_IKMB/HP.annotation.named.gff&#39;,xr = c(22100000,22350000), searchgene = c(&quot;hoxc13a&quot;),secondary_genes = c(&#39;hoxc10a&#39;,&quot;hoxc11a&quot;,&quot;hoxc12a&quot;,&quot;calcoco1_1&quot;), searchsnp = c(), muskID = &#39;C&#39;) p4 &lt;- create_K_plot(searchLG = &quot;LG17&quot;,gfffile = &#39;../../1-output/09_gff_from_IKMB/HP.annotation.named.gff&#39;,xr = c(22500000,22670000), searchgene = c(&#39;LWS&#39;,&quot;SWS2abeta&quot;,&quot;SWS2aalpha&quot;,&quot;SWS2b&quot;),searchsnp = c(22553970,22561903,22566254), secondary_genes = c(&quot;Hcfc1&quot;,&quot;HCFC1_2&quot;,&quot;HCFC1_1&quot;,&quot;GNL3L&quot;,&quot;TFE3_0&quot;,&quot;MDFIC2_1&quot;,&quot;CXXC1_3&quot;,&quot;CXXC1_1&quot;,&#39;Mbd1&#39;,&#39;CCDC120&#39;), muskID = &#39;D&#39;) An individual result will look like this (p1): We read in the external legend: legGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/legend-pw-cairo.svg&quot;)))) And combine the individual subplots with the legend: F3 &lt;- plot_grid(NULL,NULL,NULL,NULL, p1,NULL,p2,NULL, NULL,NULL,NULL,NULL, p3,NULL,p4,NULL, NULL,NULL,NULL,NULL, ncol=4,rel_heights = c(.03,1,.03,1,.1),rel_widths = c(1,.025,1,.02), labels = c(&#39;a&#39;,&#39;&#39;,&#39;b&#39;,&#39;&#39;, &#39;&#39;,&#39;&#39;,&#39;&#39;,&#39;&#39;, &#39;c&#39;,&#39;&#39;,&#39;d&#39;,&#39;&#39;, &#39;&#39;,&#39;&#39;,&#39;&#39;,&#39;&#39;, &#39;&#39;,&#39;&#39;,&#39;&#39;,&#39;&#39;),label_size = 10)+ draw_grob(legGrob, 0.1, 0, .8, 0.04) The final figure is then exported using ggsave(). ggsave(plot = F3, filename = &#39;../output/F3.pdf&#39;,width = 183,height = 155,units = &#39;mm&#39;,device = cairo_pdf) 5.3 Details of F3.plot_fun.R The actual work for figure 3 is done within the create_K_plot() function. This function loads the data and does the plotting - the F3.R script is basically just a wrapper that selects the setting for each subplot and combines the results. The function starts by importing the functions need to import the different data types (gene models, FST values, GxP p values, dXY) create_K_plot &lt;- function(searchLG,gfffile,xr,searchgene,secondary_genes,searchsnp,muskID){ source(&#39;../../0_data/0_scripts/F3.functions.R&#39;); source(&#39;../../0_data/0_scripts/F3.getFSTs.R&#39;) source(&#39;../../0_data/0_scripts/F3.getDXY.R&#39;) It then sets some global options for the plotting and throws an error if the genomic range is negative. highclr &lt;- &#39;#3bb33b&#39; theme_set(theme_minimal(base_size = 6)) if(xr[1]&gt;xr[2]){print(&#39;error: ill conditioned x-range&#39;)} wdth &lt;- .3 Then the different data types are imported using the previously loaded import functions. # get gene models df_list &lt;- get_anno_df_single_line(searchLG=searchLG,gfffile=gfffile,xrange=xr, genes_of_interest = searchgene,genes_of_sec_interest=secondary_genes,anno_rown=3) # get fst values fst_list &lt;- getFSTS(searchLG,xr,searchsnp,highclr) global_fst&lt;-fst_list$global_fst; data_fst&lt;-fst_list$data_fst # get dxy values dxy_list &lt;- getDXY(searchLG,xr) data_dxy&lt;-dxy_list$data_dxy Then the color themes, axis labels further global plotting settings are set. clr &lt;- c(&#39;#fb8620&#39;,&#39;#1b519c&#39;,&#39;#d93327&#39;) annoclr &lt;- c(&#39;lightgray&#39;,highclr,rgb(.3,.3,.3))[1:3] df_list[[1]] &lt;- df_list[[1]] %&gt;% mutate(label=paste(&quot;italic(&quot;,tolower(Parentgenename),&quot;)&quot;,sep=&#39;&#39;) ) LW &lt;- .3;lS &lt;- 9;tS &lt;- 6 plotSET &lt;- theme(rect = element_blank(), text=element_text(size=tS,color=&#39;black&#39;), panel.grid.major = element_line(colour = rgb(.92,.92,.92)), panel.grid.minor = element_line(colour = rgb(.95,.95,.95)), plot.background = element_blank(), plot.margin = unit(c(3,7,3,3),&#39;pt&#39;), panel.background = element_blank(),#element_rect(colour = &#39;gray&#39;,fill=NULL), legend.position = &#39;none&#39;,#c(.8,.3), legend.direction = &#39;horizontal&#39;, axis.title.y = element_blank(), axis.title.x = element_blank(),#element_text(size=13,face=&#39;bold&#39;,color=&#39;black&#39;), strip.text.y = element_text(size=lS,color=&#39;black&#39;), axis.text.x = element_blank(),#element_text(size=11,color=&#39;black&#39;), strip.placement = &quot;outside&quot;, panel.border = element_blank() ) The individual data tracks are then plotted individually. Later the single tracks are aligned. The first data track contains the gene models from the annotation file as well as the x axis labels (the position on the current LG). p11 &lt;- ggplot()+ # setting the genomic range coord_cartesian(xlim=xr/1000)+ # including the exons as blocks geom_rect(data=df_list[[2]],aes(xmin=ps/1000,xmax=pe/1000,ymin=yl-(wdth/2),ymax=yl+(wdth/2), fill=as.factor(clr),col=as.factor(clr),group=Parent),lwd=.1)+ # including the extend and direction of each mRNA geom_segment(data=(df_list[[1]]%&gt;%filter(strand%in%c(&#39;+&#39;,&#39;-&#39;))), aes(x=ps/1000,xend=pe/1000,y=yl,yend=yl,group=Parent),lwd=.1, arrow=arrow(length=unit(2,&quot;pt&quot;),type=&#39;closed&#39;),color=&#39;black&#39;)+ # including the extend of mRNA (if direction is unknown) geom_segment(data=(df_list[[1]]%&gt;%filter(!strand%in%c(&#39;+&#39;,&#39;-&#39;))), aes(x=ps/1000,xend=pe/1000,y=yl,yend=yl,group=Parent),lwd=.2,color=&#39;black&#39;)+ # include gene name as label geom_text(data=(df_list[[1]]%&gt;%filter(Parentgenename%in%c(searchgene,secondary_genes))), aes(x=labelx/1000,label=label,y=yl-.5),size=1.8,parse=T)+ # this inhereted from ealier versions but kept to allow propper alignment with the other data tracks facet_grid(window~.,scales=&#39;free_y&#39;, switch = &#39;y&#39;,labeller = label_parsed,as.table = T)+ # color settings scale_color_manual(values=annoclr,breaks=c(&quot;x&quot;,&quot;y&quot;,&quot;z&quot;),guide=F)+ scale_fill_manual(values=annoclr,guide=F)+ # axes labels and settings scale_x_continuous(name=paste0(searchLG,&#39; (&#39;,muskID,&#39;, kb)&#39;),expand=c(0,0),position = &#39;top&#39;)+ scale_y_continuous(breaks = seq(0,.75,length.out = 4))+ # plot format adjustments theme(rect = element_blank(), text=element_text(size=tS,color=&#39;black&#39;), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank(), panel.background = element_blank(), legend.position = c(.8,.3), legend.direction = &#39;horizontal&#39;, axis.title.y = element_blank(), axis.title.x = element_text(size=lS,face=&#39;bold&#39;,color=&#39;black&#39;), strip.text.y = element_text(size=lS,color=&#39;black&#39;), axis.text.x = element_text(size=tS,color=&#39;black&#39;), strip.placement = &quot;outside&quot;, panel.border = element_blank()); To achieve proper formatting, the gene labels are adjusted independently depending on the outlier region ID. This is done after plotting to be able to represent special formats within the plot labels (eg. Greek letters AND italics) if(muskID==&#39;a&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(sox10)&quot;,&quot;italic(rnaseh2a)&quot;) } else if(muskID==&#39;b&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(casz1^3)&quot;,&quot;italic(casz1^2)&quot;,&quot;italic(casz1^1)&quot;) } else if(muskID==&#39;c&#39;){ p11$layers[[4]]$data$label &lt;- c(&#39;italic(hoxc10a)&#39;,&quot;italic(hoxc11a)&quot;,&quot;italic(hoxc12a)&quot;,&quot;italic(hoxc13a)&quot;,&quot;italic(calcoco1)&quot;) }else if(muskID==&#39;e&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(polr1d)&quot;,&quot;italic(ednrb)&quot;) } else if(muskID==&#39;f&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(foxd3)&quot;,&quot;italic(alg6)&quot;,&quot;italic(efcab7)&quot;) } else if(muskID==&#39;d&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(hcfc1)&quot;,&quot;italic(hcfc1[2])&quot;,&quot;italic(hcfc1[1])&quot;,&#39;italic(paste(sws2a,&quot;\\u03B2&quot;))&#39;, &#39;italic(paste(sws2a,&quot;\\u03B1&quot;))&#39;,&quot;italic(sw2b)&quot;,&quot;italic(lws)&quot;, &quot;italic(gnl3l)&quot;,&quot;italic(tfe3)&quot;,&quot;italic(mdfic2)&quot;,&quot;italic(cxxc1[3])&quot;, &quot;italic(cxxc1[1])&quot;,&#39;italic(mbd1)&#39;,&#39;italic(ccdc120)&#39;) }else if(muskID==&#39;g&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(lgals3bpb)&quot;,&quot;italic(mpnd)&quot;,&quot;italic(sh3gl1)&quot;,&quot;italic(rorb)&quot;) }else if(muskID==&#39;h&#39;){ p11$layers[[4]]$data$label &lt;- c(&quot;italic(alg2)&quot;,&quot;italic(sec61b)&quot;,&quot;italic(nr4a3)&quot;, &quot;italic(stx17)&quot;,&quot;italic(erp44)&quot;,&quot;italic(invs)&quot;) } The second data track contains the FST values. p12 &lt;- ggplot()+ # setting the genomic range coord_cartesian(xlim = xr)+ # including the global Fst values (SNP) basis geom_point(data = global_fst, aes(x = POS, y = WEIR_AND_COCKERHAM_FST), col = global_fst$clr, size = .2)+ # highlight the SNPs of interest geom_point(data = global_fst, aes(x = POS, y = WEIR_AND_COCKERHAM_FST), col = global_fst$clr2, size = .3)+ # add pairwise fsts as lines (10 kb / 1kb) geom_line(data = (data_fst %&gt;% filter(POS &gt; xr[1],POS&lt;xr[2])) ,aes(x = POS, y = WEIGHTED_FST, col = run), lwd = LW)+ # set color theme scale_color_manual(values = c(clr, annoclr), breaks = c(&quot;nig-pue&quot;, &quot;nig-uni&quot;, &quot;pue-uni&quot;), guide = F)+ scale_fill_manual(values = annoclr, guide = F)+ # this inhereted from ealier versions but kept to allow propper alignment with the other data tracks facet_grid(window~., scales = &#39;free_y&#39;, switch = &#39;y&#39;, labeller = label_parsed, as.table = T)+ # axes settings scale_x_continuous(name = searchLG, expand = c(0, 0), position = &#39;top&#39;)+ scale_y_continuous(breaks = seq(0, .75, length.out = 4))+ scale_linetype(name = &#39;location&#39;, label = c(&#39;Belize&#39;, &#39;Honduras&#39;,&#39;Panama&#39;))+ guides(linetype= guide_legend(override.aes = list(color = &#39;black&#39;)))+ # include pre-set plot theme plotSET The third data track contains the dXY values. p13 &lt;- ggplot()+ # setting the genomic range coord_cartesian(xlim=xr)+ geom_line(data=(data_dxy %&gt;% filter(POS &gt; xr[1],POS&lt;xr[2])) ,aes(x=POS,y=dxy,col=run),lwd=LW)+ # geom_line(data=(data_dxy_pw %&gt;% filter(POS &gt; xr[1],POS&lt;xr[2])) # ,aes(x=POS,y=dxy,col=run,linetype=group),lwd=1)+ scale_color_manual(values=c(clr,annoclr),breaks=c(&quot;nig-pue&quot;,&quot;nig-uni&quot;,&quot;pue-uni&quot;,&quot;x&quot;,&quot;y&quot;,&quot;z&quot;),guide=F)+ # this inhereted from ealier versions but kept to allow propper alignment with the other data tracks facet_grid(window~.,scales=&#39;free_y&#39;, switch = &#39;y&#39;,labeller = label_parsed,as.table = T)+ # axes settings scale_x_continuous(name=searchLG,expand=c(0,0),position = &#39;top&#39;)+ scale_y_continuous(breaks = seq(0,.01,length.out = 5))+ scale_linetype(name=&#39;location&#39;,label=c(&#39;Belize&#39;,&#39;Honduras&#39;,&#39;Panama&#39;))+ guides(linetype= guide_legend(override.aes = list(color = &#39;black&#39;)))+ # include pre-set plot theme plotSET Finally the individual tracks are stacked and aligned using plot_grid() (cowplot package). The complete sub figure is then returned by the function. p2 &lt;- plot_grid(p11,p12,p13, ncol = 1,align = &#39;v&#39;,axis = &#39;r&#39;,rel_heights = c(1.3,rep(1,2))) return(p2)} "],
["figure-4.html", "6 Figure 4 6.1 Summary 6.2 Details of F4.R 6.3 Details of F4.functions.R 6.4 Details of F4.genomeWide_box.R 6.5 Details of F4.peakArea_box.R", " 6 Figure 4 6.1 Summary This is the accessory documentation of Figure 4. The Figure can be recreated by running the R script F4.R: cd $WORK/3_figures/F_scripts Rscript --vanilla F4.R rm Rplots.pdf 6.2 Details of F4.R In the following, the individual steps of the R script are documented. Is an executable R script that depends on a variety of image manipulation and data managing and genomic data packages. It Furthermore depends on the R scripts F4.functions.R,F4.genomeWide_box.R and F4.peakArea_box.R (all located under $WORK/0_data/0_scripts). library(tidyverse) library(scales) library(cowplot) library(grid) library(gridSVG) library(grImport2) source(&#39;../../0_data/0_scripts/F4.functions.R&#39;) The script F4.functions.Rcontains a function (trplot()) that plots a single LD triangle as seen in Figure 4. The Details of this script are explained below. The output depends on the data set plotted (sub figure a contains additional annotations). We create an empty list that is then filled with the subplots using the trplot() function. plts &lt;- list() for(k in 1:7){ plts[[k]] &lt;- trplot(k) } An individual result will look like this (plts[[1]] &amp; plts[[2]]): The basic plots are transformed into grid obgects. These are afterwards rotated by 45 degrees. tN &lt;- theme(legend.position = &#39;none&#39;) pG1 &lt;- ggplotGrob(plts[[1]]+tN) pG2 &lt;- ggplotGrob(plts[[2]]+tN) pG3 &lt;- ggplotGrob(plts[[3]]+tN) pG4 &lt;- ggplotGrob(plts[[4]]+tN) pG5 &lt;- ggplotGrob(plts[[5]]+tN) pG6 &lt;- ggplotGrob(plts[[6]]+tN) pG7 &lt;- ggplotGrob(plts[[7]]+tN) pGr1 &lt;- editGrob(pG1, vp=viewport(x=0.5, y=0.91, angle=45,width = .7), name=&quot;pG1&quot;) pGr2 &lt;- editGrob(pG2, vp=viewport(x=0.25, y=0.535, angle=45,width = .28), name=&quot;pG2&quot;) pGr3 &lt;- editGrob(pG3, vp=viewport(x=0.25, y=0.352, angle=45,width = .28), name=&quot;pG3&quot;) pGr4 &lt;- editGrob(pG4, vp=viewport(x=0.25, y=0.17, angle=45,width = .28), name=&quot;pG4&quot;) pGr5 &lt;- editGrob(pG5, vp=viewport(x=0.75, y=0.535, angle=45,width = .28), name=&quot;pG5&quot;) pGr6 &lt;- editGrob(pG6, vp=viewport(x=0.75, y=0.352, angle=45,width = .28), name=&quot;pG5&quot;) pGr7 &lt;- editGrob(pG7, vp=viewport(x=0.75, y=0.17, angle=45,width = .28), name=&quot;pG5&quot;) Then, the data for the box plots are loaded and plotted within external scripts. source(&#39;../../0_data/0_scripts/F4.genomeWide_box.R&#39;) source(&#39;../../0_data/0_scripts/F4.peakArea_box.R&#39;) Additional annotations are generated and external annotations loaded. Then, variables used for placing the annotations are generated. pGRAD &lt;- ggplot(data = data.frame(x=c(0,1,-1,0), y=c(0,15,15,0)), aes(x,y))+geom_polygon(fill=&#39;lightgray&#39;)+coord_equal()+theme_void() nigGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/nigricans-cairo.svg&quot;)))) pueGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/puella-cairo.svg&quot;)))) uniGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/unicolor-cairo.svg&quot;)))) belGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/belize-cairo.svg&quot;)))) honGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/honduras-cairo.svg&quot;)))) panGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/panama-cairo.svg&quot;)))) globGrob &lt;- gTree(children=gList(pictureGrob(readPicture(&quot;../../0_data/0_img/caribbean-cairo.svg&quot;)))) leg &lt;- get_legend(plts[[1]]+theme(legend.direction = &#39;horizontal&#39;,legend.key.width = unit(60,&#39;pt&#39;))) sY &lt;- .035; sX &lt;- -.03; Finally, the complete Figure 4 is put together. F4 &lt;- ggdraw(pGr1)+ draw_plot(pBOX,-.1,.54,.45,.21)+ draw_plot(boxGenes,.65,.54,.45,.21)+ draw_plot(pGRAD,.307,-.018,.16,.53)+ draw_grob(pGr2,0,0,1,1)+ draw_grob(leg,-.33,.22,1,1)+ draw_grob(pGr3,0,0,1,1)+ draw_grob(pGr4,0,0,1,1)+ draw_grob(pGr5,0,0,1,1)+ draw_grob(pGr6,0,0,1,1)+ draw_grob(pGr7,0,0,1,1)+ draw_grob(nigGrob, 0.85, 0.38, 0.12, 0.095)+ draw_grob(pueGrob, 0.85, 0.2, 0.12, 0.095)+ draw_grob(uniGrob, 0.85, 0, 0.12, 0.095)+ draw_grob(panGrob, 0.35+sX+.01, 0.37+sY, 0.12, 0.045)+ draw_grob(belGrob, 0.35+sX+.01, 0.19+sY, 0.12, 0.045)+ draw_grob(honGrob, 0.35+sX, 0+sY, 0.14, 0.045)+ draw_grob(globGrob, 0.06, .925, 0.08, 0.08)+ draw_plot_label(letters[1:5], x = c(rep(0.01,2),.87,0.01,.54), y=c(.99,.81,.81,.57,.57), size = 14) The final figure is then exported using ggsave(). ggsave(plot = F4,filename = &#39;../output/F4.png&#39;,width = 183,height = 235,units = &#39;mm&#39;,dpi = 150) 6.3 Details of F4.functions.R The script first defines a function for standard error se &lt;- function(x){ x2 &lt;- x[!is.na(x)] return(sd(x2)/sqrt(length(x2)))} And then the plotting function for the LD triangles is defined. Within the trplot() function the sel parameter specifies the selected data set. Then, a lot of annotation elements are generated, data is transformed to match the triangle layout of the plot and the base plot is created. trplot &lt;- function(sel){ # the inventory of LD data files files &lt;- c(&quot;global&quot;,&quot;boc&quot;,&quot;bel&quot;,&quot;hon&quot;,&quot;pue&quot;,&quot;nig&quot;,&quot;uni&quot;,&quot;nig-pue&quot;,&quot;nig-uni&quot;,&quot;pue-uni&quot;) # reading in the LD data data &lt;- read.csv(paste(&#39;../../2_output/08_popGen/07_LD/&#39;,files[sel],&#39;-10000-bins.txt&#39;,sep=&#39;&#39;),sep=&#39;\\t&#39;) # confirm selection message(files[sel]) The genomic positions (start, end, length and spacing between) of the ranges of interest are stored. s1=17620000;s2=19910000;s3=21960000;s4=22320000; e2=20660000;e3=22460000; stp=20000; l1=500000;l2=750000;l3=500000;l4=500000 Then we define a vector containing the x-shift needed to transform the positions of the candidate genes on the respective LGs into the transformed x-axis of the LD triangles scling &lt;- c(-s1,-s2+(l1+stp),-s3+(l1+l2+2*stp),-s4+(l1+l2+l3+(stp*3))) The LD data is read in and positions are transformed. dt &lt;- data %&gt;% mutate(miX = floor(Mx/10000),miY=floor(My/10000)) We generate a data frame that includes the gene positions (taken from the annotation file) the respective entry in our scale transformation vector, the LG and gene name. The genomic positions are then transformed. genes &lt;- data.frame(start=c(17871610,20186151,22225149,22553187,22556763,22561894,22573388), end=c(17873915,20347811,22228342,22555052,22559742,22566321,22575503), sclr=c(1,2,3,4,4,4,4), LG=c(&quot;LG09&quot;,&quot;LG12-1&quot;,&quot;LG12-2&quot;,&quot;LG17&quot;,&quot;LG17&quot;,&quot;LG17&quot;,&quot;LG17&quot;), name=c(&quot;sox10&quot;,&#39;casz1&#39;,&quot;hoxc13a&quot;,&quot;sws2a\\u03B1&quot;,&quot;sws2a\\u03B2&quot;,&quot;sws2b&quot;,&quot;lws&quot;)) genes &lt;- genes %&gt;% mutate(Nx1 = (start+scling[sclr])/10000,Nx2 = (end+scling[sclr])/10000, labPOS = (Nx1+Nx2)/2) The position of some of the gene labels need shifting to avoid overlapping. genes$labPOS[genes$name %in% c(&quot;sws2a\\u03B1&quot;,&quot;sws2a\\u03B2&quot;,&quot;sws2b&quot;,&quot;lws&quot;)] &lt;- genes$labPOS[genes$name %in% c(&quot;sws2a\\u03B1&quot;,&quot;sws2a\\u03B2&quot;,&quot;sws2b&quot;,&quot;lws&quot;)]+c(-16,-1,12,20) Then we define a data frame for the polygons that zoom onto the gene position within the annotation/legend of sub figure a. GZrS &lt;- 40000;GZrS2 &lt;- 20000; #gene zoom offset BS &lt;- c(-5,5,-4,3,-8,8,-20,-10,-6,5,7,16,17,22)*10000 # Backshifter for gene zoom GZdf &lt;- data.frame(x=c(genes$start[1],genes$end[1],genes$end[1]+GZrS+BS[1],genes$start[1]+GZrS+BS[2],genes$start[1], genes$start[2],genes$end[2],genes$end[2]+GZrS+BS[3],genes$start[2]+GZrS+BS[4],genes$start[2], genes$start[3],genes$end[3],genes$end[3]+GZrS+BS[5],genes$start[3]+GZrS+BS[6],genes$start[3], genes$start[4],genes$end[4],genes$end[4]+GZrS+BS[7],genes$start[4]+GZrS+BS[8],genes$start[4], genes$start[5],genes$end[5],genes$end[5]+GZrS+BS[9],genes$start[5]+GZrS+BS[10],genes$start[5], genes$start[6],genes$end[6],genes$end[6]+GZrS+BS[11],genes$start[6]+GZrS+BS[12],genes$start[6], genes$start[7],genes$end[7],genes$end[7]+GZrS+BS[13],genes$start[7]+GZrS+BS[14],genes$start[7])+GZrS2, y=c(genes$start[1],genes$end[1],genes$end[1]-GZrS+BS[1],genes$start[1]-GZrS+BS[2],genes$start[1], genes$start[2],genes$end[2],genes$end[2]-GZrS+BS[3],genes$start[2]-GZrS+BS[4],genes$start[2], genes$start[3],genes$end[3],genes$end[3]-GZrS+BS[5],genes$start[3]-GZrS+BS[6],genes$start[3], genes$start[4],genes$end[4],genes$end[4]-GZrS+BS[7],genes$start[4]-GZrS+BS[8],genes$start[4], genes$start[5],genes$end[5],genes$end[5]-GZrS+BS[9],genes$start[5]-GZrS+BS[10],genes$start[5], genes$start[6],genes$end[6],genes$end[6]-GZrS+BS[11],genes$start[6]-GZrS+BS[12],genes$start[6], genes$start[7],genes$end[7],genes$end[7]-GZrS+BS[13],genes$start[7]-GZrS+BS[14],genes$start[7])-GZrS2, grp=rep(letters[1:7],each=5)) %&gt;% mutate(sclr=rep(c(1,2,3,4,4,4,4),each=5), Nx1 = (x+scling[sclr])/10000, Nx2 = (y+scling[sclr])/10000) Some parameters are set for plotting (base colors for the LD color gradient, gene annotation color, zoom color and y offsets for LG labels, LG background and gene-axis). clr = c(viridis::inferno(5)[c(1,1:5)]) Gcol &lt;- &#39;#3bb33b&#39; Zcol = rgb(.94,.94,.94) DG &lt;- rgb(.4,.4,.4) LGoffset &lt;- 15 GLABoffset &lt;- 8 At this point, the data set selection is checked. If the current selection is the global data set, the additional annotation is included. if(sel %in% c(1,8)){ For the additional annotation, several helpers are needed and defined as data frames. The construction of these is obscured by the fact that in the base plot, the later horizontal bands are plotted tilted by an angle of 45 degrees. Therefore, a simple y offset affects both x and y axis in the base plot. rS &lt;- 100000 # width of grey annotation band zmRange &lt;- data.frame(x=c(s1,s1+l1,s1+l1+rS,s1+rS,s1, s2,s2+l2,s2+l2+rS,s2+rS,s2, s3,s3+l3,s3+l3+rS,s3+rS,s3, s4,s4+l4,s4+l4+rS,s4+rS,s4), y=c(s1,s1+l1,s1+l1-rS,s1-rS,s1, s2,s2+l2,s2+l2-rS,s2-rS,s2, s3,s3+l3,s3+l3-rS,s3-rS,s3, s4,s4+l4,s4+l4-rS,s4-rS,s4), grp=rep(letters[1:4],each=5)) %&gt;% mutate(sclr=rep(1:4,each=5), Nx1 = (x+scling[sclr])/10000-1, Nx2 = (y+scling[sclr])/10000) rS2 &lt;- .75*rS # width of dark grey LG label band zmRange2 &lt;- data.frame(x=c(s1,s1+l1,s1+l1+rS2,s1+rS2,s1, s2,s2+l2,s2+l2+rS2,s2+rS2,s2, s3,s3+l3,s3+l3+rS2,s3+rS2,s3, s4,s4+l4,s4+l4+rS2,s4+rS2,s4), y=c(s1,s1+l1,s1+l1-rS2,s1-rS2,s1, s2,s2+l2,s2+l2-rS2,s2-rS2,s2, s3,s3+l3,s3+l3-rS2,s3-rS2,s3, s4,s4+l4,s4+l4-rS2,s4-rS2,s4), grp=rep(letters[1:4],each=5)) %&gt;% mutate(sclr=rep(1:4,each=5), Nx1 = (x+scling[sclr])/10000-1, Nx2 = (y+scling[sclr])/10000) rS3 &lt;- .07*rS # width of dark grey gene axis zmRange3 &lt;- data.frame(x=c(s1,s1+l1,s1+l1+rS3,s1+rS3,s1, s2,s2+l2,s2+l2+rS3,s2+rS3,s2, s3,s3+l3,s3+l3+rS3,s3+rS3,s3, s4,s4+l4,s4+l4+rS3,s4+rS3,s4), y=c(s1,s1+l1,s1+l1-rS3,s1-rS3,s1, s2,s2+l2,s2+l2-rS3,s2-rS3,s2, s3,s3+l3,s3+l3-rS3,s3-rS3,s3, s4,s4+l4,s4+l4-rS3,s4-rS3,s4), grp=rep(letters[1:4],each=5)) %&gt;% mutate(sclr=rep(1:4,each=5), Nx1 = (x+scling[sclr])/10000-1, Nx2 = (y+scling[sclr])/10000) Then one data frame is defined containing the LG labels and positions and another one containing the borders of the genomic regions. zmLab &lt;- data.frame(x=c(s1+.5*l1,s2+.5*l2,s3+.5*l3,s4+.5*l4), label=c(&#39;LG09 (A)&#39;,&#39;LG12 (B)&#39;,&#39;LG12 (C)&#39;,&#39;LG17 (D)&#39;)) %&gt;% mutate(sclr=1:4, Nx= (x+scling[sclr])/10000) zmEND &lt;- data.frame(x=c(s1,s1+l1, s2,s2+l2, s3,s3+l3, s4,s4+l4)) %&gt;% mutate(sclr=rep(1:4,each=2), Nx = ((x+scling[sclr])/10000)+rep(c(2.5,-4),4), label=round((x/1000000),2)) Three vectors containing text sizes (adjusted to Figure 4 &amp; Suppl. Figure 12) are created. textSCALE1 &lt;- c(1.8,rep(NA,6),.7) textSCALE2 &lt;- c(2,rep(NA,6),1) textSCALE3 &lt;- c(3.5,rep(NA,6),1.75) Then, the base plot is created and stored in p1. # the data set is filtered to remove missing values p1 &lt;- ggplot(dt %&gt;% filter(!is.na(Mval)),aes(fill=Mval))+ # the aspect ratio of x and y scale needs to be fixed to 1:1 coord_equal()+ # adding the light gray band highlighting the genomic ganges geom_polygon(inherit.aes = F,data=zmRange, aes(x=Nx1+1,y=Nx2-1,group=grp),fill=&#39;lightgray&#39;)+ # adding the dark gray background of the LG labels geom_polygon(inherit.aes = F,data=zmRange2, aes(x=Nx1+11,y=Nx2-11,group=grp),fill=DG)+ # adding the dark gray gene axis geom_polygon(inherit.aes = F,data=zmRange3, aes(x=Nx1+1.1,y=Nx2-1.1,group=grp),fill=DG)+ # adding the light gene zoom polygons geom_polygon(inherit.aes = F,data=GZdf, aes(x=Nx1,y=Nx2,group=grp),fill=Zcol)+ # adding green gene highlights geom_segment(inherit.aes = F,data=genes, aes(x=Nx1+1,y=Nx1-1,xend=Nx2+1,yend=Nx2-1), col=Gcol,size=1.5)+ # adding LD data geom_tile(aes(x=miX,y=miY))+ # adding labels for genomic range labels geom_text(inherit.aes = F,data=zmEND, aes(x=Nx+LGoffset,y=Nx-LGoffset,label=paste(label,&#39;\\n(Mb)&#39;)), angle=45,size=textSCALE1[sel])+ # adding gene labels geom_text(inherit.aes = F,data=genes, aes(x=labPOS+GLABoffset,y=labPOS-GLABoffset,label=name), angle=-45,fontface=&#39;italic&#39;,size=textSCALE2[sel])+ # adding LG labels geom_text(inherit.aes = F,data=zmLab, aes(x=Nx+LGoffset-.8,y=Nx-LGoffset+.8,label=label), angle=-45,fontface=&#39;bold&#39;,size=textSCALE3[sel],col=&#39;white&#39;)+ # format x and y axis scale_x_continuous(expand=c(0,0))+ scale_y_continuous(expand=c(0,0), trans = &#39;reverse&#39;)+ # plot layout theme theme_void()+ theme(legend.position = c(.7,.75),legend.direction = &#39;vertical&#39;) If the current selection is not the global data set, the only the base triangle is plotted. } else { # the data set is filtered to remove missing values p1 &lt;- ggplot(dt %&gt;% filter(!is.na(Mval)),aes(fill=Mval))+ # the aspect ratio of x and y scale needs to be fixed to 1:1 coord_equal()+ # adding green gene highlights geom_segment(inherit.aes = F,data=genes, aes(x=Nx1+1.5,y=Nx1-1.5,xend=Nx2+1.5,yend=Nx2-1.5), col=Gcol,size=1)+ # adding LD data geom_tile(aes(x=miX,y=miY))+ # format x and y axis scale_x_continuous(expand=c(0,0))+ scale_y_continuous(expand=c(0,0), trans = &#39;reverse&#39;)+ # plotting layout theme theme_void()+ theme(legend.position = c(.7,.75),legend.direction = &#39;vertical&#39;) } Finally, the genreal color paletee is set and the function returns the current base plot stored in p1. cs &lt;- scale_fill_viridis_c(name=expression(bar(r^2)), limits = c(0,0.55),values = c(0,.08,1), option = &#39;B&#39;,direction = -1) return(p1+cs) } The script F4.functions.R contains a second helper function which is used for the box plots. This function retuns a set of summary statistics for the r^2 distributions: run_one_sample_test &lt;- function(pop,n,data){ df &lt;- data %&gt;% filter(run == pop) x &lt;- df$R.2 %&gt;% na.omit() %&gt;% as.vector() exp_mu &lt;- 1/(2*n) test_result &lt;- wilcox.test(x,mu = exp_mu) return(tibble(population = pop, mean = x %&gt;% mean, sd = x %&gt;% sd, se = x %&gt;% se, var = x %&gt;% var, n = length(x), exp_mu = exp_mu) %&gt;% bind_cols( test_result %&gt;% broom::tidy())) } 6.4 Details of F4.genomeWide_box.R Within this script, the ILD data of the genome wide subsets of SNPs is loaded and and plotted. First we read in the global data set, then we loop over the individual population subsets and append the data to the global data set. Since we’re only interested in the distribution of r[2], we select only this column and create extra columns for the run ID, and the run type (global, subset by species, subset by location) BW &lt;- read.csv(&#39;../../2_output/08_popGen/07_LD/subsets/glob_between.interchrom.hap.ld&#39;,sep=&#39;\\t&#39;) %&gt;% select(R.2) %&gt;% mutate(type=&#39;Global&#39;,run=&#39;Global&#39;) for(j in 1:6){ flS &lt;- c(&quot;boc&quot;,&quot;bel&quot;,&quot;hon&quot;,&quot;pue&quot;,&quot;nig&quot;,&quot;uni&quot;) flL &lt;- c(&quot;Panana&quot;,&quot;Belize&quot;,&quot;Honduras&quot;,&quot;H. puella&quot;,&quot;H. nigricans&quot;,&quot;H. unicolor&quot;) flT &lt;- c(rep(&#39;Geo&#39;,3),rep(&#39;Spec&#39;,3)) k &lt;- flS[j] q &lt;- flL[j] u &lt;- flT[j] print(j) BW &lt;- BW %&gt;% rbind(.,(read.csv(paste(&#39;../../2_output/08_popGen/07_LD/subsets/glob_between.&#39;,k,&#39;.interchrom.hap.ld&#39;,sep=&#39;&#39;), sep=&#39;\\t&#39;) %&gt;% select(R.2) %&gt;% mutate(type=u,run=q))) } Then, we arrange the order of the runs. BW$run &lt;- factor(BW$run,levels=c(&#39;Global&#39;,&quot;Panana&quot;,&quot;Belize&quot;,&quot;Honduras&quot;, &quot;H. nigricans&quot;,&quot;H. puella&quot;,&quot;H. unicolor&quot;)) To be able to include the mean values to the box plots we create a small summary table of the LD data. dt2 &lt;- BW %&gt;% group_by(run) %&gt;% summarise(meanR2=mean(R.2,na.rm = T),medR2=median(R.2,na.rm = T)) %&gt;% gather(key = &#39;type&#39;,value = &#39;val&#39;,2:3) The the colors for the plot are set. BC &lt;-rgb(.7,.7,.7) clr &lt;-colorRampPalette(colors = c(&#39;white&#39;,BC,&#39;black&#39;))(8) We run the summary statistics on the data (basically only to generate the expected mean r^2). # tests locations &lt;- c(&#39;Global&#39;,&quot;Panana&quot;,&quot;Belize&quot;,&quot;Honduras&quot;,&quot;H. puella&quot;,&quot;H. nigricans&quot;,&quot;H. unicolor&quot;) sample_sizes &lt;- c(110,39,36,35,37,37,36) test_gw &lt;- purrr::map2(locations,sample_sizes,run_one_sample_test,data=BW) %&gt;% bind_rows() test_gw$xS &lt;- factor(test_gw$population, levels = c(&#39;Global&#39;,&quot;Panana&quot;,&quot;Belize&quot;,&quot;Honduras&quot;, &quot;H. nigricans&quot;,&quot;H. puella&quot;,&quot;H. unicolor&quot;)) %&gt;% as.numeric() And finally , the data is plotted. # initializing the plot pBOX &lt;- ggplot(BW,aes(x=run,y=R.2))+ # adding box plots geom_boxplot(fill=BC,width=.7,outlier.size = .1)+ # set a fixed aspect ratio coord_fixed(ylim=c(0,.031),ratio = 250)+ # adding expected mean value geom_segment(inherit.aes = FALSE,data=test_gw, aes(x=xS-.38,xend=xS+.38,y=exp_mu,yend=exp_mu), col=&#39;red&#39;,size=.5)+ # adding mean and median values geom_point(inherit.aes = F, data=dt2,aes(x=run,y=val,fill=type),shape=23,size=1)+ # settting the axis and color labels scale_x_discrete(labels = expression(Global,Panama,Belize,Honduas, italic(&quot;H. nigricans&quot;),italic(&quot;H. puella&quot;),italic(&quot;H. unicolor&quot;)))+ scale_y_continuous(name=expression(genome~wide~ILD~(italic(r)^2)))+ scale_fill_manual(&#39;&#39;,values = clr[c(6,1)],labels=c(&#39;mean&#39;,&#39;median&#39;))+ # formatting the legend guides(shape = guide_legend(ncol = 1))+ # adjusting the plot theme theme(legend.position = c(-.4,1.27), text = element_text(size=10), axis.title.x = element_blank(), axis.text.y = element_text(size=7), axis.text.x = element_text(size=7,angle=45,hjust = 1), panel.background = element_blank(), plot.background = element_blank()) 6.5 Details of F4.peakArea_box.R Within this script, the ILD data of the peak area subsets of SNPs is loaded and and plotted. First, we loop over the individual runs and combine them into a single data set. Since we’re only interested in the distribution of r^2, we select only this column and create extra columns for the run ID, and the run type (global, subset by species, subset by location) dataBoxGenes &lt;- data.frame(R.2=c(),run=c(),stringsAsFactors = F) for (file in dir(&quot;../../2_output/08_popGen/07_LD/&quot;,pattern = &quot;interchrom.hap.ld.gz&quot;)) { nm &lt;- str_remove_all(file, &quot;spotlight.&quot;) %&gt;% str_remove_all(.,&quot;.interchrom.hap.ld.gz&quot;) if(nm %in% c(&quot;global&quot;,&quot;uni&quot;,&quot;pue&quot;,&quot;nig&quot;,&quot;hon&quot;,&quot;bel&quot;,&quot;boc&quot;)){ dataBoxGenes &lt;- read.csv(gzfile(paste(&#39;../../2_output/08_popGen/07_LD/&#39;,file,sep=&#39;&#39;)),sep=&#39;\\t&#39;) %&gt;% mutate(run = nm) %&gt;% select(R.2,run) %&gt;% rbind(dataBoxGenes,.) } } Then, we arrange the order of the runs. dataBoxGenes$xS &lt;- factor(dataBoxGenes$run, levels = c(&quot;global&quot;,&quot;boc&quot;,&quot;bel&quot;,&quot;hon&quot;,&quot;nig&quot;,&quot;pue&quot;,&quot;uni&quot;)) To be able to include the mean values to the box plots we create a small summary table of the LD data. BoxGenes_summary &lt;- dataBoxGenes %&gt;% group_by(xS) %&gt;% summarise(run = run[1], meanR2 = mean(R.2,na.rm = T), medR2 = median(R.2,na.rm = T), sdR2 = sd(R.2,na.rm=T), seR2 = se(R.2), nanr = sum(is.na(R.2)), minR2 = min(R.2,na.rm = T), maxR2 = max(R.2,na.rm = T), lengthR2 = length(R.2)) %&gt;% select(xS,meanR2,medR2) %&gt;% gather(key = &#39;type&#39;,value = &#39;val&#39;,2:3) We run the summary statistics on the data (basically only to generate the expected mean r^2). # tests locations &lt;- c(&quot;global&quot;,&quot;boc&quot;,&quot;bel&quot;,&quot;hon&quot;,&quot;nig&quot;,&quot;pue&quot;,&quot;uni&quot;) sample_sizes &lt;- c(110,39,36,35,37,37,36) test_area &lt;- purrr::map2(locations,sample_sizes,run_one_sample_test,data=dataBoxGenes) %&gt;% bind_rows() test_area$xS &lt;- factor(test_area$population, levels = c(&quot;global&quot;,&quot;boc&quot;,&quot;bel&quot;,&quot;hon&quot;,&quot;nig&quot;,&quot;pue&quot;,&quot;uni&quot;)) %&gt;% as.numeric() And finally , the data is plotted. # initializing the plot boxGenes &lt;- ggplot(dataBoxGenes,aes(x=xS))+ # adding box plots geom_boxplot(aes(y=R.2),fill=BC,width=.7,outlier.size = .1) + # adding mean and median values geom_point(data=BoxGenes_summary,aes(y=val,fill=type),shape=23,size=1)+ # set a fixed aspect ratio coord_fixed(ylim=c(0,.031),ratio = 133)+ # adding expected mean value geom_segment(inherit.aes = FALSE,data=test_gw, aes(x=xS-.38,xend=xS+.38,y=exp_mu,yend=exp_mu), col=&#39;red&#39;,size=.5)+ # settting the axis and color labels scale_y_continuous(expression(candidate~gene~ILD~(italic(r)^2)))+ scale_x_discrete(labels = expression(Global,Panama,Belize,Honduas, italic(&quot;H. nigricans&quot;),italic(&quot;H. puella&quot;),italic(&quot;H. unicolor&quot;)))+ scale_fill_manual(&#39;&#39;,values = clr[c(6,1)],labels=c(&#39;mean&#39;,&#39;median&#39;))+ # formatting the legend guides(shape = guide_legend(ncol = 1))+ # adjusting the plot theme theme(legend.position = c(.35,1.27), text = element_text(size=10), axis.title.x = element_blank(), axis.title.y = element_text(hjust=1.25), axis.text.y = element_text(size=7), axis.text.x = element_text(size=7,angle=45,hjust = 1), panel.background = element_blank(), plot.background = element_blank()) "],
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
......@@ -25,7 +25,7 @@
<meta name="author" content="Kosmas Hench">
<meta name="date" content="2018-12-10">
<meta name="date" content="2018-12-12">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment