BM214-Workshop-3/exercise-03_reporter.qmd at main · sipbs-compbiol/BM214-Workshop-3 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
webr:
  packages: ["ggplot2", "tidyverse"]
filters:
  - webr
---

# Reporter Gene Expression

```{webr-r}
#| context: setup

# Download reporter data
download.file('https://raw.githubusercontent.com/sipbs-compbiol/BM214-Workshop-3/main/assets/data/reporter_curves.csv', 'reporter_curves.csv')

library(ggplot2)
library(tidyverse)
```

## Introduction

In this exercise you will use absorbance ratio data obtained from cloning your reporter gene downstream of a set of nine candidate kanamycin-responsive promoter regions.

Your reporter gene absorbs light at 700nm so, by using a spectrophotometer and measuring the attenuation of light at 700nm (OD700), you can estimate the amount of reporter gene that is produced.

::: { .callout-caution }
It is not enough to only measure the absorbance due to the reporter gene. We are interested in how much reporter gene is produced _by each cell_. So we must take into account how many cells are present in the medium.

To see what we mean:

If _500 cells each produced 10 units_ of reporter gene, we'd expect around 5000 units of absorbance ($500 \times 10 = 5000$). But if there are only _100 cells producing 50 units_ of reporter gene each, we'd still expect around 5000 units of absorbance ($100 \times 50 = 5000$). **A single measurement of OD700 alone could not tell the difference between these results.**

We must also take the number of cells into account, so we also measure absorbance at 600nm (OD600) as a measure of cell density. Then, by taking the _ratio_ of OD700 to OD600 ($\frac{\textrm{OD700}}{\textrm{OD600}}$) we can _normalise_ the measurement of reporter gene by the amount of the organism that is present.

With our example above, assuming that one cell gives one unit of absorbance at 600nm:

- the situation with 500 cells producing 10 units of reporter gene each would have an OD700/OD600 ratio of $5000/500 = 10$
- the case of 100 cells producing 50 units of reporter gene would have an OD700/OD600 ratio of $5000/100 = 50$

and the resulting value represents the level of expression of the reporter per cell.
:::

In this experiment, we are seeking a reporter system that responds to high concentrations of kanamycin by expressing, or _switching on_, the reporter gene. So we are looking for plasmids (here named `pABS1.01` to `pABS1.09`) that have a strong expression response at high concentrations of kanamycin, but a weaker expression response at lower concentrations.

To find good candidate reporter systems, we plot the OD700/OD600 ratio (_dependent_ variable) against kanamycin concentration (_independent_ variable), to visualise which systems appear to have the reporter characteristics we are looking for.

::: { .callout-important }
In this part of the workshop, you will plot the ratio of your reporter absorbance (OD700) to your organism growth (OD600) against the concentration of kanamycin applied, using `R`, in order to identify good candidate reporters.
:::

## Load and inspect your data

Your data is in the file `reporter_curves.csv`, so load it into `R` using the `read.csv()` function, and inspect the format of your data, just as you did for the yeast growth data in @sec-yeast-expt.

::: { .callout-important title="Task" }
Use the `WebR` cell below to load your data.
:::

```{webr-r}
# Use read.csv() to load your data in this cell
# Use glimpse() or head() to inspect the format of your data

```

::: { .callout-warning collapse="true" }
## Help! I'm stuck!

- Check back with @sec-yeast-expt to see if you can use anything you've already learned

Use the `R` code below to load your data

```r
data <- read.csv("reporter_curves.csv")
glimpse(data)
```
:::

Your data contains three columns:

- `sample`: this indicates which sample was measured (control, or plasmid ID)
- `conc`: the concentration of kanamycin that was applied
- `abs_ratio`: the measure $\frac{\textrm{OD700}}{\textrm{OD600}}$ ratio

### Make a basic `ggplot2` figure of your reporter data

You have loaded absorbance ratio data for nine candidate kanamycin reporters and a control sample. You're going to plot these in the same way as you plotted the yeast growth data in @sec-yeast-expt.

::: { .callout-important title="Task" }
Use the `WebR` cell below to make a scatterplot of your data, showing absorbance data against kanamycin concentration.
:::

::: { .callout-tip collapse="true" }
## I need a hint!

- Use the `ggplot()` and `aes()` functions to create your base layer with the data, and how you want to group your data.
- Use a `geom_point()` layer to visualise the datapoints
- You're plotting the `abs_ratio` column against `conc`, and grouping data by `sample`
- Don't forget to include a line that shows your figure!
:::

```{webr-r}
# Make a basic plot of your reporter curve data in this cell
# Use the ggplot(), aes(), and geom_point() functions to visualise your data.

```

::: { .callout-warning collapse="true" }
## Help! I'm stuck!

- Check back with @sec-yeast-expt to see if you can use anything you've already learned

Use the `R` code below to load your data

```r
fig <- ggplot(data, aes(x=conc, y=abs_ratio, color=sample)) +
         geom_point()
fig
```

:::

::: { .callout-caution collapse="true" }
## Result

The figure output shows the datapoints, but there are a lot of reporters, so there are a lot of colours. It's difficult to track any single reporter because of the overlap between points, and confusion of colours.

![A `ggplot()` graph of reporter absorbance ratios against kanamycin concentration.](assets/images/reporter-01.png){#fig-reporter-01 width=80%}

:::

### Make a lineplot to help with visualisation

One of the advantages of `ggplot2` is that it is easy to add and swap _layers_. We don't only have to make a scatterplot, we can add a lineplot to our figure as well. We do this by adding a `geom_line()` layer.

::: { .callout-important title="Task" }
Use the `WebR` cell below to add a lineplot to your data.
:::

::: { .callout-tip collapse="true" }
## I need a hint!

- Use a `geom_line()` layer to visualise the datapoints
- Don't forget to use `+` to add the layer!
:::

```{webr-r}
# Add a line plot to your figure in this cell
# Use the geom_line() function to visualise your data.

```

::: { .callout-warning collapse="true" }
## Help! I'm stuck!

- Check back with @sec-yeast-expt to see if you can use anything you've already learned

Use the `R` code below to load your data

```r
fig <- ggplot(data, aes(x=conc, y=abs_ratio, color=sample)) +
         geom_point() +
         geom_line()
fig
```

:::

::: { .callout-caution collapse="true" }
## Result

The lines help to follow individual candidate reporters, but the plot is still jumbled up in the middle, and the similarities between some of the colours make it difficult to follow.

![A `ggplot()` graph of reporter absorbance ratios against kanamycin concentration, with lines to aid tracking data.](assets/images/reporter-02.png){#fig-reporter-02 width=80%}

:::

### Use **facets** to make the visualisation clearer

Another advantage of `ggplot2` is that we can quickly make major changes to the layout of a plot, in order to improve visualisation. Here, you will use _facets_ to plot each sample separately in its own smaller subplot (called a _facet_). This is a common way to present data for multiple factors of interest, and will avoid the visualisation problems caused by overlapping lines with similar colours.

To do this, we use the `facet_wrap()` styling layer. We need to tell `facet_wrap()` what variable should be plotted in each separate _facet_. If we want to place each sample in its own facet, we would use `facet_wrap(~sample)` - **NOTE: the variable `sample` is preceded by a _tilde_ (`~`)**.

::: { .callout-important title="Task" }
Use the `WebR` cell below to plot your figure with a separate facet for each sample.
:::

::: { .callout-tip collapse="true" }
## I need a hint!

- Use `facet_wrap(~sample)` to make a separate subplot for each sample.
:::

```{webr-r}
# Make a facet plot in this cell
# Use the facet_wrap() function to visualise your data.

```

::: { .callout-warning collapse="true" }
## Help! I'm stuck!

- Check back with @sec-yeast-expt to see if you can use anything you've already learned

Use the `R` code below to plot your data

```r
fig <- ggplot(data, aes(x=conc, y=abs_ratio, color=sample)) +
         geom_point() +
         geom_line() +
         facet_wrap(~sample)
fig
```

:::

::: { .callout-caution collapse="true" }
## Result

Now that we have a separate plot for each sample, it is easy to see which candidate reporters look like they might be worth taking forward. Notice that the $x$- and $y$ axis scales are the same in each _facet_.

![A `ggplot2` facet plot of reporter absorbance ratios against kanamycin concentration](assets/images/reporter-03.png){#fig-reporter-03 width=80%}

:::

### Tidying up your figure

The current axis labelling of the figure could be improved. You can change the axis labels to something more meaningful by using the `labs()` styling layer. To change the $x$- and $y$-axis labels you might use a layer like `labs(x="X-axis title", y="Y-axis title")`.

::: { .callout-important title="Task" }
Use the `WebR` cell below to change the $x$-axis label to "[kanamycin]" and the $y$-axis label to "OD700/OD600".
:::

::: { .callout-tip collapse="true" }
## I need a hint!

- Use `labs()` with the `x=` and `y=` arguments to change the axis labels for your plot
:::

```{webr-r}
# Change the x- and y-axis labels in this cell
# Use the labs() function to change the labels

```

::: { .callout-warning collapse="true" }
## Help! I'm stuck!

- Check back with @sec-yeast-expt to see if you can use anything you've already learned

Use the `R` code below to plot your data

```r
fig <- ggplot(data, aes(x=conc, y=abs_ratio, color=sample)) +
         geom_point() +
         geom_line() +
         facet_wrap(~sample) +
         labs(x="[kanamycin]", y="OD700/OD600")
fig
```

:::

::: { .callout-caution collapse="true" }
## Result

![A `ggplot2` facet plot of reporter absorbance ratios against kanamycin concentration](assets/images/reporter-04.png){#fig-reporter-04 width=80%}

:::

### Make a monochrome plot

You can change the presentation of your plot using the functions you learned in @sec-yeast-expt, to generate a monochrome plot ready for publication.

::: { .callout-important title="Task" }
Use the `WebR` cell below to convert your plot to monochrome.
:::


::: { .callout-tip collapse="true" }
## I need a hint!

- use `scale_colour_grey()` to convert colours to greyscale
- use `theme_bw()` to make the theme black and white
- use `theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())` to remove grid lines
:::

```{webr-r}
# Make a monochrome version of your plot in this cell

```

::: { .callout-warning collapse="true" }
## Help! I'm stuck!

- Check back with @sec-yeast-expt to see if you can use anything you've already learned

Use the `R` code below to plot your data

```r
fig <- ggplot(data, aes(x=conc, y=abs_ratio, color=sample)) +
         geom_point() +
         geom_line() +
         facet_wrap(~sample) +
         labs(x="[kanamycin]", y="OD700/OD600") +
         scale_colour_grey() +
         theme_bw() +
         theme(panel.grid.major = element_blank(),
               panel.grid.minor = element_blank())
fig
```

:::

::: { .callout-caution collapse="true" }
## Result

![A monochrome `ggplot2` facet plot of reporter absorbance ratios against kanamycin concentration](assets/images/reporter-05.png){#fig-reporter-05 width=80%}

:::

## Summary

::: { .callout-note title="Well Done!"}
After successfully working through this section you should be able to:

- import reporter gene expression/absorbance data into `R`
- use `R` and `ggplot2` to visualise expression/absorbance data
- interpret the meaning of expression/abundance data
:::

::: { .callout-important }
**Please answer the questions below in the formative quiz on MyPlace**

- [MyPlace formative quiz]({{< var myplace.quiz1 >}})
:::