## Monday, February 19, 2018

### A Bird's View of Color

Most birds have much better color vision than mammals. In general, they have four distinct types of color-sensing cones in their eyes, compared to the usual three for us and two for most other mammals. The fourth cone that birds have is sensitive to ultraviolet (UV), letting them perceive colors we can only imagine. The other three cones don't precisely match up with our three, but they cover basically the same range of frequencies.

To get an idea of what things look like to birds, we have to incorporate that UV information we can't see. Taking photographs of the UV world can take some special equipment, but even consumer grade cameras can be altered to better capture UV light. I've been interested in photography for a while and I've been interested in UV photography, but I haven't yet invested in the equipment I'd need to take UV photos. For now I have to rely on people posting occasional UV photos to get an idea what things look like. (For a good selection of photos in UV and other frequency bands, go take a look at: photographyoftheinvisibleworld.blogspot.com)

You can look at UV light photos next to visual light photos to get an idea of what things look like to birds, but I decided to see if I could do one step better. I wrote a script which takes four image channels (Red, Green, Blue, Ultraviolet) and compresses them into the three we can see (RGB). The math for this is pretty simple and so doesn't match what really happens in detail, but it might help us get an idea of what things look like to birds.

$$R_h = R_b + \frac{G_b}{3}$$
$$G_h = \frac{2(G_b+B_b)}{3}$$
$$B_h = \frac{B_b}{3} + U_b$$

 RGBU (bird vision) -> RGB (human vision).
This math is represented visually in the diagram at right. At the top are the four image channels that birds can see and at bottom are the three we can see. All the information that birds can see in their red channel goes into our red channel, along with a third of what birds see in their green channel. The other channels of a bird's vision are similarly apportioned into the channels we can see by moving from top to bottom in the figure.

Conceptually, this is similar to drawing a 3D cube on a 2D sheet of paper. Some information is lost in the transformation, but much of it comes through the process and allows us to visualize something that otherwise can't be done (in 2D). In this case, we're transforming a 4D data structure into a 3D one, that just happens to be presentable as a color photo.

To show what this math means for a photo, I found a nice example paired set of visual and UV images from photographyoftheinvisiblew... to work with.

 RGBU image channels along top; RGB and compressed RGBU images at bottom. Original photos from [link].

In UV this flower of the Marsh Marigold (Caltha palustris) is dramatically marked, but in our composite RGBU image it really doesn't stand out that much. Flowers rarely utilize birds for their pollination, so it shouldn't be any surprise that they might not look too dramatic to birds. (Bees can't see red, but can see UV, so they'd have no problem seeing the marks on this flower.)

Can we find some nice UV imagery of something that birds would care about? Well, it's a bit harder to make paired visual and UV photos of a creature which is suspicious of your intentions. Recently, I came across a post by twitter user @JamieDunning illustrating some dramatic fluorescence on the beak of a Puffin specimen he was examining. It occurred to me that something strongly fluorescent should also be UV-dark, since the UV energy is being absorbed and released at visual frequencies instead of reflected. I realized from this I could construct a simulated UV-channel by inverting the fluorescence image (and mapping the images together to correct for the different camera position (and using a bit of artistry to clean up the fluorescence image)). Performing the same image channel compression as I did earlier, we get the lower-right portion of the next figure.

 RGBU image channels along top; RGB and compressed RGBU images at bottom. Original photos from [link].

It would be nice to have a comparable real UV-photo for doing this comparison, but that the simulated bird-vision of the Puffin's beak shows a much greater color contrast than we saw with the Marsh Marigold (and that both results align with the evolutionarily expected results) suggests this might be a useful approach.

My wish-list, money-is-no-option, sort of data for doing this kind of image analysis would be that produced from hyper-spectral imaging. A hyper-spectral camera takes images at a large range of narrow frequency bands. We could map that data (vs. the color sensitivity spectra illustrated in the figure at the top of this post) to what either birds or humans can see, as well as something like the transformation I've described here to illustrate in human vision what a bird could see.

I'm not sure anyone would provide sufficient funding for me to explore this, however.

@JamieDunning recently submitted a paper comparing the original photo with spectrophotometer examination of regions of the bill. I'm looking forward to their paper to see how the results compare to the predictions from my playing around with the math.

References:

## Monday, February 12, 2018

### Chromosome Painting

The figure at left shows the metaphase chromosomes of a pepper root-tip, in all their squiggly false-color glory. In it you can count the number of chromosomes and (with some little background research) determine the overall ploidy of the source plant. (It has 24 chromosomes, so is a diploid.)

The original image had all the same information, but it was much harder to look at and learn from. This is a fundamental lesson of, and reason for, data visualization.

 Step 0.
The original image comes from Twitter user @ChaoticGenetics. They're studying chile genetics and routinely post cool photos derived from their work. The question paired with this image was, "How many chromosomes does everyone see?" I figured I'd take a stab at it.

Lets dive into the details of how I made my figure. I use GIMP for essentially all my image editing needs. With each step figure I'll include the menu options for each command I use in brackets, so others can repeat the procedure.

0) Load the image with GIMP. Open "Tool Options" [Control-B] and "Layers" [Control-L] windows.

 Step 1.
 Step 3.
1) Select a rectangular region around the interesting looking chromosomes, then crop [Image > Crop to Selection] the image.

 Step 2.
2) Select the "Eraser Tool" and erase all the background color and spots that don't appear as chromosomes.

3) Right-click on the image in the layer window. Select, "Add Alpha Channel". Discard the color information in the image [Colors > Desaturate]. Remove the background color [Colors > Color to Alpha (Set "From:" color to white.)].

 Step 4.
4) From the layer window, make a new image layer filled in white. Move this layer beneath the image layer. Select the image layer.

 Step 5.
5) Using the "Free Select Tool", draw around a visually distinct chromosome. Invert the color of the selection [Colors > Invert]. Change the color of the selection [Colors > Components > Channel Mixer... (red=50,0,0; green=0,0,0; blue=0,0,50)].

 Step 6.
6) Many of the chromosomes in this example are adjacent or overlapping with another. For these, we have to use some knowledge about chromosomes and some artistry. Lets have a look at the cluster here highlighted in green.

 Step 7.
7) At this scale, chromosomes are essentially linear structures. They don't branch and they don't loop. From this we can tell the green feature in step 6 is actually three chromosomes. I cut each chromosome out of the image and pasted into a new layer. From there I could clean up their shape a little before changing the colors and recombining them.
 Step 8.

8) Going progressively through the image, isolating and coloring the most apparent chromosomes at each stage, we come to 16 chromosomes that we can be confident about. (So, our cell isn't a haploid with 12 chromosomes.)

We're left with the region at left I've highlighted in pink. This region would need to account for a further 8 chromosomes to reach the expected diploid count of 14 in total. Though there are probably a few chromosomes in this region that we can confidently separate, much of it is down to guesswork.

It is possible for this specific pepper plant to have fewer chromosomes. Though it is unlikely for a chromosome pair to be lost, since each has been conserved over a long time period and likely contains critical genes, it is common enough evolutionarily for chromosomes to fuse. That pink mess could hypothetically be 6 or 4 chromosomes, though this one image isn't sufficient evidence to make me think it is likely. If the same pattern is shown in a few more images from the same plant, especially if the chromosomes are better spread, then I'd start to consider that as increasingly likely.

For now, the balance of the evidence leads me to think there are 24 chromosomes and they're just not perfectly isolated. So, I divided the uncertain pile of chromosomes into the number that I expect are remaining. Any figure you make will invariably include your assumptions. The key is to try and make those assumptions reasonable or at least apparent to the reader (though this may require some nice caption-writing).

Interestingly, there's a protocol which can experimentally produce the sorts of painted chromosomes we're simulating here. Fluorescent In-Situ Hybridization (FISH) relies on making DNA probes which are stained a unique color for each chromosome. When the probes are applied to a chromosome spread, the result helps visualize chromosome crossovers, deletions, and other large scale alterations that can be important in diagnosing cancer and other disorders. The setup work for this is pretty intense, so it's probably not going to be used for the simple task of seeing how many chromosomes a plant has.

While I was in grad school, I routinely modified figures from papers I was reviewing for in-class (or in-lab) presentations. Usually highlighting different components of the figure in different colors (like here), to make them stand out more when displayed. I was doing the hard work of figuring out the important parts of the figures so students watching my presentation didn't have to. My goal was for them to focus on what I was saying about the figures and see what wanted them to see at a glance.

Using colors to present different partitions of a larger dataset ended up being central to my last large graduate project (YMAP) as well as an important part of my current [non-academic] job. While using colors for data presentation, it is important to keep in mind that not everyone has the same ability to see color. The most common forms of color-blindness are often called Red-Green-colorblindness. From this, it is a good idea to try and avoid the commonly used Red-Green color scheme seen so often in biology research figures. (Blue-Yellow is a good alternative, but there are subtleties I'll have to go into later.) Being conscious of the issues means they will inform your decisions, even if you're not fully aware of the topic.

This post was inspired by a conversation over on Twitter. (You can follow me there as @thebiologistisn.)

The original picture of the chromosome spread was made by @ChaoticGenetics, who gave permission for me to use it in this post.

References:

## Monday, February 5, 2018

### Speculative Biology: 3-Way Reproduction

The other day I found myself thinking about what would be the fundamental biological characteristics of a species having a system which depended on three individuals, instead of the two or one we're used to, for each reproduction event. (This is a distinctly different concept from a species having multiple sexes or genders. See references at end for examples of these.)

To simplify the discussion, I'm going to start with a big assumption that the hypothetical organism only differs from what we see here in that it has a 3-way reproduction system. It is carbon based, it has DNA organized into chromosomes, etc. Breaching that huge assumption would introduce far more variables into the discussion, when I'm only interested in the basics of sexual reproduction for this discussion.

There are different ways for Earth biology to control the sex of individuals. Sex in some species is driven by genetic differences (mammals and birds). In some it is driven by the number of chromosomes (bees and wasps). In some it is driven by temperature differences (reptiles and amphibians). In others the sex changes with age or social situations (some fishes). In large groups, there are almost always exceptions to the general patterns.

Almost all of these cases involve some chromosomes being contributed to an offspring by both a female and a male. Haplodiploidy in bees/wasps is an interesting exception. (Males grow from unfertilized eggs, while females grow from fertilized eggs.)

For our 3-way reproduction discussion, lets start by assuming each of the three individuals contributes chromosomes equally to each offspring. (Later we'll examine a more complicated case.)

We can easily enough abstract the concept of a Punnett square into higher dimensions, though it does get difficult to simply convey the results in a 2D format. In the regular version, the possibilities for a single chromosome contributed from each parent are aligned along the top and side edge of the table. For a 3D version we'll do the same, but split the contributions from the third parent into three sub-tables (left to right) with the third parent contribution at the upper-left corner of each. (I've added some color highlights to help visualize the contributing parent for each chromosome as well as the sexes of the potential offspring.)

 Punnett square (and "cube") for 2-way and 3-way crosses.

The first observation that stands out from this is the difference in predicted sex ratios of the offspring. In our 2-way system, the calculations implies a 1:1 ratio. In the 3-way system, the calculations implies a 1:2.5:1 ratio between the three sexes of offspring.

Next lets discuss something analogous to the haplodiploidy of bees/wasps, where males only contribute chromosomes to their daughters.

 Punnett square (and "cube") for 2-way and 3-way crosses, with haplodiploidy.

This shows the same possibilities for sex ratios of offspring. Since we're using bees as a model here, it's a good time to introduce the idea that a creature doesn't have to produce offspring at the ratios suggested by such simple calculations. Bees produce very few males, and only when needed for fertilization of new queens. Similarly, a hypothetical 3-way reproducing species could easily adjust the sex ratio of its offspring to be different from what the above calculations suggest.

An abstraction from Fisher's Principle (http://the-biologist-is-in.blogspot.com/2015/12/evolutionary-battle-of-sexes.html) suggests most species would evolve towards a 1:1:1 ratio between the three sexes. Cases where this wasn't the case would be interesting.

I imagine one sex evolving into an approximation of female (with a large immobile gamete), while the other two sexes evolving into an approximation of male (with smaller, more mobile gametes). It gets much more difficult to make predictions beyond this point, though a couple examples inspired from fiction and biology come to mind.

Maybe the two male-equivalents would actively court each other and then seek out the a female-equivalent together as a pair. This seems to be the pattern described for the fictional Pierson's Puppeteers (though their sexual biology is rarely detailed in the author's stories about them).

Maybe the male-equivalents would independently seek out the female-equivalent. I imagine something similar to the Deep-sea Anglerfish, with smaller male-equivalent individuals fusing to a larger female-equivalent and waiting for the opportunity to contribute to offspring when all three sexes have joined the party.

In the grand scheme of things, I expect biological systems requiring three individuals for reproduction would be rare in the cosmos. At an early evolutionary stage, any organisms which only required two partners to reproduce would probably out-compete those requiring three partners simply because it would be easier to arrange appropriate matings. This isn't to say it wouldn't happen, since all sorts of strange things happen in biology.

The Red Queen hypothesis (http://the-biologist-is-in.blogspot.com/2014/04/oxalis-and-red-queen.html) suggests why larger species don't simply have one sex. Yet, we do see this from time to time.

References

## Monday, January 29, 2018

### Astrobiology: Life on the Martian Surface

The surface conditions of Mars are, to put it politely, somewhat extreme. The air pressure, temperature, and chemical conditions at the surface would individually be fatal to our species. All three together means we could only live there with some pretty advanced technology.

The average air pressure at the surface of Mars is ~600 pascals (about 0.6% of Earth's average at sea level). At the lowest altitude of Mars, in parts of Hellas Panitia, the air is twice as thick at ~1200 pascals. This is about 3.4% of the air pressure found at the peak of Mount Everest. This is far lower than the level of air pressure at which point we require pressure suits to survive.

The temperature at the surface of Mars is... cold. The experience of the rovers we've had exploring near the equator there for the last several years give us some idea of what the temperatures are like. The good news is that temperatures routinely reach above freezing, except in winter. Temperatures that we'd consider quite comfortable are even likely during a summer day. The bad news is the temperature drops precipitously at night. A routine temperature swing of a hundred degrees Celsius would be expected. Going from 50 °F to -125 °F and back across the day-night cycle would be awkward to deal with. The further we go from the equator, the colder it is going to be.

The chemistry of the Martian surface is interesting. The thin atmosphere can't absorb UV the way ours (with its ozone layer) does, so much more UV reaches the surface (even though the sun is distant enough to reduce its intensity by almost half). That higher-energy light of UV is able to drive chemical reactions in the surface layer of soil. It can break water and generate highly oxidizing compounds, making the soil surface highly reactive to organic material. (This led to the confounding results from the Viking lander life detection experiments.)

Years of biology research on Earth, however, has shown life can prosper in a far wider range of conditions than we find comfortable.

Even if the surface of Mars was entirely uninhabitable, there can still be plenty of living organisms within Mars. Life on Earth extends to as far into the rock as we've examined, with active cells living kilometers down into the crust. If there was life on the once warmer and wetter Mars (and I'm pretty certain there was), then there is certainly life still there hiding in the depths.

How about life on the surface? People have been pondering this for a while. The air pressure, temperature, and even surface chemistry are all compatible with some potential forms of microbial life. (The reactive chemicals in the soil could be consumed by microbes to fuel their own growth.) Hypothetically, even some life forms from here on Earth might be able to survive in the nicer locations on Mars.

Some researchers have been exploring this, by sampling organisms from the colder parts of our planet and exposing them to atmosphere, pressure, and UV conditions similar to what is seen on Mars. Their work showed that some lichens could not only survive the conditions there, but would actually become used to the conditions and increase its metabolism. Over time, such lichens might be able to slowly grow and spread on Mars.

That slow growth could be their downfall. Mars periodically experiences planet-spanning dust storms. Their interval isn't known precisely, but the consequence for a slow-growing lichen on the surface could be dramatic. A photosynthetic lichen which got covered in dust for too long would cease to live.

 Valles Marineris circled. Image edited from [link].
The lichens could probably persevere on vertical or slightly overhanging rock surfaces at low altitudes near the equator. (Perhaps in Valles Marineris.) They might be able to get sufficient sunlight and be protected from dust accumulation. From that foot-hold, I suspect lichens could evolve to handle the conditions and spread further. There are a few strategies that might work.

1) They could go into stasis and wait for local winds to clear away the dust. This is the strategy used by encrusting lichens here on Earth. (This is also how the Opportunity rover has managed to keep going so long.) Too much dust would still kill them, but rock surfaces exposed to wind would probably get cleared fast enough.

2) They could grow a surface which was smooth enough to not hold on to any dust, so the winds could shift it away easier.

3) They could grow into small, pointed spikes with a surface smooth enough that dust couldn't adhere. This wouldn't save them from being overcome by a traveling dune, but it might let them survive heavier dust accumulation. During clear-sky years they'd have to spend much of their energy growing upwards.

 Dr. Ian Malcom: "Life, uh, finds a way." Dr. Me: "Unless it goes extinct first."
4) They could sidestep the issue by frequently shedding their resilient spores into the wind. This would ensure that some would land on the surface when the storm settled down. Even if existing colonies were killed with each storm season, some new colonies could be formed. Some would have to be able to grow from a spore to a size where they could make more spores in the intervals between killing dust storms.

Any combination of strategies could turn up given sufficient time, if they were able to survive in some protected niches initially. This process could take thousands of years or more. If we wanted to terraform Mars on any sort of human-scale timeframe, we'd have to be much more involved in adapting living things to live there.

References:

## Tuesday, October 31, 2017

### Sex Chromosomes of the Triturus Newt

Salamanders and newts are an interesting group of animals. They're amphibians, like the frogs and toads you're probably familiar with, but they have an elongated body form that looks more like a lizard. They're generally rather small and tend to live in places where you don't, so you probably won't see one unless you go a bit out of your way in search of them. Both salamanders and newts start out life as a submerged egg that hatches into a swimming tadpole. As they grow up, they both generally metamorphose into a form adapted for crawling around on land (though they do prefer moist places). Salamanders typically live out the rest of their lives in this stage, while newts return to an aquatic life once they reach full maturity. Adult newts develop a flattened tail that helps them swim and then they go about their quiet lives underwater.

 Fig1. Adapted from Grossen et al. and photos by @Blackmudpuppy.
Hidden within these shy critters is some really interesting biology. One group, the Triturus newts (the marbled and crested newts), have some peculiar chromosome weirdness going on that results in the death of 50% of their eggs. Evolutionarily, this is a very strange situation. You'd expect any trait that resulted in such a high rate of offspring loss would quickly disappear from a population. You definitely wouldn't expect the trait to become a permanent fixture of most species in the genus, as is observed.

 Fig 2. Triturus babies. Photos by @Blackmudpuppy
My initial thought was that maybe the dead eggs were fed upon by their surviving siblings. Kin selection could then explain why such an apparently "wasteful" trait would stick around. If the additional nutrition gained from eating a sibling resulted in at least a 2x increase in genetic fitness (survival and offspring), then the trait could be maintained by this mechanism.

The problem with this idea is that the newts lay their eggs individually, folding a bit of underwater leaf over them for protection from predators. Any given hatchling wouldn't be expected to find a separately stashed egg, so the dead eggs would probably be consumed by other organisms. If the eggs were laid in small clusters, the story might be different, but for now we have to abandon this hypothesis.

 Fig 3. Hypothetical lethal male.
The next thought I had is about the 50% loss. That specific number implies a limited set of possible genetics patterns. The first I thought of is the way our biology (generally) uses chromosomes to determine our sex. The male gametes come in X and Y versions, while the female gametes are all X. The result is that 50% of our kids are XX and 50% are XY. (There are lots of subtleties and complications to this story, but for now I'm just going to use this simple model.) If the Triturus newts were doing basically this (without the sex-determination thing the way we do it), but one version of the sperm always resulted in dead embryos... Well, that chromosome would very quickly disappear from a population. This is another hypothesis we have to abandon.

 Fig 4. Balanced lethal chr1 in Triturus.
At this point I dug up a 2012 paper by Grossen et al., which described what was going on and described a model for how it might have come to be. It turns out every one of these newts has two distinct versions of their chromosome 1 (the largest chromosome). The two versions have a region of sequence with inversions and deletions relative to the other. These differences mean the two regions don't recombine during meiosis like the comparable regions of most chromosomes. This is significant because each version of this region has a recessive lethal allele that is paired to a functional allele on the other version. (Probably some of those deletions I mentioned earlier.) If any egg is fertilized with a sperm carrying the same version of this chromosome, one of the recessive lethal traits is expressed. This results in the death of 50% of embryos, and leaves the survivors all heterozygous for chromosome 1. This is a pattern that can continue through generations.

The difficulty arises when we consider how this arrangement could come to be. If either recessive lethal allele was present without the other, then it would be selected out of the population. The chromosome version without a lethal allele would quickly dominate in the population (and we wouldn't see a 50% death rate among the babies).

The Grossen et al. paper goes into some really cool details about how sex-determination works in these newts. They have an XY sex-chromosome system kind of like ours, but they're also strongly impacted by temperature. If baby newts are raised up in water that's too warm, they'll become functionally male. If they're raised up in water that's too cold, they'll become functionally female. If the temperatures are extreme enough, they'll all grow into one sex without respect to what their sex-chromosomes look like. (This happens with lots of cold-blooded creatures, though what sex is produced at what temperature varies by species.)

In this context, they propose that the 1a/1b chromosomes that are causing so much trouble started out as two versions of an ancestral Y-chromosome. Y-chromosomes tend to collect recessive lethal mutations (deletions and such) and different lineages of Y-chromosome will end up with different mutations. If a population of ancestral Triturus newts experienced a significant cold spell, some of the chromosomally male newts would have grown up female. They could then breed with more typical males to produce offspring with two Y-chromosomes. If the two Y-chromosomes have the same mutations, the offspring would die. But if they had sufficiently different versions, they could survive. (This has been shown experimentally in a few species, as described in Haskins et al. 1970.) Grossen et al. go into some detail simulating how this initial case could lead to the chromosome dynamics now seen.

 Fig 5. Model for evolution of balanced lethal Ys.
While I was reading the Grossen et al. paper, I was thinking of a slightly different version of the model. In this version, we see a female-promoting mutation develop instead of the male-promoting one that was modeled. In the associated figures to the right, I've included Punnett squares for all the possible chromosome combinations involved in matings at each stage of the model. The color of the progeny squares represents their sex, as determined by the interaction between genetics and temperature. (Red=female; blue=male; purple=either; black=dead.)
1. The population beings with XY males and XX females. Multiple Y lineages coexist.
2. As the temperature drops, the offspring of all XY*XX crosses develop as female. Sooner or later a XY female of the newer generation meets up with an XY male from the previous generation. If the Y-chromosome versions are the same, every baby again develops as female because the double-Y babies die. If the Y-chromosome versions are different, a fourth of the babies will develop as YaYb males.
3. The older males eventually die off, leaving only YaYb males. There are still XX females around from the last generation, but all of their offspring will now be XYa or XYb.
4. The XX females eventually die off, leaving only XYa and XYb females.
5. The temperature drops a bit further. Now YaYb embryos can develop as either male or female.
6. Some YaYb females meet up with some YaYb males and the first clutch of eggs is laid that experiences a 50% chromosomal-induced fatality rate.
7. The newt population has been experiencing a major catastrophe and has dropped to very small numbers. The X chromosome carrying females die out, due to either random chance or some minor benefit the YaYb females have.
 Fig 6. Model for evolution of new sex chromosomes.
At this stage in the story, there are no further X chromosomes and half of every clutch dies due to incompatible Y chromosomes. We shouldn't really call them Y chromosomes anymore, so lets call them chromosome 1s for simplicity. The population is now in a stable evolutionary configuration, but only for as long as the temperatures remain consistent.
1. The temperature starts to rise back up and new 1a1b babies start developing as males. Some very few of the females happen to have a mutation on another chromosome (2*) that encourages female development. The offspring carrying this mutation develop female at these warmer temperatures.
2. All the older females die off, leaving only those carrying the mutation. The temperatures are still rising and some newts carrying the mutation start developing male.
3. Some males with the mutation meet up with females also carrying the mutation. The first newts with two copies of the mutation are born and develop female.
4. The temperatures are still rising. Newts born with only one copy of the mutation start developing male. The only newts that develop female have two copies of the mutation.
5. The older females die off, leaving only those with two copies of the mutation. The only new males are those born with one copy of the mutation. The older males die off, leaving only those with one copy of the mutation.
At this stage in the story, a new pair of sex chromosomes has evolved. The copy with the mutation is now an X chromosome and the copy without the mutation is now a Y chromosome. The residual chromosomes from the original sex chromosome pair are still causing 50% of babies to die in early development.

I could extend my model using a simulation-based approach similar to Grossen et al. to make a better comparison, but I don't expect I will. With complex evolutionary history like this, it might never be possible to ascertain exactly how the process unfolded. There could be many equally-probably historical scenarios that would have led to the evolution of the situation we see today. It's educational to study how these systems could potentially have evolved, even if we can't sort out the exact path they took to get where they now are.

I don't often write about published papers, but when I do I prefer to carry the discussion past where the paper ended. Discussing an alternate model is in no way an attack against the one proposed in Grossen et al.. It's an honest reflection of what I was thinking of while reading and is part of an exploration of the ideas discussed in the paper. Frankly, if their writing didn't give me any new ideas to play with, I wouldn't find it anywhere near as interesting to read. Good science answers a question. Great science answers a question and then draws you in to ask your own new questions.

This post was inspired by an interesting conversation over on Twitter. (You can follow me there as @thebiologistisn.)
The photos of the Triturus newts were loaned to me for this blog post by photographer (and newt wrangler) @Blackmudpuppy.

References:

## Saturday, August 19, 2017

### Significantly Fuzzy and Uncertain Math

I was always a very smart student, but I wasn't always a very good student. During lessons over the years, there would occasionally be little pieces that I would miss. Well, I either missed them or they simpler weren't taught. One of the earliest ones was about what the point of remainders were in doing division. I never once remembered a math teacher saying the remainder was the numerator and divisor was the denominator. When the schoolwork moved past remainders, I had to basically learn the math all over again because there was no apparent connection between what we were doing with what I had been taught before. Years later I was puzzling over what the point of that early math had been and I made the connection, filling in the gap in what I was taught. If someone is trying to teach me something and I can't integrate it into the knowledge I already have, it has always been extra difficult.

In high-school, I was taught about significant figures. Our pre-calculus teacher got in an argument with a student (not me) one day. She was adamant that, "0 was not the same as 0.000", but she didn't explain why. I always had the hardest time keeping the rules for significant figures straight during calculations. It was only in college that I finally understood that significant figures represent the level of uncertainty in a measurement. The idea that a numerical measurement was a distinct concept from the number that described the measurement was something of a novelty to me.

Those significant figures rules?
1. For addition & subtraction, the last significant figure for the calculated results should be the leftmost position of the last significant figure of all the measured numbers. Only the position of the last significant figure matters. [10.0 + 1.234 ≈ 11.2]
2. For multiplication & division, the significant figures for the calculated result should be the same as the measured number with the least significant figures. Only the number of significant figures matters. [1.234 × 2.0 ≈ 2.5]
3. For a base 10 logarithm, the result should have the same number of significant figures as the starting number in scientific notation. [log10(3.000×104) ≈ 4.4771]
4. For an exponentiation, the result should have the same number of significant figures as the fractional part of the starting number in scientific notation. [10^2.07918 ≈ 120.0]
5. Don't round to significant figures until the entire calculation is complete.

Lets see if we can convert these basic rules into something with a more statistical flavor. First we should define a way of writing uncertain numbers. lets define an example number 'x', which has a measured value of '2' and an uncertainty of ±1. If we consider the measurement to fit the Gaussian assumption, then that uncertainty would be the standard deviation.

x = (2±1)

If we add these two measurements together, with all their uncertainty, we'd expect an average value of 4 with some unknown standard deviation.

(2±1) + (2±1) = (4±[?])

We'll need to take a step back at this point. If you
If you go explore the topic of "fuzzy mathematics" on Wikipedia, you'll find some abstract discussion of set theory rather than something that seems like what we've been talking about here. If you do some searches for "fuzzy arithmetic", you'll get into a realm of math that is between the abstract set theory and something closer to what I'm looking for.

If you dig even further, you'll find Gaussian Fuzzy Numbers (GFN). This sounds very much like the sort of math I want. Two GFNs are added together to generate a new GFN in a two step process. The means of the two numbers are added to make the new mean. The standard deviations are added to make the new standard deviation. In the above notation, this would be:

(2±1) + (2±1) = (4±2)

This is a pretty straightforward rule, but it doesn't feel like it has the statistical flavor that I'm looking for.

 Method 1
How can we derive the standard deviation produced by adding two uncertain measurements? After thinking about it a bit, I thought of two methods to estimate what the value would be.

My first method basically simulates two uncertain measurements. I created a set of several thousand random samples within each initial Gaussian distribution, then iterated every possible pairwise addition between the two sets. I then calculated mean and standard deviation estimates from the set of pairwise additions. I repeated this estimation process a few thousand times and calculated the average values for the mean and standard deviation. With enough repetitions of this process, the estimates began to converge.

(2±1) + (2±1) = (3.9998±1.4146) ≈ (4±sqrt(2))

 Method 2
That approach to estimating the new standard deviation takes a lot of calculations. My second method is much more efficient and converges faster. I started with two Gaussian curves, sampled at some high density. I then iterate through every combination of one point from first and second curves. For each combination, the two x-values were added to make a new x-value. The two y-values were multiplied to make a new y-value. (The y-values are probabilities. Multiplying the two probabilities calculates the probability for both happening at once.) Plot all those x/y value pairs (in light blue at left) and the envelope (or outline, roughly) of those points (shown in red) describes the same curve we calculated more roughly with my first method. I fitted the Gaussian distribution function to this curve to get the numerical estimate for it's standard deviation.

(1±1) + (1±1) = (2±1.4142) ≈ (2±sqrt(2))

That seems a nice and simple relationship, but it is distinctly different than Gaussian Fuzzy Number calculation described previously would indicate. It took some further digging before I found a document on the topic of "propagation of uncertainties". The document included a nice table with a series of very useful relationships, describing how Gaussian uncertainties are combined by various different basic mathematical operations.

From these relationships, we can short-circuit around all the iterative calculations I've been playing with. If we have measurements with a non-Gaussian distribution, it might still be necessary to use the numerical estimation methods I came up with.

Lets compare the three methods for tracking uncertainty through calculations.

Significant figures: (1±0.5) + (1±0.5) = (2±0.5)
Gaussian fuzzy numbers: (1±0.5) + (1±0.5) = (2±1.0)
Propagation of uncertainties: (1±0.5) + (1±0.5) = (2±0.70711)

The significant figures method underestimates the uncertainty through the calculation, while the Gaussian fuzzy numbers approach overestimates the uncertainty. Both these methods do have the advantage of being simple to apply without requiring any detailed computation. However, the errors would probably accumulate through more extensive calculations. I'll have to play around with a few test cases later to illustrate this.

I didn't like significant figures when I was first taught about them. The rules struck me as somewhat arbitrary and the results didn't fit at all with my expectations of how numbers should behave. The lessons were always a stumbling point for me because of this disconnect.

Over the years since, I had occasionally played around with how to do it better. It was only recently that I figured out how to derive the solutions I described above and realized propagation of uncertainties was what I had been searching for. Those high-school lessons would have been so much more effective had they included the real math instead of assuming I couldn't handle the concepts.

References:

## Tuesday, August 1, 2017

### A Cross by Any Other Name

I've been involved in a few discussions online lately about different types of crosses that can be used in plant breeding. There has been some mild confusion about basic terms, as well as about the implications of different types of crosses. A few years ago I wrote about backcrossing. Though that post is somewhat hard for me to read, as I imagine early writings are for most authors, it has some useful information. Here I'm going to try and do a more general overview. Lets see how this little ride goes.

Some of that basic terminology and common abbreviations:
• P : Parental. An initial variety used in a cross. Multiple parents can be numbered, like in "p1 x p2".
• F : Filial, relating to progeny generations after an initial cross. F1 is the initial hybrid. F2 is the result of crossing two F1s. F3 is the result of crossing two F2s, etc.
• Self Cross : Crossing the male and female parts of the same plant.
• BC : Back cross. Crossing a filial generation back to one of the parents.
• CC : Complex cross. A cross involving more than two parents.

P : To simplify things, we usually use highly stable varieties as initial parents in a hybridization project. This means that several generations of each parent variety have been grown out without any visible variation appearing. At the basic genomic level, this means the varieties are highly homozygous. In theoretical cases we consider the parents to be absolutely homozygous, though reality is never quite so clear-cut.

F1 : Our initial hybrid between two parents can be written out in a bit longer form like "p1 x p2", or just referred to as an F1 between the two parents. In our idealized scenario, every F1 produced by crossing the same two parents will be identical. F1 stands for "first filial generation".

If a group of F1s aren't identical, this says one or both of the parents wasn't entirely homozygous. (Or new mutations were introduced, or epigenetic effects are at play, or etc. It can get complicated). Because they're (more or less) identical, selection usually isn't very important at this stage.

F2 : Our second filial generation is produced by crossing two F1s together. For those plants that can self cross (like peppers and tomatoes), the F2s would generally be produced by crossing one F1 to itself. For those that can't (like tomatillos), the F2s would be produced by crossing two separate F1 siblings.

The F2 generation is where the different alleles from each parent are recombined. Almost any combination of traits from each parent can turn up in an individual among the F2s. This is where the magic happens in a plant breeding project really happens. This generation is where selection is most important.

F3...Fn : Subsequent filial generations would be produced in a similar way to the F2s. If you produced F3s by selfing an F2, each F3 will have about 50% of the heterozygosity of the F2. Selfing another generation will result in another 50% loss of heterozygosity. Continue this process for enough generations and you will have a new stable variety, with an essentially homozygous genome.

If you produced F3s by crossing random F2s, you'll keep mixing up the genetics instead of automatically losing 50% of the heterozygosity each generation. If you do this with relatively few plants, you will still be losing heterozygosity each generation, though calculating exactly how much becomes a bit complicated.

If you produced F3s by crossing specific F2s that had a trait you liked, you'll keep mixing up all the other genetics while selecting for that specific trait. You would be losing heterozygosity near the genes responsible for the trait of interest, but the rest of the genome would still be maintaining heterozygosity through generations.

BC : In basic back crossing, each subsequent generation past F1 is crossed back to one of the parents. BC1 would be diagrammed something like, "[p1 x p2] x p1" (or "F1 x p1"). For one hypothetical mutation found in the first parent, a BC1 individual would have a 50% chance of having two copies (and a 0% chance of having no copies) since it is assured of inheriting one copy from the parental strain used in the backcross.

Through each generation of back-crossing the resulting plants will lose 50% of their heterozygosity, but it will be replaced with whatever mutations are found in the parental strain. The result will end up more and more like the recurrent parent strain over the generations. If you do this randomly, you will end up with essentially a genetic clone of the recurrent parent. To get anything different, you have to persistently select for a trait that was originally only in the second parental variety. Doing this will eventually produce something almost exactly like the recurrent parent, but with the one trait that was originally in the other parent variety. (That's all detailed in the link I mentioned in the intro.)

CC : A complex cross involves three or more parental varieties. A simple case would be taking an F1 and crossing it to an independent F1, "[p1 x p2] x [p3 x p4]". In these scenarios you would get a very diverse population, just like with F2s, but the mutations contributed to the population can come from all four parent varieties.

A mutation that was found in only one of the parental strains would only be found in one copy in 25% of this mixed up population. If one of these plants was selfed, the chance of a plant being homozygous in the next generation is 6.25%.
If the plants were allowed to cross randomly, the chance of a plant being homozygous in the next generation drops to only 1.5625%. You would need to be working with very large numbers of plants to routinely recover double-recessives using this strategy. I strongly advise you not use this strategy.

References: