// Twitter Cards // Prexisting Head The Biologist Is In

Saturday, August 19, 2017

Significantly Fuzzy and Uncertain Math

I was always a very smart student, but I wasn't always a very good student. During lessons over the years, there would occasionally be little pieces that I would miss. Well, I either missed them or they simpler weren't taught. One of the earliest ones was about what the point of remainders were in doing division. I never once remembered a math teacher saying the remainder was the numerator and divisor was the denominator. When the schoolwork moved past remainders, I had to basically learn the math all over again because there was no apparent connection between what we were doing with what I had been taught before. Years later I was puzzling over what the point of that early math had been and I made the connection, filling in the gap in what I was taught. If someone is trying to teach me something and I can't integrate it into the knowledge I already have, it has always been extra difficult.

In high-school, I was taught about significant figures. Our pre-calculus teacher got in an argument with a student (not me) one day. She was adamant that, "0 was not the same as 0.000", but she didn't explain why. I always had the hardest time keeping the rules for significant figures straight during calculations. It was only in college that I finally understood that significant figures represent the level of uncertainty in a measurement. The idea that a numerical measurement was a distinct concept from the number that described the measurement was something of a novelty to me.

Those significant figures rules?
  1. For addition & subtraction, the last significant figure for the calculated results should be the leftmost position of the last significant figure of all the measured numbers. Only the position of the last significant figure matters. [10.0 + 1.234 ≈ 11.2]
  2. For multiplication & division, the significant figures for the calculated result should be the same as the measured number with the least significant figures. Only the number of significant figures matters. [1.234 × 2.0 ≈ 2.5]
  3. For a base 10 logarithm, the result should have the same number of significant figures as the starting number in scientific notation. [log10(3.000×104) ≈ 4.4771]
  4. For an exponentiation, the result should have the same number of significant figures as the fractional part of the starting number in scientific notation. [10^2.07918 ≈ 120.0]
  5. Don't round to significant figures until the entire calculation is complete.

Lets see if we can convert these basic rules into something with a more statistical flavor. First we should define a way of writing uncertain numbers. lets define an example number 'x', which has a measured value of '2' and an uncertainty of ±1. If we consider the measurement to fit the Gaussian assumption, then that uncertainty would be the standard deviation.

x = (2±1)

If we add these two measurements together, with all their uncertainty, we'd expect an average value of 4 with some unknown standard deviation.

(2±1) + (2±1) = (4±[?])

[from link.]
We'll need to take a step back at this point. If you
If you go explore the topic of "fuzzy mathematics" on Wikipedia, you'll find some abstract discussion of set theory rather than something that seems like what we've been talking about here. If you do some searches for "fuzzy arithmetic", you'll get into a realm of math that is between the abstract set theory and something closer to what I'm looking for.

If you dig even further, you'll find Gaussian Fuzzy Numbers (GFN). This sounds very much like the sort of math I want. Two GFNs are added together to generate a new GFN in a two step process. The means of the two numbers are added to make the new mean. The standard deviations are added to make the new standard deviation. In the above notation, this would be:

(2±1) + (2±1) = (4±2)

This is a pretty straightforward rule, but it doesn't feel like it has the statistical flavor that I'm looking for.

Method 1
How can we derive the standard deviation produced by adding two uncertain measurements? After thinking about it a bit, I thought of two methods to estimate what the value would be.

My first method basically simulates two uncertain measurements. I created a set of several thousand random samples within each initial Gaussian distribution, then iterated every possible pairwise addition between the two sets. I then calculated mean and standard deviation estimates from the set of pairwise additions. I repeated this estimation process a few thousand times and calculated the average values for the mean and standard deviation. With enough repetitions of this process, the estimates began to converge.

(2±1) + (2±1) = (3.9998±1.4146) ≈ (4±sqrt(2))

Method 2
That approach to estimating the new standard deviation takes a lot of calculations. My second method is much more efficient and converges faster. I started with two Gaussian curves, sampled at some high density. I then iterate through every combination of one point from first and second curves. For each combination, the two x-values were added to make a new x-value. The two y-values were multiplied to make a new y-value. (The y-values are probabilities. Multiplying the two probabilities calculates the probability for both happening at once.) Plot all those x/y value pairs (in light blue at left) and the envelope (or outline, roughly) of those points (shown in red) describes the same curve we calculated more roughly with my first method. I fitted the Gaussian distribution function to this curve to get the numerical estimate for it's standard deviation.

(1±1) + (1±1) = (2±1.4142) ≈ (2±sqrt(2))

That seems a nice and simple relationship, but it is distinctly different than Gaussian Fuzzy Number calculation described previously would indicate. It took some further digging before I found a document on the topic of "propagation of uncertainties". The document included a nice table with a series of very useful relationships, describing how Gaussian uncertainties are combined by various different basic mathematical operations.

From these relationships, we can short-circuit around all the iterative calculations I've been playing with. If we have measurements with a non-Gaussian distribution, it might still be necessary to use the numerical estimation methods I came up with.

Lets compare the three methods for tracking uncertainty through calculations.

Significant figures: (1±0.5) + (1±0.5) = (2±0.5)
Gaussian fuzzy numbers: (1±0.5) + (1±0.5) = (2±1.0)
Propagation of uncertainties: (1±0.5) + (1±0.5) = (2±0.70711)

The significant figures method underestimates the uncertainty through the calculation, while the Gaussian fuzzy numbers approach overestimates the uncertainty. Both these methods do have the advantage of being simple to apply without requiring any detailed computation. However, the errors would probably accumulate through more extensive calculations. I'll have to play around with a few test cases later to illustrate this.

I didn't like significant figures when I was first taught about them. The rules struck me as somewhat arbitrary and the results didn't fit at all with my expectations of how numbers should behave. The lessons were always a stumbling point for me because of this disconnect.

Over the years since, I had occasionally played around with how to do it better. It was only recently that I figured out how to derive the solutions I described above and realized propagation of uncertainties was what I had been searching for. Those high-school lessons would have been so much more effective had they included the real math instead of assuming I couldn't handle the concepts.


Tuesday, August 1, 2017

A Cross by Any Other Name

From [link].
I've been involved in a few discussions online lately about different types of crosses that can be used in plant breeding. There has been some mild confusion about basic terms, as well as about the implications of different types of crosses. A few years ago I wrote about backcrossing. Though that post is somewhat hard for me to read, as I imagine early writings are for most authors, it has some useful information. Here I'm going to try and do a more general overview. Lets see how this little ride goes.

Some of that basic terminology and common abbreviations:
  • P : Parental. An initial variety used in a cross. Multiple parents can be numbered, like in "p1 x p2".
  • F : Filial, relating to progeny generations after an initial cross. F1 is the initial hybrid. F2 is the result of crossing two F1s. F3 is the result of crossing two F2s, etc.
  • Self Cross : Crossing the male and female parts of the same plant.
  • BC : Back cross. Crossing a filial generation back to one of the parents.
  • CC : Complex cross. A cross involving more than two parents.

P : To simplify things, we usually use highly stable varieties as initial parents in a hybridization project. This means that several generations of each parent variety have been grown out without any visible variation appearing. At the basic genomic level, this means the varieties are highly homozygous. In theoretical cases we consider the parents to be absolutely homozygous, though reality is never quite so clear-cut.

F1 : Our initial hybrid between two parents can be written out in a bit longer form like "p1 x p2", or just referred to as an F1 between the two parents. In our idealized scenario, every F1 produced by crossing the same two parents will be identical. F1 stands for "first filial generation".

If a group of F1s aren't identical, this says one or both of the parents wasn't entirely homozygous. (Or new mutations were introduced, or epigenetic effects are at play, or etc. It can get complicated). Because they're (more or less) identical, selection usually isn't very important at this stage.

From [link].
F2 : Our second filial generation is produced by crossing two F1s together. For those plants that can self cross (like peppers and tomatoes), the F2s would generally be produced by crossing one F1 to itself. For those that can't (like tomatillos), the F2s would be produced by crossing two separate F1 siblings.

The F2 generation is where the different alleles from each parent are recombined. Almost any combination of traits from each parent can turn up in an individual among the F2s. This is where the magic happens in a plant breeding project really happens. This generation is where selection is most important.

F3...Fn : Subsequent filial generations would be produced in a similar way to the F2s. If you produced F3s by selfing an F2, each F3 will have about 50% of the heterozygosity of the F2. Selfing another generation will result in another 50% loss of heterozygosity. Continue this process for enough generations and you will have a new stable variety, with an essentially homozygous genome.

If you produced F3s by crossing random F2s, you'll keep mixing up the genetics instead of automatically losing 50% of the heterozygosity each generation. If you do this with relatively few plants, you will still be losing heterozygosity each generation, though calculating exactly how much becomes a bit complicated.

If you produced F3s by crossing specific F2s that had a trait you liked, you'll keep mixing up all the other genetics while selecting for that specific trait. You would be losing heterozygosity near the genes responsible for the trait of interest, but the rest of the genome would still be maintaining heterozygosity through generations.

BC : In basic back crossing, each subsequent generation past F1 is crossed back to one of the parents. BC1 would be diagrammed something like, "[p1 x p2] x p1" (or "F1 x p1"). For one hypothetical mutation found in the first parent, a BC1 individual would have a 50% chance of having two copies (and a 0% chance of having no copies) since it is assured of inheriting one copy from the parental strain used in the backcross.

Through each generation of back-crossing the resulting plants will lose 50% of their heterozygosity, but it will be replaced with whatever mutations are found in the parental strain. The result will end up more and more like the recurrent parent strain over the generations. If you do this randomly, you will end up with essentially a genetic clone of the recurrent parent. To get anything different, you have to persistently select for a trait that was originally only in the second parental variety. Doing this will eventually produce something almost exactly like the recurrent parent, but with the one trait that was originally in the other parent variety. (That's all detailed in the link I mentioned in the intro.)

CC : A complex cross involves three or more parental varieties. A simple case would be taking an F1 and crossing it to an independent F1, "[p1 x p2] x [p3 x p4]". In these scenarios you would get a very diverse population, just like with F2s, but the mutations contributed to the population can come from all four parent varieties.

A mutation that was found in only one of the parental strains would only be found in one copy in 25% of this mixed up population. If one of these plants was selfed, the chance of a plant being homozygous in the next generation is 6.25%.
If the plants were allowed to cross randomly, the chance of a plant being homozygous in the next generation drops to only 1.5625%. You would need to be working with very large numbers of plants to routinely recover double-recessives using this strategy. I strongly advise you not use this strategy.


Wednesday, July 19, 2017


For a while I kept up with the tempo of writing one blog post a week. Occasionally I'd pull ahead and have a few posts written and queued up to be automatically published. Occasionally I'd fall behind and go a few weeks between posts. I haven't been writing at all for a while now, since pretty much exactly when I started using Twitter. I post there as @thebiologistisn. (Twitter handles are limited to fifteen characters, so I had to make do.) I only have a certain amount of time in a day to play online and lately that hasn't been writing for the blog.

I've been accumulating ideas and half-baked concepts for posts, but I just haven't found the motivation to sit for the few to several hours it takes to write a full post. It doesn't help that my after-work time has been pretty full with house and yard tasks.

Two years ago I built an effective deer fence. It kept them out and let me garden in peace. Last year our vegetable gardens were nearly wiped out by rabbits, that ran right through the deer-fence. They hadn't been an issue the year before, probably because we had a family of Cooper's Hawks in the yard to keep them under control. We had lots of rabbits around this year, so I couldn't plant anything they would eat until I had built some fences that would keep them out.

The garden at right got its fence done first. I then planted a nice patch of carrots and strawberries. The garden already had onions and siberian irises I'd planted the fall before. Rabbits don't like those, so those plants survived even without protection. The onions are potato onions I grew from seed. Two of the seedlings thrived (at left), while several others either died through the winter or didn't thrive this year. The carrots are all from breeding projects. The near half are the third generation plants and the far half are second generation plants.

The rabbit fence for the second garden took a while longer to get built. It is now populated with a diverse collection of tomatoes from various breeding projects I'm working on. Theres also a small group of tomatilloes that I've been selecting for intense purple pigment. All these plants were put in the ground much later than would be ideal, so hopefully they will mature sufficiently to produce fruit this season. Next year I won't have to build fences, so things will get moving sooner.

One of the central rules I started this blog with was that I wouldn't write about my job. While I was in grad school, I didn't talk about my research. Since I've been out of grad school, I haven't talked about whatever work it is I'm doing now. This rule was intended to make it clear this blog is entirely my own and doesn't represent anyone else or any organization.

Now that I've got some more free time, there might just be blog posts coming at a slightly higher rate. Since I'm no longer in grad school, nor working in academia, I will probably start to have some posts about computational biology projects I've been working on. I might even have a few posts by guest bloggers. We shall see how this goes.

Friday, May 26, 2017

Return of the Sunflower

A few years ago I made a cross between a sunflower (var. "Russian Mammoth") and a sunchoke (probably var. "Stampede"). My goal was to eventually get a plant that produced tubers like the sunchoke, but was super-charged with the giant growth of the sunflower. The two species have different chromosome counts, which complicates things a bit.

That first cross resulted in three F1 plants. I should have grown more, but it didn't work out that way. One of the plants grew to 10ft tall, with relatively large flowers, while the other two looked more or less like the sunchoke parent. I moved by the end of this season, but I was able to go back and recover root material from the plants. I then stored the roots in a fridge over the first winter and planted them at our new place when spring came around. Only the tubers produced from the largest plant survived this process.

Our current place has routine visitations by deer who seem to find sunflower leaves delectable. The first several shoots produced by the tubers were neatly trimmed to the ground. When I finally put together some protection for them, they succeeded in sending up one final shoot. That shoot topped out at about 2ft tall and produced a single small flower. This was a far-cry from the 10ft skyscraper the tuber had come from. I was so disappointed that I didn't even take any pictures of the little plant. I assumed the repeated early-season trimmings had dwarfed it and hoped it would do better the following year.

I was better prepared for 2016. I made a 7ft tall chicken-wire cylinder to place around the growing plant. This kept the plant protected for most of the season. The cylinder was knocked over a couple times (either by storms or aggressive deer), but this only exposed the lower leaves to the hungry mammals. The plant grew to about as tall as the cage and bloomed, still well short of the 10 ft of the first year. Though the plant branched much more than the first year, it did so much less than the sunchoke parent.

Our new yard has lots of animal traffic besides the deer. Turkeys are commonly seen in the neighborhood, though we don't often see them in our yard.

I'm hoping the plant will do even better in 2017. Three clusters of new shoots have been coming up near where the plant was last year and I have a slightly wider protective cage put in place. The shoots are more widely spaced than they were last year. I'm hoping this means each individual stem will be larger/taller.

The plant produced abundant seeds in the first year, but has since then produced absolutely none. I'm pretty sure this means the plant inherited the self-incompatibility mechanism from its sunchoke mother. The first year there was plenty of pollen from its siblings, but there has been none the last couple of years.

There may be a few oilseed sunflowers around this year to contribute pollen, grown from scattered birdseed. My perennial sunflower is tetraploid, so any crosses to these diploids would produce highly sterile triploid offspring. These might be interesting to grow, but they wouldn't contribute to my overall goal.

For my breeding goals to move forward, I'll need to produce more tetraploid F1s by crossing sunchoke and another giant sunflower. I don't have any sunchoke planted right now, so I probably won't be able to get flowers even if I did find some tubers soon. There are several giant sunflower varieties I could use to help make more F1s, but I'll probably keep using "Russian Mammoth" to simplify the overall genetics of the future F2s.


Sunday, May 21, 2017

Calculations in the Woods

A. tricoccum in local woods.
Wild foods are available most times of the year in Minnesota, but one species that attracts the most interest in spring is Allium tricoccum (known as "Ramps" or "Wild Leeks"). This slow growing plant is a close relative of onions/chives that are routinely available and has a similar flavor, though aficionados will argue it has a flavor all of its own. Ramps are distinct from the commonly available onion types in that it grows broad and flat leaves, in addition to their habit of growing in the moist shade of wooded areas.

Over-harvesting of A. tricoccum has led to the species disappearing from many areas where they used to be common. The plants grow very slowly, taking several years to grow from seed to a mature plant. The plants are also sensitive to physical disruption because their fragile roots grow close to the surface. If all the plants in an area are pulled out (or accidentally killed), then it could be decades before some seeds find their way back and start towards reestablishing a population.

At this time of year, the local foraging groups are filled with people posting pictures of their (often outrageous) harvests as well as people responding with ideas about sustainable practices of harvest. Advice to, "take no more than half" or, "only take 10%" are pretty common. There doesn't seem to be any standard number. I think some mathematical analysis can maybe help clarify what might be a good rule.

[1] Lets start with a very simple model. We have a population of plants and a whole bunch of people interested in harvesting them.

If everyone harvests 1/2 of the plants...

\(\lim \limits_{n\to\infty} \frac{1}{2}^n = 0\)

...or 1/4 of the plants (thus 3/4 remain after each person harvests)...

\(\lim \limits_{n\to\infty} \frac{3}{4}^n = 0\)

...then the population still dwindles towards extinction.

In this simplified model it doesn't matter what fraction each person takes, the population will always dwindle away towards extinction. This isn't realistic, since we didn't factor in the ability of the plants to reproduce.

[2] A slightly more complicated (and realistic) model factors in how fast the plant is able to replicate itself. Lets assume a fraction of of the adult plants are able to produce another adult plant each year. This is still a pretty big simplifying (and highly optimistic, since it is quite biologically wrong) assumption, but it's a starting point to work from. Lets start by defining some terms.

R_y & \text{Population of Ramps in year 'y'.} \\
r_i & \text{Total increase rate per year.} \\
r_h & \text{Total harvest rate per year.} \\

The population of next year is calculated from the current year population and the total rate of increase.

\(R_y(1+r_i) = R_{y+1} \)

Then we add in a term for losses due to people harvesting a percentage of the plants.

\(R_y(1+r_i)(1-r_h) = R_{y+1} \)

If we want the population to remain stable over time...

\(R_y = R_{y+1} \)

\(R_y(1+r_i)(1-r_h) = R_{y+1} \)
\((1+r_i)(1-r_h) = \frac{R_{y+1}}{R_y} \)
\((1+r_i)(1-r_h) = 1 \)
\(1-r_h = \frac{1}{1+r_i} \)
\(r_h = 1-\frac{1}{1+r_i} \)

...and we assume a third of the plants produce a second plant each year,

\(r_i = \frac{1}{3}\)

\(r_h = 1-\frac{1}{1+\frac{1}{3}} \)
\(r_h = 1-\frac{1}{\frac{4}{3}} \)
\(r_h = 1-\frac{3}{4} \)
\(r_h = \frac{1}{4} \)

...then a cumulative total of 25% of the plants could be harvested each year. If any more were harvested, then the population would be declining like in our first model.

Remember, this is the cumulative total harvest rate. This could be just one person harvesting Ramps, or it could be several people harvesting separately through the season. If two or more people come across the patch and decide to harvest some, then they would have to harvest less than the 25% we calculated and still have the population remain stable. We have to define some new terms...

n & \text{Number of people harvesting in a year.} \\
r_{hi} & \text{Harvest rate per individual per year.} \\

The relationship between the number of individuals harvesting and the cumulative total harvest rate is pretty simple.

\((1-r_{hi})^n = (1-r_h) \)

{n} & {r_{hi} = 1-\sqrt[n]{\frac{3}{4}}} \\
\hline \\
{1} & {r_{hi} = 1-\frac{3}{4}} = 0.25 \\
{2} & {r_{hi} = 1-\sqrt{\frac{3}{4}}} \approx 0.13397 \\
{3} & {r_{hi} = 1-\sqrt[3]{\frac{3}{4}}} \approx 0.09144 \\
{4} & {r_{hi} = 1-\sqrt[4]{\frac{3}{4}}} \approx 0.06940 \\
{5} & {r_{hi} = 1-\sqrt[5]{\frac{3}{4}}} \approx 0.05591 \\
{\vdots} & {\vdots} \\
{10} & {r_{hi} = 1-\sqrt[10]{\frac{3}{4}}} \approx 0.02836 \\
{\vdots} & {\vdots} \\
{100} & {r_{hi} = 1-\sqrt[100]{\frac{3}{4}}} \approx 0.00287 \\

The main lesson we can take from this second model is the more people that have access to a patch of Ramps, the smaller the fraction each person can harvest for the population to remain sustainable.

From link.

[3] Mathematically, a more ideal model would be somewhere between the discrete series function I used above and a set of continuous differential equations expressing the same concepts as well as accounting for stochasticity in the rates. Biologically, a more ideal model would include each life stage shown in the figure at right (encompassing sexual and vegetative reproduction) as well as realistic rates for each step.

It would be a relatively simple task to construct this sort of more detailed model, but properly determining all the rates would require extensive (presumably years-long) fieldwork. Thus, I'll leave this as an exercise for the reader.

Even though the models we discussed here are incomplete, they are informative. The big lesson is that the harvesting of Ramps from publicly accessible places is a nice example of a tragedy of the commons. There really isn't a harvesting percentage that can be used as a rule of thumb to tell people in the various forums.

If you have a large patch on your own land, then you can probably harvest a decent amount each year and the patch will never be at risk. Our hypothetical model [3] above might be able to tell us precisely how much of a population could be sustainably harvested, but without all the additional information it isn't worth worrying over. You can simply pay attention to how much you harvest and notice if the patch is dwindling or not from year to year. As it is your own patch, which you find valuable, you will adjust your personal harvest rate to allow the patch to prosper.

Is there anything we can encourage foragers to do, aside from simply advising them to leave the plants alone? If you harvest only one leaf from each mature plant (never the last leaf, or from small plants), without disturbing the bulb and roots, then the plants will survive and spread each year. If everyone followed this rule, large patches of Ramps could be maintained in woodlands close to or even within large cities. Convincing people to do this will be a difficult task.


Tuesday, April 4, 2017

Pepper Permutations

Whenever I interact with a plant breeder, I first want to know what their goals are with their projects. Then I want to know about how they're approaching the problem and what results they've had so far. There are other interesting conversations to be had with breeders (What got you started in plant breeding? What got you interested in this crop? etc.), but these are the ones I keep asking when a breeder mentions their projects.

I've got a few pepper breeding projects I'm working on. I figured I'd answer my own questions regarding them, though I may ramble on a bit.

Upright fruit on plant from 2nd generation.
Upper-right: Overlay of fruit ripening brown.
My first pepper breeding project started with a simple observation. I was growing a batch of plants from seed I had saved from a tasty small brown bell pepper I found at the grocer. One of the plants had fruit which pointed upwards. I decided I liked the look of the plant, so it was to only one I saved seed from.

The next year, I grew several plants and only found 2 of the 7 had the upright fruit posture I liked. I was disappointed, but this told be something important about the genetics of the trait. Since the female parent had the trait, but not all of the kids did, the trait didn't have a dominant inheritance. This is consistent with the description of the trait in a review of published research, indicating there are two recessive genes that interact to produce the trait.

At the end of the last growing season, I dug up the plants for this project and moved them into my basement under lights. This lets me ensure the next batch of seeds will only be from selfing the selected plants. This should help ensure the next generation of plants will all have the upright fruit trait.

This project was pretty simple to plan and is moving forward nicely. I'm hoping the plants I grow out this year will show the project is essentially complete. 

I have other projects that aren't so simple, in that they will require one or more directed hybridizations. These projects will take several years to accomplish, that is if I ever complete them.

What are my breeding goals?
  1. White Habanero: A Habanero with very little of any pigment, so appear "white".
  2. Black Habanero: A Habanero with high levels of anthocyanins, so appear "black".
  3. Fancy Jalapeno: A Jalapeno which ripens to red with brown stripes; bonus points for having dark purple marks too when ripe.
  4. Floral peppers: Arbitrary fruit, with large and colorful flowers. Ideally with flowers presented above leaves.
How am I approaching these goals?
  1. White Habanero: A review of the
    [+;c1;+] and [y;+;c2]
    primary color genes for peppers (in an earlier post) indicates I can get a "white" chile with the genotype [y/y;c1/c1;c2/c2]. I have pale-orange (genotype [+/+;c1/c1;+/+]) and yellow (genotype [y/y;+/+;c2/c2]) habanero varieties. They both have the shape I'm looking for, so it is just the color genes I'll have to worry about. The F1 formed by crossing the two strains will be red and have the genotype [+/y;+/c1;+/c2]. Among the F2s, 1/64 should be homozygous for all three recessive traits needed to make a "white" chile.
    • All the other red/orange/yellow colors should also turn up in the F2s, so this will be an interesting cross to play with.
    • There is a variety called "White Habanero", but it doesn't have the shape I think of when I think of a Habanero pepper.
    • I've even thought of a name for the final variety: "Pale Horse".
  2. Black Habanero: I already have a "black" chile called "Pimenta da Neyde" (genotype [A;MoA;an]). It doesn't have the habanero shape, but I can cross it to a habanero for those traits. The boxy shape is dominant to the elongated shape, so the F1 should be boxy (and have some arbitrary color). Among the F2s, 1/64 should be homozygous for the three recessive traits needed to make the "black" color. 3/4s should have the boxy habanero shape, with 1/3 of those being homozygous for the dominant trait. All together, 1/192 of the F2s should have the ideal combination of traits.
    1. Depending on what color genes are hidden beneath the black of "Pimenta da Neyde", as well as what traits are brought to the party by the habanero, lots of other colors are likely to turn up in the F2s.
    2. There is also a variety called "Black Habanero", but it has neither the black color or the shape I'm looking for.
    3. I've also thought of a name for the final variety: "Black Death".
  3. Fancy Jalapeno: his concept will
    Hypothetical ripe and immature.
    take combining traits from several varieties. "Fish" has a recessive striped trait. I have a nice Jalapeno with a black top when unripe (which is probably related to sun exposure). I have a bell pepper which ripens brown because the chlorophyll isn't degraded upon ripening. "Pimenta da Neyde" has the trait to retain anthocyanin when the fruit is ripe. Getting all these traits, due to at least 6 different genes, into one plant is going to be a challenge. My plan so far is to make crosses between two pairs of the four strains. Once I've selected F2s from each cross that have their parents' traits, I'll then cross them, then select among the new F2s.
    1. I don't yet have a name in mind to go with this project.
    2. I may give up before this project is done. We shall see.
    Floral variations.
  4. Floral Peppers: I have a bell pepper line with very large, white flowers. I have a pepper with relatively large intense-magenta flowers. I also have a pepper with relatively tiny, but greenish-yellow flowers. The plan is to cross each of the two colored-flower chiles to the bell pepper, then screen the resulting F2s for larger flowers with improved color. The fruit characteristics are entirely arbitrary.
What have I accomplished towards these goals?
  1. Left: Chile with magenta flowers crossed to bell.
    Right: Chile with greenish flowers crossed to bell.
    White Habanero: I have a mature habanero plant with yellow fruit. I am about to plant seeds for the one pale-orange fruit. Later in the season I'll be able to do the initial cross.
  2. Black Habanero: I have a mature habanero. I have several plants of "Pimenta da Neyde", but they're all very slow growers and have yet to flower. Hopefully later in the season I'll be able to do the initial cross with them.
  3. Fancy Jalapeno: I have got basically nothing done towards this goal. I have all the seeds and will be planting them shortly.
  4. Floral Peppers: I've made each of the initial crosses between the chiles with colored flowers and the bell pepper with extra-large white flowers.

It took me a while to figure out what sort of peppers I might want to breed. I've got a few other projects in mind, but I figured this post was getting rather long already.

It helps to keep an open mind when working on breeding projects, since you will find variations and combinations of traits that you didn't expect from the outset. So long as you're doing the breeding project for your own purposes (as I am), there is no harm in changing direction along the way. I expect the F2s from the habanero crossed to the bell pepper to be all sorts of interesting. Even if my ideas about floral peppers never comes to fruition, something useful will come out of it.

Among pepper plants I've grown, I've already found a couple really interesting traits that have caused me to change course. I'm not willing to discuss them publicly yet. I'm still trying to decide if they're the sort of thing that would have value to a larger market, so to speak.


Thursday, February 23, 2017

Biology of the Enjoya Pepper

"Enjoya" pepper; marketing
photo from the TwitterVerse.
A few years ago a new pepper turned up in markets of Europe and then in the USA (and elsewhere). The bell peppers were a dramatic yellow splashed with red flames and were sold as "Enjoya" or "Flame" peppers.

There was no information available about the genetics of the trait, as there had been no academic literature published on the new variety. Gardeners with the habit of growing their own plants from seed took this as a challenge. People around the globe independently said, "Can I can grow seeds from that pepper and get striped fruit in my garden?" Seeds were collected by those who found the peppers in their grocers and then shared via online forums to those who had not yet found them. Soon after, there were many little green seedlings being tended to around the world.
Typical flowers and fruit.

Months later, the first reports on the plants started coming in. The plants were producing large bell peppers, but they were all ripening yellow. (I have reports of 11 plants maturing to produce yellow fruit.) As these reports were posted to the forums, interest in the plants waned. (Dreams of crossing the trait into jalapenos and other hot peppers quietly died.) If the amazing red flames weren't going to reappear, then why would anyone want to be growing these plants?

Where did these peppers come from?

The marketing site for the pepper says:
Now, 30 years later, nature has once again surprised us with a natural variation: the red/yellow striped pepper. In 2013, Wilfred van den Berg found this beautiful variety in his greenhouse in Est.
But the US patent applied for the pepper says:
[0011] `E20B3751` was discovered in a screening trial of mutants of pepper variety `Maduro` conducted at Est, Netherlands. The mutant `E20B3751` was selected based on its vertical red and yellow stripes color and propagated vegetatively (i.e., asexually).
I strongly suspect those responsible for writing the marketing site didn't want to say the variety was the result of a mutation breeding project in a high-tech lab, as such things tend to get a lot of people suspicious about their foods. This is only a slight fib, since the mutated variety is a variation of the natural pepper.

What draws my attention more is that the patent doesn't say anything at all about how the pepper plant was produced (aside from the general concept of a mutagenesis screen). The entirety of the patent starting on line [0046] is simply a rehashing of general plant biology and breeding. None of that tells us anything at all about the origin of the striped peppers. This is strongly counter to the basic idea of what patents are supposed to be. The earlier paragraphs of the patent do give a concise description of what the pepper is, as well as a listing of specific traits associated with it, so it isn't entirely a useless document.

Since there isn't any academic research published on the pepper and neither the patent or marketing information provide any biological details, we're going to have to see what we can figure out from basic principles.

Mutations in genes typically produce traits which are either dominant or recessive. (There are a few other scenarios, but we're not going to worry about them for now.) If the striped trait is recessive, then essentially all of the next generation would also have the trait.

If the striped trait was dominant, then [with perfect selfing] the next generation might all have the trait, but there are other scenarios. If the Enjoya pepper plant (remember, from the patent they are propagated assexually and so are all from the same genetic plant) was heterozygous for the dominant trait, then half of the next generation would remain heterozygous and have the trait. Another quarter would be homozygous for the no-stripes trait and the remaining plants would be homozygous for the striped trait. Dominant traits can sometimes also have recessive lethal characteristic, though it is rare. All together, at the very least 66.6% of the next generation should have stripes if the trait was due to a dominant nuclear mutation.

In either scenario, we should have the majority of the next generation with stripes. What do we see? Between my plants and those reported by other growers, we have 16 plants that have ripened fruit. All of which matured to yellow with no red stripes. This would be a very unexpected result for either model discussed above.

Meristem figure from Wikipedia.
There is another scenario that might be important. A growing meristem of a plant include multiple tissue layers which replicate independently. A mutation in one layer generally won't transfer to the other layers. As the plant grows, the mutated and non-mutated tissues will be maintained separately. As leaves or other organs develop, the different meristem layers contribute to different parts and so would result in visible variegation if the mutation had a visible impact.

Photo cropped from one at link.
After looking around a bit, I found a photo which might provide some clarity to the situation. In the cropped close-up at right, it is clear that all the seeds are attached directly to yellow tissue. There is red tissue in the core of the seed mass, but none at the surface where the eggs (and then seeds) developed.

It looks like some of the red core cells are able to migrate to the surface of the fruit during early development. This results in the red stripes as the fruit then expands in size.

Since the red color is carried in tissue which isn't made into eggs or seeds, it appears unlikely that the seed-grown progeny of an Enjoya pepper would produce red or striped fruit.

Sorry folks, I think the game is up. We probably won't be able to breed flame-colored jalapenos. At least we've learned something about the biology of these peppers.

That the striped trait can't be passed down through seeds tells us something about the experiments which led to the Enjoya pepper. The patent indicates it came from a mutagenesis experiment, but gives no details. One of the easiest ways to do it would have been to soak a large batch of seeds in a chemical mutagen (like EMS) and then grow them out after treatment. EMS is relatively easy to work with and it would produce point mutations all over the nuclear and cytoplasmic genomes. I bet when that first plant matured its first fruit, there were amazed expressions all around.

The classical story of pepper color genetics (described at the-biologist-is-in.blogspot.ca/2015/11/the-color-of-peppers-2.html) suggests it would take two separate mutations to produce the rich yellow color seen in the Enjoya pepper. However, there are a lot of mutations which impact pepper color that don't really seem to fit the classical story. I strongly suspect the visible difference between the red and yellow fruit tissues is down to one mutation.

However, EMS is not something that would be used to make a single point mutation. It would instead create hundreds or thousands of point mutations per seed in this sort of mutagenesis experiment. Selection of the resulting progeny, as well as backcrossing to the parent type, would normally be used to clean up any unwanted deletarious mutations... but the striped trait would not have survived this process.

This means that the genome of the Enjoya pepper is probably chock-full of other potentially interesting mutations. Many of those mutations will be recessive and so only become visible in the second generation after treatment. The plants we've been growing from saved seed represent that second generation (referred to in shorthand as M2).

Enjoya-M2 with a transient anthocyanin shoulder.
One of my seven M2 plants produced a dark shoulder of anthocyanin pigments on the unripe fruit. These anthocyanins were later broken down as the fruit matured to its [now] expected yellow. Dark shoulders are pretty common in peppers, so I'm still trying to decide if I want to save any seeds from this plant.

Enjoya-M2 with color-marked flowers.
Interesting stripes on the unripe fruit.
Another of my seven plants produced flowers with distinctive purple highlights. The fruit on this plant later showed a distinctive green striping on the shoulder while unripe. (The fruit of every other plant was solidly dark green.) I'm still expecting this one to mature to a solid yellow, but there remains the slim chance that a red cell fought its way into the seed. (The pepper has since ripened to the expected yellow.)

Two of my seven plants produced distinctively different plants. This suggests there are indeed numerous hidden recessive mutations in the Enjoya pepper. The relatively large fruit I've been getting from these plants and the potential to find other novelty mutations means I'll probably be growing quite a few of these M2 plants in the coming years.