Model Recapitation and Mutation¶
As introduced in the previous tutorial, shadie
utilizes recapitation to increase the efficiency of complex simulations that use alternation-of-generations models. Without it, these models are not as feasible to run.
This means that, by deafult, shadie
will run only the selected part of a simulation with SLiM when you use the function model.run()
, and then the resulting model.trees
file will need to be recapitated and mutated before the simulation is truly complete.
The use of alternation-of-generations (altgen) models can complicate the recapitation process because time in SLiM and time in msprime, already confusing at the best of times, is even more mismatched by the altgen model architecture.
import shadie
import pyslim
print("pyslim", pyslim.__version__)
pyslim 1.0.4
#set up and run a toy model
#make the chromosome
chrom = shadie.chromosome.explicit({
(0, 200_000): shadie.NONCDS,
(200_001, 400_000): shadie.EXON,
(400_001, 600_000): shadie.INTRON,
(600_001, 800_000): shadie.EXON,
(800_001, 1_000_000): shadie.NONCDS,
})
chrom.draw();
#write the model
with shadie.Model() as tut5:
tut5.initialize(
chromosome=chrom,
sim_time=100,
mutation_rate=1e-8,
recomb_rate=1e-9,
file_out="tut5.trees",
)
tut5.reproduction.pteridophyte_homosporous(
spo_pop_size=500,
gam_pop_size=1000,
)
tut5.run()
There are multiple functions in the postsim
module to account for the different kinds of simulations you might run. Most users will likely utilize the OneSim()
module which, as its name suggests, helps you process a single shadie
simulation. We will focus on this function in this tutorial.
However, if you ran a specialized simulation you may need one of the other functions. A short description of the available functions and links to other tutorials:
.postsim.OneSim()
: process and analyze a "normal"shadie
simulation. You did not use theskip_neutral_mutations
parameter, or set the parameter toskip_neutral_mutations=True
. You did not include a burn-in period in your simulation..postsim.TwoSims()
: this is a specific function that helps you merge two separate simulations into one ancestral population at the beginning of the simulation then recapitate and mutate it. Conditions should be the same as above. This is most often used to simulate a population split, allowing you to perform certain analyses, such as the MK test. See the demography tutorial for more detail and the Quick Demo for another example of this..postsim.PureSlim()
: process and analyze an alteredshadie
simulation in which you simulated neutral mutations and simulated a burn-in (or loaded in a burn-in before continuing the simulation withshadie
- an example of that can be found in --). Usually, this means you utilized the parameter settingskip_neutral_mutations=False
. In this case, you are loading the simulation purely to analyze the results.shadie
will take care of the time unit mismatch that occurs due to the use of altgen simulations..postsim.PureSlim_TwoPops()
: a combination ofPureSlim()
andTwoSims()
, this handles the case where multiple simulations were started from the same burn-in file and need to be merged so that they can be analyzed together.
The postsim functions only need two parameters: the .trees file and the chromosome object. The chromosome is not necessary for running analyses, it is only used for plotting functions.
#load the simulation into the post-sim module
post = shadie.postsim.OneSim("tut5.trees", chrom)
🌿 INFO | one_sim.py | shadie assumes sims were run without a burn-in and without neutral mutations and will recapitate (add ancestry simulation) and mutate (add neutral mutations) any loaded sims by default. Current settings - Recapitate: True Add neutral mutations: True 🌿 INFO | one_sim.py | Simulation currently has 7 existing mutations of type(s) {2, 3}. 🌿 INFO | one_sim.py | Mutating using mutation rate: 1e-08. Keeping 57 existing mutations of type(s) {0, 2, 3}.
post.tree_sequence.population(2)
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[10], line 1 ----> 1 post.tree_sequence.population(2) File ~/miniconda3/envs/py3_12_2/lib/python3.12/site-packages/tskit/trees.py:6084, in TreeSequence.population(self, id_) 6076 def population(self, id_): 6077 """ 6078 Returns the :ref:`population <sec_population_table_definition>` 6079 in this tree sequence with the specified ID. As with python lists, negative (...) 6082 :rtype: :class:`Population` 6083 """ -> 6084 id_ = self.check_index(id_, self.num_populations) 6085 (metadata,) = self._ll_tree_sequence.get_population(id_) 6086 return Population( 6087 id=id_, 6088 metadata=metadata, 6089 metadata_decoder=self.table_metadata_schemas.population.decode_row, 6090 ) File ~/miniconda3/envs/py3_12_2/lib/python3.12/site-packages/tskit/trees.py:5951, in TreeSequence.check_index(index, length) 5949 index += length 5950 if index < 0 or index >= length: -> 5951 raise IndexError("Index out of bounds") 5952 return index IndexError: Index out of bounds
As you can see from the logger output, shadie
will automatically recapitate and mutate the simulation and will report how many mutations it added. Mutations of type 0
are automatically defined as neutral by shadie, so you should always see no mutations of this type in the original simulation, and then the additional mutation type = 0 added after mutation.
If you do not input any parameters, as in the example above, the function will attempt to recapitate and mutate the simulation using information from the metadata that shadie
writes to the .trees file. If you did not use shadie
to generate your model, or you altered it yourself in the SLiMgui, you may find that some of this metadata is missing (or worse, incorrect!).
If metadata is missing you are able to enter it manually:
#load the simulation into the post-sim module
post_manual = shadie.postsim.OneSim(
trees_file="tut5.trees",
chromosome=chrom,
generations=500, #this is full lifecycles
gens_per_lifecycle=2, #this indicates whether sim is altgen (=2) or not (=1)
ancestral_Ne=500, #usually the same as the diploid population size
mutation_rate=1e-8,
recomb_rate=1e-9,
)
As you can see, the results of the recapitation and mutation are very similar. shadie
uses a random seed for the coalescent ancestry simulation and the mutation function, so every time it is run you will see slightly different results. If you want control over the seed you will need to run recapitate() and mutate() manually, then you can use the parameter random_seed
.
You can also control the behavior of the OneSim() function, for example you may not want it to recapitate (perform a coalescent ancestry simulation until a single ancestor is reached) or you may not want it to mutate (overlay neutral mutations that were not part of the original simulation).
#load the simulation into the post-sim module
post_norecap = shadie.postsim.OneSim("tut5.trees",
chrom,
recapitate=False, #turn off recapitation
add_neutral_mutations=False) #turn off neutral mutations
🌿 INFO | one_sim.py | shadie assumes sims were run without a burn-in and without neutral mutations and will recapitate (add ancestry simulation) and mutate (add neutral mutations) any loaded sims by default. Current settings - Recapitate: False Add neutral mutations: False
Note: If you set both recapitate=False
and add_neutral_mutations=False
, you are essentially creating the PureSlim()
function. PureSlim()
is a utility function, but even if you are running full SLiM simulations you can still keep using the OneSim()
function to analyze those results if you are already comfortable with it.
You can use the function draw_tree_sequence()
to plot the treesequence if desired. If you do so, you will see that the original .trees file (loaded and not recapitated or mutated as post_norecap
) only has the non-neutral mutations, while the recapitated and mutated object (loaded as post
) has many additional neutral mutations.
post_norecap.draw_tree_sequence(sample=25, seed=333);
post.draw_tree_sequence(sample=25, seed=333);
Sometimes a non-recapitated simulation cannot be plotted because there is no root - this indicates that there is more than one ancestor at the beginning of the simulation. If you instead plot the recapitated and mutated object (loaded as post
) you will see a tree with a single root because the ancestry simulation has been performed.
# the more samples you include, the more likely it is that multiple roots will exist in the tree
post_norecap.draw_tree_sequence(sample=125, seed=333);
⚠️ toytree | toytree_sequence:_get_toytree | tree has multiple (3) roots.
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[14], line 2 1 # the more samples you include, the more likely it is that multiple roots will exist in the tree ----> 2 post_norecap.draw_tree_sequence(sample=125, seed=333); File ~/code/git/hacks/shadie/shadie/postsim/src/one_sim.py:503, in OneSim.draw_tree_sequence(self, start, max_trees, sample, seed, height, width, show_generation_line, scrollable, axes, **kwargs) 499 height = height if height is not None else 325 500 height += 100 501 width = max(300, ( 502 width if width is not None else --> 503 15 * tts.at_index(0).ntips * min(max_trees, len(tts)) 504 )) 506 # create an axis for the chromosome. +100 height for chrom. 507 if scrollable: File ~/miniconda3/envs/py3_12_2/lib/python3.12/site-packages/toytree/utils/src/toytree_sequence.py:133, in ToyTreeSequence.at_index(self, index) 131 """Return a ToyTree from the indexed order in the tree_sequence.""" 132 tree = self.tree_sequence.at_index(index) --> 133 return self._get_toytree(tree) File ~/miniconda3/envs/py3_12_2/lib/python3.12/site-packages/toytree/utils/src/toytree_sequence.py:216, in ToyTreeSequence._get_toytree(self, tree, site) 213 pnode._add_child(cnode) 215 # wrap TreeNode as a toytree --> 216 ttree = toytree.ToyTree(pseudo_root)#root_idx])#.ladderize() 218 # add mutations as metadata 219 if site is not None: File ~/miniconda3/envs/py3_12_2/lib/python3.12/site-packages/toytree/core/tree.py:117, in ToyTree.__init__(self, treenode) 114 self.annotate = AnnotationAPI(self) 116 # update Node idxs, _idx_dict, nnodes, ntips, and Node heights --> 117 self._update() File ~/miniconda3/envs/py3_12_2/lib/python3.12/site-packages/toytree/core/tree.py:362, in ToyTree._update(self) 359 node = queue.pop() 361 # set depth of this node from the root --> 362 depths[node] = depths[node._up] + node._dist 364 # if leaf add to output stack and update farthest depth 365 if node.is_leaf(): KeyError: None
See the next tutorial for more functionality of the postsim
module for analysis