Post-Simulation Analyses¶

This tutorial explores some of the functionality of the postsim module. This aspect of shadie is still under development.

Loading a simulation¶

In [5]:

Copied!





import shadie

#re-make the chromosome object
chrom = shadie.chromosome.explicit({
    (0, 200_000): shadie.NONCDS,
    (200_001, 400_000): shadie.EXON,
    (400_001, 600_000): shadie.INTRON,
    (600_001, 800_000): shadie.EXON,
    (800_001, 1_000_000): shadie.NONCDS,
})

#load the simulation from tutorial #5
post = shadie.postsim.OneSim("tut5.trees", chrom)
import shadie

#re-make the chromosome object
chrom = shadie.chromosome.explicit({
    (0, 200_000): shadie.NONCDS,
    (200_001, 400_000): shadie.EXON,
    (400_001, 600_000): shadie.INTRON,
    (600_001, 800_000): shadie.EXON,
    (800_001, 1_000_000): shadie.NONCDS,
})

#load the simulation from tutorial #5
post = shadie.postsim.OneSim("tut5.trees", chrom)

🌿 INFO | one_sim.py | shadie assumes sims were run without a burn-in and without neutral mutations and will recapitate (add ancestry simulation) and mutate (add neutral mutations) any loaded sims by default.
Current settings -
Recapitate: True
Add neutral mutations: True

🌿 INFO | one_sim.py | Simulation currently has 7 existing mutations of type(s) {2, 3}.
🌿 INFO | one_sim.py | Mutating using mutation rate: 1e-08. Keeping 67 existing mutations of type(s) {0, 2, 3}.

Once you have recapitated and mutated your tree sequence, it may make sense to save it so that you don't have to re-do that process every time you come back. It can be helpful to keep the original simulation, so it is recommended to save as a new file:

In [6]:

Copied!

#save using the treesequence method dump()
post.tree_sequence.dump("tut5_recapitated.trees")
#save using the treesequence method dump()
post.tree_sequence.dump("tut5_recapitated.trees")

And if you wanted to load a postsim analysis object in shadie from that recapitated file, you could do so using the method below. Remember that you need to suppress recapitation and mutation, which is the automatic behavior of the OneSim() function.

In [7]:

Copied!





post_recapitated = shadie.postsim.OneSim("tut5.trees", chrom,
                             recapitate=False,
                             add_neutral_mutations=False
                            )
post_recapitated = shadie.postsim.OneSim("tut5.trees", chrom,
                             recapitate=False,
                             add_neutral_mutations=False
                            )

🌿 INFO | one_sim.py | shadie assumes sims were run without a burn-in and without neutral mutations and will recapitate (add ancestry simulation) and mutate (add neutral mutations) any loaded sims by default.
Current settings -
Recapitate: False
Add neutral mutations: False

Analyzing the simulation results¶

Working with the tree sequence¶

Now that you have your recapitated tree sequence, you can access any of the metadata or methods provided by tskit.

For example, you can check for the presence of the ancestry simulation by inspecting the populations:

In [8]:

Copied!

post.tree_sequence.population(0), #the SLiM simulation
post.tree_sequence.population(0), #the SLiM simulation

Out[8]:

(Population(id=0, metadata={'name': 'p0', 'slim_id': 0}),)

In [9]:

Copied!

post.tree_sequence.population(1) #the msprime ancestry simulation
post.tree_sequence.population(1) #the msprime ancestry simulation

Out[9]:

Population(id=1, metadata={'description': 'ancestral population simulated by msprime', 'name': 'ancestral'})

You can also look up any of the individuals in the simulation and see what kind of data the tree sequence has stored for it:

In [13]:

Copied!

post.tree_sequence.individual(15)
post.tree_sequence.individual(15)

Out[13]:

Individual(id=15, flags=196608, location=array([0., 0., 0.]), parents=array([-1, -1], dtype=int32), nodes=array([29, 30], dtype=int32), metadata={'pedigree_id': 19933821, 'pedigree_p1': 19904772, 'pedigree_p2': 19904772, 'age': 0, 'subpopulation': 1, 'sex': -1, 'flags': 0})

The pyslim and tskit docs will provide a more in depth idea of how this functionality may come in handy.

Visualizations¶

shadie has a number of plotting functions that utilize toyplot and allow you to visualize the results of the simulation.

draw_tree_sequence allows you to sample haplotypes from the end of the simulation and visualize the tree sequence. You will see the chromosome structure at the top (if you hover over the genomic elements you will see some information about what they are) and then below is a representation of the tree sequence for the sampled haplotypes. The chromosome is broke up into chunks due to recombination, so each chunk of the chromosome has a different topology. Usually a random seed is generated by shadie for the haplotype sampling, but if you wanted to sample the same individuals every time you can specify a number for the seed and as long as the seed doesn't change, the same haplotypes will be sampled every time.

In [17]:

Copied!

post.draw_tree_sequence(sample=10, seed=123);
post.draw_tree_sequence(sample=10, seed=123);

You can also plot the individual trees using the draw_tree function. You will need to specify the idx, which corresponds to the chunk of chromosome that you want to see the topology for. If you use the same sample number and seed as above, you will see that you can access the same trees shown in the tree sequence:

In [22]:

Copied!

post.draw_tree(idx=2, sample=10, seed=123);
post.draw_tree(idx=2, sample=10, seed=123);

Calculating Statistics¶

shadie can calculate statistics like theta (diversity) and Tajima's D on your postsim object using the function stats(). You can specify how many haplotypes are sampled for each calculation and how many replicate calculations are performed. These calculations utilise the built-in theta and Tajima's D calculations from tskit.

In [23]:

Copied!

post.stats(sample=15, reps=50)
post.stats(sample=15, reps=50)

Out[23]:

	mean	CI_5%	CI_95%
theta	0.000001	0.000001	0.000001
D_Taj	-1.654312	-1.740551	-1.568073

Access Metadata¶

If you forget the simulations settings, you can always access the parameters settings from SLiM in the tree sequence metadata:

In [24]:

Copied!

post.tree_sequence.metadata["SLiM"]["user_metadata"]
post.tree_sequence.metadata["SLiM"]["user_metadata"]

Out[24]:

{'file_in': ['None'],
 'file_out': ['tut5.trees'],
 'full_lifecycles': ['100'],
 'gam_archegonia_per': ['1'],
 'gam_ceiling': ['10000'],
 'gam_clone_rate': ['0.0'],
 'gam_clones_per': ['1'],
 'gam_female_to_male_ratio': ['0.6666666666666666'],
 'gam_maternal_effect': ['0'],
 'gam_mutation_rate': ['5e-09'],
 'gam_pop_size': ['1000'],
 'gam_random_death_chance': ['0'],
 'gam_self_rate': ['0.0'],
 'gam_self_rate_per_egg': ['0.1'],
 'gens_per_lifecycle': ['2'],
 'lineage': ['Pteridophyte'],
 'mode': ['homosporous'],
 'model_source': ['shadie'],
 'mutation_rate': ['1e-08'],
 'recomb_rate': ['1e-09'],
 'slim_gens': ['201'],
 'spo_clone_rate': ['0.0'],
 'spo_clones_per': ['1'],
 'spo_maternal_effect': ['0'],
 'spo_mutation_rate': ['5e-09'],
 'spo_pop_size': ['500'],
 'spo_random_death_chance': ['0'],
 'spo_self_rate': ['0.0'],
 'spo_self_rate_per_egg': ['0.1'],
 'spo_spores_per': ['100']}

In [ ]: