View on GitHub

PyMsBayes

A multi-processing Python wrapper and API for approximate-Bayesian phylgeographical inference

Download this project as a .zip file Download this project as a tar.gz file

Table Of Contents

Previous topic

5. Caveats and Recommendations

Next topic

7. Acknowledgements

This Page

6. PyMsBayes Change History

6.1. Version 0.3.5

6.1.1. Changes

  • Updating versions of bundled msbayes.pl and dpp-msbayes.pl PERL scripts from dpp-msbayes package. These no longer use deprecated POSIX::tmpnam. Instead, they use File::Temp::tempfile.

6.1.2. Bug Fixes

  • Fixing a bug that occurred for two-stage analyses (i.e., Stage 1: generate all prior samples, and Stage 2: then do rejection). When ‘–compress’ option was used in Stage 1 command, the prior sample file was being compressed, which caused eureject (and ultimately dmc.py) to crash during rejection (Stage 2). This bug is fixed in this release.
  • Fixing issue where the reporting frequency could be greater than the total number of prior samples.

6.2. Version 0.3.4

6.2.1. Changes

  • Making ‘pretty’ summary output of dmc_posterior_probs.py script more robust. The regex used by ListConditionEvaluator to put the names of the taxon pairs into the expression was easily fooled by numbers included in the names. This did not affect the results, but led confusing summary output. This is now more robust, and the output messages should now help rather than confuse.

6.3. Version 0.3.3

6.3.1. Changes

  • If ‘nan’ observed summary statistics are produced, an error is raised to crash the analysis early on. This can happen for some summary statistics when there are populations with only a single sequence sampled.

6.4. Version 0.3.2

6.4.1. New Features

  • New documentation.
  • New tutorials.
  • A new option timeInSubsPerSite is now supported for dpp-msbayes and msbayes analyses.
  • The coefficient of variation of divergence times is now reported.
  • The dmc_dpp_summary.py CLI script now estimates and reports the probabilities of the number of categoories.
  • New dmc_sum_sims.py CLI script for summarizing and plotting the results of simulation-based analyses.
  • New “-m/–mu” option added to dmc_plot_results.py script for specifying the mutation rate to rescale time to generations.
  • New “–extension” option added to dmc_plot_results.py script to allow the file format of the plots to be specified.
  • Adding new worker classes to the API for simulating data via Joseph Heled’s biopy package.
  • Adding DppSimWorker and DppSimTeam to the API.

6.4.2. Changes

  • More rigorous checks added to dmc.py to make sure all the sample tables are the same within an analysis.
  • The logging frequency is now in units of prior samples rather than batch iterations.
  • Help menus of CLI scripts have been updated.
  • New dpp-msbayes executables were added that accommodate the timeInSubsPerSite option and reporting the coefficient of variation of divergence times.
  • The package has been updated to handle these new options.
  • For the dmc_plot_results.py CLI script, changing the the --iteration-index option to --sample-index to reflect the new logging behavior (units of prior samples rather than batch iterations).

6.5. Version 0.3.1

6.5.1. Changes

  • Cleaning up intra-package imports and rearranging some of the code.

6.6. Version 0.3.0

6.6.1. Changes

  • Updated the version of dpp-msprior binary (for both mac and linux) that is bundled with the package. This version of dpp-msprior is from version 0.2 of the dpp-msbayes package and was updated to create a lower limit (0.000000000001) for values of theta. In rare cases, a theta value of zero would cause the coalescent simulator msDQH to crash. Any previous analyses that did not crash should not be affected by this change.
  • The newer version of dpp-msprior also changes the weirdness from the original msBayes where there was a check for small (i.e., 0.0001) divergence times. In such simulations, the div time was set to this arbitrary lower bound, and the bottleneck time was set to 0.5 of this. I am guessing this was to prevent unrealistic (and numerically unstable?) changes in pop size. However, 0.0001 can be thousands of generations which is not trivial. Also, rather than this weird hack of the bottleneck time, it seems much better to simply have no bottleneck if the div time is essentially zero. Accordingly, I lowered the threshold and simply “turn off” the bottleneck if the time is below it (I no longer adjust the div time or bottleneck time). This change should have very little affect on most analyses.

6.6.2. Bug Fixes

  • As mentioned in the change above, in rare cases, theta values of zero drawn from the prior via dpp-msprior would cause msDQH to crash. The new version of the dpp-msprior binaries should prevent this. Any previous analyses that did not crash were not affected by this bug.

6.7. Version 0.2.2

6.7.1. Changes

  • End-user CLI scripts log additional information.
  • dmc.py will now set the number of standardizing samples equal to the number of prior samples, if the former is larger than the latter.

6.7.2. New Features

  • New management of paths to dpp-msbayes and msBayes executables. Previously, the package only used the executables bundled with the package, which required the package to be installed via python setup.py develop. Now, the package will also look for executables on the system.
  • Updating setup.py so that the bundled executables will be installed along with end-user scripts via python setup.py install after checking that all the executables work on the system.
  • 32-bit Linux versions of the dpp-msbayes/msBayes executables are now bundled with the package.

6.7.3. Bug Fixes

  • Fixed a bug where MsBayesWorker instances with a sample size of zero would hang indefinitely. A ValueError is now raised if an MsBayesWorder is initiated with an integer less than one for the sample size.

6.8. Version 0.2.1

6.8.1. Changes

  • Adding working version of documentation.
  • Overhauled package-wide and CLI-script logging management.

6.8.2. Bug Fixes

  • Fixing bug in dmc.py script, where the wrong value for the number of standardizing samples was being written to the run summary file.

6.9. Version 0.2.0

6.9.1. New Features

  • New options added to package and scripts for the sort index. These allow summary statistics to be grouped by taxon, but without re-sorting.

6.9.2. Changes

  • The new default option for the sort index for the entire package and for end-user scripts is to not group or re-sort the summary statistics of each alignment. The re-sorting that was done prior to this was NOT VALID, because the summary statics calculated from the different alignments are NOT exchangeable.