Release History

v1.1.1

  • Minor edits to readme and removing methylcheck import, because it is not used anywhere.
  • Note: methylprep is only imported for reading Manifest files and handling ArrayType.

v1.1.0

  • We found that diff_meth_pos results were not accurate in prior versions and have fixed the regression optimization.
  • diff_meth_pos function kwargs changed to provide more flexibility in how the model is optimized.
    • Added support for COVARIATES in logistic regression. Provide a dataframe with both the phenotype and covariates, and specify which columns are phenotype or covariates. It will rearrange and normalize to ensure the model works best.
    • Use the new ‘solver’ kwarg in diff_meth_pos to specify which form of linear or logistic regression to run. There are two flavors of each, and both give nearly identical results.
    • Auto-detects logistic or linear based on input: if non-numeric inputs in phenotype of exactly two values, it assumes logistic.
  • Upgraded manhattan and volcano plots with many more options. Default settings should mirror most R EWAS packages now, with a “suggestive” and “significant” threshold line on manhattan plots.
  • Unit test coverage added.

v1.0.1

  • Fixed option to use Differentially methylated regions (DMR) via cached local copy of UCSC database (via fetch_genes) without using the internet. Previously, it would still contact the internet database even if user told it not to.
  • Added testing via github actions, and increased speed
  • updated documentation

v1.0.0

  • fixed bug in fetch_genes() from UCSC browser; function will now accept either the filepath or the DMR dataframe output.

v0.9.9

  • Added a differentially methylated regions (DMR) functions that takes the output of the diff_meth_pos (DMP) function.
    • DMP maps differences to chromosomes; DMR maps differences to specific genomic locii, and requires more processing.
    • upgraded methylprep manifests to support both old and new genomic build mappings for all array types. In general, you can supply a keyword argument (genome_build='OLD') to change from the new build back to the old one.
    • Illumina 27k arrays are still not supported, but mouse, epic, epic+, and 450k ARE supported. (Genome annotation won’t work with mouse array, only human builds.)
    • DMP integrates the combined-pvalues package (https://pubmed.ncbi.nlm.nih.gov/22954632/)
    • DMP integrates with UCSC Genome (refGene) and annotates the genes near CpG regions.
    • Annotation includes column(s) showing the tissue specific expression levels of relevant genes (e.g. filter=blood) this function is also available with extended options as methylize.filter_genes()
    • provides output BED and CSV files for each export into other genomic analysis tools
  • methylize.to_BED will convert the diff_meth_pos() stats output into a standard BED file (a tab separated CSV format with standardized, ordered column names)

v0.9.8

  • fixed methylize diff_meth_pos linear regression. upgraded features too
    • Fixed bug in diff_meth_pos using linear regression - was not calculating p-values correctly. Switched from statsmodels OLS to scipy linregress to fix, but you can use either one with kwargs. They appear to give exactly the same results now after testing.
    • The “CHR-” prefix is omitted from manhattan plots by default now
    • dotted manhattan sig line is Bonferoni corrected (pass in post_test=None to leave uncorrected)
    • added a probe_corr_plot() undocumented function, a scatterplot of probe confidence intervals vs pvalue
    • sorts probes by MAPINFO (chromosome location) instead of FDR_QValue on manhattan plots now
  • Support for including/excluding sex chromosomes from DMP (probe2chr map)

v0.9.5

  • Added imputation to diff_meth_pos() function, because methylprep output contains missing values by default and cannot be used in this function.
    • This can be disabled, and it will throw a warning if NaNs present.
    • Default is to delete probes that have any missing values before running analysis.
    • if ‘auto’: If there are less than 30 samples, it will delete the missing rows.
    • if ‘auto’: If there are >= 30 samples in the batch analyzed, it will replace NaNs with the average value for that probe across all samples.
    • User may override the default using: True (‘auto’), ‘delete’, ‘average’, and False (disable)
    • diff_meth_pos() now support mouse array, with multiple copies of the same probe names.

v0.9.4

  • Fixed bug where methylize could not find a data file in path, causing ImportError
  • Improved diff_meth_pos() function and added support for all array types. Now user must specify the array_type when calling the function, as the input data are stats, not probe betas, so it cannot infer the array type from this information.