Package cstag

Licence Test Python PyPI Bioconda JOSS DOI

cstag

cstag is a Python library tailored for manipulating and visualizing minimap2's cs tags.

[!NOTE] To add cs tags to SAM/BAM files, check out cstag-cli.

🌟 Features

For comprehensive documentation, please visit our docs.

🛠 Installation

Using PyPI:

pip install cstag

Using Bioconda:

conda install -c bioconda cstag

💡 Usage

Generating cs tags

import cstag

cigar = "8M2D4M2I3N1M"
md = "2A5^AG7"
seq = "ACGTACGTACGTACG"

print(cstag.call(cigar, md, seq))
# :2*ag:5-ag:4+ac~nn3nn:1

print(cstag.call(cigar, md, seq, long=True))
# =AC*ag=TACGT-ag=ACGT+ac~nn3nn=G

Shortening or Lengthening cs tags

import cstag

# Convert a cs tag from long to short
cs_tag = "=ACGT*ag=CGT"

print(cstag.shorten(cs_tag))
# :4*ag:3


# Convert a cs tag from short to long
cs_tag = ":4*ag:3"
cigar = "8M"
seq = "ACGTACGT"

print(cstag.lengthen(cs_tag, cigar, seq))
# =ACGT*ag=CGT

Creating a Consensus

import cstag

cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=C*gt=T", "=ACT+ccc=T"]
positions = [1, 1, 2, 2, 1]

print(cstag.consensus(cs_tags, positions))
# =AC*gt=T

Masking Low-Quality Bases

import cstag

cs_tag = "=ACGT*ac+gg-cc=T"
cigar = "5M2I2D1M"
qual = "AA!!!!AA"
phred_threshold = 10
print(cstag.mask(cs_tag, cigar, qual, phred_threshold))
# =ACNN*an+ng-cc=T

Splitting a cs tag

import cstag

cs_tag = "=ACGT*ac+gg-cc=T"
print(cstag.split(cs_tag))
# ['=ACGT', '*ac', '+gg', '-cc', '=T']

Reverse Complement of a cs tag

import cstag

cs_tag = "=ACGT*ac+gg-cc=T"
print(cstag.revcomp(cs_tag))
# =A-gg+cc*tg=ACGT

Reconstructing the Reference Subsequence

import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
print(cstag.to_sequence(cs_tag))
# ACTTCTTA

Generating a VCF Report

import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
chrom = "chr1"
pos = 1
print(cstag.to_vcf(cs_tag, chrom, pos))
"""
##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    3   .   G   T   .   .   .
chr1    4   .   TGG T   .   .   .
chr1    5   .   C   CTT .   .   .
"""

The multiple cs tags enable reporting of the variant allele frequency (VAF).

import cstag
cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=ACGT", "=AC*gt=T"]
chroms = ["chr1", "chr1", "chr1", "chr2", "chr2"]
positions = [2, 2, 3, 10, 100]
print(cstag.to_vcf(cs_tags, chroms, positions))
"""
##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">
##INFO=<ID=AD,Number=1,Type=Integer,Description="Depth of Alt allele">
##INFO=<ID=VAF,Number=1,Type=Float,Description="Variant allele frequency (AD/DP)">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    4   .   G   T   .   .   DP=3;RD=1;AD=2;VAF=0.667
chr2    102 .   G   T   .   .   DP=1;RD=0;AD=1;VAF=1.0
"""

Generating an HTML Report

import cstag
from pathlib import Path

cs_tag = "=AC+ggg=T-acgt*at~gt10ag=GNNN"
description = "Example"

cs_tag_html = cstag.to_html(cs_tag, description)
Path("report.html").write_text(cs_tag_html)
# Output "report.html"

You can visualize mutations indicated by the cs tag using the generated report.html file as shown below:

image

Generating a PDF Report

import cstag

cs_tag = "=AC+ggg=T-acgt*at~gt10ag=GNNN"
description = "Example"
path_out = "report.pdf"

cstag.to_pdf(cs_tag, description, path_out)
# Output "report.pdf"

You can obtain the same images of cstag.to_html as a PDF file.

📣 Feedback and Support

For questions, bug reports, or other forms of feedback, we'd love to hear from you!
Please use GitHub Issues for all reporting purposes.

Please refer to CONTRIBUTING for how to contribute and how to verify your contributions.

🤝 Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.

📄 Citation

Expand source code
"""
.. include:: ../../README.md
"""

from cstag.call import call
from cstag.shorten import shorten
from cstag.lengthen import lengthen
from cstag.consensus import consensus
from cstag.mask import mask
from cstag.split import split
from cstag.revcomp import revcomp
from cstag.to_html import to_html
from cstag.to_pdf import to_pdf
from cstag.to_vcf import to_vcf
from cstag.to_sequence import to_sequence

Sub-modules

cstag.call
cstag.consensus
cstag.lengthen
cstag.mask
cstag.revcomp
cstag.shorten
cstag.split
cstag.to_html
cstag.to_pdf
cstag.to_sequence
cstag.to_vcf
cstag.utils