Pages

Sunday, 31 March 2019

Phi X 174

What is it?

    1. a single-stranded DNA(ssDNA) virus that infects Escherichia coli
    2. the first DNA-based genome to be sequenced in 1977
    3. Well-defined, small(5,396bp), and diverse(45% GC, 55% AT) genome
    4. fasta file download link:
      1. PhiX_from_Illumina
      2. PhiX_from_NCBI
    5. Using it as a positive control in Illumina NGS



    What are benefits of using PhiX control?

    1. Calibration Control: can be run alone and serves as a calibration control for;
      1. Cluster generation: can be used as a positive control in the clustering process

        PlatformMode/ReagentsOptimal Raw Cluster Density
        HiSeqHigh Output, TruSeq v3750-850 K/mm²
        High Output, HiSeq v4
        (required upgrade)
        950-1050 K/mm²
        Rapid v2850-1,000 K/mm²
        MiSeqv21,000-1,200 K/mm²
        v31,200-1,400 K/mm²
        MiniSeqMid and High Output170-220 K/mm²
        NextSeqMid and High Output, v2170-220 K/mm²
        [table 1] Cluster density guidelines for Illumina sequencing platforms

      2. Cross talk matrix generation
        1. During an illumina sequencing run, the cross-talk due to spectral overlap between the 4 fluorescently labeled nucleotides is calculated during template generation in cycle 1-5
        2. https://www.slideshare.net/idtdna/unique-dualmatched-adapters-mitigate-index-hopping-between-ngs-samples
      3. Phasing and Prephasing
        1. During sequencing by synthesis, each DNA strand in a cluster extends by 1 base per cycle
        2. A small proportion of strands may become out of phase with the current cycle, either falling a base behind(phasing) or jumping a base ahead(prephasing)
        3. For best results, use a PhiX spike-in as a control with any library that does not comprise a balanced base composition
        4. High GC samples(≧ 60%) typically show higher phasing rates, and in this case a PhiX control is required

    2. Run quality monitor: due to its small size and balanced nucleotide composition, it's an ideal in-run control (typically with >= 1% spike-in) for run quality monitoring

      PlatformPhiX Aligned(%)
      iSeq 100minimum 5%
      MiniSeq10~50%
      MiSeq
      (MCS 2.2 or higher)
      minimum 5%
      NextSeq10~50%
      HiSeq 2500
      (HCS 2.2.38 or higher)
      minimum 10%
      HiSeq 3000/4000
      (HCS 3.3.76 or lower)
      10~50%
      HiSeq 3000/4000
      (HCS 3.4.0 or higher)
      5~20%
      NovaSeqminimum 10%
      [table 2] PhiX Control v3 library Illumina recommends spiking in when running low diversity libraries

    3. Color balancing
      1. For low diversity libraries, the PhiX Control v3 library provides balanced fluorescent signals at each cycle to improve the overall run quality
      2. You can find why the nucleotide diversity is important in here

    How to remove PhiX reads from the fastq


      Phi X 174

      What is it?

      1. a single-stranded DNA(ssDNA) virus that infects Escherichia coli
      2. the first DNA-based genome to be sequenced in 1977
      3. Well-defined, small(5,396bp), and diverse(45% GC, 55% AT) genome
      4. fasta file download link:
        1. PhiX_from_Illumina
        2. PhiX_from_NCBI
      5. Using it as a positive control in Illumina NGS



      What are benefits of using PhiX control?

      1. Calibration Control: can be run alone and serves as a calibration control for;
        1. Cluster generation: can be used as a positive control in the clustering process

          PlatformMode/ReagentsOptimal Raw Cluster Density
          HiSeqHigh Output, TruSeq v3750-850 K/mm²
          High Output, HiSeq v4
          (required upgrade)
          950-1050 K/mm²
          Rapid v2850-1,000 K/mm²
          MiSeqv21,000-1,200 K/mm²
          v31,200-1,400 K/mm²
          MiniSeqMid and High Output170-220 K/mm²
          NextSeqMid and High Output, v2170-220 K/mm²
          [table 1] Cluster density guidelines for Illumina sequencing platforms

        2. Cross talk matrix generation
          1. During an illumina sequencing run, the cross-talk due to spectral overlap between the 4 fluorescently labeled nucleotides is calculated during template generation in cycle 1-5
          2. https://www.slideshare.net/idtdna/unique-dualmatched-adapters-mitigate-index-hopping-between-ngs-samples
        3. Phasing and Prephasing
          1. During sequencing by synthesis, each DNA strand in a cluster extends by 1 base per cycle
          2. A small proportion of strands may become out of phase with the current cycle, either falling a base behind(phasing) or jumping a base ahead(prephasing)
          3. For best results, use a PhiX spike-in as a control with any library that does not comprise a balanced base composition
          4. High GC samples(≧ 60%) typically show higher phasing rates, and in this case a PhiX control is required

      2. Run quality monitor: due to its small size and balanced nucleotide composition, it's an ideal in-run control (typically with >= 1% spike-in) for run quality monitoring

        PlatformPhiX Aligned(%)
        iSeq 100minimum 5%
        MiniSeq10~50%
        MiSeq
        (MCS 2.2 or higher)
        minimum 5%
        NextSeq10~50%
        HiSeq 2500
        (HCS 2.2.38 or higher)
        minimum 10%
        HiSeq 3000/4000
        (HCS 3.3.76 or lower)
        10~50%
        HiSeq 3000/4000
        (HCS 3.4.0 or higher)
        5~20%
        NovaSeqminimum 10%
        [table 2] PhiX Control v3 library Illumina recommends spiking in when running low diversity libraries

      3. Color balancing
        1. For low diversity libraries, the PhiX Control v3 library provides balanced fluorescent signals at each cycle to improve the overall run quality
        2. You can find why the nucleotide diversity is important in here

      How to remove PhiX reads from the fastq


        Nucleotide Diversity

        What is nucleotide diversity and why is it important?

        1. High nucleotide diversity: when a library has roughly equal proportions of all 4 nucleotides in every cycle of the run
        2. The diagram below illustrates the diversity and base-balance of well-balanced and unbalanced libraries, and how that can be reflected in the % base plot of Sequencing Analysis Viewer(SAV)
        [fig 1] Illustrates of the diversity and base-balance

        Why is nucleotide diversity important?

        1. Nucleotide diversity is required for effective template generation and is important for the generation of high-quality data
        2. Diversity is especially important during the first 4-7 cycles of the first sequencing read for MiniSeq, MiSeq, NextSeq, and HiSeq 1000-2500 systems. The Sequencing software uses images from these early cycles to identify the location of each cluster in a process called template generation
        3. Diversity is also important for the first 25 cycles because this is when phasing/pre-phasing, color matrix corrections, and the pass filter calculations occur
        4. Real-Time Analysis(RTA) software need a proper PhiX is spiked-in. You can find more specific data in here
          ref)
            https://support.illumina.com/bulletins/2016/07/what-is-nucleotide-diversity-and-why-is-it-important.html

            Nucleotide Diversity

            What is nucleotide diversity and why is it important?

            1. High nucleotide diversity: when a library has roughly equal proportions of all 4 nucleotides in every cycle of the run
            2. The diagram below illustrates the diversity and base-balance of well-balanced and unbalanced libraries, and how that can be reflected in the % base plot of Sequencing Analysis Viewer(SAV)
            [fig 1] Illustrates of the diversity and base-balance

            Why is nucleotide diversity important?

            1. Nucleotide diversity is required for effective template generation and is important for the generation of high-quality data
            2. Diversity is especially important during the first 4-7 cycles of the first sequencing read for MiniSeq, MiSeq, NextSeq, and HiSeq 1000-2500 systems. The Sequencing software uses images from these early cycles to identify the location of each cluster in a process called template generation
            3. Diversity is also important for the first 25 cycles because this is when phasing/pre-phasing, color matrix corrections, and the pass filter calculations occur
            4. Real-Time Analysis(RTA) software need a proper PhiX is spiked-in. You can find more specific data in here
              ref)
                https://support.illumina.com/bulletins/2016/07/what-is-nucleotide-diversity-and-why-is-it-important.html

                Thursday, 28 March 2019

                [A6000 + 30.4] Piazzale Michelangelo6


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Piazzale Michelangelo4


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Piazzale Michelangelo5


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Piazzale Michelangelo3


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Piazzale Michelangelo2


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Piazzale Michelangelo1


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Piazzale Michelangelo1


                2019. 03
                from Piazzale Michelangelo, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                Wednesday, 27 March 2019

                [A6000 + 30.4] Battistero di San Giovanni


                2019. 02
                from Battistero di San Giovanni, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Ponte Vecchio


                2019. 02
                from Ponte Vecchio, Florence, Italy
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Ttukseom Hangang Park5


                2018. 06
                from Ttukseom Hangang Park
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Ttukseom Hangang Park4


                2018. 06
                from Ttukseom Hangang Park
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Ttukseom Hangang Park3


                2018. 06
                from Ttukseom Hangang Park
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Ttukseom Hangang Park2


                2018. 06
                from Ttukseom Hangang Park
                Sony A6000 + Sigma 30mm f1.4

                [A6000 + 30.4] Ttukseom Hangang Park1


                2018. 06
                from Ttukseom Hangang Park
                Sony A6000 + Sigma 30mm f1.4

                Estimating standard deviation: divide by n-1

                Two reasons for that

                1. To reduce gap between sample variance and population variance
                  ( empirical reason )
                  1. "1/n" version is the maximum likelihood estimate of the population variance, however, it is also mathematically biased
                  2. sample variance is usually smaller than the population variance
                    → estimation of the population variance is getting bigger than real
                  3. to reduce gap using "1/n-1" convention ( provides an unbiased estimate )
                  4. why not n-2 ?
                    1. related to degree of freedom, that is n-1
                2. To match both expectation of sample variances and population variance
                  ( mathematical reason )
                  1. let,
                    : sample size
                    : sample mean
                    : sample variance
                    : population mean
                    : population variance
                  2. then, figure out following is true
                  3. first,







                  4. as here,



                Estimating standard deviation: divide by n-1

                Two reasons for that

                1. To reduce gap between sample variance and population variance
                  ( empirical reason )
                  1. "1/n" version is the maximum likelihood estimate of the population variance, however, it is also mathematically biased
                  2. sample variance is usually smaller than the population variance
                    → estimation of the population variance is getting bigger than real
                  3. to reduce gap using "1/n-1" convention ( provides an unbiased estimate )
                  4. why not n-2 ?
                    1. related to degree of freedom, that is n-1
                2. To match both expectation of sample variances and population variance
                  ( mathematical reason )
                  1. let,
                    : sample size
                    : sample mean
                    : sample variance
                    : population mean
                    : population variance
                  2. then, figure out following is true
                  3. first,







                  4. as here,