1 Load the package
2 Define a helper function to save the raw dataset as a temporary text file
3 Design an experiment
- 3.1 Examples for single indexing
- 3.2 Examples for dual indexing
4 Build your own workflow

This document gives an overview of the DNABarcodeCompatibility R package with a brief description of the set of tools that it contains. The package includes six main functions that are briefly described below with examples. These functions allow one to load a list of DNA barcodes (such as the Illumina TruSeq small RNA kits), to filter these barcodes according to distance and nucleotide content criteria, to generate sets of compatible barcode combinations out of the filtered barcode list, and finally to generate an optimized selection of barcode combinations for multiplex sequencing experiments. In particular, the package provides an optimizer function to favour the selection of compatible barcode combinations with least heterogeneity in the frequencies of DNA barcodes, and allows one to keep barcodes that are robust against substitution and insertion/deletion errors, thereby facilitating the demultiplexing step.

The DNABarcodeCompatibility package also contains:

one workflow called experiment_design() allowing one to perform all steps in one go.
two data sets called IlluminaIndexesRaw and IlluminaIndexes for running and testing examples.
a series of API to build your own workflow.

The package deals with the three existing sequencing-by-synthesis chemistries from Illumina:

Four-Channel SBS Chemistry: MiSeq, HiSeq systems
Two-Channel SBS Chemistry: MiniSeq, NextSeq, NovaSeq systems
One-Channel SBS Chemistry: iSeq system

1 Load the package

library("DNABarcodeCompatibility")

2 Define a helper function to save the raw dataset as a temporary text file

# This function is created for the purpose of the documentation 
export_dataset_to_file = 
    function(dataset = DNABarcodeCompatibility::IlluminaIndexesRaw) {
        if ("data.frame" %in% is(dataset)) {
            write.table(dataset,
                        textfile <- tempfile(),
                        row.names = FALSE, col.names = FALSE, quote=FALSE)
            return(textfile)
        } else print(paste("The input dataset isn't a data.frame:",
                            "NOT exported into file"))
    }

3 Design an experiment

The function experiment_design() uses a Shannon-entropy maximization approach to identify a set of compatible barcode combinations in which the frequencies of occurrences of the various DNA barcodes are as uniform as possible. The optimization can be performed in the contexts of single and dual barcoding. It performs either an exhaustive or a random search of compatible DNA-barcode combinations, depending on the size of the DNA-barcode set used, and on the number of samples to be multiplexed.

3.1 Examples for single indexing

12 libraries sequenced in multiplex of 3 on a HiSeq (4 channels) platform

txtfile <- export_dataset_to_file (
    dataset = DNABarcodeCompatibility::IlluminaIndexesRaw
)
experiment_design(file1=txtfile,
                    sample_number=12,
                    mplex_level=3,
                    platform=4)
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##    sample Lane    Id sequence
## 1       1    1 RPI18   GTCCGC
## 2       2    1 RPI27   ATTCCT
## 3       3    1 RPI32   CACTCA
## 4       4    2 RPI07   CAGATC
## 5       5    2 RPI17   GTAGAG
## 6       6    2 RPI39   CTATAC
## 7       7    3 RPI21   GTTTCG
## 8       8    3 RPI29   CAACTA
## 9       9    3 RPI33   CAGGCG
## 10     10    4 RPI24   GGTAGC
## 11     11    4 RPI31   CACGAT
## 12     12    4 RPI40   CTCAGA

12 libraries sequenced in multiplex of 3 on a NextSeq (2 channels) platform

txtfile <- export_dataset_to_file (
    dataset = DNABarcodeCompatibility::IlluminaIndexesRaw
)
experiment_design(file1=txtfile,
                    sample_number=12,
                    mplex_level=3,
                    platform=2)
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##    sample Lane    Id sequence
## 1       1    1 RPI05   ACAGTG
## 2       2    1 RPI08   ACTTGA
## 3       3    1 RPI27   ATTCCT
## 4       4    2 RPI03   TTAGGC
## 5       5    2 RPI22   CGTACG
## 6       6    2 RPI44   TATAAT
## 7       7    3 RPI13   AGTCAA
## 8       8    3 RPI26   ATGAGC
## 9       9    3 RPI36   CCAACA
## 10     10    4 RPI24   GGTAGC
## 11     11    4 RPI31   CACGAT
## 12     12    4 RPI48   TCGGCA

12 libraries sequenced in multiplex of 3 on a iSeq (1 channels) platform

txtfile <- export_dataset_to_file (
    dataset = DNABarcodeCompatibility::IlluminaIndexesRaw
)
experiment_design(file1=txtfile,
                    sample_number=12,
                    mplex_level=3,
                    platform=1)
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##    sample Lane    Id sequence
## 1       1    1 RPI01   ATCACG
## 2       2    1 RPI27   ATTCCT
## 3       3    1 RPI44   TATAAT
## 4       4    2 RPI04   TGACCA
## 5       5    2 RPI23   GAGTGG
## 6       6    2 RPI25   ACTGAT
## 7       7    3 RPI40   CTCAGA
## 8       8    3 RPI41   GACGAC
## 9       9    3 RPI42   TAATCG
## 10     10    4 RPI07   CAGATC
## 11     11    4 RPI08   ACTTGA
## 12     12    4 RPI11   GGCTAC

12 libraries sequenced in multiplex of 3 on a HiSeq platform using barcodes robust against 1 substitution error

txtfile <- export_dataset_to_file (
    dataset = DNABarcodeCompatibility::IlluminaIndexesRaw
)
experiment_design(file1=txtfile,
                sample_number=12,
                mplex_level=3,
                platform=4,
                metric = "hamming",
                d = 3)
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##    sample Lane    Id sequence
## 1       1    1 RPI05   ACAGTG
## 2       2    1 RPI20   GTGGCC
## 3       3    1 RPI26   ATGAGC
## 4       4    2 RPI16   CCGTCC
## 5       5    2 RPI17   GTAGAG
## 6       6    2 RPI18   GTCCGC
## 7       7    3 RPI24   GGTAGC
## 8       8    3 RPI39   CTATAC
## 9       9    3 RPI47   TCGAAG
## 10     10    4 RPI02   CGATGT
## 11     11    4 RPI10   TAGCTT
## 12     12    4 RPI36   CCAACA

3.2 Examples for dual indexing

12 libraries sequenced in multiplex of 3 on a HiSeq platform

# Select the first half of barcodes from the dataset
txtfile1 <- export_dataset_to_file (
    DNABarcodeCompatibility::IlluminaIndexesRaw[1:24,]
)

# Select the second half of barcodes from the dataset
txtfile2 <- export_dataset_to_file (
    DNABarcodeCompatibility::IlluminaIndexesRaw[25:48,]
)

# Get compatibles combinations of least redundant barcodes
experiment_design(file1=txtfile1,
                sample_number=12,
                mplex_level=3,
                platform=4,
                file2=txtfile2)
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##       Id Lane
## 1  RPI06    1
## 2  RPI14    1
## 3  RPI23    1
## 4  RPI04    2
## 5  RPI07    2
## 6  RPI21    2
## 7  RPI01    3
## 8  RPI10    3
## 9  RPI11    3
## 10 RPI05    4
## 11 RPI09    4
## 12 RPI15    4
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##       Id Lane
## 1  RPI26    1
## 2  RPI34    1
## 3  RPI42    1
## 4  RPI33    2
## 5  RPI40    2
## 6  RPI43    2
## 7  RPI37    3
## 8  RPI39    3
## 9  RPI45    3
## 10 RPI27    4
## 11 RPI38    4
## 12 RPI46    4
##       Id Lane sequence
## 1  RPI06    1   GCCAAT
## 2  RPI14    1   AGTTCC
## 3  RPI23    1   GAGTGG
## 4  RPI04    2   TGACCA
## 5  RPI07    2   CAGATC
## 6  RPI21    2   GTTTCG
## 7  RPI01    3   ATCACG
## 8  RPI10    3   TAGCTT
## 9  RPI11    3   GGCTAC
## 10 RPI05    4   ACAGTG
## 11 RPI09    4   GATCAG
## 12 RPI15    4   ATGTCA
##       Id Lane sequence
## 1  RPI26    1   ATGAGC
## 2  RPI34    1   CATGGC
## 3  RPI42    1   TAATCG
## 4  RPI33    2   CAGGCG
## 5  RPI40    2   CTCAGA
## 6  RPI43    2   TACAGC
## 7  RPI37    3   CGGAAT
## 8  RPI39    3   CTATAC
## 9  RPI45    3   TCATTC
## 10 RPI27    4   ATTCCT
## 11 RPI38    4   CTAGCT
## 12 RPI46    4   TCCCGA
##    sample Lane   Id1 sequence1   Id2 sequence2
## 1       1    1 RPI06    GCCAAT RPI26    ATGAGC
## 2       2    1 RPI14    AGTTCC RPI34    CATGGC
## 3       3    1 RPI23    GAGTGG RPI42    TAATCG
## 4       4    2 RPI04    TGACCA RPI33    CAGGCG
## 5       5    2 RPI07    CAGATC RPI40    CTCAGA
## 6       6    2 RPI21    GTTTCG RPI43    TACAGC
## 7       7    3 RPI01    ATCACG RPI37    CGGAAT
## 8       8    3 RPI10    TAGCTT RPI39    CTATAC
## 9       9    3 RPI11    GGCTAC RPI45    TCATTC
## 10     10    4 RPI05    ACAGTG RPI27    ATTCCT
## 11     11    4 RPI09    GATCAG RPI38    CTAGCT
## 12     12    4 RPI15    ATGTCA RPI46    TCCCGA

12 libraries sequenced in multiplex of 3 on a HiSeq platform using barcodes robust against 1 substitution error

# Select the first half of barcodes from the dataset
txtfile1 <- export_dataset_to_file (
    DNABarcodeCompatibility::IlluminaIndexesRaw[1:24,]
)

# Select the second half of barcodes from the dataset
txtfile2 <- export_dataset_to_file (
    DNABarcodeCompatibility::IlluminaIndexesRaw[25:48,]
)

# Get compatibles combinations of least redundant barcodes
experiment_design(file1=txtfile1, sample_number=12, mplex_level=3, platform=4,
                    file2=txtfile2, metric="hamming", d=3)
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##       Id Lane
## 1  RPI02    1
## 2  RPI06    1
## 3  RPI13    1
## 4  RPI03    2
## 5  RPI09    2
## 6  RPI12    2
## 7  RPI10    3
## 8  RPI11    3
## 9  RPI16    3
## 10 RPI04    4
## 11 RPI15    4
## 12 RPI23    4
## [1] "Theoretical max entropy: 2.48491"
## [1] "Entropy of the optimized set: 2.48491"
##       Id Lane
## 1  RPI27    1
## 2  RPI41    1
## 3  RPI43    1
## 4  RPI26    2
## 5  RPI30    2
## 6  RPI48    2
## 7  RPI34    3
## 8  RPI40    3
## 9  RPI44    3
## 10 RPI37    4
## 11 RPI42    4
## 12 RPI45    4
##       Id Lane sequence
## 1  RPI02    1   CGATGT
## 2  RPI06    1   GCCAAT
## 3  RPI13    1   AGTCAA
## 4  RPI03    2   TTAGGC
## 5  RPI09    2   GATCAG
## 6  RPI12    2   CTTGTA
## 7  RPI10    3   TAGCTT
## 8  RPI11    3   GGCTAC
## 9  RPI16    3   CCGTCC
## 10 RPI04    4   TGACCA
## 11 RPI15    4   ATGTCA
## 12 RPI23    4   GAGTGG
##       Id Lane sequence
## 1  RPI27    1   ATTCCT
## 2  RPI41    1   GACGAC
## 3  RPI43    1   TACAGC
## 4  RPI26    2   ATGAGC
## 5  RPI30    2   CACCGG
## 6  RPI48    2   TCGGCA
## 7  RPI34    3   CATGGC
## 8  RPI40    3   CTCAGA
## 9  RPI44    3   TATAAT
## 10 RPI37    4   CGGAAT
## 11 RPI42    4   TAATCG
## 12 RPI45    4   TCATTC
##    sample Lane   Id1 sequence1   Id2 sequence2
## 1       1    1 RPI02    CGATGT RPI27    ATTCCT
## 2       2    1 RPI06    GCCAAT RPI41    GACGAC
## 3       3    1 RPI13    AGTCAA RPI43    TACAGC
## 4       4    2 RPI03    TTAGGC RPI26    ATGAGC
## 5       5    2 RPI09    GATCAG RPI30    CACCGG
## 6       6    2 RPI12    CTTGTA RPI48    TCGGCA
## 7       7    3 RPI10    TAGCTT RPI34    CATGGC
## 8       8    3 RPI11    GGCTAC RPI40    CTCAGA
## 9       9    3 RPI16    CCGTCC RPI44    TATAAT
## 10     10    4 RPI04    TGACCA RPI37    CGGAAT
## 11     11    4 RPI15    ATGTCA RPI42    TAATCG
## 12     12    4 RPI23    GAGTGG RPI45    TCATTC

4 Build your own workflow

This section guides you through the detailed API of the package with the aim to help you build your own workflow. The package is designed to be flexible and should be easily adaptable to most experimental contexts, using the experiment_design() function as a template, or building your own workflow from scratch.

4.1 Load and check a dataset of barcodes

The file_loading_and_checking() function loads the file containing the DNA barcodes set and analyzes its content. In particular, it checks that each barcode in the set is unique and uniquely identified (removing any repetition that occurs). It also checks the homogeneity of size of the barcodes, calculates their GC content and detects the presence of homopolymers of length >= 3.

file_loading_and_checking(
    file = export_dataset_to_file(
        dataset = DNABarcodeCompatibility::IlluminaIndexesRaw
    )
)
##       Id sequence GC_content homopolymer
## 1  RPI01   ATCACG      50.00       FALSE
## 2  RPI02   CGATGT      50.00       FALSE
## 3  RPI03   TTAGGC      50.00       FALSE
## 4  RPI04   TGACCA      50.00       FALSE
## 5  RPI05   ACAGTG      50.00       FALSE
## 6  RPI06   GCCAAT      50.00       FALSE
## 7  RPI07   CAGATC      50.00       FALSE
## 8  RPI08   ACTTGA      33.33       FALSE
## 9  RPI09   GATCAG      50.00       FALSE
## 10 RPI10   TAGCTT      33.33       FALSE
## 11 RPI11   GGCTAC      66.67       FALSE
## 12 RPI12   CTTGTA      33.33       FALSE
## 13 RPI13   AGTCAA      33.33       FALSE
## 14 RPI14   AGTTCC      50.00       FALSE
## 15 RPI15   ATGTCA      33.33       FALSE
## 16 RPI16   CCGTCC      83.33       FALSE
## 17 RPI17   GTAGAG      50.00       FALSE
## 18 RPI18   GTCCGC      83.33       FALSE
## 19 RPI19   GTGAAA      33.33        TRUE
## 20 RPI20   GTGGCC      83.33       FALSE
## 21 RPI21   GTTTCG      50.00        TRUE
## 22 RPI22   CGTACG      66.67       FALSE
## 23 RPI23   GAGTGG      66.67       FALSE
## 24 RPI24   GGTAGC      66.67       FALSE
## 25 RPI25   ACTGAT      33.33       FALSE
## 26 RPI26   ATGAGC      50.00       FALSE
## 27 RPI27   ATTCCT      33.33       FALSE
## 28 RPI28   CAAAAG      33.33        TRUE
## 29 RPI29   CAACTA      33.33       FALSE
## 30 RPI30   CACCGG      83.33       FALSE
## 31 RPI31   CACGAT      50.00       FALSE
## 32 RPI32   CACTCA      50.00       FALSE
## 33 RPI33   CAGGCG      83.33       FALSE
## 34 RPI34   CATGGC      66.67       FALSE
## 35 RPI35   CATTTT      16.67        TRUE
## 36 RPI36   CCAACA      50.00       FALSE
## 37 RPI37   CGGAAT      50.00       FALSE
## 38 RPI38   CTAGCT      50.00       FALSE
## 39 RPI39   CTATAC      33.33       FALSE
## 40 RPI40   CTCAGA      50.00       FALSE
## 41 RPI41   GACGAC      66.67       FALSE
## 42 RPI42   TAATCG      33.33       FALSE
## 43 RPI43   TACAGC      50.00       FALSE
## 44 RPI44   TATAAT       0.00       FALSE
## 45 RPI45   TCATTC      33.33       FALSE
## 46 RPI46   TCCCGA      66.67        TRUE
## 47 RPI47   TCGAAG      50.00       FALSE
## 48 RPI48   TCGGCA      66.67       FALSE

4.2 Examples of an exhaustive search of compatible barcode combinations

The total number of combinations depends on the number of available barcodes and of the multiplex level. For 48 barcodes and a multiplex level of 3, the total number of combinations (compatible or not) can be calculated using choose(48,3), which gives 17296 combinations. In many cases the total number of combinations can become much larger (even gigantic), and one cannot perform an exhaustive search (see get_random_combinations() below).

48 barcodes, multiplex level of 2, HiSeq platform

# Total number of combinations
choose(48,2)
## [1] 1128

# Load barcodes
barcodes <- DNABarcodeCompatibility::IlluminaIndexes

# Time for an exhaustive search
system.time(m <- get_all_combinations(index_df = barcodes,
                                    mplex_level = 2,
                                    platform = 4))
##    user  system elapsed 
##   0.407   0.006   0.416

# Each line represents a compatible combination of barcodes
head(m)
##      [,1]    [,2]   
## [1,] "RPI04" "RPI35"
## [2,] "RPI05" "RPI19"
## [3,] "RPI06" "RPI12"
## [4,] "RPI07" "RPI17"
## [5,] "RPI10" "RPI39"
## [6,] "RPI18" "RPI25"

48 barcodes, multiplex level of 3, HiSeq platform

# Total number of combinations
choose(48,3)
## [1] 17296

# Load barcodes
barcodes <- DNABarcodeCompatibility::IlluminaIndexes

# Time for an exhaustive search
system.time(m <- get_all_combinations(index_df = barcodes,
                                    mplex_level = 3,
                                    platform = 4))
##    user  system elapsed 
##   6.514   0.053   6.649

# Each line represents a compatible combination of barcodes
head(m)
##      [,1]    [,2]    [,3]   
## [1,] "RPI01" "RPI02" "RPI48"
## [2,] "RPI01" "RPI03" "RPI07"
## [3,] "RPI01" "RPI03" "RPI08"
## [4,] "RPI01" "RPI03" "RPI09"
## [5,] "RPI01" "RPI03" "RPI10"
## [6,] "RPI01" "RPI03" "RPI16"

4.3 Examples of a random search of compatible barcode combinations

When the total number of combinations is too high, it is recommended to pick combinations at random and then select those that are compatible.

48 barcodes, multiplex level of 3, HiSeq platform

# Total number of combinations
choose(48,3)
## [1] 17296

# Load barcodes
barcodes <- DNABarcodeCompatibility::IlluminaIndexes

# Time for a random search
system.time(m <- get_random_combinations(index_df = barcodes,
                                        mplex_level = 2,
                                        platform = 4))
##    user  system elapsed 
##   0.202   0.001   0.206

# Each line represents a compatible combination of barcodes
head(m)
##      [,1]    [,2]   
## [1,] "RPI04" "RPI35"
## [2,] "RPI06" "RPI12"
## [3,] "RPI10" "RPI39"
## [4,] "RPI18" "RPI25"
## [5,] "RPI26" "RPI42"
## [6,] "RPI27" "RPI45"

48 barcodes, multiplex level of 4, HiSeq platform

# Total number of combinations
choose(48,4)
## [1] 194580

# Load barcodes
barcodes <- DNABarcodeCompatibility::IlluminaIndexes

# Time for a random search
system.time(m <- get_random_combinations(index_df = barcodes,
                                        mplex_level = 4,
                                        platform = 4))
##    user  system elapsed 
##   0.572   0.005   0.585

# Each line represents a compatible combination of barcodes
head(m)
##      [,1]    [,2]    [,3]    [,4]   
## [1,] "RPI01" "RPI14" "RPI22" "RPI23"
## [2,] "RPI01" "RPI06" "RPI12" "RPI17"
## [3,] "RPI01" "RPI03" "RPI04" "RPI33"
## [4,] "RPI01" "RPI35" "RPI39" "RPI42"
## [5,] "RPI01" "RPI14" "RPI24" "RPI41"
## [6,] "RPI01" "RPI03" "RPI04" "RPI10"

48 barcodes, multiplex level of 6, HiSeq platform

# Total number of combinations
choose(48,6)
## [1] 12271512

# Load barcodes
barcodes <- DNABarcodeCompatibility::IlluminaIndexes

# Time for a random search
system.time(m <- get_random_combinations(index_df = barcodes,
                                        mplex_level = 6,
                                        platform = 4))
##    user  system elapsed 
##   0.827   0.009   0.844

# Each line represents a compatible combination of barcodes
head(m)
##      [,1]    [,2]    [,3]    [,4]    [,5]    [,6]   
## [1,] "RPI01" "RPI06" "RPI26" "RPI29" "RPI38" "RPI47"
## [2,] "RPI01" "RPI03" "RPI17" "RPI18" "RPI27" "RPI35"
## [3,] "RPI01" "RPI03" "RPI08" "RPI15" "RPI22" "RPI31"
## [4,] "RPI01" "RPI05" "RPI09" "RPI12" "RPI45" "RPI48"
## [5,] "RPI01" "RPI06" "RPI14" "RPI23" "RPI30" "RPI45"
## [6,] "RPI01" "RPI16" "RPI19" "RPI21" "RPI35" "RPI45"

4.4 Constrain barcodes to be robust against one substitution error

# Load barcodes
barcodes <- DNABarcodeCompatibility::IlluminaIndexes

# Perform a random search of compatible combinations
m <- get_random_combinations(index_df = barcodes,
                            mplex_level = 3,
                            platform = 4)

# Keep barcodes that are robust against one substitution error
filtered_m <- distance_filter(index_df = barcodes,
                            combinations_m = m,
                            metric = "hamming",
                            d = 3)

# Each line represents a compatible combination of barcodes
head(filtered_m)
##      V1      V2      V3     
## [1,] "RPI01" "RPI05" "RPI48"
## [2,] "RPI01" "RPI24" "RPI45"
## [3,] "RPI01" "RPI26" "RPI45"
## [4,] "RPI01" "RPI23" "RPI41"
## [5,] "RPI01" "RPI06" "RPI08"
## [6,] "RPI01" "RPI26" "RPI48"

4.5 Optimize the set of compatible combinations to reduce barcode redundancy

# Keep set of compatible barcodes that are robust against one substitution
# error
filtered_m <- distance_filter(
    index_df = DNABarcodeCompatibility::IlluminaIndexes,
    combinations_m = get_random_combinations(index_df = barcodes,
                                            mplex_level = 3,
                                            platform = 4),
    metric = "hamming", d = 3)

# Use a Shannon-entropy maximization approach to reduce barcode redundancy
df <- optimize_combinations(combination_m = filtered_m,
                            nb_lane = 12,
                            index_number = 48)
## [1] "Theoretical max entropy: 3.58352"
## [1] "Entropy of the optimized set: 3.58352"

# Each line represents a compatible combination of barcodes and each row a lane
# of the flow cell
df
##       V1      V2      V3     
##  [1,] "RPI06" "RPI12" "RPI21"
##  [2,] "RPI05" "RPI10" "RPI39"
##  [3,] "RPI03" "RPI13" "RPI47"
##  [4,] "RPI18" "RPI27" "RPI32"
##  [5,] "RPI20" "RPI30" "RPI41"
##  [6,] "RPI24" "RPI28" "RPI45"
##  [7,] "RPI08" "RPI40" "RPI44"
##  [8,] "RPI01" "RPI02" "RPI48"
##  [9,] "RPI19" "RPI35" "RPI42"
## [10,] "RPI04" "RPI33" "RPI46"
## [11,] "RPI11" "RPI22" "RPI29"
## [12,] "RPI07" "RPI14" "RPI17"

4.6 The optimized result isn’t an optimum when filtering out too many barcodes

Increased distance between barcode sequences: redundancy may become inevitable

# Keep set of compatible barcodes that are robust against multiple substitution
# and insertion/deletion errors
filtered_m <- distance_filter(
    index_df = DNABarcodeCompatibility::IlluminaIndexes,
    combinations_m = get_random_combinations(index_df = barcodes,
                                            mplex_level = 3,
                                            platform = 4),
    metric = "seqlev", d = 4)

# Use a Shannon-entropy maximization approach to reduce barcode redundancy
df <- optimize_combinations(combination_m = filtered_m,
                            nb_lane = 12,
                            index_number = 48)
## [1] "Theoretical max entropy: 3.58352"
## [1] "Entropy of the optimized set: 2.71161"

# Each line represents a compatible combination of barcodes and each row a
# lane of the flow cell
df
##       V1      V2      V3     
##  [1,] "RPI15" "RPI28" "RPI46"
##  [2,] "RPI17" "RPI27" "RPI29"
##  [3,] "RPI20" "RPI30" "RPI35"
##  [4,] "RPI12" "RPI20" "RPI30"
##  [5,] "RPI03" "RPI37" "RPI41"
##  [6,] "RPI12" "RPI20" "RPI30"
##  [7,] "RPI17" "RPI27" "RPI29"
##  [8,] "RPI04" "RPI28" "RPI35"
##  [9,] "RPI15" "RPI28" "RPI46"
## [10,] "RPI03" "RPI37" "RPI41"
## [11,] "RPI37" "RPI41" "RPI45"
## [12,] "RPI23" "RPI27" "RPI29"

Introduction to DNABarcodeCompatibility

2018-11-16

Contents