Title: | Processes Binary Data Obtained from Fragment Analysis (Such as AFLPs, ISSRs, and RFLPs) |
---|---|
Description: | A molecular genetics tool that processes binary data from fragment analysis. It consolidates replicate sample pairs, outputs summary statistics, and produces hierarchical clustering trees and nMDS plots. This package was developed from the publication available here: <https://www.sciencedirect.com/science/article/pii/S1049964420306538>. The GUI version of this package is available on the R Shiny online server at: <https://clarkevansteenderen.shinyapps.io/BINMAT/> or it is accessible via GitHub by typing: shiny::runGitHub("BinMat", "clarkevansteenderen") into the console in R. Two real-world datasets accompany the package: an AFLP dataset of Bunias orientalis samples from Tewes et. al. (2017) <https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2745.12869>, and an ISSR dataset of Nymphaea specimens from Reid et. al. (2021) <https://www.sciencedirect.com/science/article/pii/S0304377021000218> . The authors of these publications are thanked for allowing the use of their data. |
Authors: | Clarke van Steenderen [aut, cre]
|
Maintainer: | Clarke van Steenderen <[email protected]> |
License: | GPL-3 |
Version: | 0.1.5 |
Built: | 2025-02-26 04:01:18 UTC |
Source: | https://github.com/cran/BinMat |
Example input data containing a consolidated binary matrix with groups
data(BinMatInput_ordination)
data(BinMatInput_ordination)
A dataframe with columns for loci, and rows of replicate pairs. Grouping information is in the second column.
data(BinMatInput_ordination) mat = BinMatInput_ordination group.names(mat) scree(mat) shepard(mat) clrs = c("red", "green", "black") nmds(mat, colours = clrs, labs = TRUE)
data(BinMatInput_ordination) mat = BinMatInput_ordination group.names(mat) scree(mat) shepard(mat) clrs = c("red", "green", "black") nmds(mat, colours = clrs, labs = TRUE)
Example input data containing a binary matrix comprising replicate pairs
data(BinMatInput_reps)
data(BinMatInput_reps)
A dataframe with columns for loci, and rows of replicate pairs.
data(BinMatInput_reps) mat = BinMatInput_reps check.data(mat) cons = consolidate(mat) pks = peaks.consolidated(cons) err = errors(cons) rem = peak.remove(cons, 4) clust = upgma(cons)
data(BinMatInput_reps) mat = BinMatInput_reps check.data(mat) cons = consolidate(mat) pks = peaks.consolidated(cons) err = errors(cons) rem = peak.remove(cons, 4) clust = upgma(cons)
Example input file of Bunias orientalis AFLP data, taken from Tewes et. al. (2017). This dataset has already been consolidated, and can be used as input for the generation of an nMDS plot. The paper can be found here: <https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2745.12869>
data(bunias_orientalis)
data(bunias_orientalis)
A dataframe with columns for loci, and rows of replicate pairs. Grouping information is in the second column.
Checks for unwanted values (other than 1, 0, and ?).
check.data(x)
check.data(x)
x |
A CSV file containing replicate pairs of binary data. |
Index positions where unwanted values occur (row, column).
data(BinMatInput_reps) mat = BinMatInput_reps check.data(mat)
data(BinMatInput_reps) mat = BinMatInput_reps check.data(mat)
Reads in a binary matrix comprising replicate pairs and consolidates each pair into a consensus read. For each replicate pair at each locus, 1 & 1 -> 1 (shared presence), 0 & 0 -> 0 (shared absence), 0 & 1 -> ? (ambiguity).
consolidate(x)
consolidate(x)
x |
A CSV file containing replicate pairs of binary data. See the example input file "BinMatInput_reps". |
Consolidated binary matrix.
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat)
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat)
Calculates the Jaccard and Euclidean error rates for the dataset. Jaccard's error does not take shared absences of bands as being biologically meaningful. JE = (f10 + f01)/(f10 + f01 + f11) and EE = (f10 + f01)/(f10 + f01 + f11 + f00). At each locus, f01 and f10 indicates a case where a 0 was present in one replicate, and a 1 in the other. f11 indicates the shared presence of a band in both replicates, and f00 indicates a shared absence. For example, if a replicate pair comprises Rep1 = 00101 and Rep2 = 01100, JE = (1+1)/(1+1+1) = 2/3 = 0.67, EE = (1+1)/(1+1+1+2) = 2/5 = 0.4.
errors(x)
errors(x)
x |
Consolidated binary matrix. |
JE (Jaccard Error), EE (Euclidean Error), and standard deviations.
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat) errors(cons)
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat) errors(cons)
Returns group names in the uploaded consolidated binary data. This will help in knowing which colours are assigned to which group name.
group.names(x)
group.names(x)
x |
Consolidated binary matrix with grouping information in column 2. |
Scree plot.
mat = BinMatInput_ordination group.names(mat)
mat = BinMatInput_ordination group.names(mat)
Creates an nMDS plot from a consolidated binary matrix with grouping information. Colours and shapes of plotted points need to be specified. For example, if there are two groups, then: clrs = c("red", "blue"), sh = c(16, 16). This assigns red to the first group name, and blue to the second. Both will have a pch shape of 16 (round dot). These two vectors are then passed to the function nmds() as: colours = clrs, shapes = sh.
nmds( x, dist_meth = "binary", k_val = 2, pt_size = 1, colours = c("dodgerblue", "black", "red", "green3", "orange", "darkblue", "gold2", "darkgreen", "darkred", "grey", "darkgrey", "magenta", "darkorchid", "purple", "brown", "coral3", "turquoise", "deeppink", "lawngreen", "deepskyblue", "tomato", "yellow", "yellowgreen", "royalblue", "olivedrab", "midnightblue", "indianred1", "darkturquoise"), labs = FALSE, legend_pos = "right", include_ellipse = FALSE, ellipse_type = "norm", dimension1 = 1, dimension2 = 2 )
nmds( x, dist_meth = "binary", k_val = 2, pt_size = 1, colours = c("dodgerblue", "black", "red", "green3", "orange", "darkblue", "gold2", "darkgreen", "darkred", "grey", "darkgrey", "magenta", "darkorchid", "purple", "brown", "coral3", "turquoise", "deeppink", "lawngreen", "deepskyblue", "tomato", "yellow", "yellowgreen", "royalblue", "olivedrab", "midnightblue", "indianred1", "darkturquoise"), labs = FALSE, legend_pos = "right", include_ellipse = FALSE, ellipse_type = "norm", dimension1 = 1, dimension2 = 2 )
x |
Consolidated binary matrix with grouping information in the second column. |
dist_meth |
Distance method. Set to "binary" by default. Other options are "euclidean", "maximum", "manhattan", "canberra", or "minkowski". |
k_val |
Number of dimensions for the nMDS plot. Set to 2 by default. |
pt_size |
Point size for symbols on the plot. Set to 1 by default. |
colours |
Vector containing colours to be assigned to groups. This can be changed to the options available in the RColorBrewer palette set (e.g. "Set1"). See <http://applied-r.com/rcolorbrewer-palettes/> for more palette options. Alternatively, the colours can be set manually using, for example, c("red", "green", "blue"), thereby setting a colour for each group in your dataset. There are 28 default colours that will be set automatically to your groups. |
labs |
Indicate whether labels should appear on the graph or not (TRUE or FALSE). Default = FALSE. |
legend_pos |
Indicate the position of the legend. Default = "right", but other options are "left", "bottom", "top", or "none" |
include_ellipse |
Indicate whether ellipses should be included around groups. Default = FALSE. |
ellipse_type |
Select the type of ellipses to include around groups. Options are "convex", "confidence", "t", "norm", and "euclid". See the ggpubr::ggscatter() function documentation for more details. |
dimension1 |
Indicate the first dimension to plot (1, 2, or 3) for the x axis. If k = 2, the first two dimensions will automatically be plotted. If k = 3, select between the three. |
dimension2 |
Indicate the second dimension to plot (1, 2, or 3) for the y axis |
nMDS plot.
mat = BinMatInput_ordination group.names(mat) clrs = c("red", "green", "black") nmds(mat, colours = clrs, labs = TRUE, include_ellipse = TRUE)
mat = BinMatInput_ordination group.names(mat) clrs = c("red", "green", "black") nmds(mat, colours = clrs, labs = TRUE, include_ellipse = TRUE)
Example input file of Nymphaea ISSR data, taken from Reid et. al. (2021). This dataset has already been consolidated, and can be used as input for the generation of an nMDS plot. The paper can be found here: <https://www.sciencedirect.com/science/article/pii/S0304377021000218>
data(nymphaea)
data(nymphaea)
A dataframe with columns for loci, and rows of replicate pairs. Grouping information is in the second column
Removes samples with a peak number less than a specified value.
peak.remove(x, thresh)
peak.remove(x, thresh)
x |
Binary matrix - consolidated or original. |
thresh |
Peak threshold value for removal. |
Filtered dataset, and either the row name/s or row number/s of samples that were removed.
mat = BinMatInput_ordination new = peak.remove(mat, 4)
mat = BinMatInput_ordination new = peak.remove(mat, 4)
Returns total, maximum, and minimum number of peaks in the binary matrix.
peaks.consolidated(x)
peaks.consolidated(x)
x |
Binary matrix comprising replicate pairs. |
Peak information.
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat) peaks.consolidated(cons)
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat) peaks.consolidated(cons)
Returns total, maximum, and minimum number of peaks in the binary matrix.
peaks.original(x)
peaks.original(x)
x |
Binary matrix comprising replicate pairs. |
Peak information.
data(BinMatInput_reps) mat = BinMatInput_reps peaks.original(mat)
data(BinMatInput_reps) mat = BinMatInput_reps peaks.original(mat)
Creates a scree plot for the nMDS. This indicates the optimum number of dimensions to use to minimise the stress value. The stress value is indicated by a red dotted line at 0.15. Values equal to or below this are considered acceptable.
scree(x, dimensions = 4, dist_meth = "binary")
scree(x, dimensions = 4, dist_meth = "binary")
x |
Consolidated binary matrix with grouping information in column 2. |
dimensions |
Number of dimensions to plot. Set to 4 by default. |
dist_meth |
Distance method. Set to "binary" by default. Other options are "euclidean", "maximum", "manhattan", "canberra", or "minkowski". |
Scree plot.
mat = BinMatInput_ordination scree(mat)
mat = BinMatInput_ordination scree(mat)
Creates a Shepard plot for the nMDS. This indicates the 'goodness of fit' of the original distance matrix vs the ordination representation. A high R-squared value is favourable.
shepard(x, k_val = 2, dist_meth = "binary")
shepard(x, k_val = 2, dist_meth = "binary")
x |
Consolidated binary matrix. |
k_val |
Number of dimensions. Set to 2 by default. |
dist_meth |
Distance method. Set to "binary" by default. Other options are "euclidean", "maximum", "manhattan", "canberra", or "minkowski". |
Shepard plot.
mat = BinMatInput_ordination shepard(mat)
mat = BinMatInput_ordination shepard(mat)
Creates a UPGMA hierarchical clustering tree, with a specified number of bootstrap repetitions.
upgma( x, bts = 10, size = 0.55, lab_size = 0.55, method = "binary", hclust = "average", fromFile = FALSE )
upgma( x, bts = 10, size = 0.55, lab_size = 0.55, method = "binary", hclust = "average", fromFile = FALSE )
x |
Consolidated binarx matrix. |
bts |
Bootstrap replications. Set to 10 by default. |
size |
Size of plot. Set to 0.55 by default. |
lab_size |
Size of label text. Set to 0.55 by default. |
method |
Distance method. Set to 'binary' (=Jaccard distance) by default. |
hclust |
Clustering method. Set to 'average' (=UPGMA) by default |
fromFile |
Indicates whether the binary data used by the function has been consolidated by BinMat, or whether it comes from the user's own file. Set to FALSE by default (in the assumption that the data has been consolidated by BinMat, and that that object is being passed to the function). |
UPGMA tree
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat) clust = upgma(cons)
data(BinMatInput_reps) mat = BinMatInput_reps cons = consolidate(mat) clust = upgma(cons)