Skip to contents

The "Per sequence GC Content" needs to create a theorecial distribution of GC content. using the current GC content data parameters to create it.

This function calculates the weighted mean and standard deviation from a data frame containing values ('gc_content') and their frequencies ('count'). It then generates a new data frame with a normally distributed sample based on these statistics.

Usage

process_gc_data(data)

Arguments

data

A data frame with two numeric columns: 'gc_content' and 'count'.

Value

A list containing three elements: - mean: The calculated weighted mean. - sd: The calculated weighted standard deviation. - normal_distribution_df: A new data frame with two columns: 'gc_content' (integers from 1 to 100) and 'count' (the frequency of each value based on the generated normal distribution (mode, sd)).

Examples

# Create a sample data frame
gc_table <- data.frame(
  gc_content = c(40, 50, 60),
  count = c(200, 500, 300)
)

# Run the function
results <- process_gc_data(gc_table)

# View the results
print(paste("Calculated Mean:", results$mean))
#> [1] "Calculated Mean: 51"
print(paste("Calculated SD:", results$sd))
#> [1] "Calculated SD: 7.00350262718942"

# Check the new data frame
print("Head of the new normally distributed data frame:")
#> [1] "Head of the new normally distributed data frame:"
print(head(results$normal_distribution_df))
#>   gc_content count
#> 1          1     0
#> 2          2     0
#> 3          3     0
#> 4          4     0
#> 5          5     0
#> 6          6     0

print(paste("Total sum of counts in new df:", sum(results$normal_distribution_df$count)))
#> [1] "Total sum of counts in new df: 1000"