The "Per sequence GC Content" needs to create a theorecial distribution of GC content. using the current GC content data parameters to create it.
This function calculates the weighted mean and standard deviation from a data frame containing values ('gc_content') and their frequencies ('count'). It then generates a new data frame with a normally distributed sample based on these statistics.
Value
A list containing three elements:
- mean
: The calculated weighted mean.
- sd
: The calculated weighted standard deviation.
- normal_distribution_df
: A new data frame with two columns: 'gc_content'
(integers from 1 to 100) and 'count' (the frequency of each value
based on the generated normal distribution (mode, sd)).
Examples
# Create a sample data frame
gc_table <- data.frame(
gc_content = c(40, 50, 60),
count = c(200, 500, 300)
)
# Run the function
results <- process_gc_data(gc_table)
# View the results
print(paste("Calculated Mean:", results$mean))
#> [1] "Calculated Mean: 51"
print(paste("Calculated SD:", results$sd))
#> [1] "Calculated SD: 7.00350262718942"
# Check the new data frame
print("Head of the new normally distributed data frame:")
#> [1] "Head of the new normally distributed data frame:"
print(head(results$normal_distribution_df))
#> gc_content count
#> 1 1 0
#> 2 2 0
#> 3 3 0
#> 4 4 0
#> 5 5 0
#> 6 6 0
print(paste("Total sum of counts in new df:", sum(results$normal_distribution_df$count)))
#> [1] "Total sum of counts in new df: 1000"