James Backstrom, Author

View Original

Improve your Writing with Statistical Sampling

What in hell does statistical sampling have to do with writing? What is statistical sampling for that matter? Statistical sampling is a method for making choices for the purposes of an experiment or testing a hypothesis. On the surface, this doesn’t have much to do with writing. After all, writing is not an experiment. And testing a hypothesis in writing can cost a lot of time if it turns out to be wrong (up to one year for a finished but unsellable book). That’s part of what this post is about, how to avoid wasting your time.

They way we’ll approach this is to show how statistical sampling can give you an idea of the market and the tropes that are in fashion. But first we have to explain how to do it, the assumptions we’ll be working under, and a realistic method.

First, statistical sampling is about selecting a subset of a given population (randomly) in order to test hypotheses about the population. If that sounds a bit technical, don’t worry. Basically, if you have some number of people, you can take a random smaller group of them (the subset) and safely assume that they are a viable representation of the whole people. This is because most traits among humankind (or the physical world for that matter) are normally distributed. That means that 2/3 of the population lies within one standard deviation of the average for any given trait. Standard deviation is a measure of how variable (spaced out) the trait is.

For example, the average height of women worldwide is 5 ft. 4.5 in. with a standard deviation of 2.5 inches. That means that 2/3 of women worldwide are between 5 ft. 2 in. and 5 ft. 7 in. The other 1/3 is split between the higher and lower heights.

We’re not going to get into the math of calculating standard deviations here because it truly does have little to do with writing. However, understanding the principle is important for a number of reasons, not least of which is understanding how to judge groups of things or people.

In order for standard deviations to work, you have to make sure that your sample is truly representative of the whole. For example, if you only sampled rich women and measured their heights, you would likely find them slightly taller on average with a smaller standard deviation. That is because the sample is skewed, in this case by income.

To combat this, we use random sampling. To properly random sample, you must have access to the entire population in question and the ability to pull anyone from it at random. In most cases this is not possible, but with enough demographic data you can control for variables like income and use a smaller subset of the population to sample from. It’s not perfect, but it’s what we’ve got to work with.

How it applies - This applies to writing skill because you can safely assume that writing skill is normally distributed (since it is a human trait). That means that you can sample a given population of books and find a fairly accurate representation of the state of the business. The caveat is you must choose the population very carefully and randomly select from within it. The best way to do this is to specialize your population. If you’re a fantasy writer like me, then you don’t need to sample every book ever written to understand your market. You just need to sample fantasy books. You could even go a step further and work with a sub-genre, such as dark fantasy.

Say that’s what we’re doing: looking at dark fantasy. First, you’ll want to determine the purpose of your search. Are you looking to understand the current market, the historical market, or something in between. Most books are published two years after they’re written, so even if you sampled only books published in the last year you are still two - three years behind the current trends.

For our purposes (assessing the current market) you’ll want to look at books that are recently published. I’d recommend five years since that is what most agents recommend as a time-frame to find comparative titles for your work. Now that we’ve narrowed down the population to dark fantasy published in the last five years, we have to assemble the population. You can get this information from Publisher’s Marketplace or by asking to see a catalogue of titles at your local bookstore. Not what they’re carrying, the catalogue of titles they have access to from their distributor.

Once you have that, compile the list. Include authors and book titles, along with publish date. Microsoft Excel or another spreadsheet program is ideal for this. This is not a perfect process. It will ignore most self-published works since they do not appear on distributor catalogs. It can alos ignore certain smaller publishing presses. I don’t know of a way to accurately close this gap. If anyone does, drop me a line.

Once you have your list, randomly select 30 titles from it. I won’t get into the math. Just understand that for any given population (if it is truly all-inclusive) 30 examples will be sufficient to gauge the population.

Congratulations! You now have a 30 book reading list specific to your market. Try to complete the list within a year, noting the tropes used (tvtropes can help with this). You should get a spread of data that clues you in to what is used (and not used).

If you’re truly industrious, take detailed data about anything you can think of (such as character ethnicity, age, tropes, moral compass, etc.). Then, using Excel or some other program, figure out the average distribution of tropes and the standard deviations of those tropes. This part is optional.

If you didn’t do the math above, don’t worry. You’ve still built a thorough understanding of what the market is like right now. You’ve also likely read a good deal of novels that you never would have otherwise, have a better understanding of the range of talent of published authors, and should be full of ideas for your own work.

Congratulations! Now, get back to writing.