Storing Bible Passage References as Numbers

If you want to build a Bible-related app, you probably need to store references to Bible passages like these in a database:

  • John 3:16
  • Eze 21
  • SNG
  • 2TI 1:2,3
  • Romans 5:8-6:4

Storing these references as strings would be both inconsistent and inefficient. We can save space and keep our data uniform if we store them using only numbers.

I'm going to share the solution I used to build My Bible Log, a free app that allows people to track their Bible reading. I'll explain the problems I encountered and how I solved them to create the Verse ID format used behind the scenes to store passage references.

Passages as Text

To allow users to keep track of exactly which verses they read, I had to find a way to represent passages like these in a database:

  • Genesis 1
  • John 3:16
  • 1 Peter
  • 2 Peter 2-3
  • Titus 2:11-3:11

Since the format is consistent, I could have just stored the text you see above. However, I wanted my solution to be as efficient as possible so it could scale. In programming, working with numbers is faster than working with text.

I also wanted to provide several features that would be much easier to implement if I could find a way to represent each passage as a single number. For example, I wanted to be able to run a report on which passages have been read already, and how many more verses the user had left to read.

The Obvious Solution

There is one solution that might seem obvious. Why don't we start at the first verse of the Bible, call it verse 1, and then count our way to the last verse of the Bible?

Well, we could do that, but different translations of the Bible actually include or exclude just a few verses (based on the prevalence of those verses in the original manuscripts), leaving different translations with different numbers of verses:

Translation Verse Count
KJV 31,102
NASB 31,102
NIV 31,173
ESV 31,102

If we want our numbering system to be compatible with multiple Bible translations, we need to find another way to represent each verse as a unique number.

Also remember that some translations include verses in brackets with a footnote about why - often because it did not appear in the earliest available manuscripts. Since there is some ambiguity about which verses are included, even within a single translation, we can't use a rigid system of consecutive numbers to represent each individual verse.

Passages as Numbers

My first step to storing passages as numbers was breaking each passage down into a set of variables that I could use to build a unique Verse ID number for a Bible verse.

These are the possible variables in any passage reference:

  • Book
  • Start Chapter
  • Start Verse
  • End Chapter
  • End Verse

Not all passages have all of the parts above. For example, "Matthew" is a valid passage, referring to the whole book of Matthew. Similarly, "Mark 1" is valid, referring to the first chapter of Mark. Again, "Luke 4-7" is valid, referring to all verses within those three chapters of Luke.

You'll notice that the "start chapter" and "start verse" are implied to be 1 if they are not provided. If we don't state that we are starting in the middle of a book or chapter, we understand that we are starting at the beginning.

Similarly, if the "end chapter" is omitted, it is implied to be the final chapter of the book, and if the "end verse" is omitted, it is implied to be the final verse of the chapter. Without a specific stopping point, we read each book or chapter in its entirety.

So, even if we don't see a specific chapter or verse number in a passage, we can still determine those numbers and use them to build our Verse ID.

Books as Numbers

You may have noticed that chapters and verses are already numbers, but books are known by their titles, which means we are still working with strings. We need a way to represent books like "Judges" and "Proverbs" as numbers.

This is simple enough - we just need to assign each book an arbitrary number. I started by assigning 1 to Genesis, and going through the standard order of Bible books until I reached book 66, Revelation. I chose not to start at zero because I wanted these numbers to be at least somewhat decipherable at a glance, so I needed book 1 to refer to the first book of the Bible instead of the second.

Verses as Numbers

If we have a verse like "Genesis 1:1", we can represent it with these variables:

Variable Value
book 1
chapter 1
verse 1

Now, how could I represent this verse as a single number?

If I concatenated them, I would get 111. The next verse would be 112, and so on. However, it would be very unclear how to deconstruct this number and turn it back into three separate numbers to get the book, chapter, and verse.

What happens when we get to numbers like 5217? Which option below would be correct?

Variable Book Chapter Verse
Option 1 5 21 7
Option 2 5 2 17
Option 3 52 1 7

Consistent Deconstruction

In order to reliably deconstruct a Verse ID number, we need to give each variable in the verse its own "space" in the number - a specific number of digits that always represent that variable.

As an example, if we want to represent two numbers, each with a range of 1-99, we need to give each number two digits (even if it sometimes only uses one).

We can represent two numbers with a max length of two digits each in a single four digit number like this:

First Variable Second Variable Combined
1 1 0101
1 99 0199
55 7 5507

Now we can be sure that 0199 means 01 and 99, rather than 19 and 9. Giving each number its own digits removes any ambiguity about what each digit represents.

We can easily construct and deconstruct these combination numbers in code:

/**
 * Represents where the ones digit of the first number is,
 * in this case in the hundreds position of the combo number.
 * 
 * _ _ _ _
 *   ^
 */
const FIRST_OFFSET = 100;

const makeComboNumber = (first, second) => {
  return first * FIRST_OFFSET + second;
};

const parseComboNumber = (comboNumber) => {
  const first = Math.floor(comboNumber / FIRST_OFFSET);
  const second = comboNumber - first;
  return { first, second }; 
};

const comboNumber = makeComboNumber(1, 2); // 0102
const { first, second } = parseComboNumber(comboNumber); // { first: 1, second: 2 }

Just Enough Space

It's important that each variable (book, chapter, and verse) gets enough space in the Verse ID. I had to determine how many digits each one could possibly need so I could be sure the variables wouldn't risk overlapping each other.

Books were easy - there are only 66 books in the Bible. The book variable gets 2 digits.

Finding the highest number of chapters in a book or verses in a chapter took a bit more work. It turns out Psalms is the longest book in the Bible with 150 chapters. Also, Psalm 119 is the longest chapter, boasting 176 verses. This means both chapter and verse need 3 digits.

I could now represent any individual verse in the Bible with just eight digits. For example, 01 001 001 will represent book 1, chapter 1, verse 1 - we just remove the spaces to get our final Verse ID number.

For example:

Passage Verse ID
Genesis 1:1 01001001
Genesis 22:16 01022016
John 3:16 43003016

This means each verse can be represented as an integer, taking up only 4 bytes of space in memory! In contrast, "Genesis 22:16" would usually take up 14 bytes of space as a string. Creating Verse IDs has saved us memory and storage space.

Handling Leading Zeroes

You'll notice that any Verse ID from the first 9 books of the Bible will have a leading zero. Leading zeroes can get weird in programming languages.

For example, JavaScript treats numbers with a leading zero as octal numbers, so they'll get interpreted as a different decimal number (decimal numbers, a.k.a. base-10 numbers, are what we use every day):

Octal Number Decimal (actual) Counterpart
01 1
010 8
0100 64
045 37
01001001 262657

To help ensure that we never run into issues with leading zeroes, I decided to add a 1 to the beginning of each Verse ID.

These are the updated Verse ID values from the table above:

Passage Verse ID
Genesis 1:1 101001001
Genesis 22:16 101022016
John 3:16 143003016

Because they are still within the range of a typical int in memory, they still only take up 4 bytes of space!

Serializing and Deserializing

This is the code I use to "make" and "parse" Verse ID values:

Bible.makeVerseId = (book, chapter, verse) => {
  const verseId = 100000000 + book * 1000000 + chapter * 1000 + verse;
  return verseId;
};

Bible.parseVerseId = (verseId) => {
  verseId -= 100000000;
  const book = Math.floor(verseId / 1000000);
  verseId -= book * 1000000;
  const chapter = Math.floor(verseId / 1000);
  verseId -= chapter * 1000;
  const verse = verseId;
  return { book, chapter, verse };
};

I also have additional code for validating that a Verse ID represents a real verse in the Bible to ensure data integrity.

Verse Ranges as Numbers

So far we've solved the problem of how to store a single verse reference as a number... but most of the time people read multiple verses together. Users want to track the whole passage that they read as a single unit, not as a list of individual verses.

To solve this, I simply store two Verse ID numbers together in the database: startVerseId and endVerseId.

Technically, we could create a longer "Passage ID" number to represent an entire passage. This would take all 5 variables I mentioned at the beginning of this article:

  • Book (2 digits)
  • Start Chapter (3 digits)
  • Start Verse (3 digits)
  • End Chapter (3 digits)
  • End Verse (3 digits)

However, this comes to a total of 14 digits, plus one if we still want a leading 1 to keep things consistent.

Signed integers can only support values up to 10 digits long (specifically, values up to 2,147,483,647). To store larger numbers, we would need to use more than 4 bytes. In fact, a "bigint" (big integer), the next size up for number storage, takes 8 bytes - the same as two separate integers. So at that point we might as well just store two separate integers and not worry about handling another special number format.