Surely, some of you out there have to deal with databases that, over time, have become too large to handle comfortably. It does happen. Recently, an IT forensics customer asked us if there’s a way to evenly split the contents of a large Notes database into multiple NSFs so their metadata extraction software could run multiple threads (for much faster processing of the application’s contents). In today’s post, we’ll look at an easy example of how you can break these super-sized databases up into smaller workable parts.
As part of our v12 release, we developed a search enhancement for scanEZ that allows you to limit the number of search results and thereby grab the first “x amount” of matching documents & designs.
This is the feature that allows us to split database contents into multiple identical databases.
Although it is possible to split an NSF into ANY number of pieces, the easiest way to go is to split an application into 2ᶰ databases. For our simple example, let’s take a database that contains an even total of 400 documents*; we’ll examine how to split it into first two, and then four separate databases.
*Please note that this is a pretty small amount of documents. Normally, when this process is preferred (or even necessary), it’s because the database has grown much larger than the one in our example. However, regardless of the time it may take to process a very large amount of documents, these steps will let you split your desired databases accurately.
1) With the initial database already on your local machine, create a second OS level copy (this will keep the replica ID and note identifiers intact). Disable replication for both.
2) Rename the original NSF ToSplit_1.nsf and the new copy ToSplit_2.nsf (see fig. 1).
3) Open ToSplit_1.nsf in scanEZ, checkbox-select all of its documents, then proceed to Checkbox Selection > Add to > New ‘My Selection’. For this example, let’s name the new My Selection All Documents. Set display titles to use the @NoteID formula (this guarantees that we keep the same order of documents in both databases). Make sure to deselect your Documents folder.
4) Right-click your brand-new All Documents My Selection folder and choose Select first from current category in the contextual menu; specify 200 (see fig. 2).
5) With the first 200 documents selected, go to Checkbox Selection > Remove From > Current My selection. Then, right-click your All Documents My Selection folder (which should now contain only the second 200 documents) and click Delete All Documents in Category. Make sure you choose NOT to create deletion stubs (see fig. 3). ToSplit_1.nsf now contains only the first half of the documents.
6) Open scanEZ on ToSplit_2.nsf and repeat steps 3 and 4 in order to checkbox-select the first 200 of 400 documents.
Tip: On databases containing a very large volume of documents, you may need to get rid of some steps. Instead of creating a new MySelection folder containing all the documents, you can simply change display titles on the Documents section by using the @NoteID formula. Then, select the first half of the documents just as we’ve selected the first 200 in our example. This will save you the time of creating a new MySelection folder.
7) Go to Checkbox Selection > Delete Document(s) and, again, opt out of creating deletion stubs (see fig. 4).
At this point, you’ll have two NSF files with identical designs and replica IDs. ToSplit_1.nsf contains the first 200 documents and ToSplit_2.nsf contains the second 200. If you need to split your original database even further, create OS level copies of both files and call them ToSplit_3.nsf (identical to ToSplit_1.nsf) and ToSplit_4.nsf (identical to ToSplit_2.nsf). Simply repeat the above steps on NSFs ToSplit_1 & 3 and ToSplit_2 & 4 and you’ll end up with four equal databases, all containing 100 documents each (see fig. 5).
As I stated early on in this post, this example uses a database with a relatively small amount of documents. When working with a very large amount of data (such as millions of documents) these steps might take you a while. However, regardless of the size of your project, these steps will put you on track for breaking even your largest Notes databases down to a workable size.