GridFS

GridFS

GridFS is a specification for storing and retrieving files that exceded the Bson document size limit of 16m. Instead of stroring a file in a single document, GridFS divides a file into chunks, and stores each of those chunks as separate document. It not only used to store large size documents, but also for file that does need to be loaded into memory. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

Use GridFS

To store and retrieve files using GridFS, use either of the following:

  • A MongoDB driver when not using the shell.
  • The mongofiles command-line tool.

Mongofiles command-line tool

If you have the bin path in your environment variables you will have access to the mongofiles command if not the mongofiles command are located in the bin folder under your mongodb installation.

mongofiles -d gridfs put "/path/to/the/file"

In the first time you execute this command, a new database will be created named gridfs which contains two collections "fs.chunks" and "fs.files".

>use gridfs;
switched to db gridfs
>show collections;
fs.chunks
fs.files

Chunks collection

Each document in the chunks collection represents a distinct chunk of a file.

{
	_id: <ObjectId>,
	file_id: <ObjectId>
	n: <num>,
	data: <binary>
}

Files collection

Each document in the files collection represents a file in the GridFS store.

{
	_id: <ObjectId>,
	length: <num>
	chunkSize: <num>,
	uploadDate: <timestamp>,
	md5: <hash>,
	filename: <string>,
	contentType: <string>,
	aliases: <string array>,
	metadata: <dataObject>
}

Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.