Opened 12 years ago

Last modified 12 years ago

#50 closed task (FIXED)

mongodb not scaling down: workaround and documentation

Reported by: Elrond Owned by: Elrond
Priority: minor Milestone: 0.0.5
Component: documentation Keywords:
Cc: Parent Tickets:


mongodb needs 0.5 GB for a fresh install. Half of this goes to

kombu could use redis or some other transport (see
`#322 </issues/322>`_), so fixing `#322 </issues/322>`_ will help
this issue too.

mongodb database files contain a lot of NUL bytes. So one can
easily use sparse files to save space on disk:


     # service mongodb stop
     # cd /var/lib/
     # cp -a --sparse=always mongodb
     # mv mongodb mongodb.old
     # mv mongodb
     # service mongodb start
     # # TEST mongodb
     # rm -rf mongodb.old

Maybe later versions of mongodb do this already internally. The
above was needed on mongodb from debian/stable.

Documenting the "make sparse" possibility will also help this

Change History (8)

comment:1 by Elrond, 12 years ago

Ahh, caring for old bugs...


Using CELERY\_ALWAYS\_EAGER (which is already documented) shrinks
the mongodb size (on a fresh install or after removing
kombu\_default) by 50%. Which is already great.

The docs mentions that CELERY\_ALWAYS\_EAGER is not for production.
But on small systems (freedombox) it might still be acceptable as a
production setting to save space?

That said: This bug can be set to "40% done". :-)

mongodb in general¶

cwebber found
`\_thread/thread/6c32836a742270aa <>`_
which lists some options to shrink mongo's space usage. I will test
them. If they help, the mediagoblin docs should point to this
posting (or the parts of the mongo docs).
Relevant mongo docs:
` <>`_
` <>`_

If the "sparse files" trick (as shown above) is not documented or
integrated into mongo by now, it should at least be documented

comment:2 by Caleb Davis, 12 years ago

Elrond says:
1. Those files on disk contain a a lot NUL chunks. If one converts
them to sparse files, things are already much better (see the bug,
it's in there).

2. Putting "noprealloc = true" in mongodb.conf: mongodb usually
   allocates a fresh, empty file for mmapping. So it has fresh space,
   just in case. That takes up a lot of space on disk.

3. "nssize = 1" (MB). This is the namespcae file size. I don't know
   actually, what is the worse effect of this. It might break
   everything. ;o)

4. Running GMG/celery in always-eager-mode (see bug; This is the
   default for currently) also saves half the mongo
   size, as kombu (messaging whatnot behind celery) isn't used. Worse
   effect: Any processing is synchronous. So you have to wait for the
   server to process your uploaded media, while sitting at your desk
   and watching your browser.


-  There seems to be a "smallfiles" option. I have not tested it.
   It should make the first mmaped file much smaller. Only makes
   sense, if you intend to have a small amount of data in your db
-  This isn't available in the config, but only as a cmdline param.

-  Most mongo options only take effect before even starting GMG, as
   they must be set before the "mediagoblin" database is created on
   the db-server.


-  Most options only affect on disk space. As mmaping even a 1 TB
   file (on a 64bit machine) doesn't load that file into RAM. The OS
   only loads stuff into RAM actually used. And the OS can "swap"
   (write to the file on disk) the stuff anyway, if needed. The real
   problem is a large db, because it will basicly be either fully in
   RAM, or any operation on it will swap like mad.
-  celery\_always\_eager will actually save RAM, as it means a
   complete database not being created/used! Caleb swears to:
-  At least put this in some nice form on the wiki. :-)
-  Including some links to the docs from the bug, etc.
-  docs: See "mongodb in general" paragraph at
   ` <>`_
   There are some links to official mongo docs. Those should be put in
   the wiki (or the deployment docs).
-  put a link to the wiki page on the bug.
-  [BONUS] - Write a cool "Using GMG on limited hardware (read
   freedombox)" chapter for the non existent deployment docs.

Elrond conludes:
So, if someone comes in and says "I want to run GMG on my
freedombox, what should I take care of", we can point them to the
wiki and say "What we know, is there. It's not ready for the
official docs, but all the info is there." :-))

comment:3 by Elrond, 12 years ago

Thanks Caleb for taking care of this!!

Also for the "See also" section of the upcoming wiki page:

-  ` <>`_
   (found by and also comments by cwebber)

So that people know where to go and help. ;)


\* 8. UNTESTED: Backing up and restoring the complete db might safe
some space/RAM, as it might remove some fragmentation. \* Maybe
someone more familiar with mongodb could look over this finally.

comment:4 by Caleb Davis, 12 years ago

Status: NewClosed
I'm closing this ticket because these efforts are now documented in
the wiki!

`\_Down <>`_

comment:5 by Caleb Davis, 12 years ago

Owner: set to Elvenlord Elrond
Status: ClosedFeedback
hm, maybe just let Elrond decide whether to close the ticket.

comment:6 by Elrond, 12 years ago

Component: Documentation
Great! Thanks!

comment:7 by Christopher Allan Webber, 12 years ago

Milestone: 0.0.5
Status: FeedbackClosed
I'm marking this as closed. We can continue to update these docs as
we find more info.

Thanks Caleb / Elrond for your efforts on this ticket!

comment:8 by Will Kahn-Greene, 11 years ago

The original url for this bug was .
#22: related, #49: related

Note: See TracTickets for help on using tickets.