Marklogic: Forests

From Glitchdata
Jump to navigation Jump to search
  • A forest is a collection of XML, JSON, RDF, text, or binary documents.
    • Forests are created on hosts and attached to databases to appear as a contiguous set of content for query purposes.
    • A forest can only be attached to one database at a time. You cannot load data

into a forest that is not attached to a database.

  • A forest contains in-memory and on-disk structures called stands.
    • Each stand is composed of XML, JSON, binary, and/or text fragments, plus index information associated with the fragments.
    • When fragmentation rules are in place, XML documents may span multiple stands. * MarkLogic Server periodically merges multiple stands into a single stand to optimize performance.
    • See Understanding and Controlling Database Merges for details on merges.
  • A forest also contains a separate on-disk Large Data Directory for storing large objects such as large binary documents.
    • MarkLogic Server stores large objects separately to optimize memory usage, disk usage, and merge time. A small object is stored directly in a stand as a fragment. ** A large object is stored in a stand as a small reference fragment, with the full content stored in the Large Data Directory.
    • The size threshold for storing objects in the Large Object Store and the location of the Large Object Store are configurable through the Admin Interface and Admin API.



Sizing

  • A forest should be <200GB
  • 2 Cores per forest

Related