Skip to content

What is a Pharo image and how is it created

Inao edited this page Jun 14, 2023 · 2 revisions

Image format on disk

There are two different 2 formats for the disk storage of a Pharo image:

  • Classical image, which is made of only one file
  • Composed image, which is a folder with many files (more recent)
    • files are stored according to the memory segments
    • The metadata of the image is stored in STON format

What is stored inside an image?

They are two main parts inside an image

  1. A header, which contains a lot of metadata
  2. a dump of the memory, which freezes the current state of the Pharo system (All the living objects, windows, ...)

How do we dump and restore the memory

When saving an image, the user often wants to keep working. The decision was made to make snapshots as fast as possible, even if it requires a bit more time to load. So how do we dump memory quickly and how do we restore it when loading an image?

The memory is dumped by storing the new space and old the segments of the old space continuously without the bridges. At this point, memory addresses have not been changed and still refer to where the objects are in memory for the image that is opened. This minimizes the changes to do/undo before being able to continue working in the image. Storing the addresses already modified would require going over all the objects to change the references and then reversing it to be able to keep working. Metadata are stored (in a C struct ?) to register the size and starting addresses of different segments started in the memory at this point in time for all segments except the first (which is stored in the image header.)

When loading the image you read all the segments after the header. But you need to use the information of the segment size and previous addresses to remap the memory addresses with the new starting addresses of the segments. This is swizzling pointer. See SpurMemoryManager>>adjustAllOopsBy: for the implementation. It iterates over the objects and their fields' addresses, finding the segment addresses belong to, and offsetting them according to both the old and new positions of the segment in memory. When loading, Pharo asks for the same memory addresses if possible to limit swizzling.

Header Structure

See SpurImageReader>>readHeaderFrom:startingAt: for the implementation of header reading. Some fields have different sizes. When reading data from the file, the messages have the following structure:get...FromFile:swap:. The swap flag is carry information about endianess. This is important when changing machines, especially with some older machines.

  • 1st field - version: is this a 32 or 64 bits image? This first field is used to determine the endianness of the image. This field encoding is not reversible so you can detect if you have to swap or not to be able to read it properly. This information is used to read the next fields.
  • 2nd field - imageHeaderSize : Size of the header
  • 3rd field - dataSize : Size of the memory dump part
  • 4th field - oldBaseAddress. Base address before storing. Addresses are in absolute positions in memory. So when you load the image, the operating system might not give you the same memory area. So you have to move all the addresses accordingly.
  • 5th field - initial SpecialObjectArray : special array needed by the VM. Contains objects such as true, false and nil
  • 6th field - headerLastHash: Last hash number assigned to an object. Each object gets one when they are allocated, and this hash is used to generate the next one. So this field keeps track of the last one used.
  • 7th field: currently unused! (used to be for screen size)
  • 8th field - flags: These flags are transmitted to the interpreter (See setImageHeaderFlagsFrom:) They specify if Floats are big/little endian, and how processes of the same priority are handled (preemptionYields). See Processes.
  • 9th field - extraVMMemory: This is a parameter that is used by the Interpreter when loading the image. This specifies how much space should be added as headroom for the old space by the stack interpreter. This creates room for the image to grow and not to start asking the system for memory just after loading.
  • 10th field - headerNumberOfStackPages: The stack is organised in pages. This is how many pages we want. Pages are made of stackFrames. There should be more pages than process because each process that is running should have an active page otherwise it is very costly for the VM.
  • 11th field - headerCogCodeSize: This is the space used to hold compiled machine code. Too little means too much (re)compilation. There is a default in the VM if the image parameter is set to 0.
  • 12th field - headerEdenByes : Size of the eden memory part
  • 13th field - Maximum size of external semaphore table: Semaphores are normal objects. They are used when the VM wants to signal something to the image. One problem is that the Semaphore object inside the image may move. Instead, the VMCode refers to this external semaphore table that does not move. The current semaphore table has a limited size, which is set by this parameter, which means there is a limited amount of file sockets, ... Increase this parameter if you need a lot of semaphores for an application. This requires reloading the image after changing the parameter or it could be overridden in the command line (but then it is not stored in the header)
  • 14th field - imageVersion (NEW !): P12, or P11 ... this is the version of the Pharo inside the image
  • 15th field - firstSegSize: This stores the size of the first segment of old space memory.
  • 16th field - freeOldSpaceInImage: Amount of free space inside the image (holes inside the currently used memory).

Some of those fields are historical

Clone this wiki locally