Intermediate Levels

Manifold is often used to display very large images, at times truly immense images.


Like most technologies for displaying very large images Manifold utilizes intermediate level images, also known as pyramids, to allow much faster display of images, including zooming in or out and panning the image.   Intermediate levels may be explained as follows.




Consider a very large image that is shown above in greatly reduced form. The image shows a one meter resolution image (each pixel covers one meter on the ground) that covers the region in the Montara Mountain USGS quad in the San Francisco Bay area.  The image shows a region of the San Francisco peninsula ranging from the San Francisco International Airport, the light region in the upper right corner, down to the harbor at Half Moon Bay at the lower left where the dark crescent of the harbor is just barely visible.  


The image is 11119 x 13929 pixels in size which requires over 700 megabytes of storage space in most image storage formats.  By modern standards it is not a particularly big image, which could easily be tens of gigabytes in size.




If we zoom all the way into this image in the vicinity of the San Francisco airport so that each image pixel occupies one pixel on the computer monitor we can see that the image is detailed enough so that individual automobiles are easily seen.


Computer monitors usually have resolutions of about 72 pixels per inch. Even though that results in millions of pixels available in a large monitor, if displayed at 72 pixels per inch the example image at 11119 x 13929 pixels would require a computer monitor over 12 feet wide (about four meters) and over 16 feet tall (over five meters) to display the entire image at full resolution.


Obviously, therefore, we will never see this entire image at full resolution on our computer monitor. We will only see parts of the image at full resolution when we zoom in to specific locations. If we zoom far enough out so that the entire image fits, we will not be looking at the image's pixels; instead, we will be looking at an interpolation of the image that averages out millions of pixels per square inch to show us an approximate visual representation of what the real image might look like when viewed at greatly reduced scale.




For example, the image above is roughly 1/100th of the size in width and height of the image seen at full resolution. In rough approximation, instead of being 154 inches wide it is only approximately 1.5 inches wide. Therefore, each pixel in the above image represents all of the pixels in a 100 x 100 region of the full resolution image, that is, 10,000 pixels in the full sized image.  At 72 pixels per inch, each square inch of the above greatly reduced view interpolates almost 52 million pixels of the full image. Just one square inch of reduced view requires many calculations.


To calculate what the full image would look like at the above greatly reduced view requires computations on hundreds of millions of pixels. That can take a lot of time even on a fast computer so it is no surprise that truly huge images might appear to be slow whenever we zoom in or out or pan them. Each such step can require a lot of computational power and processor time to compute the necessary view by interpolating millions or hundreds of millions or even many billions of pixels.


The trick to speeding up that process is make those calculations in advance, at least partially, and to store the results. If we zoom out from the full image so that the entire thing fits on the screen as in the example above the resulting view requires only a few tens of thousands of pixels.  That zoomed out, interpolated view is insignificantly small compared to the hundreds of megabytes of the full sized image. We may as well make the computation just once (perhaps when we store or export the image) and then save the result for use whenever anyone ever wants to see a zoomed out view of the image.


If someone wants to see a zoomed out view, instead of repeating the massive calculation we can simply fetch the pre-built view and display it. Since computer monitors show relatively small numbers of pixels (the number of pixels on even large, high resolution computer monitors is but a tiny fraction of the size of really big images), we can fetch pre-built images to fill up a computer monitor virtually instantaneously. That's a lot faster than computing such zoomed out views on the fly.


The whole idea of intermediate image levels, therefore, is when a very large image is first created or stored our software will automatically compute views of that image at various zoom levels and will store those views along with the image. Programs that display the image can then use those pre-computed, stored views whenever anyone wants to zoom in or out to see different parts of the image at different zoom levels.  A more sophisticated approach, used by Manifold, includes other tricks such as pre-computing indexes and other useful information that will speed up the rendering of images.




In the case of our full resolution example, as seen above, Manifold might also compute views at regular intervals such as zoomed out at twice the scale, four times the scale and so on.




The image above shows the same region at lower resolution (1/4th the zoom) so that only 1/16th as many pixels are used to cover the same region. A series of images at this intermediate zoom level covering the entire image could be stored with the full sized image.




Zooming even further out requires only 1/64th as many pixels to cover the same region. Another way of saying the same thing is that when zoomed out like this we can use the same number of pixels to show a region 64 times larger.


Obviously, if we compute intermediate levels of zoom and save extra images that takes time. If we compute and store very many intermediate levels of zoom that will also increase the storage size required for images because in addition to the image itself there will be additional space required to store the intermediate level images that are created.


Different technologies using intermediate levels in different software packages have different ways of figuring out what is a reasonable number of views and how to implement storing those intermediate views, but the basic idea is the same: compute some reasonable number of intermediate views and save them so that later on display of the image at different zoom levels will be very fast.


It turns out that with most such technologies it is not a great burden to take the time to compute intermediate levels in advance. That's because such computation usually is done when an image is being stored or exported in some process that already will take a significant amount of time. Delays in such cases are not as unpleasant as delays when we zoom in with our mouse and are waiting expectantly for something to happen right away with the image.


In addition, such delays when creating or exporting the image are one-time delays: after the intermediate levels are created once they never again have to be created. We might have a slightly slower, one-time creation process but then every time we view the image our will go much faster.


The need for extra storage space also is not usually a burden, since most people would happily see an image increase in storage space by, say, 50% if thereafter display was virtually instantaneous instead of taking minutes for each change in zoom.  Disk storage space is virtually free while time spent at a keyboard tapping our fingers waiting for a zoom to happen is priceless.


Why are intermediate level images also called pyramids?




If we imagine a sequence of such images stacked up we see that the reduced intermediate levels form a conceptual pyramid of sorts in that "higher" images are smaller than "lower." In addition, when designing such software the data structures used for storage often are diagrammed using hierarchical diagrams that look like pyramids, hence the name.


See Also