from SIR_CreateCuts.CutoutCollection import CutoutCollection
Upon construction, a CutoutCollection
object does not actually load the contents of the underlying HDF5 files, so construction occurs very quickly:
%time cutouts = CutoutCollection('cutouts.json')
Getting a list of cutout IDs does require a bit of reading, but not much:
%time ids = cutouts.get_ids()
Asking for the same result again does not require the file to be accessed, becuase the ids are now stored within the cutouts
object:
%time ids = cutouts.get_ids()
Fetching the total flux of each cutout takes a bit longer than getting the IDs:
%time fluxes = [cutout.flux for cutout in cutouts]
When the same task is performed again, there is no disk access overhead because the results are cached within the cutouts
object:
%time fluxes = [cutout.flux for cutout in cutouts]
That may seem like a long time, but note that there are tens of thousands of objects in the collection:
len(fluxes)
Let's see how long it takes to fetch an object's pixel array...
%time image = cutouts[455954280235691009].pixels
Now, let's try that a second time, with the same object:
%time image = cutouts[455954280235691009].pixels
Let's load ALL of the pixel arrays into a list:
%time images = [cutout.pixels for cutout in cutouts]
You may notice that this operation was faster than the process of loading the fluxes. This is likely due to the operating system's file caching capabilities (this was the second time that the HDF5 files were traversed). The caching built into the CutoutCollection
vastly improves upon this though. When the pixel arrays are requested for the second time, the operation takes a few milliseconds:
%time images = [cutout.pixels for cutout in cutouts]
Since the CutoutCollection
has a custom __contains__()
method, we can test whether a particular id is in the collection, like this:
477657999814785735 in cutouts
477657999814785734 in cutouts