from SIR_CreateCuts.CutoutCollection import CutoutCollection

Upon construction, a CutoutCollection object does not actually load the contents of the underlying HDF5 files, so construction occurs very quickly:

%time cutouts = CutoutCollection('cutouts.json')

CPU times: user 1.21 ms, sys: 1.03 ms, total: 2.24 ms
Wall time: 2.43 ms

Getting a list of cutout IDs does require a bit of reading, but not much:

%time ids = cutouts.get_ids()

CPU times: user 1.34 s, sys: 181 ms, total: 1.53 s
Wall time: 1.53 s

Asking for the same result again does not require the file to be accessed, becuase the ids are now stored within the cutouts object:

%time ids = cutouts.get_ids()

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 12.9 µs

Fetching the total flux of each cutout takes a bit longer than getting the IDs:

%time fluxes = [cutout.flux for cutout in cutouts]

CPU times: user 10.3 s, sys: 417 ms, total: 10.7 s
Wall time: 10.7 s

When the same task is performed again, there is no disk access overhead because the results are cached within the cutouts object:

%time fluxes = [cutout.flux for cutout in cutouts]

CPU times: user 22.3 ms, sys: 4.65 ms, total: 27 ms
Wall time: 28.7 ms

That may seem like a long time, but note that there are tens of thousands of objects in the collection:

len(fluxes)

33469

Let's see how long it takes to fetch an object's pixel array...

%time image = cutouts[455954280235691009].pixels

CPU times: user 931 µs, sys: 0 ns, total: 931 µs
Wall time: 855 µs

Now, let's try that a second time, with the same object:

%time image = cutouts[455954280235691009].pixels

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 25.3 µs

Let's load ALL of the pixel arrays into a list:

%time images = [cutout.pixels for cutout in cutouts]

CPU times: user 6.56 s, sys: 1.86 s, total: 8.43 s
Wall time: 8.44 s

You may notice that this operation was faster than the process of loading the fluxes. This is likely due to the operating system's file caching capabilities (this was the second time that the HDF5 files were traversed). The caching built into the CutoutCollection vastly improves upon this though. When the pixel arrays are requested for the second time, the operation takes a few milliseconds:

%time images = [cutout.pixels for cutout in cutouts]

CPU times: user 41 ms, sys: 0 ns, total: 41 ms
Wall time: 38.9 ms

Since the CutoutCollection has a custom __contains__() method, we can test whether a particular id is in the collection, like this:

477657999814785735 in cutouts

True

477657999814785734 in cutouts

False