With my prospectus completed and approved, I was ready to begin my actual research project at Brigham Young University. I am here on a Research Experiences for Teachers (RET) program funded through the National Science Foundation (Grant # PHY1157078). I’m working with Dr. Eric Hintz to study high-mass x-ray binaries, among other things, in several open star clusters. In my last post I outlined this background research. Not all of it will be relevant to my final analysis, but that’s something you don’t know when you start out.
During my second week I moved to the next step, which was to learn the software and processes I would need to successfully analyze the star data. Wait a second, you say, why am I jumping right to analysis when I haven’t collected any data yet? What we have is a science Catch 22. You can’t know how to best collect data until you know how that data will be analyzed, but you can’t analyze the data until you’ve collected it. The answer is to learn how to analyze data with another data set that someone else has already collected before you try to collect your onw. That way you can check to make sure you know what you’re doing, then apply your process knowledge to planning your research methodology.
Perhaps a personal story will illustrate what I mean. When I was in my undergraduate program at BYU, I was studying psychology with a minor in political science. I wanted to do some independent research even then, and got some other students involved with me. We arranged to work with a professor in the political science department to get some independent poly sci credit. We proposed to study the attitudes of members of the Ute Tribe on the Uintah and Ouray Reservation toward tribal self-government (one of us was a tribal member). We created a questionnaire and selected respondents at random from tribal rolls. One person, for example, was actually in jail in Fort Duchesne at the time and I had to go in and interview him. We compiled all the data, did statistical tests, wrote up the report, and got our credit without too much trouble.
To get rides out to Fort Duchesne, we also worked with the Multicultural Education Department at BYU. They had another project going on at the same time. They created an extensive questionnaire that compared white students with Native American students in the elementary, middle, and high schools in the area. There is an extremely high drop out rate among Native American students in the area once they reach high school, yet both groups are evenly matched through most of elementary school. The questionnaire asked their attitudes toward education, their sense of self-confidence, their support systems, etc. We gave the questionnaire to hundreds of students (with the full cooperation of the school district). I helped to administer the study on one of my trips out there.
When we collected all the questionnaires, they formed a stack literally four feet tall. And it was then we realized we had a big problem. The students that had designed the questionnaire had never considered how the data would be recorded and analyzed. It would have taken a small army of flunkies to record the data and put it into a computer program. And this was in the early 1980s, before spreadsheets were readily available. Lotus 1-2-3 hadn’t been invented yet, let alone Excel. So by the end of the semester, no data reduction had been done and the questionnaires sat in a pile in a corner of our professor’s office. To this date, I don’t know if the study was ever completed.
Moral of the story: When dealing with a mountain of data (such as looking at hundreds of stars in an open cluster over several nights with different filters), it’s essential to know what the data will be like and how to manage it all before collecting it in the first place. Every NASA space probe mission has to plan its data pipeline carefully, including how the instruments on board will store and transmit the data back to Earth, how that data will be collected and recorded and archived here, and how it will be reduced and analyzed. Then and only then do you start designing the instruments. You’ve got to know the end from the beginning.
The only type of data you have to work with in astronomy is light. It can be measured directly, filtered, ran through a spectrometer, looked at across the entire EM spectrum, and compared over time. I’m amazed at how much we can learn just from the light coming from a star. Much of what we do is to measure the intensity of the light at its various wavelengths. This is called photometry, literally “measuring light.”
The software for photometry (and many other things) used by most astronomers is called IRAF (Image Reduction and Analysis Facility) developed by the National Optical Astronomy Observatory (NOAO). It is powerful and can do both image reduction and analysis. I’ll talk about the reduction end in this post and the analysis end in the next. It is also essentially open source, so astronomers can program their own add ons and tweaks. But it is a pain to use, because it was first developed back in the days before GUI operating systems and to this day still uses a command line interface. So the learning curve is steep and painful. I must at least partially master it before I can attempt to do my own astronomic research.
Whenever you collect data for a science project, the data has to be converted into some format that makes sense. This could be as easy as creating a Likert scale in a questionnaire and recording the numbers chosen. But it makes a difference if the scale has limited, discrete choices (such as only being able to select 1, 2, 3, 4, or 5) or if it has a continuous scale (where people could choose 3.7, for example). It takes experience to know how to set this all up. Once the questionnaires are finished, the data must be recorded or entered into a program such as MS Excel where analysis and comparisons can be made. But can you imagine having to measure and type in all the parameters for a star, including its right ascension and declination, its magnitude, its stellar class, etc. and having an image field of hundreds of stars for each observation and each filter with several runs per night over many nights? They used to do it that way, such as the Bonner Durchmusterung (Bonn Star Catalog) that surveyed hundreds of thousands of stars in the northern hemisphere in the 1850s without the use of photography. Now we use IRAF.
But to get IRAF solutions to have any meaning, the data must be filtered before it can be analyzed. Any photograph of the sky done recently uses electronic sensors called CCDs (Charge Coupled Devices) which were first invented for astronomy and the space program but now are found in every cell phone and digital camera. They act as a grid of sensors or photon traps: as a photon from a star hits a sensor pixel, it knocks an electron off of the silicon and stores it in a register where it can be read out as a digital number. By reading all the numbers stored in all the pixels, a grid of digital data is built up representing the brightness of the image for each location. We call this a bitmap or a raster. Since I teach computer graphics, I’m very familiar with this aspect of astronomy and photography in general.
Now every CCD has some biases that affect the accuracy of the pixel data. First, when you read out the data, not every electron is successfully pulled out of the photon traps. Some get stuck, and you have to account for them and subtract these trapped electrons from your final image. Second, the electronics of the camera creates a background hum of noise that must also be subtracted out. Now these effects are not very important when taking a regular photograph, where there is so much light or signal compared to this background noise. But in astronomy, where you leave the shutter open for minutes or even hours (days in the case of the Hubble Deep Field), and you try to trap every photon that hits the sensor, then these effects are very noticeable. Finally, the individual sensor pixels in the CCD do not have the same sensitivity. One pixel may trap 90% of all photons that hit it, whereas another (especially those around the edges) may only trap 60%. You have to zero out this sensitivity bias as well.
An analogy would be to clean up a photograph taken indoors under dim fluorescence lighting. You’ve got to improve the photo’s brightness and adjust the color bias away from yellow toward blue. Likewise, astronomers have to remove the biases caused by trapped electrons, system electronics noise, and sensor sensitivity. The cleaned up images have been “reduced” and are ready for analysis.
The Master Zeros, Darks, and Flats:
I know it sounds like something out of a Dungeons and Dragons game, but these are terms familiar to any professional or aspiring amateur astronomer. To get rid of the biases, calibration frames must be recorded and averaged and digitally subtracted from the original raw images. Anything that remains is the result of “seeing” (visibility and air conditions).
To get rid of trapped electrons, the camera on the telescope collects an image with the cap on at zero time – basically instantaneously. Since it should be completely black, any numbers above zero that show up are electrons trapped in the pixels. To get rid of electronics hum (called dark current), images are taken with the cap on at 60 or so seconds. When compared with the Zeros, any electrons showing up were built up by the surrounding electronics. In both cases, ten or so images are taken and the results averaged to get a master zero and a master dark which are applied to all the images taken on a particular night.
To get rid of sensitivity biases, astronomers take a series of ten or so images with a neutral sky. Some take images using a flat gray paper equally illuminated taped to a wall. Or they point the telescope at the zenith at twilight to get a flat field. Since all the pixels should have the same number, any differences are divided out from this flat value. This should make a nice even image across the entire field of view of the telescope, provided there aren’t any high level cirrus clouds or other “seeing” problems.
IRAF is a command line program but it works in tandem with DS9, another package developed by the Smithsonian Astronomical Observatory (SAO). If you know the names of your individual files, IRAF will load them one at a time, create the masters, then apply them to the .fits or .imh files in DS9 and save out a final reduced image for each frame taken for a given night or filter.
All of the data I will use for the first few weeks will already be reduced, so I won’t go through this process for a while. I researched the process during this second week so I could understand it and get a better feel for what astronomers do. Now what I learned as a SOFIA Airborne Astronomy Ambassador makes more sense. The scientists on board talked frequently about data reduction and the “data pipeline.” For infrared astronomy the process is even worse because IR is measuring heat, and the heat of the sensor, the telescope, and the air all interfere with the image to create terrible noise. And that’s not even counting vibrational jitter. That’s why the scope on SOFIA chops and nods; it is doing much of the heat noise reduction right in the telescope. The CCD/spectrograph biases have to be processed out later, and they are still working out the final bugs as SOFIA reaches full operational readiness.
The data we will use for NITARP is already reduced and digitized in the IPAC databases. I am really appreciating that for the first time. All we will have to do is analyze the data, which will be difficult enough.