When we set out to build the rear-facing camera for our phone our primary goal was to have a camera without the telltale bump and have it integrated seamlessly into the overall design. However we were not willing to sacrifice image-quality in low light which is a common point of frustration for many people who rely on their phone’s camera. In a nifty bit of engineering we were able to accomplish both those goals. And now as we ready Essential Phone for production, I want to give you a behind-the-scenes look at the camera’s components, and the process through which we engineered our image capture for photography.
But first I want to tell you a bit about me. I am an image quality engineer with an MS in Color & Imaging and a PhD in Human Visual Perception. So you may be thinking that I am an ace photographer too, but that’s not the case; I’m way more focused on making sure images are rendered properly than I am on taking nice pictures. And as you’re about to see, when it comes to the art meets science of image rendering, there’s a lot to think about.
The Essential Phone camera is made up of two cameras that work in tandem with one another. The first rear-camera is designed for color, and like most cameras, it applies a red, green, or blue color filter at different pixel locations, and then assigns that pixel a value. As a result, the camera must interpolate the neighboring pixels to produce the final image. What does this mean? If only some of the pixels are assigned color values, the camera must infer what the rest of the image should look like, and this often leads to less-than-ideal resolution. That’s why we made our second rear camera a true monochrome camera, which does not require any color filter. The lack of a color filter means that no interpretation is necessary—every pixel is assigned a true black or white value, which enables the camera to produce images with much less noise and much higher resolution, no matter the lighting conditions.
When taking a still picture, Essential Phone activates both cameras at once. The monochrome and color images are then fused to create a final photograph with rich, deep clarity.
As an image quality engineer, all of this is a lead up to the main thrust of my work. To convert imaging information from the sensor data into a final image, the camera must employ a complex Image Signal Processing (ISP) pipeline. Getting this right requires months of tuning, and it has been the focus of my work since October of last year.
When you activate a typical smartphone camera, it's sensors immediately evaluate your surroundings, and the ISP compensates accordingly by automatically adjusting different components like focus, exposure, white-balance, lens shading correction, and more. Objective tuning is meant to ensure that each camera module sent to production is operating at an acceptable baseline level. It began with picking the correct golden and limit samples from the factory.
The golden samples are the modules whose characteristics most closely align to the average of our camera and the experience that most of our users will have. Once golden samples were collected, we used them to capture a series of images under various laboratory-controlled test conditions. The images from the golden samples were then used to train the ISP to recognize the unique characteristics of those modules. In other words, we taught the ISP to see the world in a certain way. We also tested other limit and random samples, which have different characteristics that are saved in the factory calibration data, to ensure that they are behaving like the golden samples in those scenes too. The objective tuning process lasted three months. By the end, all of our cameras were responding to the predefined lab scenes in an accurate and predictable fashion.
But even when a camera can repeat actions in a lab, it still needs to be taken into the field— because in real life a camera must be able to take the right picture in millions of different scenarios. Subjective tuning is what makes this possible. It is a painstaking, iterative process—but also one I find incredibly rewarding.
The key to subjective tuning is capturing all types of pictures in the wild, identifying systematic image quality problems, and adjusting the ISP setting to address them. In order to address an image problem, however, we have to analyze multiple pictures to determine the root cause. Let’s say one of the captured images looked soft; our thought process goes something like this: Is that softness a result of focus failure or camera shake? Or is it an issue we need to address through tuning? Was it observed in this one image only, or was it a recurring issue? And if so, is there a particular type of scene where it’s a problem? For example, is it only in a specific gain/lux range? Does it happen more in highlight/midtone/shadow regions? Does it happen before or after dual-camera fusion? Does it apply to only fine textures with subtle contrasts, or does it apply to well-defined edges? Asking these questions can help us narrow down which components in the ISP we should try to fix—and once a fix is identified, we must test it across all types of pictures again to ensure that by fixing one thing, we didn’t inadvertently cause other kinds of images to break.
As you can see, the ISP is an extremely complex, interlaced system. The subjective tuning process requires deep understanding of both the engineering and artistic nuances of how an image is produced, and how the different components interact with each other. Even the smallest tweak might have an impact on multiple components, so it requires a great amount of patience, and a willingness to iterate over and over to achieve the best possible image quality—especially when you’re tuning two cameras at once.
Our subjective tuning process began in January 2017, and during that time, we have gone through 15 major tuning iterations, along with countless smaller tuning patches and bug fixes. We have captured and reviewed more than 20,000 pictures and videos, and are adding more of them to our database every day. We’re almost there, but I’m not going to stop tuning the camera on our phone until the last possible minute to provide the best photographic experience possible.
dfdfdfdf