As has become tradition for Ars at Google I/O, we recently sat down with some of the people who make Android to learn more about Google’s latest OS. For 2019, the talk was all about Android Q and this year’s big engineering effort, Project Mainline. Mainline’s goal is to enable Google (and sometimes OEMs!) to directly update core parts of the OS without pushing out a whole system update. If that sounds technical and challenging, well, it is.
This year running the Ars Android Interview Gauntlet we have veteran interviewee Dave Burke, VP of engineering for Android. As the head of Team Android, Burke is an encyclopedia of Android knowledge and always manages to come up with insightful answers to my grab bag of esoteric questions. And returning for the second year in a row is Iliyan Malchev, principal engineer at Android, the lead of Project Treble, and all-around Linux integration guru.
But to help up the ante for this latest deep dive, Ars was also joined by Anwar Ghuloum, Android’s senior director of engineering and the lead of Project Mainline. Ghuloum’s insight was especially welcomed given this year’s I/O headliner: as “The Next Great Android Update Project,” Mainline was easily the biggest news to come out of the conference.
So, buckle up for a long Android Q(&A) if you will—but first, some background on Mainline.
Project Mainline: A “fundamental shift” in Android OS development
For years, we’ve seen Google continually work to chop Android up into more easily updatable pieces. Early on in Android’s life, the Google apps and core system apps were offloaded to the Android app store, allowing Google to pump out new user-facing features whenever it wanted. Google Play Services then took many developer APIs and offloaded those to the Android app store, allowing Google to pump out developer-facing API updates whenever it wanted. More recently Android 8.0 brought us Project Treble, which separated the OS from the hardware support, allowing for easier update development.
With Android Q, the big new modularization effort is “Project Mainline.” Along the same lines as Google’s early-days move to put apps in the Play Store, Mainline modularizes several core system components and moves those to the Play Store. Mainline goes deeper into the system than the surface-level apps, though—these are big chunks of system functionality like the media framework and ART, the Android RunTime.
Traditionally, the Play Store has distributed apps only in the form of APK files, but for many of the components being modularized in Project Mainline, they wouldn’t work if packaged up as an APK. Since the APK system was built for system and user-level apps, there are limitations for things like permissions and when they can turn on in the boot up process. For modularizing these core components, Google came up with something more powerful than an APK: the “APEX” file type. APEX files can have essentially root-level permissions, and they get to start up very early in the boot process, allowing Google (or your OEM) to update many more components. APK files are packages for system- and user-level apps, and APEX files are packages for core system components. This table shows the first batch of them in Android Q:
In the future, we’ll probably see Project Mainline modules grow to encompass more and more of the Android system. For this first Android Q release, though, Google chose to focus on three themes: “Consistency,” “Security,” and “Privacy.” Before our I/O interview, Google provided us with the above table of the Project Mainline components in Android Q, detailing which components are being modularized and what the recommendations are for OEMs. And that brought us to the first question.
What follows is a transcript, with some of the interview lightly edited for clarity. For a fuller perspective, we’ve also included some topical background comments in italics.
Ars: So I have this Project Mainline table, which details which component are recommended or not. How did you go about picking what is and is not mandatory?
Anwar Ghuloum, the head of Project Mainline: Ideally, we’d want everything to be mandatory. The way we worked on these modules was to talk to all our device manufacturers and say, “Hey, we’re doing this, work with us on it.” They upstreamed a bunch of code. They had a bunch of future requests for things that they were beginning the process of working on, and, for those modules where we could actually meet all those requirements, we made those mandatory. For the modules where there are still gaps, we made them optional for this release, and for the next release they’ll be mandatory. So that gives us time to get to parity, because we don’t want to regress their device experience, but pushing these modules, we want to make sure their stuff gets in.
Dave Burke, VP of engineering: I think part of this work is upstreaming with our partners. When I say partners, I’m talking about device makers. They add changes into the device they build, and we want to get them all upstreamed to mainline code branch, so we have consistency. It just takes time.
Ghuloum: Yeah, I mean, we’ve done a ton of upstreaming. It’s amazing. For some of these packages, we upstreamed more in the last year than we’ve upstreamed in the previous 10 years.
Burke: Yeah, it’s important.
I think part of the background here is that Project Mainline represents Google clawing back final ownership of some core system code from OEMs (aka device manufacturers). If device manufacturers are going to give up ownership of that code, Google wants to make sure all the customizations OEMs used to add are now supported in the normal AOSP (Android Open Source Project) code base that everyone uses.
You can imagine Google going around the ecosystem for each module and asking things like “Do you really need to customize the way the DNS resolver works?” When the answer was “no,” Google’s version was made mandatory. For modules where the answer was “Well actually…,” the plan is to upstream all that into the Google version in AOSP and eventually adopt the Google version. Ghuloum’s declaration that some packages were able to “upstream more in the last year than we’ve upstreamed in the previous 10 years” sounds like they’re making a ton of progress. More code upstreamed into AOSP means less code to keep track of for OEMs, which leads to easier, less complicated system updates.
Ghuloum: What we explained to our teams is that the premise of using a Mainline module is that you will get to release once a month. That you are actively working with the partners, co-developing, planning your roadmap, and stuff like that. People in the team have been pretty compelled by that, and excited about it.
Ars: Oh, is that the plan—a once-a-month release for Mainline modules?
Ghuloum: Well, that’s our trained cadence and that’s driven by our security update schedule, because some of the components are security sensitive. The media component in particular comprises primarily codecs and extractors. One of the reasons that’s a module is that we looked at vulnerabilities over the last year, and nearly 40 percent of patch vulnerabilities in our security updates came from those modules. So, we’re like, “Hey, what if we could just push these out to the entire ecosystem, instead of putting the burden on the OEM to take these, test them, and push them out themselves?”
Android’s media playback engine has to load all sorts of scary file types from across the Web, and doing so in a safe manner has always been a security challenge. Android’s media playback engine is called “Stagefright,” a name you might recognize from the news cycle in 2015, when series of remote code execution vulnerabilities were discovered in the Stagefright engine. The formal monthly security update schedule that Ghuloum is talking about was started up in response to those vulnerabilities, and the media framework hardening continues to this day.
Today, Google produces the AOSP security updates every month and hands them over to OEMs, but that still isn’t a perfect solution. Not every phone supports monthly security updates, not every phone ships these updates every month in a timely manner, and most phones only support them for two years. As Ghuloum says, in addition to Google taking over the testing, integration, and roll out of these fixes, Project Mainline would also (eventually, once everything is on Android Q) let Google roll out these fixes every month to the entire ecosystem instead of just a few flagship phones.
Burke: The other thing is, we often hear from developers on what we could do to make their lives easier on Android. One of the things that comes up often is fragmentation of slightly different behaviors in different parts of the OS, even within the same manufacturer—the media framework is one they bring up. And so more consistency there is good for the developers, too. It reduces errors and the work they have to do, and it increases the quality of apps, which is good for users.
Ghuloum: I was calling this “bug consistency” yesterday.
Burke: (laughing) Bug consistently! Yeah, that’s true.
Ghuloum: There’s this module called “ANGLE;” it’s basically OpenGL implemented on Vulcan. Right now it’s mandatory for OEMs, but developers can opt in to whether they use it or not. The idea is to lean into the kind of the support for Vulcan that’s coming on all these devices. Having a consistent GL implementation—not necessarily a bug-free one, because we never ship bug-free software, nobody ever does—but the thing for game devs that they struggle with is, they’re used to bugs in drivers, but all these different bugs and different drivers are super painful. We can make that much more consistent.
Burke: The other way to think about this is: it’s generally good hygiene. You look at the GPS rollover that happened on April 6, for example. It grounded some airplanes because they couldn’t cope with the clock rolling over. There’s always something that’s going to happen in software, and you want to have the ability to get this updated, especially, like, really low-level stuff—like with Conscript, which is our secure library, SSL library, and TLS. That’s updatable, as well. And that’s another area that when certificates expire, or you’re cert provider suddenly goes out of business, you can fix that.
Ghuloum: Or, the BoringSSL bugs.
Burke: Or, the BoringSSL bugs, yeah exactly. They’re kind of unsexy but fundamental components in the system.
Iliyan Malchev, Project Treble lead: Years ago, there was a bug in Bionic (Android’s standard C programming library) that was introduced by one of our partners, who had the sign tables wrong. So, trade functions were randomly failing, and a range of the curve in a way could break games. So, stuff like this is incredibly hard to catch before you ship.
Ghuloum: And the developers had to live with it for the next few years—unless it ever gets patched. I’ve seen that with some of our own first party apps, [they] have to work around bugs throughout the ecosystem. It’s just spaghetti code.
Ars: So, was there a test update that went out to Beta Q users?
Ghuloum: Yes—actually, as of Beta 2, we started pushing updates. There are threads on Reddit about this.
Ars: Right, OK.
Ghuloum: Their devices were rebooting and, yes, we were pushing updates, testing updates, and we were rebooting people’s devices. We’re only doing this in the Beta, actually. During production, when Q ships, all reboots will just be organic reboots from the user. We looked at the numbers, and it looks like over a couple of weeks, that gets us to a reasonable saturation level of people taking the update. Plus, we have monthly security updates we’re going to be rebooting at least once a month, anyway. So, you’ll just take it. We don’t want to put UX in the user’s face. If there’s an update waiting, we just wait for them to reboot.
Ars: OK. Do you think that’s what the final version is going to look like—kind of a quiet background thing that won’t be very visible?