Thoughts About Contact Tracing Apps

Don't we have enough location tracking already?

May 15, 2020

The subject of contact tracing is a hot topic in the infosec community. Privacy advocates have lots to say about it, and these days a lot of privacy-centric folks are also highly technical. That intersection produces a lot of really good discussion because it properly encompasses both the social and technical issues surrounding contact tracing. I live in that intersection and I think it’s worth extracting some of the salient points from that discussion into an article to give it more oxygen. Here we go…

Before I start into the topic I want to make a few points very clear. You are not about to read an article about why COVID-19 is not a big deal. You won’t find support for arguments such as “we’re overreacting” in this post. You won’t find sympathy for conspiracy theories that the pandemic is not real, and you won’t find a receptive ear if you think these things. Go somewhere else if you think these things. I don’t want you in my orbit.

Ok, now here we go for real this time…

What is contact tracing?

Let’s start at the beginning. I am not an epidemiologist so I may get some of this wrong. But, I have watched almost every broadcast from our Chief Medical Officer of Health and there have been a lot of those broadcasts. From them, I have gleaned some basics.

Basic #1

The primary job of the public health organization is to ensure the public health care system functions. It does this every day, but certainly, the job is much harder to do during a pandemic. Much like banks do not have enough cash on hand to allow every customer to withdrawal their balance all at once, hospitals do not have enough equipment for every citizen all at once. Public Health seeks to maintain a working level of health care based on expected demand.

Basic #2:

There are two basic pools of infected people that Public Health focuses on: those who contracted the disease in a known way, and those who contracted the disease is an unknown way. In the case of COVID-19 in my region, the first group became infected due to travel. Every early case in my province was a person who had recently traveled or had close (usually familial) contact with someone who had recently traveled. Now that the disease spread is more advanced, we have the second group to contend with which are cases not linked to any known source. That second group is called “community spread” which refers to the fact that these people got the disease from someone in their community and not someone who had traveled recently.

Basic #3

COVID-19 has an average 11.5 day incubation period. To make things easier, we’re saying that an infection cycle is two-weeks long. This means that if I contact COVID-19 today, I will likely remain asymptomatic (meaning, I have no symptoms of the disease) for 11.5 days, perhaps a little longer, perhaps a little shorter. This means that if I tested positive for COVID-19 today, I would have to go back through all my travels during the past two weeks to help Public Health determine where I may have contracted the disease. This means every shopping trip, every gas station, every dog walk, every post office visit. Everything.

Contact tracing deals with the community spread people in Basic 2, and specifically assists Public Health in tracking the origin of the disease on Basic 3.

In Basic 3, Public Health isn’t actually all that concerned about where we have been. Rather, it is most interested in who we had contact within those 14 days, but of course, those two things are inexorably linked. Because we’re all supposed to be isolating and not having people in our houses that do not live there, the only place we could have come into contact with an infected person is somewhere else. Therefore, tracking the locations a newly positive person has visited is the first step in determining who that person came into contact with. And in that pool of people is going to be one or more infected people.

Once Public Health has compiled the list of places the newly infected person has been, they then need to compare that list with everyone else who has been infected recently and hope they find a match: “they both went to the grocery store that day”, for example. However, even if they luck out and find a smoking gun like that, it does not necessarily mean those two people came within 2 meters of each other to transmit the disease. They may not have even been at the store at the same time. So it’s a good clue, but it’s not the best clue.

Lastly, Public Health will further try to classify the type of contact as Low, Moderate, or High Risk and then take different courses of action for each type of contact. Here’s an infographic explaining that process from the Nova Scotia Health Authority.

So far, I have described a lot of work. Public Health people are doing an immense amount of work to attempt to draw intersections between infected people and in most cases, the best result is a solid “probably”.

What Public Health really wants is to cut through all the guesswork and know, for certain, that you were within 2 meters of an infected person at 2:14 pm last Tuesday afternoon for more than 5 minutes at the gas station on route 4.

That is what contact tracing purports to be able to deliver. And do it by removing a lot of the drudge work Public Health is doing now.

Pros and Cons of the current contact tracing method

The biggest pro of the current method is that there are no false positives. Public Health does not get involved until someone has tested positive for COVID-19.

That may seem like an innocuous benefit because we expect things to work properly, so the fact that a process has no false positives isn’t usually considered a pro. It’s just expected. However, when you consider the massive amount of work that goes into tracking down a single infected source, eliminating false positives is paramount because there’s just not enough people and time to go chasing after false positives.

Another pro is that the current system has 100% coverage. Meaning, it is not limited to just people with smartphones.

The biggest con of the current method is that it is primarily reactive. It has very little ability to predict cases before people become symptomatic and have potentially spread the disease around.

Another con is that people just plain old forget stuff. It can be difficult to recall every single person you’ve seen or place you’ve been for the past 14 days and that can lead to unexplainable transmission.

Pros and Cons of mobile app contact tracing

The biggest pro of using a mobile app is that it can predict infections before the person becomes infectious. That means that people who have been in contact with a known infected person can be isolated prior to becoming infectious themselves and endangering others.

The biggest con of using mobile data it has the potential for a very high false-positive rate. The apps use Bluetooth to detect other devices in its area. Anyone who has used a BT headset or has BT in their car knows that BT has no problem penetrating walls, windows, car doors, and has a range far in excess of 2 meters.

Other potential cons of the mobile process include:

It has less coverage because it excludes people who do not have smartphones, or who do not/cannot install the app or, in phase 2, do not activate it in the OS.
It relies heavily on wide-spread testing. The relevancy of this point is highly regional. Some areas have very robust and widespread testing programs, others do not. But without broad testing, the list of known infected people will be less complete, and newly notified people will not be able to get tested quickly. A strong testing program is a lynchpin of any contact tracing, but especially rapid and automated contact tracing.

My first big idea

In my field, we have an understanding that we’re going to “throw the first one away”. Meaning that the first attempt to create a solution for a problem will likely be thrown away. This is because you’ll likely learn things you did not initially know and that will inform your way forward in a different way. This was no exception.

I initially had this super-radical idea that we should use existing data. Facebook, Google, and Apple have been collecting location data for decades on its users. Let’s make/ask them to provide this data for Public Health instead of re-inventing the wheel.

Facebook “check-ins” do precisely that - it tells Facebook where you are. Facebook then takes that location data and figures out every other Facebook user that is in that location regardless of whether those people check-in or not. And now that at least one person has checked into that location, Facebook can now record every Facebook user that goes to that place from now on. That sounded like exactly what we were looking for: “tell me everyone who was in that park at that time.”

On the slightly less seedy side of town, Apple and Google have already published maps on where people are congregating during isolation weekends. Seriously, they already have this data and have published it here and here.

Bingo. I thought this was precisely the data that we can use for contact tracing so there is no point in re-inventing the wheel. Especially re-inventing it in a way that will allow yet another third party such as a government to track all of its citizen’s movements.

However, as I researched the topic deeper, it became obvious that the existing location data is collected using cell phone towers, GPS, and wifi and that location data is not detailed enough for this purpose. Your phone knows you’re at Chik’n Chik’n Chik’n (come on, you KNOW there has to be a place with that name somewhere), but it doesn’t know you’re 200 feet away from the next nearest person who happens to be in the washroom and you’ll never come into contact with.

For that level of granularity, the most widely available technology that we have now is Bluetooth. Low-Energy Bluetooth (BLE) to be exact.

How does mobile phone contact tracing work?

This is the question that kept me awake at night. I want to be a good citizen and participate in this program. I want to know if I have come into contact with an infected person as early as possible so I can isolate myself and protect others. I am also a privacy pundit and the thought of allowing anyone to track my location willingly as is anathema to me, so I was at an impasse. An impasse that would ultimately end up in not participating in the program.

I watch the TWiT (This Week in Tech) podcast which is a weekly foray into technology at a level that people like me want. It’s not fluff produced for easy digestion by the uninformed masses, but it also is not complete gearhead stuff that nobody can understand that’s what Security Now! is for). It is somewhere in the middle and a good source of information for technical people. As you’d expect from such a show, contact tracing has been a hot topic recently.

When I first heard of mobile phone contact tracing, I envisioned a system where my precise location was sent to some central repository every few minutes along with the locations of everyone else. Then, when someone tested positive for COVID-19, whoever is in charge of that giant pool of location data can dig through it to find other people who have crossed the infected person’s path.

This architecture would be ridiculously unwieldy, utterly privacy invading, exactly what I expect of a government program, and simply would not work for a myriad of reasons, some of which Bruce Schneier listed in a recent post. There is no shortage of opponents to the system and they’re not shy about saying it. Ross Anderson illustrates some comical, but entirely likely scenarios:

Anyone who’s worked on abuse will instantly realise that a voluntary app operated by anonymous actors is wide open to trolling. The performance art people will tie a phone to a dog and let it run around the park; the Russians will use the app to run service-denial attacks and spread panic; and little Johnny will self-report symptoms to get the whole school sent home.

He’s not wrong. That’s all going to happen.

Back to how it works: the current thinking in privacy-centric circles is that the best solution to any privacy problem is to collect less data. Data that does not exist cannot be stolen or abused. That makes sense but the problem is that our applications and devices become more complex over time and therefore need more data to be more useful. The best compromise to that problem is to allow the collection of data but leave that data in the users’ possession instead of carting it off to some giant data lake in the sky.

There is ample precedence that user possessed data works. LastPass and Proton Mail are a few examples where user data is only accessible on the user’s device. Yes, the data does exist on internet servers, but that copy of the data is encrypted and cannot be decrypted on those servers. Only the users’ device can decrypt it which means that the useable data only resides on the user’s device.

Here’s a somewhat hard to follow infographic from Apple and Google showing the process.

There are many different organizations working on developing contact tracing apps, but the biggest is a collaborative effort by Google and Apple. Except for a very small group of fringe users, every smartphone on the planet runs Google’s Android operating system, or Apple’s iOS operating system. These two companies can tap into pretty much every phone in existence so this is the logical place to develop contact tracing using a mobile phone.

Google has a terrible reputation for privacy. I won’t hijack this post by pointing out yet again that Google makes almost all its money from advertising; a product line it fuels by surgically harvesting our user data through Gmail, Android, and hundreds of other apps we’ve never heard of. Apple has enjoyed a reputation for privacy recently and, to be fair, it is one of the few companies left that does business the old-fashioned way of just selling us stuff we like instead of selling us data harvesters. But, it is suffering from the same technical debt that occurs every time Steve Jobs leaves the building and there is a steadily increasing number of flaws in iOS and Apple hardware these days.

But, the joint architecture declaration of how these two companies intend to develop a contact tracing app checks all the right privacy boxes.

How will our privacy be protected?

I will now go through the high points of the Apple/Google solution. Please keep in mind that many governments are commissioning their own contact tracing apps. In fact, at the time I am writing this, there are at least 30 countries ramping up contact tracing in different ways. Many of them will not use the Apple/Google solution and their apps may be more invasive. Also, when the Apple/Google solution finally becomes available it may differ from what I’ve written here. But as of today, this is the intention of Apple and Google.

Phase One

Phase one of the project is to develop an Application Programming Interface (API) that can be used for contact tracing and make it available to app developers. That will enable developers to write apps and put them in both the Apple and Google stores. Users can then install these apps and activate them to participate in the contact tracing project.

The reason for phase one is to speed development. The API will be used in phase two as well, but it takes much more time to implement phase two, and a simplified phase one allows us to start building the skateboard.

Phase Two

Phase two of the project is to do away with the apps and build the contact tracing functionality directly into the Android or iOS operating system. In both phases, the user will have to deliberately consent to participate in the program which means even in phase two when you have no choice but to accept the software into your phone, you do not need to enable it.

If you’re participating, regardless of which phase, your phone will generate a unique token every few minutes and use the Bluetooth transceiver in your phone to broadcast that unique token out. At the same time, your phone is listening for other people’s tokens. Everyone else’s phone is doing the same thing so essentially we now have a way of recording the tokens of the devices that are in a ~100m bubble around us.

One of the early criticisms of the Bluetooth solution is that it could not determine how far away a person is, and would not know about things like walls and windows separating people. That is true, but the proposed solution addresses that in two ways:

When accepting a token from some other device, your device will consider the strength of the signal from that other device. Weaker signals mean that the other device is farther away or behind something.
Tokens from other devices need to be seen by your device more than once in a period of time that will be set by Public Health. For example, that may be set to 5 minutes. That means your device will not record every token it sees, just those that it sees more than once in a 5 minute period. This eliminates a ton of “drive-by” noise.

Those are not perfect solutions to the proximity problem, but they’re the best we have with the technology that most people are carrying in their pockets right now.

Phones will only record the tokens it receives and no other identifying data. Periodically (we assume at least daily) Public Health will download a list of tokens belonging to known infected people to your phone. If any token in that list matches the tokens stored on your device, then you will be notified via the phone that you came into contact with an infected person. What happens next is determined by your local Public Health, but we can assume it will take the form of notifying public health and going in for testing.

From the Apple/Google contact tracing FAQ:

If a match is detected the user will be notified, and if the user has not already downloaded an official public health authority app they will be prompted to download an official app and advised on next steps. Only public health authorities will have access to this technology and their apps must meet specific criteria around privacy, security, and data control.

The two important pieces that ensure user privacy are:

You do not upload your token to Public Health. It remains on your device within your control.
Only you will be notified if you’ve come into contact with an infected person. Public Health cannot know this on its own because the comparison between the list of tokens you’ve come in contact with and the list of tokens belonging to known infected people happens on your phone, not in “the cloud” somewhere.

There are a few other important concepts in the proposed architecture that I’d like to call out:

The contact tracing function on the phone has a very clear edge. It will alert you if you have been in contact with someone who has tested positive for COVID-19 and then it stops. It does not tell you what to do. It does not tell anyone else. This is the “line in the sand” and it is up to the individual and regional health authorities what to do next.
The same FAQ states that Apple and Google “will disable the exposure notification system on a regional basis when it is no longer needed.” This is what we call a “forward-looking statement” which is polite speak for “that may not happen.” But, both Apple and Google have proven to be formidable foes in legal tussles and they both have more money than almost any country in the world, so I feel there’s a reasonable chance this will be accurate.

Final Thoughts

I have decided that if the final Apple/Google solution is similar to the current proposal, then I will adopt it when it becomes available. While the criticisms of the false positive rates have some merit, I think the solution mitigates them as much as technically possible. I am also pretty sure that as people start using that data, false-positive rates will inform Apple and Google how to tweak the detection better.

My biggest concern was always the privacy aspect. And now that I know my location data isn’t streaming off my phone to the government or some private development company, I feel much better about that part of it.

Jon Watson's Death by Tech Newsletter

Discussion about this post