Series Archives - GDPR Sentry

GDPR and Genetic Testing

A point of view, Emerging Tech and Data Protection, Series

In the second instalment of our Emerging Tech series, we look at the development of commercial genetic testing, and the data protection implications of widespread genetic screening.

“Customers who are genetically similar to you consume 60mg more caffeine a day than average.”

“You are not likely to be a sprinter/power athlete”

“Customers like you are more likely to be lactose intolerant”

“You are less likely to be a deep sleeper”

These are all reports you can get from commercial genetic testing. Companies such as 23 and me, Ancestry.com, MyHeritage, DNAfit. We’ve talked about the rise of genetic testing before, but recent announcements from Richard Branson have brought the topic back into discussion.

Earlier this month Richard Branson announced he was investing in 23 and Me, and the company would be going public (meaning shares will be traded on the New York Stock Exchange). This push for growth and investment has reopened the proverbial can of worms, and people are once again considering the privacy implications of genetic testing.

What is genetic testing?

Genetic testing takes a DNA sample, such as hair or saliva, and identifies variations in your genetic code. These variants can increase or decrease your risk of developing certain conditions. These tests can also identify variations in ‘junk DNA’ that have no impact on your life, but can be used to identify relatives and ancestors.

Genetic screening first appeared in the 1950s. Researchers later developed more detailed DNA profiling in the 1980s, used for crime scene investigation. Technology has come forward in leaps and bounds since then. Once an expensive and costly feat, you can buy a reasonably affordable testing kit in many pharmacies or online. In Estonia, the government are offering genetic testing to citizens; to screen for predisposal to certain conditions and help individuals act early with personalised lifestyle plans or preventative medication.

There have been suggestions to utilise genetic screening in the Education sector as well. In 2006, two years before 23 and Me began offering their first testing kits, geneticists suggested schools as the perfect place to carry out widespread screening. Researchers have also investigated the possibility of genetically informed teaching, with teaching style tailored to an individual’s predisposition to certain learning styles.

For those outside education, the biggest development has been Direct to Consumer (DTC) genetic testing. DTC testing began mostly as a tool for ancestry identification, now there are millions of consumers, and even companies offering tailor made nutrition plans designed around your genetics.

I find myself writing this a lot, but it sounds like science fiction. Yet again, the science of today has caught up with the fiction of yesterday. However, if growing up surrounded by shelves of Sci-Fi has taught me anything, a cautious approach is often best. This is definitely true of Genetic testing. There are many possible advantages, but there are also risks.

A Breach with Big Implications:

Data breaches are always a possibility when you entrust your information to someone else. However, genetic data is clearly a sensitive type of personal data, particularly if a customer has opted for genetic health screening.

Companies will put swathes of protective measures in place, but in a world where a cyber-attack occurs approximately once every 39 seconds, there will be breaches. In fact, there already have been breaches. In July last year, hackers targeted the genetic database GED match, and then later used the information to target users of MyHeritage. Even without cyberattacks, breaches occur. When recovering from the recent hack, GEDmatch reset all user permissions. This opened up over a million genetic profiles to police forces, despite users opting out of visibility to law enforcement.

If genetic testing is ever to be used in schools or offered nationwide, one key issue will be ensuring they hold that data securely. If schools and colleges offered genetically informed teaching, they would have to hold that data too. Adequate security measures for such information can be difficult to manage, particularly if education budgets stayed the same. Infrastructure would require radical change before genetic testing could ever be implemented safely.

Breaches are nothing new, but with such precious data, they can be worrying.

Secondary Functions and Sources of Discrimination:

Under the data protection act, data controllers must set out what they will use your personal data for. They cannot use that data for unrelated purposes without informing you. However, over recent years, there have been several cases where ambiguity over accessibility has made it to the news.

Individuals can opt-in to share their data with 23 and Me research teams. Many customers were comfortable with researchers using their data for medical advances. It was not until their public deal with GlaxoSmithKline, that it was clear genetic data was being passed to pharmaceutical companies for profit.

This data was anonymised, so the outcry following the announcement was more about ethics than data protection. However, there have been multiple cases where companies have allowed law enforcement to access their databases, despite stating otherwise in their privacy policy.

Your genetic data reveals a huge amount about you and your characteristics, so it’s important to know exactly who can see it. For example, variations of the MAOA gene have been linked to levels of aggression, as well as conditions such as ADHD. Identification of these types of variants could help employers find individuals more likely to succeed in their field. However, it could just as easily lead to discrimination in hiring. Researchers have also linked other conditions such as bipolar disorder to certain genetic variants. Should that information be available to employers, it might lead to workplace discrimination. For example, bosses not promoting individuals they think might later become “unstable.”

There has been speculation that biological data could be used for identifying terrorist subjects, tracking military personnel, or even rationing out treatment in overstretched health systems. This is all speculation. Even so, there are fears of discrimination based on the possibility of you developing a certain condition or trait.

The Risk of Re-identification:

The speculation above works on the basis of genetic data being individually identifiable. Companies use anonymisation to reduce risks of such discrimination. Genetic companies go to great lengths to separate genetic data from identifiers. For instance, anonymising data for research purposes, or storing personal and contact details on a separate server to genetic data. The view has always been that if you separate personal identifiers from the raw genetic data, the individuals remain anonymous.

Unfortunately, research has already shown that it is possible, in principle, to identify an individual’s genomic profile from large dataset of pooled data. It’s an interesting thought. Companies are often quite willing to share anonymised data for additional purposes. It is no longer personal data and isn’t protected with the same legal safeguards. If we can re-identify a data subject, it requires the same levels of security and legal protection as personal data. Dawn Barry, cofounder of genetic research company LunaDNA, said “we need to prepare for a future in which re-identification is possible”.

If this data could be re-identified, it raises questions over the definition of anonymity. It also reignites the discussion over who Genetic Testing companies should be sharing data with.

Understandable Worries? Or Needless Fear?

Schools and colleges have always been a proving ground for new technologies. It’s worth remembering that fingerprint scanning was being used in UK schools for over ten years before the Protection of Freedoms Act caught up in enforcing parental consent.

It would be easy see than a “scientifically based, individualised learning experience” could be presented as an ideal way of helping all students achieve the best outcomes.

Interestingly, Direct to Consumer genetic testing has now been available for just over a decade, so there is still plenty of room for development. However, we’re still some way from determining the day–to-day life of students in education.

Here’s a sobering thought though. Should the worst happen, and something compromises your data, you can change your passwords, you can change your bank details. You can even change your appearance and your name. You can’t change your DNA. We’ve got to keep that in mind as the world of biometrics continues to grow.

Next time, we’ll look at remote learning and the technologies that are being developed for the virtual classroom. Find previous posts from this series here.

4th March 2021/by Emily Parrett

GDPR Sentry can help you fill the knowledge gap

Bringing Up Baby: Raising Biased AI

A point of view, Emerging Tech and Data Protection, Series

Anyone involved in last year’s exam grade saga probably harbours a level of resentment against algorithms.

The government formula was designed to standardise grades across the country. Instead, it affected students disproportionately, raising grades for students in smaller classes and more affluent areas. Conversely, students in poorer performing schools had their grades reduced, based on past grades from previous years.

Most of us are well versed in the chaos that followed. Luckily, the government have already confirmed that this year’s results will be mercifully algorithm-free.

We touched on the increased use of AI in education in an article last year. Simple algorithms are already used to mark work in online learning platforms. Other systems can trawl through the websites people visit and the things that they write, looking for clues about poor mental health or radicalisation. Even these simple systems can create problems, but the future brings machine learning algorithms designed to support detailed decision making with major impacts on peoples lives. Many see Machine Learning as an incredible opportunity for efficiency, but it is not without its controversies.

Image-generation algorithms have been the latest to cause issues. A new study from Carnegie Mellon University and George Washington University, found that unsupervised machine learning led to ‘baked-in biases’. Namely, the assumption that women simply prefer not to wear clothes. When researchers fed the algorithm pictures of a man cropped below his neck, 43% of the time the image was auto completed with the man wearing a suit. Researchers also fed the algorithm similarly cropped photographs of women. 53% of the time, it auto completed with a woman in a bikini or a low-cut top.

In a more worrying example of machine-learning bias, A man in Michigan was arrested and held for 30 hours after a false positive facial recognition match. Facial recognition software has been found to be mostly accurate for white males but, for other demographics, it is woefully inadequate.

Starring Cary Grant and Katherine Hepburn, Bringing up Baby follows a palaeontologist through his adventures with a scatter-brained heiress… and a leopard called Baby.

Where it all goes wrong:

These issues arise because of one simple problem, garbage in, garbage out. Machine learning engines take mountains of previously collected data, and trawl through them to identify patterns and trends. They then use those patterns to predict or categorise new data. However, feed an AI biased data, and they’ll spit out a biased response.

An easy way to understand this is to imagine you take German lessons twice a week and French lessons every other month. Should someone talk to you in German, there’s a good chance you’ll understand, and be able to form a sensible reply. However, should someone ask you a question in French, you’re a lot less likely to understand, and your answer is more likely to be wrong. Facial recognition algorithms are often taught with a white leaning dataset. The lack of diversity means that when the algorithm comes across data from another demographic, it can’t make an accurate prediction.

Coming back to image generation, the reality of the internet is that images of men are a lot more likely to be ‘safe for work’ than those of women. Feed that to an AI, and it’s easy to see how it would assume women just don’t like clothes.

AI in Applications:

While there’s no denying that being wrongfully arrested would have quite an impact on your life, it’s not something you see every day. However, most people will experience the job application process. Algorithms are shaking things up here too.

Back in 2018, Reuters reported that Amazon’s machine learning specialists scrapped their recruiting engine project. Designed to rank hundreds of applications and spit out the top five or so applicants, the engine was trained to detect patterns in résumés from the previous ten years.

In an industry dominated by men, most résumés came from male applicants. Amazon’s algorithm therefore copied the pattern, learning to lower ratings of CVs including the word “women’s”. Should someone mention they captain a women’s debating team, or play on a women’s football team, their resume would automatically be downgraded. Amazon ultimately ended the project, but individuals within the company have stated that Amazon recruiters did look at the generated recommendations when hiring new staff.

Image of white robotic hand pointing at a polaroid of a man in a suit, with two other polaroids to the left and one to the right. The robot is selecting the individual in the picture they are pointing at.

Algorithms are already in use for recruitment. Some sift through CVs looking for keywords. Others analyse facial expressions and mannerisms during interviews.

Protection from Automated Processing:

Amazon’s experimental engine clearly illustrated how automated decision making can drastically affect the rights and freedoms of individuals. It’s why the GDPR includes specific safeguards against automated decision-making.

Article 22 states that (apart from a few exceptions), an individual has the right not to be subject to a decision based solely on automated processing. Individuals have the right to obtain human intervention, should they contest the decision made, and in most cases an individual’s explicit consent should be gathered before using any automated decision making.

This is becoming increasingly important to remember as technology continues to advance. Amazon’s experiment may have fallen through, but there are still AI-powered hiring products on the market. Companies such as Modern Hire and Hirevue provide interview analysis software, automatically generating ratings based on an applicant’s facial expressions and mannerisms. Depending on the datasets these products were trained on, these machines may also be brimming with biases.

As Data Controllers, we must keep assessing the data protection impact of every product and every process. Talking to wired.co.uk, Ivana Bartoletti (Technical Director–Privacy at consultancy firm Deloitte) stated that she believed the current Covid-19 pandemic will push employers to implement AI based recruitment processes at “rocket speed”, and that these automated decisions can “lock people out of jobs”.

Battling Bias:

We live in a world where conscious and unconscious bias affects the lives and chances of many individuals. If we teach AI systems based on the world we have now, it’s little wonder that the results end up the same. With the mystique of a computer generated answer, people are less likely to question it.

As sci-fi fantasy meets workplace reality (and it’s going to reach recruitment in schools and colleges first) it is our job to build in safeguards and protections. Building in a Human based check, informing data subjects, and completing Data Protection Impact Assessments are all tools to protect rights and freedoms in the battle against biased AI.

Heavy stuff. It seems only right to finish with a machine learning joke:

A machine learning algorithm walks into a bar…

The bartender asks, “What will you have?”

The algorithm immediately responds, “What’s everyone else having?”

The technologies used to process person data are becoming more sophisticated all the time.

This is the first article of an occasional series where we will examine the impact of emerging technology on Data Protection. Next time, we’ll be looking at new technologies in the area of remote learning.

8th February 2021/by Emily Parrett