All you need to know about Nabla Copilot's privacy and security features
ML Product Manager
Security and compliance are the backbone of healthcare. At Nabla, we place privacy at the top of our agenda because it is fundamentally tied to our customers' experience of our products. We are committed by design to securing customer application data, eliminating system vulnerability, and ensuring continuity of access.
In practical terms, this means we use a variety of industry-standard technologies, services, and processes to secure data against unauthorized access, disclosure, use, and loss.
We just launched our Copilot, an ML-powered medical note generation product, to allow physicians to focus on what really matters during a consultation: the patient. You can try it for free!
Because we believe transparency is everything, this article details how data is captured, stored, and processed when a physician uses Nabla Copilot. We built this product ensuring the highest level of security and compliance, without compromising the final output of the medical note. An approximate note means physicians would have to edit it and divert from care again - and that would be defeating the purpose, which is maximum quality with maximum privacy.
Overview of the Nabla Copilot data flow
The short version is this: Nabla Copilot turns a raw medical conversation that occurred during a video consultation into a structured medical note that can be exported directly to the patient's EMR.
The longer version now:
Simply put, Nabla Copilot captures the audio from the browser tab in which the video consultation is taking place. As of now, the extension is not limited to a website, but it does not capture the audio without an explicit click on the start/pause button by the user.
The audio is then transcribed live using a HIPAA-compliant speech-to-text external application programming interface (or API for connoisseurs).
This API-produced transcript is then processed to generate a SOAP note (method clinicians use to document patient encounters in a structured way) using a combination of in-house, HIPAA-compliant natural language structuring algorithms and a HIPAA-eligible Large Public Language Model (LLM).
Here is a neat graph that sums it up:
Overview of the Nabla Copilot data flow
Data storage and processing
Pure and simple, Nabla does not store any data during this process, be it audio, transcript or final note. The only storage used to run Copilot is the physician's personal computer, which is technically their local Chrome storage file. This means that the transcript and suggested note always remain on the device, not on Nabla's servers.
Nabla does not store, but processes the data (i.e., computing operations) to turn a raw audio consultation into a nice, clean (😉) SOAP note.
This data processing is done on Nabla's servers, which are powered by the HIPAA and GDPR compliant Google Cloud Platform (GCP), and on HIPAA-compatible LLM servers.
What about device linking?
If you use Nabla Copilot on your phone, you may want to "transfer" the generated note to another device where you actually chart (usually a computer). How does this transfer works? When you link a device, the mobile device receives a public encryption key from the secondary device via the QR code you scan. Since the decryption key is only stored on the secondary device, no one else can decrypt the note that was encrypted on the mobile device. This end-to-end encryption process guarantees the privacy and security of the data.
Patient-doctor conversations during a medical consultation are typically full of personally identifiable information. This includes demographic data, medical history, test and lab results, mental health conditions, insurance information and other personal data that a healthcare professional collects to identify an individual and deliver appropriate care. No one wants this data to be loose in the wild.
In addition to opting out of data retention with our LLM vendor, we have implemented an additional layer of security with a pseudonymization algorithm that systematically removes all portions of the transcript that contain personally identifiable information.
In practice, this algorithm masks names, addresses, dates, phone and fax numbers, SSN, medical record numbers, health plan beneficiary, account, certificate or license numbers, vehicle, device or serial identifiers, URLs and IP addresses. Here's what this pseudonymisation process looks like:
"My name is Clément, I was born on 06.16" becomes "My name is [masked_name_001], I was born on [masked_date_001]".
The masked version is the one we end up giving to the LLM. This way, since the LLM has no personal identifiers as input, the LLM cannot return personal identifiers as output either. It is this output that is used in combination with our own algorithms to suggest the final note in the Copilot. At best, the LLM output will include expressions of type [masked_name] or [masked_date], as shown in the example sentence.
Of course, if you've used Nabla Copilot before, you'll see the relevant personally identifiable information in the suggested note, not a confusing [masked_name] or [masked_date] every time you expect an actual name or date in your note.
We mentioned previously that no data is ever stored by Nabla, which is still true. Our pseudonymization algorithm keeps a record of the actual personally identifiable information with a link to their masked versions provided to the LLM. Technically, this correspondence table is temporarily stored on Nabla's servers’ RAM, but it gets destroyed after each query and there is no way to access it.
It gets destroyed after each query
Concretely, this means that we replace all masked personally identifiable information from the LLM output with their unmasked versions, generating a usable note for doctors while preserving confidentiality in the process.
Wrapping up on privacy, what we guarantee is that since no data is stored on our servers, no one at Nabla or anywhere else for that matter is able to access the personal information discussed during the consultation.
That said, data still needs to be secured while in transit and during processing. Nabla leverages multi-level processes and systems to do so.
At the organization and people access level, we put in place an information security program and security awareness training, performed third-party audits and penetration testing, implemented roles, responsibilities, permissions, and authentication features (SSO, 2FA, etc.), as well as the least privilege access principle.
At the cloud level, all of our services are hosted by Google Cloud Platform (GCP) which employs a robust security program with multiple certifications. In addition, we implemented TLS/SSL encryption in transit to ensure end-to-end data security. The cherry on top of the very safe cake, any unusual activity will be reported through vulnerability scanning, logging, monitoring, and alerting features.
At the vendor and risk management level, we conduct annual risk assessments as well as vendor risk management.
In this regard, we are SOC 2 Type 2 compliant and ISO 27001 certified.
On this note
Nabla's goal is to preserve privacy while generating high-quality notes. This very goal leads us to develop and train self-hosted speech-to-text algorithms and large language models, both of which are fine-tuned on medical data.
The perk of self-hosting is that it removes the need for pseudonymization, which mitigates the risk of missing PHI by masking it in our current data flow. But current general-purpose language models such as GPT-4 are too large to be self-hosted at a reasonable cost.
However, we expect multiple lighter variants of GPT-4 from various players to surface in the coming weeks and months, making self-hosting possible.
We also anticipate that anyone will be able to refine some of these smaller variants on specialized datasets, beating out the general purpose, non-specialized models in their specific domain. This is the way forward for Nabla.