I first started thinking seriously about AI-powered transcription in oral history when a community archivist handed me a folder of cassette tapes recorded with undocumented migrants—hours of voices that risked being lost, misheard or left unread because manual transcription was slow, expensive and emotionally heavy. The promise of tools like Otter.ai, Whisper, Trint and cloud speech services felt seductive: suddenly accessible text, searchable interviews, a faster route to analysis. But the more I worked with these technologies in the field, the more I realised that transcription is not a neutral technical step. It reshapes accessibility, authorship and power in projects that involve people whose safety and dignity are at stake.
Why does transcription matter beyond convenience?
Transcription does three things at once. It makes speech searchable and analyzable; it creates a text artefact that can be quoted, archived, and shared; and it embeds decisions—about punctuation, dialect, code-switching, whether to anonymise—into the record. When the speakers are undocumented migrants, those decisions have practical and ethical consequences. A mis-transcribed place name or a missed negation can misrepresent someone's trajectory. A timestamped phrase can be indexed and retrieved in ways that might expose identity. So the question is not only what technology can do, but how we use it and who controls the resulting texts.
Can AI tools improve accessibility for participants and researchers?
Yes—if used thoughtfully. For researchers and the public, automatic transcription massively lowers barriers: audio becomes readable for deaf users, non-native speakers can follow along, and long interviews become navigable. For participants, having their words transcribed can be empowering—it gives them a tangible product of their storytelling that they can edit, annotate or use for advocacy.
But accessibility isn't automatic. AI systems struggle with accented speech, code-switching (for example, mixing Spanish, Arabic and local vernacular), non-standard orthographies and low-resource languages. Off-the-shelf models trained on mainstream data will often misrecognise names, flatten cultural markers and introduce errors that erase nuance. That’s why human-in-the-loop workflows remain essential: review, correction and community verification are what turn a raw transcription into an accessible, reliable resource.
What are the main ethical and safety risks?
- Privacy and exposure: Transcripts are searchable. If sensitive locations, contacts or legal vulnerabilities are mentioned, they can be discovered and misused.
- Consent complexity: Participants may consent to an audio interview but not to a searchable, indexed transcript that could be stored in cloud servers or archived indefinitely.
- Misrepresentation: Automated systems can change meaning—mishear negations, names or emotional cues—producing a text that falsely attributes statements.
- Data jurisdiction and legal risk: Cloud transcription services route audio through servers that may be subject to different laws, potentially exposing data to subpoenas or government access.
- Authority and authorship: A machine-generated transcript can be treated as an 'authoritative' text even though it reflects algorithmic assumptions rather than the speaker's intended phrasing.
How can projects reduce harm while benefiting from AI transcription?
From my practice, these are practical steps that balance utility and care.
- Explicit, layered consent: Explain multiple options: audio-only archives, manual transcription, AI transcription with human review, or anonymised transcripts. Allow participants to choose how their words are stored and shared.
- Local first processing: Whenever possible, run speech models locally (OpenAI Whisper can run offline) to avoid routing audio through third-party servers. This reduces exposure and simplifies data governance.
- Human-in-the-loop review: Never publish AI transcripts without community verification. Allocate budget and time for editors—preferably bilingual or from the same linguistic community—to correct and annotate.
- Contextual annotations: Add metadata fields for code-switching, laughter, pauses, emotional tone and non-verbal cues. This keeps the transcript from flattening performance into plain text.
- Anonymisation and redaction protocols: Use redaction tools to mask names, locations or identifiers before indexing. Keep original audio and raw transcripts in encrypted, limited-access storage separated from public versions.
- Clear archiving policies: Define retention periods and access levels. Community-controlled archives—where participants or community representatives can veto release—are preferable to open public repositories.
Do AI transcripts change authorship and credit?
Yes. Authorship in oral history has always been negotiated: the interviewer frames questions, the editor shapes the narrative, and the community may claim ownership of stories. When an AI generates the first draft of a transcript, it introduces a new actor into that negotiation. That doesn't mean a machine "wrote" the story, but it does complicate attribution. I argue for transparent crediting: label machine-generated drafts clearly, document the tools used (model name, version, settings), and foreground the speaker as the primary author of content. Include notes on what was corrected during human review and who made those edits.
Which tools are appropriate, and how do they differ?
Here's a simple comparison that I’ve found useful when choosing a tool:
| Tool | Strengths | Concerns |
|---|---|---|
| OpenAI Whisper (local) | Good accuracy for many languages, can run offline | Requires local compute; still needs human review |
| Otter.ai | User-friendly, good collaboration features | Cloud-based; privacy and jurisdiction concerns |
| Trint | Integrated editor, speaker diarisation | Subscription cost, cloud processing |
| Google/Azure Speech-to-Text | Scalable, powerful models | Data routing, corporate access issues |
None of these are perfect; the choice depends on language needs, budget, local IT capacity and the project's ethical commitments.
How should power and ownership be redistributed?
In projects with undocumented migrants, technology can either reinforce extractive dynamics or help redistribute control. I prefer workflows that treat participants as co-creators: offer them copies of transcripts, invite corrections, include their preferences about metadata and public access, and where possible, train community members to run transcription tools locally. Community review boards or advisory committees can approve redactions and access levels. These practices slow the research process, but they produce work that is more ethical and more accurate.
What about legal accountability and archival best practice?
Archivists and legal counsel should be part of the planning. Key precautions include encrypting stored audio, applying role-based access controls, documenting chain-of-custody, and avoiding cloud vendors that automatically claim broad rights over uploaded content. For public-facing materials, use tiered access: summaries or anonymised excerpts for the public, and full transcripts/audio accessible only to verified researchers under strict conditions. Keep meticulous logs of who accessed what and when—those administrative details can protect people if legal or political pressure arises.
How do I start implementing this tomorrow?
- Map your risks: language, legal environment, likely sensitive content.
- Select a tool that fits your privacy needs (local model vs cloud service).
- Design a consent form that explains transcription options and risks in simple language.
- Budget for human verification and community review time.
- Create a metadata and redaction policy before transcription begins.
- Train at least one community member to manage or review transcripts.
AI-powered transcription can be a powerful amplifier for oral histories—making hidden voices legible, searchable and shareable. But amplification is not always benign. The work we do around undocumented migrants demands not only technical know-how but sustained ethical attention: to consent, context, control and reparative authorship. Done well, transcription becomes a collaborative act of preservation; done poorly, it can expose and distort the very people we seek to honour.