Find study content
Learning Materials

Discover learning materials by subject, university or textbook.

Explanations
All Subjects

Anthropology

Archaeology

Architecture

Art and Design

Bengali

Biology

Business Studies

Chemistry

Chinese

Combined Science

Computer Science

Economics

Engineering

English

English Literature

Environmental Science

French

Geography

German

Greek

History

Hospitality and Tourism

Human Geography

Japanese

Italian

Law

Macroeconomics

Marketing

Math

Media Studies

Medicine

Microeconomics

Music

Nursing

Nutrition and Food Science

Physics

Politics

Polish

Psychology

Religious Studies

Sociology

Spanish

Sports Sciences

Translation
Features
Features

Discover all of these amazing features with a free account.

Flashcards

StudySmarter AI

Notes

Study Plans

Study Sets

Exams
What’s new?

Flashcards
Study your flashcards with three learning modes.

Study Sets
All of your learning materials stored in one place.

Notes
Create and edit notes or documents.

Study Plans
Organise your studies and prepare for exams.
Resources
Discover

All the hacks around your studies and career - in one place.

Find a job

Student Deals

Magazine

Mobile App
Featured

Magazine
Trusted advice for anyone who wants to ace their studies & career.

Job Board
The largest student job board with the most exciting opportunities.

StudySmarter Deals
Verified student deals from top brands.

Our App
Discover our mobile app to take your studies anywhere.

Learning Materials

Features

Discover

Transcribing Spoken Data

In the study of English Language or linguistics, we often look at how people talk and interact with each other. This means that when we collect data, it is often of spoken language and is what we call spoken data. To get spoken data into a form we can use and analyse, we have to transcribe it.

Get started

+ Add tag
Immunology
Cell Biology
Mo

What is StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

How does StudySmarter help me study more efficiently?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Where can I find more explanations like this?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

What's smart about StudySmarter's flashcards?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Can I create my own content on StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

How does spaced repetition work in StudySmarter flashcards?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

What can you do with flashcards in StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Is StudySmarter a science-based learning platform?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

How do StudySmarter's smart learning plans support your exam prep?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Can you create your own study sets in StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

What is StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

How does StudySmarter help me study more efficiently?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Where can I find more explanations like this?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

What's smart about StudySmarter's flashcards?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Can I create my own content on StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

How does spaced repetition work in StudySmarter flashcards?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

What can you do with flashcards in StudySmarter?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Is StudySmarter a science-based learning platform?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

How do StudySmarter's smart learning plans support your exam prep?

Show Answer

+ Add tag
Immunology
Cell Biology
Mo

Can you create your own study sets in StudySmarter?

Show Answer

Fact Checked Content
Last Updated: 04.10.2022
14 min reading time

Content creation process designed by
Content cross-checked by
Content quality checked by

To transcribe something is to put it into a written or printed form.

Once we have transcribed spoken data, we then have a transcription that we can use to analyse the spoken data.

A transcription (or transcript) is a written or printed version of something.

In this article, we’re going to look at why we transcribe spoken data, how we transcribe, how the International Phonetic Alphabet is used in transcription, and then how to cite speech transcription.

Why do we Transcribe Spoken Data?

Due to the nature of spoken language, once we’ve heard it, we generally can’t hear it again.

Spoken data is simply data of language which represents how it was spoken. Spoken data differs from written language in that it usually shows the more informal language features that aren’t present in written language.

To collect spoken data that we can listen to again, we must record it. This can be done either as an audio recording or as an audio-visual recording (video) where we can then listen to the spoken data as many times as we need.

Although having audio recordings is important when analysing spoken data, it’s not always the most useful way to store data, as it can't be analysed and it can make it difficult to find a specific piece of data quickly.

We transcribe spoken data so that we have a written form of it. This makes it much easier to analyse what has been said and how. Looking at the content of the spoken data (such as topics, words and interruptions) can be useful in areas of linguistics like sociolinguistics where we may need to analyse and compare the language of different speakers.

Language differences can vary among speakers and can be related to social aspects such as age, class, gender, occupation, ethnicity and region.

Another reason why we transcribe spoken data is to look at a person’s accent and pronunciation features. This is done by transcribing data using the International Phonetic Alphabet, which we’ll look at in a bit more detail later. Doing this allows greater and more specific speech analysis in fields such as phonetics and phonology.

Accent and pronunciation features are the aspects of spoken language that can differ between different speakers. For example, how the /a/ in ‘bath’ is pronounced differently in British accents. Here, a short /a/ sound in ‘bath’ is a feature of northern accents.

Transcribing spoken data, Image of man writing, StudySmarter Fig. 1 - Transcribing data involves writing it out.

Transcription of data in research

Before transcribing, you first need to collect the data. This is done most often through recording spoken language either as an audio recording or recording as a video – having a video may be useful for looking at things such as NVC within a person's speech.

NVC stands for non-verbal communication and is the name given to any sort of gesture, movement or facial expression used to communicate something. NVC is often used in conjunction with verbal communication (speech) but can also be used on its own.

When recording and transcribing data, certain factors need to be considered. These are ethics and the observer’s paradox.

Ethics

In relation to ethics, we need to think about what is the morally right practice as researchers. As spoken language is produced by an individual and is unique to that individual, you need their permission to record them.

If you don’t ask permission before recording someone, it could be considered a breach of that person’s privacy. Every study that requires spoken data has to first go through ethical considerations and make sure that permission has been asked for where it is needed.

The observer’s paradox

The observer’s paradox is the name given to the problem that arises when trying to record natural spoken language. Most natural speech occurs when the speakers are completely at ease and talking casually amongst themselves.

When recording data though, there is usually an observer (the person recording the data) or at the very least a recording device. Due to ethical considerations, the speakers will also know that they are being recorded. As much as people may try to speak naturally, there is always an element of being a bit on edge when you know you’re being recorded or listened to. This may cause the speaker to either consciously or subconsciously alter how they speak.

How to overcome observer’s paradox

When collecting data, you can make certain allowances for observer’s paradox to overcome it. One thing you could do is ask for permission to record someone’s speech in advance of doing it and then record them when they’re not expecting it. With this method, you’ll have to let them listen to what you recorded before you use it as data to make sure they’re happy with you using it.

Another way to try and sidestep the observer’s paradox is to let people know that you are recording them and then lead the conversation through some casual topics before you get to the conversation you want to record.

By doing this, you’ll allow the speakers to get accustomed to being recorded and settle into speaking more naturally by the time it gets to the data you need. This will hopefully encourage more natural speech.

Transcribing Data

Before you start writing out your data into transcript form, you’ll need to write a sentence or two outlining some basic context. This will need to include:

Where and when the interaction is taking place
Who the speakers are
Any contextual information relevant to your study, for example, the gender of the speakers if you’re looking at language and gender

When writing out a transcript, you’ll first need to listen to your recording and write out what was said. It’s a good idea to listen to the recording a few times to make sure you write what you actually hear and not what you expect to hear.

It’s easy to mishear and automatically correct what you hear when you write it down. You’ve got to be careful not to do this when transcribing as you want a true representation of the spoken data.

If something is said that is unusual or of note (this will depend on what you’re looking for), it’s a good idea to annotate this on your transcript and to listen through again to see if it appears anywhere else as well.

Features of communication that can be shown in transcriptions:

Feature	Definition	As it would be shown in a transcript
False start	Where someone starts speaking, pauses, and starts again.	John: I don't think... I didn't really see him.
Micro-pauses	A pause in speech that is less than a tenth of a second.	(.)
Pause	A pause in speech longer than a tenth of a second, showing the length of the pause in seconds.	(0.6)
Interruptions	Where one speaker interrupts another. Two slashes indicate at what point the speaker interrupts.	John: I did see that the game // was on over the weekend.Peter: // The game was amazing!
Simultaneous speech	This is where two speakers are speaking at the same time, indicated with lines on either side of simultaneous speech.	John: Did you see the game? It was amazing, \| there was a goal right at the end of the second half! \|Peter: \| It was so close! I couldn't believe they got in there so quick with that goal. \|
Repetition	Where the same word or utterance is repeated.	John: I did see that. I did see that yeah.
Stutter	Where a speaker struggles to keep a flow in speech.	Tom: D d d did you see the g g game?
Filler	A small word inserted by a speaker in-between utterances.	John: I erm, did see uh, that it like, was really sudden.

Making note of specific speech sounds, such as phonemes can be done by using the International Phonetic Alphabet.

What is the International Phonetic Alphabet?

The International Phonetic Alphabet (IPA) was developed in the 19^th century as an internationally recognised system of phonetic symbols. Each symbol corresponds to one specific speech sound, removing the confusion caused by having multiple sounds represented by the same letters.

In English, the letter ‘c’ either sounds like ‘see’ or ‘k,’ as in the words 'cat' and 'centipede'. The IPA symbols can help us differentiate between the sounds as there is a different symbol for each different sound, such as /kæt/ for cat and /sɛntɪpi:d/ for centipede.

You can have a look at all of the different symbols are in the IPA chart here.

Transcribing spoken data Image of The International Phonetic Alphabet StudySmarter

^{Fig. 2 - IPA Chart.}

How to use the IPA when Transcribing Spoken Data

Using IPA in transcribing spoken data can make your data much more accurate and can be especially useful if you're looking at accent features such as vowel pronunciation in your spoken data. In A-level English language, you won’t be expected to transcribe whole extracts into IPA, but you will be expected to have a basic understanding of it.

Let's look at an example of how the IPA can be used to show pronunciation features.

A glottal stop is a closing of the throat which creates a pause in the airflow. Glottal stops usually replace consonants at the end or middle of words in certain languages and dialects. In the IPA, the glottal stop is represented with this symbol /ʔ/.

Let's look at the glottal stop that appears in the word hat in certain dialects.

If the ‘t’ is pronounced, it would be written as /hat/.

If the ‘t’ isn’t pronounced and is replaced with a glottal stop, it would be written as /haʔ/.

When you write something using IPA, make sure to put slanted brackets on either side of it to indicate your use of IPA. For example, /kat/ for ‘cat,’ /wau/ for ‘wow,’ and /beið/ for ‘bathe.’ The slanted brackets are for phonemic transcription (otherwise known as broad transcription) which is language-specific and records enough details to show how words differ from others in a language. Square brackets [ ] are used for narrow transcription which records as many details in the sound as possible.

In the IPA chart, there are also diacritics and suprasegmentals which are the small marks placed next to, under, or on top of vowel or consonant symbols and give much greater information about the prosodic features of the speech sounds.

Prosodic features are the extra elements of speech sound, such as tone, intonation, rhythm, and stress.

The use of suprasegmentals and diacritics can be used to show stress, syllables and the linking of speech so that you can represent in written form exactly how something has been said. When adding diacritics and suprasegmentals into your transcription, you need to use square brackets around the transcribed speech to show that it's narrow transcription.

Transcript example

This transcript is an extract from a recorded conversation between two friends (Polly and Laura) who are planning a trip. You can spot some of the features from the table earlier.

1 Polly: Well I was thinking that we could all get the train together.

2 Laura: (0.5) Yeah… Yeah well I was going to say I could drive some of (.) four

3 of us.

4 Polly: Oh yeah (2) Well how about (.) | how about girls | in the car and boys

5 on the train. | |

6 Laura: | How about we |

7 Yeah that sounds okay (1) We’ll have to //

8 Polly: // I mean (.) we’ll have to see (.) Like we’ll have to ask the boys what

9 they think

10 Laura: Yeah yeah

What are we looking at in this example?

Line 1 is an example of an utterance without any notable speech features.
In line 2, we can see that Laura took a pause of half a second before she started speaking, and then took another micro-pause later on in her utterance.
In line 4, Polly pauses for two seconds and then we see an example of simultaneous speech. In this simultaneous speech, Polly on line 4 says "how about girls" while Laura on line 6 says "how about we." As the lines are around those two sections of utterances, these are the only two sections that are spoken simultaneously.
In lines 7 and 8, we can see an interruption where the double slanted brackets are. Here, Polly interrupts Laura and then carries on speaking.

An utterance is a spoken sound, word or sentence. ‘Utterance’ is often used in relation to transcription instead of ‘sentence.’

Citing speech transcriptions

When you first reference the transcript you’re talking about in your work, it’s usually good to cite the year and to give an overview of the general context, saying briefly who the speakers are and where the conversation is taking place (providing it’s relevant to what you’re discussing). From then on, it’s usually fine to reference a line number (as all transcripts should have numbered lines) and also state who is speaking to make it clear for your reader.

Quoting transcriptions

When quoting a short utterance or a word, simply put it in quote marks as you would when quoting a book.

In line 4, Polly pauses for 2 seconds, saying "oh yeah (2) Well how about."

When you are explaining something with the help of the IPA, make sure to put that part in slanted brackets.

When quoting multiple lines, do it as a separate section underneath your paragraph and then do your explanation underneath, making sure to still reference specific line numbers.

----- Paragraph explaining your point -----

“

Quoted lines from the transcript

”

----- Paragraph discussing the quoted text -----

Transcribing Spoken Data - Key Takeaways

A transcription is a written or printed version of something.
When recording data for transcription, we have to consider ethics and the observer’s paradox.
Transcripts can be used to show features of spoken language such as interruptions, pauses and simultaneous speech.
The International Phonetic Alphabet (IPA) can be used to represent specific sounds of speech.
When citing speech transcripts, you can either quote a short utterance or a longer extract.

References

Fig. 2: IPA chart 2020 (https://commons.wikimedia.org/wiki/File:IPA_chart_2020.svg) by International Phonetic Association (https://www.internationalphoneticassociation.org/IPAcharts/IPA_chart_orig/IPA_charts_E.html) is licensed by CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/deed.en)

Already have an account? Log in

Frequently Asked Questions about Transcribing Spoken Data

What is transcription?

The process of transcription is when you record spoken data into a written or printed form so that it can be analysed.

How to cite a transcription of speech?

When first introducing the transcript, give the year and some basic context. Then (throughout your discussion and analysis), reference the line number for what you're discussing. It's also a good idea to state who is speaking for greater clarity in your explanation.

How do you transcribe a speech?

To transcribe speech, you need to record it, then write out what was said in the recording. When you have done this, you need to make clear where any features such as interruptions, pauses and simultaneous speech are.

How should a transcript look?

A transcript should have a sentence or two giving context at the beginning. Then the text should be arranged with a new line for each speaker with the speakers' names down the left of the page. Every line should be numbered.

What should be included in a transcript?

Context of the interaction including anything that's relevant to your area of research.
Line numbers.
Speech features such as pauses, interruptions, simultaneous speech, fillers and false starts.

Save Article

How we ensure our content is accurate and trustworthy?

At StudySmarter, we have created a learning platform that serves millions of students. Meet the people who work hard to deliver fact based content as well as making sure it is verified.

Content Creation Process:

Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.

Get to know Lily

Content Quality Monitored by:

Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.

Get to know Gabriel

Discover learning materials with the free StudySmarter app

About StudySmarter

StudySmarter is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

Learn more

StudySmarter Editorial Team

Team English Teachers