top of page
Couples Therapy

Research Methods

Aims & Hypotheses

The aim states the purpose of the investigation and is driven by a theory. The aim is a broad starting point that gets narrowed down into the hypothesis.

A hypothesis (plural hypotheses) is a precise, testable statement of what the researchers predict will be the outcome of the study. Hypotheses must always include the DV and both levels of IV.

An experimental hypothesis predicts what change(s) will take place in the dependent variable when the independent variable is manipulated.

The null hypothesis states that there is no relationship between the two variables being studied e.g. There will be no difference in the number of words recall between the music and no music condition.

A directional hypothesis (one-tailed) that states the direction of the difference or relationship (negative or positive correlation). This is used when previous research findings are available to help make the prediction e.g. Ppts in the no music condition will recall less words compared to the music condition.

A non-directional (two tailed) states that there will be a difference between the two groups/conditions but it does not state where the difference will occur. This is used when there are no previous research findings available to help make the prediction e.g. There will be a difference in the number of words recall between the music and no music condition.

Variables and controls

The IV is the variable that is manipulated and changed between the conditions.

The DV is the data that is collected.

Operationalisation is a clear identification/definition of the observable actions/behaviours to be recorded which enables the behaviour under review to be measured objectively.

A control condition does not involve exposure to the treatment or manipulation of the IV.

An experimental condition does involve exposure to the treatment or manipulation of the IV.

Extraneous variables

Situational extraneous variables are aspects of the environment might affect the participant's behavior e.g. noise, temperature, lighting, time of day

Participant extraneous variables are any characteristic of a participant's that could affect the DV when it is not intended to e.g. IQ, mood, age, gender

Demand characteristics are when participants detect cues from the environment which makes them guess the aim of the study and they change their behaviour as a result.

Investigator effects are where a researcher (consciously or unconsciously) acts in a way to support their prediction and therefore affects the DV.

Order effects are when when participants' responses in the are affected by the order of conditions to which they were exposed i.e. fatigue effects or practice effects.

Participant reactivity is when participants alter their behaviour because they know they are part of a study and this can threaten the internal validity of the results.

Extraneous variables are variables other than the IV which could affect the DV and should be controlled.

Confounding variables are variables which have affected the DV and have threatened the validity of the results.

Controlling variables

Random allocation is when each participant has an equal chance of being allocated to either condition. All experiments should use random allocation apart from quasi-experiments which cannot.

Counterbalancing is a way to control for order effects. Half of the participants do condition A then B and the other half do B then A. It means that any order effects will be spread out equally across both conditions.

Randomisation is used in the presentation of trials in an experiment to enable to stimuli to be presented in a random manner so the order of presentation does not have any effect on the DV e.g. words presented in a list.

Standardisation is the process in which procedures used in research are kept exactly the same apart for the IV.


Single blind study is when the participants are deliberately kept ignorant of either the group to which they have been assigned or key information in order to reduce demand characteristics.

Double blind study is when the participants and the researcher are deliberately kept ignorant of either the group to which the participants belong or key information in order to reduce demand characteristics and investigator effects.

Type of experiments

Lab experiments

A laboratory experiment is an experiment conducted under highly controlled conditions and the IV is deliberately manipulated by the researcher. Participants will typically know they are in the experiment.


  • High control – high internal validity

  • Standardised procedure – reliable methodology



  • Low ecological validity

  • Low mundane realism

  • Chances of demand characteristics


Field experiments

A field experiment is an experiment conducted in a natural environment but the IV is deliberately manipulated by the researcher. Participants typically will not know they are in the experiment.


  • High ecological validity

  • High mundane realism

  • Less chance of demand characteristics



  • Extraneous variables – questionable internal validity

  • Hard to replicate to check reliability of the findings


Natural experiments

A natural experiment is an experiment conducted in a natural environment and the IV is naturally occurring e.g. introduction of the internet to a remote community or a natural disaster


  • High ecological validity

  • Allows research to be carried out that would not normally be ethically viable



  • Confounding variables do not allow for causality to be established

  • Cannot be replicated to check reliability of the findings


Quasi experiments

A quasi experiment is an experiment conducted in a lab or natural environment. The IV is a naturally occurring characteristic of the participant e.g. autism, OCD etc. Random allocation is not possible and therefore it is not a ‘true experiment’.


See the previous types of experiments for strengths and limitation depending on where the study is carried out.


Experimental design

Independent groups design is when participants only take part in one condition.  


  • Less chance of demand characteristics affecting the DV

  • Order effects are eliminated



  • Participant EV are a threat to internal validity of the results

  • More costly and time consuming as more participants are needed


Repeated measures design is when participants take part in both conditions.



  • Participant extraneous variables are reduced

  • Less costly and time consuming as fewer participants are needed



  • Increased chance of demand characteristics affecting the DV

  • Order effects are a threat to internal validity


Matched pairs design is when participants only take part in one condition but they are matched on a key characteristic with another participant. Random allocation will be used to allocate one of the pairs and the other participant will go into the other condition.



  • Participant extraneous variables are controlled for

  • Less costly and time consuming as fewer participants are needed compared to independent groups design

  • Order effects are eliminated

  • Demand characteristics are reduced



  • Participants can never truly be matched – there will always be some differences

  • It can be time consuming to match participants


Sampling methods

The target population is the group of people the researcher wants to study. They cannot study everyone so they have to select a sample.

A sample is a small group of people who represent the target population and who are studied.

It is important the sample is representative of the target population.

Random sampling

This is a sampling technique in which every member of the target population has an equal chance of being chosen.


How to do it 

1 You need a sampling frame which is a complete list of all members of the target population is obtained

2 All of the names on the list are assigned a number

3 The sample is selected randomly for example, a computer-based randomiser or picking names from a hat



  • There is no bias with this method. Every person in the target population has an equal chance of being selected. Therefore the sample is more likely to be (but not definitely) representative.



  • Impractical: It takes more time and effort than other sampling methods because you need to obtain a list of all the members of the target population, identify the sample and then contact the people identified and ask if they will take part and not all members may wish to take part.

  • Not completely representative: unbiased selection does not guarantee an unbiased sample, for example, the random selection may only end up generating an all female sample, making the sample unrepresentative and therefore not generalisable.


Opportunity sampling

A technique that involves recruiting anyone who happens to be available at the time of your study.


How to do it

The researcher will go somewhere where they are likely to find their target population and ask people to take part.



  • Simple, quick and easy and cheap as you are just using the first participants that you find.

  • Useful for naturalistic experiments as the researcher has no control over who is being studied.



  • Unrepresentative: the sample is likely to be biased by excluding certain types of participants which means that we cannot confidently generalise the findings e.g. an opportunity sample collected in town during the day on a week day would not include those who are at work or college


Volunteer sampling

This is when people actively volunteer to be in a study by responding to a request which has been advertised by the researcher (they are self-selecting). The researcher may then select only those who are suitable for the study.


How to do it

Participants self-select by responding to an advert.



  • Most convenient and economical way method to gather a wide range of people with particular requirements for a study compared to a random sample as they have already agreed to participate.

  • Can reach a wide audience, especially online.



  • Sampling bias - particular people (with higher levels of motivation and who have more time) are more likely to volunteer so may be harder to generalise. What is it that has made them decide to take part? This may lead to a bias as they may all display similar characteristics.


Systematic sampling


Systematic sampling involves selecting names from the sampling frame at regular intervals. For example, selecting every fifth name in a sampling frame.  


How to do it

  • A sampling frame is produced, which is a list of people in the target population organised into, for example, alphabetical order.

  • A sampling system is nominated (e.g every 3rd, 6th or 8th person etc).

  • The researcher then works through the sampling frame until the sample is complete.



  • Simple to conduct (still need a sampling frame) 

  • The population will be evenly sampled using an objective system, therefore reduces researcher bias increasing the chances of a representative sample.



  • Not truly unbiased/random because not all people have an equal chance of being selected therefore representativeness is not guaranteed.


Stratified sampling

For this method, participants are selected from different subgroups (strata) in the target population in proportion to the subgroup’s frequency in the population.


How to do it

  • If a researcher wants to sample different age groups in a school, they first of all have to identify how many students are in each strata e.g. students aged 10-12, 13-15, 16-18

  • They then need to work out the % of each strata that makes up the target population. If there are 1000 students in the school and 300 of them are 10-12 years, 500 of them are 13-15 years and 200 are 16-18 the frequency of the subgroups is 30%, 50% and 20%.

  • The researcher then uses random sampling to select the sample. If the researcher wishes to have a sample of 50 participants, then 30% of the 50 should 10-12, 50% of the 50 should be 13-15 years and 20% should be 16-18 years which are selected at random from each strata.



  • Most representative than all sampling methods because there is an equal representation of subgroup, making generalisations of findings possible.



  • Knowledge of population characteristics required: stratified samples require a detailed knowledge of the population characteristics, which may not be available.

  • Time consuming: dividing the population into stratums and then randomly selecting from each can take time and more admin time



Remember the saying ‘Can Do Can’t Do With Participants’ to help you remember the ethical guidelines.


Consent: This means revealing the true aims of the study, or at least letting the participant know what is actually going to happen.

Participants must be aware of what they are needed to do as part of the study in order to give valid consent.

If the study involves children parental consent must be obtained.


Deception: This means deliberately misleading or with holding information. Deceiving participants must be kept to a minimum. Withholding details of the research to avoid influencing behaviour is acceptable, deliberately providing false information is not acceptable. If telling the truth will not have an effect on results participants must be informed.


Confidentiality: The communication of personal information from one person to another and the trust this will be protected. Psychologists need to be sure the information they publish will not allow their participants to be identified (keeping their identity secret may not be enough).


Debrief: If consent cannot be obtained (such as in a field experiment) participants must be fully debriefed afterwards. This involves telling the participant about the experiment and then giving them the option of withdrawing their information if they wish


Withdraw: Even after giving consent participants still have the right to leave the experiment at any point in time. The p’s must be made aware of this when they sign the consent form. 


Protection from harm: Participants should be no worse off when they leave an experiment as to when they arrived. Risk is considered acceptable if it is no greater than what would be experienced in everyday life.

Validity and Reliability


Validity refers to the extent to which something is measuring what it is claiming to measure.


Internal validity refers to the extent to which a study establishes a cause-and-effect relationship between the IV and the DV.


External validity

Ecological validity refers to whether the data is generalisable to the real world.

Population validity refers to whether the data is generalisable to other groups of people.

Temporal validity refers to whether the data is generalisable to other time periods.


Test validities

Construct validity: This refers to the degree to which a test measures what it claims, or purports, to be measuring e.g. How effectively does a mood self-assessment for depression really measure the construct of ‘depression’? How effectively does an IQ test really measure ‘intelligence’?


Concurrent validity: This asks whether a measure is in agreement with a pre-existing measure that is validated to test for the same (or a very similar) concept. This is gauged by correlating measures against each other. For example, does a new test measuring intelligence, agree with an existing measure of intelligence e.g. Stanford-Binet Test?


Predictive validity: This is the degree to which a test accurately predicts a criterion that will occur in the future. For example, a diagnostic test of schizophrenia has low predictive validity as being diagnosed with sz can lead to very different outcomes. Some people continue to live ‘normal’ lives whilst others struggle with homelessness and drug abuse.


Face validity: A superficial form of validity where you apply a superficial and subjective assessment of whether or not your study or test measures what it is supposed to measure.


Reliability refers to the extent to which something is consistent.


Test-retest reliability: This measures test consistency; the reliability of a test measured over time. i.e. if a person completed the same test twice at different times, are the scores are the same?

If the results on the two tests achieve a correlation co-efficient of 0.8 or above, we can assume the test is reliable.


Inter rater/observer reliability is the degree of agreement among raters.

If there is high correlation (0.8+) between the observers/ raters, the measure is reliable.


A researcher will simply observe behaviour of a sample and look for patterns. Like all non-experimental methods, in an observation we cannot draw cause and effect relationships.

Observations are used in psychological research in one of two ways, a method or a technique.

Observations can be understood about by considering its four main features:

  • The settings: Naturalistic v controlled

  • The data: Structured v unstructured

  • The participants: Overt v covert

  • The observers: Participant v nonparticipant


Naturalistic observations

This refers to the observation of behaviour in its natural setting. The researcher makes no attempt to influence the behaviour of those being observed. It is often done where it would be unethical to carry out a lab experiment.


  • High levels of ecological validity as carried out in a natural setting

  • P’s are less likely to be affected by demand characteristics as they are unaware they are being studied



  • Little control over EVs - hard to establish causality

  • Replication is often not possible - cannot check reliability of the findings


Controlled observations

This refers to an observation taking place in a controlled setting, usually behind a one way mirror so they cannot be seen. 



  • There is less risk of extraneous variables affecting the behaviour as it is in a controlled environment



  • The setting is artificial and therefore the results may lack ecological validity


Structured observations

An observation where the researcher creates a behavioural checklist before the observation in order to code the behaviour. Behaviour can be sampled using time or event sampling.

Researchers will use a standardised behaviour checklist to record the frequency of those behaviours (collecting quantitative data). The target behaviour is split up into a set of behavioural categories (behaviour checklist) e.g. aggression may be categorised as punching, kicking, shouting etc.

The behaviours should:

  • Be observable (record explicit actions)

  • Have no need for inferences to be made

  • Cover all possible components behaviours

  • Be mutually exclusive/ not overlap (not having to mark two categories at the same time)


A pilot study is a small scale study carried out before the actual research. It allows the researchers to practise using the behaviour checklist/ observation schedule. 


It is not always possible to watch and record every behaviour so researchers use a systematic method of sampling it. 


Event sampling

Counting each time a particular behaviour is observed.

Strength: Useful when the target behaviour or event happens infrequently and could be missed if time sampling was used. 

Limitation: However, if the situation is too busy and there is lots of the target behaviour occuring the researcher could not record it all


Time sampling

Recording behaviour at timed intervals

 Strength: The observer has time to record what they have seen.

Limitation: Some behaviours will be missed outside the intervals - observations may not be representative.


Strengths of structured observations

  • The behavioural checklist (coding system) allows objective quantifiable data to be collected which can be statistically analysed

  • Allows for more than one observer (due to checklist) which can increase the reliability (inter-observer reliability)


Limitations of structured observations

  • The pre-existing behavioural categories can be restrictive and does not always explain why the behaviour is happening


Unstructured observations

The observer note down all the behaviours they can see in a qualitative form over a period of time. No behavioural checklist is used.



  • They can generate in-depth, rich qualitative data that can help explain why the behaviour has occurred

  • Researchers are not limited by prior theoretically expectations



  • The observer can get drawn to eye catching behaviours that may not be representative of all behaviours occurring

  • More subjective and less comparable across researchers


Overt observations

Participants are aware that their behaviour is being studied, the observer is obvious.



  • It will better fulfil ethical guidelines (compared to covert)



  • Participants know they are being observed and therefore they may change their behaviour (participant reactivity)


Covert observations

Participants are unaware that their behaviour is being studied – the observed is covered.



  • P’s do not know they are being observed and therefore their behaviour is more likely to be natural (higher validity)



  • It can break many ethical guidelines as deception is used it may cause the p’s some psychological harm


Participant observations

The observer becomes involved in the participant group and may not be known to other ppts.


  • Being part of the group can allow the researcher to get a deep understanding of the behaviours of the group (increasing validity).


  • The presence of the researcher might influence the behaviour of the group.

  • The researcher may lose objectivity as they are part of the group.


Non-participant observations

The observer is separate from the participant group that are being observed.



  • Researchers observations are likely to be more objective as they are not influenced by the group



  • It is harder to produce qualitative data to understand the reasons for the behaviour

Self-report methods

Self-report techniques describe methods of gathering data where participants provide information about themselves e.g. their thoughts, feelings, opinions.

This can be done in written or oral form. The techniques generally used are:

  • Questionnaires

  • Interviews


There are thousands of of standardised measures used in psychology for clinical purposes and for research. Examples include:

  • Adverse Childhood Experiences (ACEs) questionnaire

  • Beck Depression Inventory

  • Frost Multidimensional Perfectionism Scale (FMPS)

  • The Holmes and Rahe Stress Scale

  • Adolescent Attachment Questionnaire

  • McGill Pain Questionnaire



Questionnaires are a written self-report technique where participants are given a pre-set number of questions to respond to. They can be administered in person, by post, online, over the telephone, or to a group of participants simultaneously.

A closed question is when there are only a certain amount of choices available to answer. Closed questions give quantitative data and are easier to analyse.

There are many types of closed style questions such as: Likert scale and rank styles questions.

Open questions produce qualitative data as they allow participants to give a full, detailed answer and there is no restriction on what the participants can say. Open questions could lead to ideas for further investigation. Respondents can find open questions less frustrating than forced choice.

Standardised instructions: These are a set of written or recorded instructions that are given to the participant to ensure that all ppts receive them in the same way. This increases the reliability and validity of the research. It is used as a control to standardise the procedure.


  • Social desirability bias is reduced as no interviewer is present and questionnaires are often anonymous

  • A large amount of data can be collated very quickly which can increase the representativeness and generalisability

  • Data can be analysed easier than interviews (if mostly quantitative)



  • The options given may not reflect the p’s opinion and they may b forced into answering something which does not fit - lowering validity of the findings

  • The quantitative data produces less rich data than interviews




Interviews are self-report techniques that involve an experimenter asking participants questions (generally on a one-to-one basis) and recording their responses.



Structured interview has predetermined questions. It is essentially a questionnaire that is deliver face-to-face (or over the phone).


  • Standardised question means it can be replicated

  • Reduces differences between interviewers (consistency = higher reliability)

  • Quick to conduct


  • Interviewers cannot deviate from the topic or elaborate points

  • Mainly produces quantitative data which lacks insight

Unstructured interview has less structure. They may start with some predetermined questions and then new questions may develop during the interview depending on the answers given.



  • More flexibility allows for the collection of rich data which offers a deeper insight and for the interviewer to follow up, explore more or seek clarification


  • Difficult to analyse as they produce lots of qualitative data. The researcher should demonstrate reflexivity.

  • Interviewees may not be truthful due to social desirability bias which lowers the validity of the findings


Semi-structured interview is a mix of structured and unstructured and is often the most successful approach.

Most interviews involve an interview schedule, which is the list of questions that the interviewer intends to cover. This should be standardised for each participant to reduce the contaminating effect of interviewer bias.


The interviewer may take notes throughout the interview, although this can interfere with their listening skills. The interview may be audio or video recorded and analysed later. Any audio recordings must be turned into written data which is called an interview transcript. This must protect anonymity.​​

Case studies

A case study involves the detailed study of a single individual or a small group of people. Conducting a case study usually (but not always) involves the production of qualitative data.


Researchers will construct a case history of the individual concerned using interviews, observations, questionnaires or psychological testing to assess what they are (or are not) capable of. Testing may produce quantitative dataTriangulation means using more than one method to check the validity of the findings.



  • Phineas Gage - the effect of damage to the prefrontal cortex on personality

  • Genie - investigating the effect of abuse/ neglect on development

  • David Reimer - investigating whether gender is biologically determined or socialised

  • HM - investigating the impact of damage to the hippocampus on memory

  • Clive Wearing - investigating the impact of damage to the hippocampus on memory


Case studies are generally longitudinal studies, which means they follow the individual or group over an extended period of time. The strength is they allows to look at changes over time. However, participants may drop out which can lead to a small sample size. The number of people dropping out is the attrition rate.



  • They offer high levels of validity as they go into depth and give a rich insight.

  • They allow multiple methods to be used (triangulation) = increasing validity.

  • They allow researchers to study events or complex psychological areas they could not practically or ethically manipulate.

  • Efficient as it only takes one case study to refute a theory.



  • Researcher bias: researchers can become too involved and lose their objectivity - misinterpreting or influencing outcomes.

  • Lack of control: there are many confounding variables that can affect the outcome.

  • As they are unique they can be difficult to replicate and therefore lack scientific rigour.

Correlational research

This design looks for a relationship or association between two variables (co-variables).

  • If the two variables increase together then this is a positive correlation

  • If one variable increases and the other decreases then this is a negative correlation

  • If there is no relationship between variables this is called a zero correlation


A correlation can be illustrated using a scattergram. Scores for two variables are obtained which are used to plot one dot for that individual. The scatter of dots indicates the degree of correlation between the co-variables. This is a correlation co-efficient (or Pearson’s r). A strong correlation needs to be 0.8+.



  • They can be used when it would be impractical or unethical to manipulate variables using another method

  • It can make use of existing data (secondary), and so can be a quick and easy way to carry out research

  • Often, no manipulation of behaviour is required. Therefore, it is often high in ecological validity because it is real behaviour or experiences.

  • Correlations are very useful as a preliminary research technique, allowing researchers to identify a link that can be further investigated through more controlled research.



  • Correlation does not equal causation. No cause and effect relationships can be inferred as the direction of the relationship is not know.

  • The relationship could be explained by a third intervening variable. Correlations are open to misinterpretation.

Types of data

Qualitative data

It gathers information that is not in numerical form but in words and often describing thoughts, feelings, opinions. It rich in detail and might include a reason as to why the behaviour occurred. Qualitative researchers are interested in trying to see through other people’s eyes. Qualitative research acknowledges it is subjective, unlike the scientific approach which is objective.


Examples include answers given in an interview, descriptions of an observation, explanations or opinions in a questionnaire.



  • Provides a rich insight and understanding of an issue

  • It can help explain the why of a phenomena

  • It is less reductionist than quantitative data



  • More open to researcher bias in the interpretation of the results

  • Harder to analyse


Quantitative data

Quantitative data is numerical data that can be statistically analysed. It does not include a reason or explanation (the why) for the numerical answer given.

Examples include: Numerical data collected in experiments, observations, correlations and closed/rating scale questions from questionnaires all produce quantitative data.



  • Allows for easier analysis and comparison

  • Objective and scientific

  • Less chance of researcher bias



  • The why often cannot be answered

  • Can be viewed as reductionist as complex ideas are reduced to numbers


Many studies collect a mixture of both qualitative and quantitative data e.g. in Milgram - 65% fully obedient, but he also reported the comments of the observations and interviewed the ppts afterwards, providing further insight into the participants. They can be complimentary and are not mutually exclusive.


Primary data

Primary data is any data that has been collected by the psychologist for the purpose of their own research or investigation. It is of direct relevance to their research aim and hypothesis.


It can include:

  • Answers from a questionnaire

  • Notes from an observation

  • Results from an experiment



  • Gathered for the aim of the study

  • Replicable (check for  reliability of results)

  • Taken directly from the population (generalisability)



  • Researcher bias

  • Time and effort

  • Need a large sample to make it generalisable


Secondaary data

Secondary data is any data that already exists and was collected for another purpose.


It can include:

  • Government statistics

  • Newspapers

  • Websites



  • Easier to access than primary

  • Large samples may exist e.g. Govt data



  • Data may not fit what the researcher wants to find out

  • May be of poor quality


Levels of data

Nominal level data is data that can be grouped into categories e.g. favourite lessons. There is no logical order to the categories. The most appropriate measure of average for this level of measurement is the mode. This is the lowest level of data.


Ordinal level data is data that is presented in ranked order (e.g. places in a beauty contest) but the values themselves have no meaning e.g. The person who comes 1st is not twice as beautiful at the person who comes 5th (out of ten competitors). The data may be subjective e.g. happiness scores. The most appropriate measure of average for this level of measurement is the median. This is the mid-level of data.


Interval level data is measured in fixed units with equal distance between points on the scale. For example, time measured in seconds. The most appropriate measure of average for this level of measurement is the mean. Psychological tests which are standardised (psychometric test) e.g. IQ scores, are classed as interval. This is the highest level of data.


Ratio level data is like interval but it has a true value of zero (it is not on the specification – so class it as interval).

Measures of central tendency



The value that occurs the most frequently in a data set. It is used with nominal level data.

 How it is calculated: Identify the value that is most common. If there are two values in common then this is known as bimodal.



  • It is the only average to use when the data is nominal

  • It is easy to calculate



  • It is a very basic method and is not always representative of the data

  • When the data is bimodal or there is no mode it has limited usefulness



It is the middle score in a data set. It is used with ordinal level data.

 How it is calculated: Arrange the scores from lowest of highest and identify the middle value.



  • It is less effected by extreme scores



  • It does not take account of all ppt scores and is therefore not representative



The mean is a mathematical average of a set of scores. It is used with Interval and ratio level data.

 How it is calculated: 1. Arrange the scores from lowest of highest and identify the middle value

2. If there are even numbers in the data set, you locate the 2 middle values, add 1 and divide by 2.



  • The most sensitive as it takes into account all of the data = more representative of the scores of all ppts



  • Easily distorted by anomalous data


Measures of dispersion


The range

The range is the difference between your highest and lowest values. It is used with the median.

How it is calculated: Take the lowest value from the highest value and adding 1. Adding 1 enables the data to be rounding up or down to account for any margin of error.


  • Quick and easy to calculate



  • It does not take central values of a data set into account, and so it can be skewed by extremely high or low values.


Standard deviation

Standard deviation is a single value that tells us how far scores deviate (move away) from the mean. It is used with the mean.

The larger the SD, the greater the dispersion/spread within a set of data. A large SD suggests that not all ppts were affected by the IV. A low standard deviation is better as it means that the data are tightly clustered around the mean, suggesting ppts responded in a similar way and the results are more reliable.


  • This is a much more precise measure than the range as it includes all values.



  • Extreme values may not be revealed


Graphs and charts


Bar charts are a simple and effective way of presenting and comparing data. They are most useful with nominal data because each bar represents a different category of data. It is important to leave a space between each bar on the graph in order to indicate that the bars represent ‘separate’ data rather than ‘continuous’ data. Different categories of data are known as discrete data. The bars can be drawn in any order.


Histograms are mainly used to present frequency distributions of interval data. The horizontal axis is a continuous scale. There are no spaces between bars, because the bars are not considered separate categories. They are used when the data is interval and is continuous.


A scatter gram (or scatter graph) is a graphical display that shows the correlation or relationship between two sets of data (or co-variables) by plotting dots to represent each pair of scores. A scatter gram indicates the strength and direction of the correlation between the co-variables.


Tables are a way of presenting quantitative data in a clear way that summaries the raw data. They normally show the measures of central tendency and measures of dispersion of each condition which then can be compared. The distribution of the data can also be assessed if the mode, median and mean is presented.  

Distribution of data

Normal distribution

The normal distribution is a probability distribution bell-shaped curve. It is the predicted distribution when considering an equally likely set of results. For example, if you measure height of all the people in your school.

The mean, median and mode are all in the exact midpoint (allow a tolerance of 0.5 in the exam).


Positive skew

A positive Skew is where most of the distribution is concentrated towards the left of the graph, resulting in a long tail on the right. Imagine a very difficult test in which most people got low marks with only a handful of students at the higher end. This would produce a positive skew.

The rule is that the median and mean is higher than the mode. This means that most people got a lower score than the mean.


Negative skew

The opposite occurs in a negative skew. A very easy test would produce a distribution where the bulk of scores are concentrated on the right, resulting in the long tail of anomalous scores on the left. The mean is pulled to the left this time (due to the lower scorers who are in the minority), with the mode dissecting the highest peak and the median in the middle. 

The rule is that the median and mean is lower than the mode.


Inferential statistics

Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population. They determine if the data is statistically significant.


There are three factors which affect which test you use:

1. Whether you’re testing for a difference or a correlation (association)

2. The design of the investigation: independent groups (unrelated) OR repeated measures (related)

3. What type of data you have (nominal/ interval/ordinal)


To help you remember the table learn the phrase: Carrots should come mashed with suede under roast potatoes


                                                           Tests of difference                                               Tests of association

                                             Unrelated                              Related

                                  (independent groups)     (matched pairs and

                                                                                     repeated measures)


Nominal                         Chi squared                          Sign test                                     Chi squared


Ordinal                       Mann Whitney                    Wilcoxon                                  Spearman’s rho

Interval                         Unrelated t-test                Related t-test                                Pearson’s R

The Sign test

The sign test is a non-parametric statistical test of difference that allows a researcher to determine the significance of their investigation. It is used in studies that have used a repeated measures design, where the data collected is nominalYou need to know how to calculate this test.



1. Work out whether the data has changed by putting an + or – in the final column (the number is not needed)

2. Work out how many + or – there are (not any scores that are the same are removed and not counted)

3. The smallest number or + or – is your S value which you then need to use in a critical table to check whether the result is significant. 

4. The N value = the number of ppts – any 0 values


Levels of significance

The usual level of significance (alpha level) in psychology is 0.05. This means we can say that the chance of the results from the investigation being down to chance is 5% (1 in 20). This means there is still up to a 5% chance that the observed results happened by chance (a fluke). This is properly written as p ≤ 0.05 (p stands for probability).


There are some instances when a psychologist might choose to use a significance levels of 0.01. This means there is only up to a 1% chance that the observed results happened by chance. This is a stricter test and allows the researcher to be more confident.


Three factors to consider when comparing calculated values and critical values:

  • One-tailed/two-tailed hypothesis

  • Number of participants  (e.g. n=20)

  • Significance level (e.g. 0.05)

Writing up significance

The result is significant/ not significant because the observed value (T = 7) is higher / lower than the critical value of 11 with N = 10 and at the 0.05 level of significance.

For a sign test the N value is number of ppts – the number of ‘0’s.


Errors in significance testing

Type 1: False positive: Rejecting the null hypothesis, when there is a possibility that the results were due to chance or other extraneous variables. Often caused by using a significance level that is too lenient e.g. 10%, 0.10, 1 in 10, p≤0.10. Not being cautious enough.


Type 2: False negative: Accepting the null hypothesis, when there is a possibility that the results were significant. Often caused by using a significance level that is too strict e.g. 1%, 0.01, 1 in 100, p≤0.01. Being over cautious.


Qualitative data analysis

Content analysis

Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within qualitative data. It could be analysis of texts, pictures, transcripts, diaries, media etc.  

Data can be placed in categories and counted (quantitative) or be  analysed in themes (qualitative). 


How to carry content analysis out 

  • The researcher will read and reread the text  

  • Then devise coding units they are interested in measuring e.g. frequency of sexist words  

  • Reread the text and tally every time that the coding unit occurs  

  • A second researcher is often used to check the consistency of the coding by comparing the outcome (inter-rater reliability)  

  • The correlation co-efficient needs to be 0.8+ 



  • It is a reliable way to analyse qualitative data as the coding units are not open to interpretation and so are applied in the same way over time and with different researchers (inter-rater reliability)  

  •  It allows a statistical analysis to be conducted and an objective conclusion to be drawn 



  • As it only describes the data (i.e. what is happening) it cannot extract any deeper meaning or explanation (i.e. the why).  

  • Causality cannot be established as it merely describes the data 


Thematic analysis

Thematic analysis is a type of content analysis where you let the themes emerge from your interpretation. The researcher does not have any pre-determined ideas of the themes that may emerge. Thematic analysis turns the data into higher order and subordinate themes. 

How to carry thematic analysis out 

  • Make transcriptions if needed  

  • Read and reread the data / transcript and try to understand the meaning communicated  

  • Use coding to initially analyse the transcripts  

  • Review the transcriptions/codes looking for emergent themes 



  • More meaning can be established compared to content analysis as it keeps the data as qualitative data  

  Content and thematic analysis have high ecological validity because it is based on observations of what people actually do. 



  • It is subjective. Bias reduces the objectivity of the findings because different observers may interpret the themes differently.  

  •  Analysis is also likely to be culture biased because the interpretation of verbal or written material will be affected by the language and culture of the observer and the themes identified. 

Scientific report writing

Writing a scientific report

Psychologists need to write up their research for publication in journal articles. They have to use a conventional format which includes the following:


  • Abstract – an overview of the whole report which is around 150-300 words and includes the introduction, methods, results and discussions. It is at the beginning of the report. 

  • Introduction – literature review (theories and studies), broad to specific, aims and hypotheses

  • Method – design, sample (including method), target population, materials, procedure (standardised instructions in a recipe style), briefing, debriefing, ethics

  • Results – descriptive stats, inferential stats (test, critical and calculated values, levels of significance), acceptance/ rejection of hypotheses or qualitative analysis

  • Discussion – Summarise the results, link back to previous research and theories, limitations of the research and future recommendations, real world application 

  • Referencing – cite all the literature that has informed the project


Referencing is an important aspect of psychological reports/journals. The reference section of a journal includes full details of any sources, such as journal articles or books, that are used when writing a report. References ensures that work is not plagiarised. There are many styles but Harvard style would be formatted as below:

Family name, INITIAL(S). Year. Title. Edition (if not first edition). Place of publication: Publisher.


Peer Review

Peer review is a process that takes place before a study is published to check the quality and validity of the research and to ensure that the research contributes to its field. 

The stages of peer review

  • Research is conducted

  • Research is written up and submitted for publication

  • Research is passed on to the journal editor, who reads the article and judges whether it fits in with the journal

  • Research is sent to a group of experts who evaluate its quality and either accept or reject it

  • If it is accepted, the researcher gets it back with recommendations

  • The editor then decides whether it is published or not

The purpose of peer review

  • Prevents dissemination of unwarranted claims / unacceptable interpretations / personal views and deliberate fraud – improves quality of research

  • Validates the quality of research. Ensures published research is taken seriously because it has been independently scrutinised

  • It suggests amendments or improvements. Increases probability of weaknesses / errors being identified – authors and researchers are less objective about their own work.

  • To allocate research funding – research is paid for by various government and charitable bodies. For example, the government-run Medical Research Council had £605 million to spend in 2008-09 and have a duty to spend this responsibly. They have a vested interest in establishing which research projects are most worthwhile.

Features of Science

Remember THE PROF

Theory construction: Psychologists make theories based on data collected from studies. They then update theories based on new evidences.

Hypothesis testing: Psychologists set hypotheses and collect data to test them. They control variables in experimental methods. The IV and DV should be operationalised (precisely defined and measured). Less extraneous variables can become confounding and cause and effect can be established which leads to high internal validity.

Empirical methods: These are methods that are observable and measurable. Observations must be made based on sensory experiences rather than simply describing ones own thoughts and beliefs. A scientific idea is one that has been subjected to empirical testing e.g. an experiment. Science therefore involves making predictions, tested by scientific empirical observations.

Paradigm: According to Kuhn for a subject to be scientific there needs to be a least one shared paradign. It could be argued Psychology does not have one due to the many approaches offering different explanations. However, others may argue that the cognitive approach is a shared paradigm in Psychology.

Replicability: Replication is concerned with the ability to repeat an experiment. This can be ensured through a standardised procedure. This allows the reliability of the results to be checked.

Objectivity: The terms relates to a judgment, theory, explanation & findings based on observable phenomena (facts) and is uninfluenced by emotions or personal prejudices. It is important to remain objective to not allow research bias to threaten the validity of the results.

Falsifiability: Popper proposed falsifiability as an important feature of science. It is the principle that a theory can only be considered scientific if in principle it was possible to establish it as false.

Paradigms and paradigm shifts

A paradigm is a set of shared assumptions/beliefs about how behaviour/thought is studied/explained eg a focus on causal explanations of behaviour. A paradigm shift occurs where members of a scientific community change from one established way of explaining/studying a behaviour/thought to a new way, due to new/contradictory evidence. This shift leads to a ‘scientific revolution’ eg the cognitive revolution in the 1970s and the current emphasis on cognitive neuroscience.

Kuhn would argue Psychology is a pre-science as there is a range of views, a range of theories with no consensus and therefore it does not have an agreed paradigm. Not all people would agree with this statement as the cognitive approach is a main way of thinking in modern Psychology.

Theory construction

A theory is a set of general laws or principles which can explain and predict human behaviour. Theory construction enables predictions to be made which can be translated into hypotheses. Theories are tested by empirical methods and are refined in the light of evidence. This knowledge allows theory construction and testing to progress through the scientific cycle of enquiry.


Like what you see? Get in touch to learn more.

bottom of page