krit.club logo

Statistics and Probability - Data collection and sampling techniques

Grade 9IB

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Population and Sample: A population represents the entire group being studied (e.g., all students in a school), while a sample is a subset of that population (e.g., 50 students). Visually, imagine a large circle representing the population, with a smaller circle inside it representing the sample. The goal is for the sample to accurately reflect the characteristics of the larger circle.

Discrete vs. Continuous Data: Numerical data is classified into two types. Discrete data consists of distinct, countable values (e.g., the number of siblings, x{0,1,2,...}x \in \{0, 1, 2, ...\}). Visually, this is represented by isolated dots on a number line. Continuous data can take any value within a range (e.g., height, 150h190150 \le h \le 190 cm). Visually, this is represented by a solid, unbroken line or interval.

Random Sampling: In a simple random sample, every member of the population has an equal chance of being selected. This is often done using a random number generator or by pulling names from a hat. This method minimizes bias and ensures that the sample is not influenced by the researcher's preference.

Systematic Sampling: This technique involves selecting members at regular intervals from an ordered list. For example, selecting every kthk^{th} person. Visually, if you have a line of individuals, you might pick the 3rd3^{rd}, 6th6^{th}, 9th9^{th}, and so on. The interval is calculated by dividing the population size by the desired sample size.

Stratified Sampling: The population is divided into distinct subgroups, or 'strata', based on shared characteristics (like age or gender). A random sample is then taken from each stratum in proportion to its size in the population. Visually, imagine a bar chart where each bar represents a group; the sample takes a 'slice' from each bar relative to its height.

Convenience and Quota Sampling: Convenience sampling involves choosing individuals who are easiest to reach (e.g., surveying friends). Quota sampling involves selecting a specific number of people from different groups until a target is met, but unlike stratified sampling, the selection is not random. These methods are often faster but more prone to bias.

Sampling Bias: Bias occurs when the sample does not accurately represent the population, leading to skewed results. Visually, this looks like a target where all the arrows are clustered in one corner far from the center. Common causes include small sample sizes, non-random selection, or excluding certain groups within the population.

📐Formulae

Sampling Interval (k)=Nn\text{Sampling Interval } (k) = \frac{N}{n} where NN is population size and nn is sample size.

Stratified Sample Size for a Group=Number in GroupTotal Population Size×Total Sample Size\text{Stratified Sample Size for a Group} = \frac{\text{Number in Group}}{\text{Total Population Size}} \times \text{Total Sample Size}

Percentage Proportion=Sample SizePopulation Size×100%\text{Percentage Proportion} = \frac{\text{Sample Size}}{\text{Population Size}} \times 100\%

💡Examples

Problem 1:

A school has 800800 students. A researcher wants to take a stratified sample of 120120 students based on their year groups. If there are 160160 students in Grade 9, how many Grade 9 students should be included in the sample?

Solution:

Step 1: Identify the total population (N=800N = 800), the total sample size (n=120n = 120), and the size of the specific stratum (Grade 9 = 160160). Step 2: Use the stratified sampling formula: Sample size for Grade 9=160800×120\text{Sample size for Grade 9} = \frac{160}{800} \times 120. Step 3: Simplify the fraction: 160800=15=0.2\frac{160}{800} = \frac{1}{5} = 0.2. Step 4: Calculate the final number: 0.2×120=240.2 \times 120 = 24.

Explanation:

To ensure the sample is representative, the proportion of Grade 9 students in the sample must match their proportion in the entire school population.

Problem 2:

A factory produces 20002000 lightbulbs a day. The quality control manager decides to test every 50th50^{th} bulb for defects. Identify the sampling technique used and determine how many bulbs will be tested in one day.

Solution:

Step 1: Identify the sampling technique. Since the manager is picking every kthk^{th} item from a sequence, this is Systematic Sampling. Step 2: Use the formula for the number of items tested: n=Nkn = \frac{N}{k}. Step 3: Substitute the values: n=200050=40n = \frac{2000}{50} = 40.

Explanation:

Systematic sampling is used here because there is a fixed interval (k=50k = 50) for selection. Dividing the total daily output by this interval gives the total sample size.