Per se Frequency Distribution is a very simple concept - Categorizing data and displaying the same in a summarized form into a relatively small number of intervals. Some properties -
- Way of showing unorganized data e.g. to show results of an election, income of people for a certain region, sales of a product within a certain period, student loan amounts of graduates, etc.
- Two major means of summarizing a set of numbers: pictures and summary numbers
- Frequency distributions can be used for both qualitative and quantitative data
- Each entry in the table contains the frequency or count of the occurrences of values within that particular group or interval
Simple Case Study
Thomas obtained the following marks in his 10 Statistics tests during the semester:
24, 26, 18, 21, 27, 27, 30, 44, 32, 38
How can you draw any inference from here?
First step is to summarize and see what the data has to say.
So how would you organize, classify this data, form the table and present it in the form of a picture? Simplest is to put it in a tabular form (Frequency Distribution):
Class
|
Frequency
|
15 - 25
|
3
|
25 - 35
|
5
|
35 - 45
|
2
|
Here the total number of classes is 3. And we can clearly see that most of the marks secured in the mid range. Small number of times very low and very high marks are secured.
So how do you draw the frequency distribution table?
- Find the range of the data: The range is the difference between the largest and the smallest values. In our example it is 44-18=26
- Decide the approximate number of classes: Which the data are to be grouped. In the case of Thomas, the data size was small so only 3 intervals were selected.
- Determine the approximate class interval size: The size of class interval is obtained by dividing the range of data by number of classes. In case of fractional results, the next higher whole number is taken as the size of the class interval.
- Decide the starting point: The lower class limits or class boundary should cover the smallest value in the raw data.
- Determine the remaining class limits (boundary): When the lowest class boundary of the lowest class has been decided, then by adding the class interval size to the lower class boundary, compute the upper class boundary. The remaining lower and upper class limits may be determined by adding the class interval size repeatedly till the largest value of the data is observed in the class.
- Distribute the data into respective classes: All the observations are marked into respective classes by usingTally Bars (Tally Marks) methods which is suitable for tabulating the observations into respective classes.
Important Points to note
- There is not right or wrong way for creating the distribution table - the only point that should be noted is that the class size should be consistent.
- Relatively uniform in terms of frequency distribution. it should not happen that all the 10 data point get in one interval and the frequency for the rest of the classes is 0.
Frequency Distribution Presentation – Histogram and Frequency Polygon
- A histogram - graphical equivalent of a frequency distribution; it is a bar chart where continuous data on a random variable's observations have been grouped into intervals
- A frequency polygon- is the line graph equivalent of a frequency distribution; it is a line graph that joins the frequency for each interval, plotted at the midpoint of that interval.
In our case study,
- If a histogram is drawn then it would be 3, 5 and 2.
- Count frequencies of a particular class and if the mid points are joined this will be called a frequency polygon for frequency distribution.
Presenting Histogram of the Data
Data of Thomas can be shown in the form of a histogram:
Presenting Frequency polygon
Midpoints of the interval of corresponding rectangle in a histogram are joined together by straight lines. It gives a polygon i.e. a figure with many angles. Unlike histograms, frequency polygons can be superimposed so as to compare several frequency distributions. For the marks obtained by Thomas we can have the frequency polygon as shown below:
Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables as well.