|
Frequency distribution. Grouped data and histogramsDate: 2015-10-07; view: 437. Answers. 1.a) 2.04; b) 2; c) 2; d) 1.09; 1.04; 2. a) 3.375; b) 3; c) 3; d) 1.08; 1.04; 3. a) 1.4; b) 1; c) 0; d) 3.061; 1.75; 4. a) 17.92; b) 18; c) 17; d) 0.89; 0.94; 5. a) 27.7; b) 25; c) 25; d) 41.98; 6.48.
Suppose a researcher wished to do study on the monthly earnings of sample of 50 employees of a large company. The researcher would first have to collect the data by asking each of 50 employees. When data are collected in original form, they are called raw data. In this case, the data are as follows: 405 510 520 880 820 780 810 580 555 790 505 610 620 650 680 350 530 495 480 695 610 710 810 525 530 680 705 370 760 590 705 300 590 390 460 590 450 540 690 480 420 410 595 750 620 850 585 690 570 560
Many persons do not like to examine a mass of numbers, and many others do not have the time to do so. Therefore, it would be advantageous if the information could somehow be ”compressed “ so that the distribution of the observations could be seen at a glance. We find, after some searching, that the smallest observation is 300 and the largest observation is 880. Let us group the observations. We could subdivide the range of data and count the number of values in each subinterval. If the lowest and highest values in a data set are known, the following expression often is helpful in determining both the width of the class interval and the number of classes desired: (1)
Using this formula with a trial class width of 100 shows that ` Rounding up, we find that 6 classes would be required for the data.
Table 1.6
The numbers 301, 400, 401, 500 are known as class limits. To find the midpoint of the upper limit of the first class and the lower limit of the second class in table 1.6 we divide the sum of these two limits by 2. Thus, midpoint is The value 400.5 is called the upper boundary of the first class and the lower boundary of the second class. By using this technique, we can convert the class limits of table 1.7 to class boundaries, which are also called real class limits. Table 1.7
Definition: The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. Definition: The difference between the two boundaries of a class is called the class width. Class width= Upper boundary – Lower boundary Definition: The class midpoint (or mark) is the average of the two limits (or two boundaries)
Remark: Other class widths may be considered in (1); the decision on the class width and the number of classes is up to the user. Definition: A frequency distribution is a table used to organize data. The left column (called classes or groups) included numerical intervals on a variable being studied. The right column is a list of the frequencies, or number of observations, for each class. Data presented in the form of a frequency distributions are called grouped data. The subintervals into which the data are broken down are called classes. In this distribution the values 300 and 400 of the first class are called class limits. For any particular class, the cumulative frequencyis the total number of observations in that and previous classes. (Table 1.8)
Table 1.8
Definition: Relative frequencyis the proportion of observations in each class. It is defined as: In addition, we often want to consider the proportion of observations that are either in that or one of the earlier classes. These proportions are called cumulative relative frequencies. Example in the table 1.9 illustartes how to construct relative frequency and cumulative relative frequency distributions.
Table 1.9
Definition: A histogram is a graph in which classes are marked on a horizontal axis and either the frequencies, relative frequencies, or cumulative relative frequencies are marked on the vertical axis. The frequencies, relative frequencies, or cumulative relative frequencies are represented by the heights of the bars. In a histogram, the bars are drawn adjacent to each other.
Remark: The symbol “ -//- “ used in the horizontal axis represents a break, called the truncation, in the horizontal axis. It indicates that entire horizontal axis is not shown in this figure. As can be noticed, the zero to 300.5 portion of the horizontal axis has been omitted in the figure 1.1.
As shown in the figure 1.2., we see, for example, that 16/50 of all employees monthly earnings are between 500.5 and 600.5. The cumulative relative frequencies are the cumulated sums of the relative frequencies. For the first class, the cumulative relative frequency is the same as the relative frequency. For subsequent classes, the cumulative relative frequency for the class to the cumulative relative frequency is obtained by adding the relative frequency for the class to the cumulative relative frequency of the previous class.
The interpretation of these quantities is very valuable. For example, 38/50 of all employees' monthly earnings are less than 700.5. The information contained in the cumulative relative frequency can also be presented pictorially, as in Fig. 1.3 1.7.1. Less than method for writing classes The classes in frequency distribution given in table 1.9 for the data on monthly-earning salaries for 50 employees were written as 301-400, 401-500, etc. Alternatively, we can write the classes in a frequency distribution table using the less than method. The technique for writing classes in previous topic is more commonly used for data sets that do not contain fractional values. The less than method is more appropriate when a data set contains fractional values. Example: The following data give the hourly wage rates for a sample of 30 employees selected from a population. 12.25 9.20 13.90 8.10 7.30 7.25 8.75 5.20 15.85 11.20 10.20 14.50 10.50 8.25 7.45 10.20 12.20 10.80 9.25 14.35 16.50 6.40 15.20 10.30 11.75 12.45 13.25 10.80 10.35 9.75 Construct a frequency distribution table. Find the relative frequency distribution table. Find the relative frequency and cumulative frequencies. Solution: The minimum value is data set is 5.20 and the maximum value is 16.50. Suppose we decide to group these data using six classes of equal width. Then We round this number to a more convenient number, say 2. Then we take 2 as the width of each class. If we start the first class at 5, the classes will be written as 5 to less than 7, 7 to less than 9, and so on as it shown in table 1.10. Table 1.10
A histogram for frequencies can be drawn in the same way as for the data of table 1.10. (Fig.1.4; Fig.1.5; Fig.1.6)
|