Thursday, 19 January 2012

Data Handling-Frequency distribution and bar graphs

Definitions

Raw Data
Data collected in original form.
Frequency
The number of times a certain value or class of values occurs.
Frequency Distribution
The organization of raw data in table form with classes and frequencies.
Categorical Frequency Distribution
A frequency distribution in which the data is only nominal or ordinal.
Ungrouped Frequency Distribution
A frequency distribution of numerical data. The raw data is not grouped.
Grouped Frequency Distribution
A frequency distribution where several numbers are grouped into one class.
Class Limits
Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next.
Class Boundaries
Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit.
Class Width
The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class.
Class Mark (Midpoint)
The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two.
Data Handling
We know that data is a collection of observations. Visual representation of data will help us to understand it better and remember the facts easily.
The word ‘data’ means collection of information in the form of numerical figures, or a set of given facts.
Examples:
  • The marks obtained by 10 students of a class in a test are:
    76, 83, 95, 100, 56, 32, 80, 67, 75, 46
  • The following table gives the data regarding the favourite game of 100 students of a school:
SportsCricketFootballTennisBadminton
Number Of Students4030255
When some information is collected and presented randomly, then it is called raw data. A data classified into groups is called grouped data.
Data in raw form can be represented in the form of pictures and diagrams. It makes the given data attractive to the observer. Also, it is easy to understand and to compare it with other information.
Some commonly used diagrams to represent numerical data are:
  • Pictographs
  • Bar graph
  • Double bar graph
  • Pie-diagrams or Pie-charts
Collection and Presentation of Data
A systematic record of facts or different values of a quantity is called data. Data is of two types - Primary data and Secondary data. The data collected by a researcher with a specific purpose in mind is called primary data. The data gathered from a source where it already exists is called secondary data.
The difference between the highest and lowest values in the given data is called the range of the given data. The number of times a value occurs in the given data is called the frequency of that value.

data, primary data, secondary data, frequency
 
A table that shows the frequency of different values in the given data is called a frequency distribution table. A frequency distribution table that shows the frequency of each individual value in the given data is called an ungrouped frequency distribution table. A table that shows the frequency of groups of values in the given data is called a grouped frequency distribution table.
The groupings used to group the values in given data are called classes or class-intervals. The number of values that each class contains is called the class size or class width. The lower value in a class is called the lower class limit. The higher value in a class is called the upper class limit.
 
classes, class intervals, frequency, frequency distribution table, lower class limit, upper class limit, class width, grouped frequency distribution table

Graphical Representation of data
Graphical representation of data helps in faster and easier interpretation of data.
A bar graph uses bars or rectangles of the same width but different heights to represent different values of data.

frequency polygon, polygon, class interval
 
In a bar graph:
  1. The bars have equal gaps between them.
  2. The width of the bars does not matter.
  3. The height of the bars represents the different values of the variable.
bars, bar graph, graph, width of bars, height of bars
 
In a histogram:
  1. The bars do not have gaps between them.
  2. The width of the bars is proportional to the class intervals of data.
  3. The height of the bars represents the different values of the variable.
  4. The area of each rectangle is proportional to its corresponding frequency.
The area of a histogram is equal to the area enclosed by its corresponding frequency polygon.

Histogram, rectangles, bars, width of bars, class intervals, area of each rectangle, frequency polygon

Pictographs represent data through appropriate pictures. In pictographs, the same type of symbol or picture is used to represent the data. Each symbol is used to represent a certain value, and this is mentioned clearly in the graph. For example, one symbol may represent 25 students.
The following pictograph represents the number of students coming to a college by different means of transport:
data, collection of information, raw data, data, pictures and diagrams, pictographs, bar-graph, double bar graph, pie-diagrams, pie-charts, represent numerical data, pictographs, pictures, symbols, picture, representation of data
A representation of data with the help of bars or rectangles in a diagram is called a bar graph or a bar diagram.
Here, each bar represents only one value of the data, and hence, there are as many bars as the number of values in the data. The length or height of a bar indicates the value of the item. The width of a bar and the gap between the bars is kept uniform to make the diagram look neat.
The following bar graph represents the production of rice in different years:
bar graph, bars or rectangles, bar diagram, representation of data, length of height of the bar, width of the bar, gap between the bars
Sometimes, organising data becomes a tedious process. In such cases, we group the raw data. We write the groups as intervals. Each group is called a class interval. The class interval will have a lower class limit and an upper class limit.















The difference between the upper class limit and the lower class limit is called the width or size of the class interval. The number of times a particular item appears within a particular class interval is called frequency.
The span of a class interval is called the width or size of the class interval.



















We fill up the rows with tally marks and will count the total number of tally marks in each group. The number of tally marks in each group is listed in the frequency column. The completed table is called the frequency distribution table. With the data in a table, we can draw a graph.
class interval, frequency, width or size , class interval, frequency distribution table
A graph showing two sets of data simultaneously is called a double bar graph. It is useful for comparing two sets of data.
The following graph shows the strength of boys and girls in a school in different years:
bar graph, bars or rectangles, bar diagram, representation of data, length of height of the bar, width of the bar, gap between the bars
A pie diagram or a pie chart is a circle divided into several sectors. The circle represents the total value of the given data, and the sectors represent the proportion of the components of the total.
It is also called an angular diagram or a circular diagram.
The monthly expenditure on various items of a family is given below.
ItemFoodHouse Rent.Misc.School Fees
Amount SpentRs. 2500Rs. 2700Rs. 2400Rs. 1400
Its representation in a pie diagram is as shown.
pie diagram, pie-chart, representation of data, sector, circle, angular diagram, circular diagram
Applications Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables.
Statistical hypothesis testing is founded on the assessment of differences and similarities between frequency distributions. This assessment involves measures of central tendency or averages, such as the mean and median, and measures of variability or statistical dispersion, such as the standard deviation or variance.
A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphically—for example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be leptokurtic; if less peaked it is said to be platykurtic.
Letter frequency distributions are also used in frequency analysis to crack codes and are referred to the relative frequency of letters in different languages.

No comments:

Post a Comment