Collection of Statistical Data
Statistical Data:
A sequence of observation, made on a set of objects included in the sample drawn from population is known as statistical data.
(1) Ungrouped Data:
Data which have been arranged in a systematic order are called raw data or ungrouped data.
(2) Grouped Data:
Data presented in the form of frequency distribution is called grouped data.
Collection of Data:
The first step in any enquiry (investigation) is collection of data. The data may be collected for the whole population or for a sample only. It is mostly collected on sample basis. Collectionof data is very difficult job. The enumerator or investigator is the well trained person who collects the statistical data. The respondents (information) are the persons whom the information is collected.
Types of Data:
There are two types (sources) for the collection of data. (1) Primary Data (2) Secondary Data
(1) Primary Data:
The primary data are the first hand information collected, compiled and published by organization for some purpose. They are most original data in character and have not undergone any sort of statistical treatment. Example: Population census reports are primary data because these are collected, complied and published by thepopulation census organization. (2) Secondary Data: The secondary data are the second hand information which are already collected by some one (organization) for some purpose and are available for the present study. The secondary data are not pure in character and have undergone some treatment at least once. Example: Economics survey of England is secondary data because these are collected by more than one organization like Bureau of statistics, Board of Revenue, the Banks etc…
Methods of Collecting Primary Data:
Primary data are collected by the following methods:
Methods of Collecting Secondary Data:
The secondary data are collected by the following sources:
Difference between Primary and Secondary Data:
The difference between primary and secondary data is only a change of hand. The primary data are the first hand data information which is directly collected form one source. They are most original data in character and have not undergone any sort of statistical treatment while the secondary data are obtained from some other sources or agencies. They are not pure in character and have undergone some treatment at least once. For Example: Suppose we interested to find the average age of MS students. We collect the age’s data by two methods; either by directly collecting from each student himself personally or getting their ages from the university record. The data collected by the direct personal investigation is called primary data and the data obtained from the university record is called secondary data.
Editing of Data:
After collecting the data either from primary or secondary source, the next step is its editing. Editing means the examination of collected data to discover any error and mistake before presenting it. It has to be decided before hand what degree of accuracy is wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.
Data can be collected using three main types of surveys: censuses, sample surveys, and administrative data. Each has advantages and disadvantages. As students, you may be required to collect data at some time. The method you choose will depend on a number of factors.
Census
A census refers to data collection about every unit in a group or population. If you collected data about the height of everyone in your class, that would be regarded as a class census. There are various reasons why a census may or may not be chosen as the method of data collection:
Advantages (+)Sampling variance is zero: There is no sampling variability attributed to the statisticbecause it is calculated using data from the entire population.Detail: Detailed information about small sub-groups of the population can be made available. Disadvantages (–)Cost: In terms of money, conducting a census for a large population can be very expensive.Time: A census generally takes longer to conduct than a sample survey. Response burden: Information needs to be received from every member of the target population.
Control: A census of a large population is such a huge undertaking that it makes it difficult to keep every single operation under the same level of scrutiny and control.
Sample survey
In a sample survey, only part of the total population is approached for data. If you collected data about the height of 10 students in a class of 30, that would be a sample survey of the class rather than a census. Reasons one may or may not choose to use a sample survey include:
Advantages (+)Cost: A sample survey costs less than a census because data are collected from only part of a group.Time: Results are obtained far more quickly for a sample survey, than for a census. Fewer units are contacted and less data needs to be processed. Response burden: Fewer people have to respond in the sample. Control: The smaller scale of this operation allows for better monitoring and quality control. Disadvantages (–)Sampling variance is non-zero: The data may not be as precise because the data came from a sample of a population, instead of the total population.
Detail: The sample may not be large enough to produce information about small population sub-groups or small geographical areas.
Administrative data
Administrative data are collected as a result of an organization's day-to-day operations. Examples include data on births, deaths, marriages, divorces and car registrations. For example, prior to being issued a marriage license, a couple must provide the registrar with information about their age, sex, birthplace, address and previous marital status. These administrative files can be used later as a substitute for a sample survey or a census.
Advantages (+)Sampling variance is zero: There is no variability attributed to the statistic because it was calculated using data from the entire population.Time series: Data are collected on an ongoing basis, allowing for trend analysis. Simplicity: Administrative data may eliminate the need to design a census or survey and the associated work. Response burden: Since the data are already collected, there is no additional burden on the respondents. Disadvantages (–)Flexibility: Data items may be limited to essential administrative information, unlike a survey.Population: Data are limited to the population on whom the administrative records are kept. Change over time: Definitions are created to serve specific purposes, but often change and evolve over time. The statistician must understand that there is a possibility of change to the definitions of these files. Concepts and definitions: The definitions are established by those who create and manage the file for their own purposes. For example, income definitions may not include everything a user expects to see.
Data quality: The emphasis placed on data quality may differ from organization to organization. This may be evident when someone relies on data collected from another organization
Data Collection Methods: Pros and Cons
Each method of data collection has advantages and disadvantages.
|