### Collection of Statistical Data

Collection of Statistical Data

Statistical Data:
A sequence of observation, made on a set of objects included in the sample drawn from population is known as statistical data.
(1) Ungrouped Data:
Data which have been arranged in a systematic order are called raw data or ungrouped data.
(2) Grouped Data:
Data presented in the form of frequency distribution is called grouped data.
Collection of Data:
The first step in any enquiry (investigation) is collection of data. The data may be collected for the whole population or for a sample only. It is mostly collected on sample basis. Collectionof data is very difficult job. The enumerator or investigator is the well trained person who collects the statistical data. The respondents (information) are the persons whom the information is collected.
Types of Data:
There are two types (sources) for the collection of data.
(1) Primary Data (2) Secondary Data
(1) Primary Data:
The primary data are the first hand information collected, compiled and published by organization for some purpose. They are most original data in character and have not undergone any sort of statistical treatment.
Example: Population census reports are primary data because these are collected, complied and published by thepopulation census organization.

(2) Secondary Data:
The secondary data are the second hand information which are already collected by some one (organization) for some purpose and are available for the present study. The secondary data are not pure in character and have undergone some treatment at least once.
Example: Economics survey of England is secondary data because these are collected by more than one organization like Bureau of statistics, Board of Revenue, the Banks etc…

Methods of Collecting Primary Data:
Primary data are collected by the following methods:
• Personal Investigation: The researcher conducts the survey him/herself and collects data from it. The data collected in this way is usually accurate and reliable. This method of collecting data is only applicable in case of small research projects.
• Through Investigation: Trained investigators are employed to collect the data. These investigators contact the individuals and fill in questionnaire after asking the required information. Most of the organizing implied this method.
• Collection through Questionnaire: The researchers get the data from local representation or agents that are based upon their own experience. This method is quick but gives only rough estimate.
• Through Telephone: The researchers get information through telephone this method is quick and give accurate information.

Methods of Collecting Secondary Data:
The secondary data are collected by the following sources:
• Official: e.g. The publications of the Statistical Division, Ministry of Finance, the FederalBureaus of Statistics, Ministries of Food, Agriculture, Industry, Labor etc…
• Semi-Official: e.g. State Bank, Railway Board, Central Cotton Committee, Boards of Economic Enquiry etc…
• Publication of Trade Associations, Chambers of Commerce etc…
• Technical and Trade Journals and Newspapers.
• Research Organizations such as Universities and other institutions.

Difference between Primary and Secondary Data:
The difference between primary and secondary data is only a change of hand. The primary data are the first hand data information which is directly collected form one source. They are most original data in character and have not undergone any sort of statistical treatment while the secondary data are obtained from some other sources or agencies. They are not pure in character and have undergone some treatment at least once.
For Example: Suppose we interested to find the average age of MS students. We collect the age’s data by two methods; either by directly collecting from each student himself personally or getting their ages from the university record. The data collected by the direct personal investigation is called primary data and the data obtained from the university record is called secondary data.

Editing of Data:
After collecting the data either from primary or secondary source, the next step is its editing. Editing means the examination of collected data to discover any error and mistake before presenting it. It has to be decided before hand what degree of accuracy is wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.
Data can be collected using three main types of surveys: censuses, sample surveys, and administrative data. Each has advantages and disadvantages. As students, you may be required to collect data at some time. The method you choose will depend on a number of factors.

## Census

A census refers to data collection about every unit in a group or population. If you collected data about the height of everyone in your class, that would be regarded as a class census. There are various reasons why a census may or may not be chosen as the method of data collection:

Sampling variance is zero: There is no sampling variability attributed to the statisticbecause it is calculated using data from the entire population.
Detail: Detailed information about small sub-groups of the population can be made available.

Cost: In terms of money, conducting a census for a large population can be very expensive.
Time: A census generally takes longer to conduct than a sample survey.
Response burden: Information needs to be received from every member of the target population.
Control: A census of a large population is such a huge undertaking that it makes it difficult to keep every single operation under the same level of scrutiny and control.

## Sample survey

In a sample survey, only part of the total population is approached for data. If you collected data about the height of 10 students in a class of 30, that would be a sample survey of the class rather than a census. Reasons one may or may not choose to use a sample survey include:

Cost: A sample survey costs less than a census because data are collected from only part of a group.
Time: Results are obtained far more quickly for a sample survey, than for a census. Fewer units are contacted and less data needs to be processed.
Response burden: Fewer people have to respond in the sample.
Control: The smaller scale of this operation allows for better monitoring and quality control.

Sampling variance is non-zero: The data may not be as precise because the data came from a sample of a population, instead of the total population.
Detail: The sample may not be large enough to produce information about small population sub-groups or small geographical areas.

Administrative data are collected as a result of an organization's day-to-day operations. Examples include data on births, deaths, marriages, divorces and car registrations. For example, prior to being issued a marriage license, a couple must provide the registrar with information about their age, sex, birthplace, address and previous marital status. These administrative files can be used later as a substitute for a sample survey or a census.

Sampling variance is zero: There is no variability attributed to the statistic because it was calculated using data from the entire population.
Time series: Data are collected on an ongoing basis, allowing for trend analysis.
Simplicity: Administrative data may eliminate the need to design a census or survey and the associated work.
Response burden: Since the data are already collected, there is no additional burden on the respondents.

Flexibility: Data items may be limited to essential administrative information, unlike a survey.
Population: Data are limited to the population on whom the administrative records are kept.
Change over time: Definitions are created to serve specific purposes, but often change and evolve over time. The statistician must understand that there is a possibility of change to the definitions of these files.
Concepts and definitions: The definitions are established by those who create and manage the file for their own purposes. For example, income definitions may not include everything a user expects to see.
Data quality: The emphasis placed on data quality may differ from organization to organization. This may be evident when someone relies on data collected from another organization

## Data Collection Methods: Pros and Cons

Each method of data collection has advantages and disadvantages.
• Resources. When the population is large, a sample survey has a big resource advantage over a census. A well-designed sample survey can provide very precise estimates of population parameters - quicker, cheaper, and with less manpower than a census.
• Generalizability. Generalizability refers to the appropriateness of applying findings from a study to a larger population. Generalizability requires random selection. If participants in a study are randomly selected from a larger population, it is appropriate to generalize study results to the larger population; if not, it is not appropriate to generalize.

Observational studies do not feature random selection; so generalizing from the results of an observational study to a larger population can be a problem.
• Causal inference. Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups. Therefore, experiments, which allow the researcher to control assignment of subjects to treatment groups, are the best method for investigating causal relationships.