Learning Goal: I’m working on a python question and need an explanation and answer to help me learn.
Pick one dataset from https://data.sanjoseca.gov/dataset
1. Data Description and Curiosity Questions about the data:
- background or the context of data selected – sources, description of how it was collected, time period it represents, context in it was collected if available,
- reason(s) why you selected it?
- Description of the data:
- how big is it (number of observations, variables),
- how many numeric variables,
- how many categorical variables,
- description of the variables, if available
- Are there any missing values?
- Any duplicate rows?
- Compute summary statistics on continuous variable(s) (mean, median, mode, standard deviation, variance, range).
- Select one categorical variable, compute these statistics on a numeric variable by grouping on a categorical variable
- Record your observation. What did you find the most fascinating from your descriptive analysis.
2. Descriptive Statistics and Visualization (at least two out of the four listed below)
- Relationship between variables
- Trend
- Distribution of the variable(s)
- Spatial data representation
- Comparison of summary statistics across categories
3. Generate at least one hypothesis and perform hypothesis test.
4. Summarize your observations