Wednesday, April 29, 2015

assignment 5

Part 1

The claim that as crime rates increase so does the amount of children receiving free lunches that was made by the news is correct in this statement. When running a regression analysis on these two variables the data suggests that yes there is correlation between the two where with the increase in crime rates there is also a high incidence of free lunches given out to children in schools. But if you look at the r squared value for this relationship it is .173 meaning that the two variables don't really explain each other very well meaning that the relationship is a spurious one. With this analysis one could be 99.5% sure that yes these two variables correlate but the relationship is spurious. Based off this relationship with a crime rate of 79.9 you could expect to have a free lunch percentage of about 40%.
Part 2

Intro

For part two the objective was to examine the enrollment numbers for UW system universities and try to determine if there was a certain variable that could be determined that would cause someone to choose one school over another. thanks to the professor the data of percent with a Bachelors degree, Median Household income and distance each county is from each school. Using this data it is expected to analyze the significance each has on a students university decision.This analysis will require use of regression analysis and mapping of the residuals received from this analysis. To complete these two tasks SPSS is required for the regression analysis and ARCMAP will be used for map creation purposes.

Methods
For this the data for the UW system was provided in a excel file containing broad information about the universities and Wisconsin counties. One of the first things that was done was normalize the population numbers for counties based on the distance from each institution. This is to help decrease the impact that larger counties such as Milwaukee county will have on the results, protecting from possible large outliers. Once this column was created the process of running linear regression analysis could begin. This is a statistical tool that used to evaluate the relationship between two variables often looking for causation in this analysis. This analysis was run three separate times examining the if income education or distance were the largest factor in choice. Each time the dependent variable being number of students attending and the the independent variable being the distance, percent having a bachelors degree or median household income. With these outputs the strength of the relationships could be determined and if they were considered significant or not. Another byproduct of these outputs was the residual creation for each individual county or how far the result for each county is from the best fit line. With these residuals, chloropleth maps could be created showing a visual interpretation of the data.  


Results
Figure 1 population distance for Eau Claire


Figure 2 percent bachelor degree for Eau Claire



Figure 3 Median household income for Eau Claire students


Figure 4 Population distance from county for River Falls


Figure 5 percent bachelors degree for River Falls


Figure 6  Median Household income for River Falls
Figure 7 residual map for Median Household income for River Falls

Figure 8 residual map for population distance for Eau Claire

Figure 9 residual map for percent receiving a Bachelor degree for Eau Claire


For this analysis the schools of UW Eau Claire and UW River Falls were chosen as the study area. Figures one through six are the SPSS outputs for both schools independent variables that were analyzed. The first thing analyzed was the population of the schools coming from each separate county normalized by distance from the institution. For UW Eau Claire the regression analysis produces a significance of .000 meaning that it is almost one hundred percent certain that there is correlation between population of the university and the distance they originate from. This is echoed by the the R square value of .945 so the two variables describe each other very well. This is in stark contrast to the same variables for UW River Falls where the significance of .776 and an R square of .001 meaning that the relationship between the two are neither significant or descriptive of each other. The second variable being examined is the percent of students receiving a Bachelors degree( figures 2 and 5 ). For Eau Claire the significance was .003 and the R Square was .121 meaning that it is a significant relationship between the number of students attending and amount of students receiving a bachelors degree and the two variables do  not describe each other very well. The returns for River Falls were interesting to where the data has a significance of .105 but, for the purposes pf this analysis a significance level of at least 95% is required.  But an interesting part of this return data is that even though the significance level is to high to be considered significant for this the R squared value is .037 meaning that the two variables do not explain each other very well. The final variable that was examined was the Median household income for counties. For Eau Claire the return produces a significance of .104 and an R square value of .037 meaning that the data is about 90% significant but is not significant enough to be relevant for this analysis but the two variables do not explain each other very well. This variable did in fact though describe the data set for River Falls well where it produced a significance of .028 and an R squared of .067.

Conclusions

The goal of this assignment was to determine if there is a variable that could be identified that lead to a students choice in university attended more than usual. The three variables that were examined percent of the county with a bachelors degree, Median Household income of the counties and the total population of the county normalized by the distance from the institution. For  percent of county with a bachelors degree this variable was only truly descriptive for Eau Claire with the low significance and R squared values meaning that the two variables describe each other well. For Median Household income  this seemed to only be a significant relation ship for UW River Falls. For the third variable which is population of county in relation to the distance away from the university was descriptive for UW Eau Claire with the high R square and low significance value. Based on this there is not enough evidence to say that there is one concrete variable that influences the decision of a student to choose a university.Though for each university there is one variable that was examined that was deemed significant in the decision to choose which university in the end it is ultimately the choice of the individual on where they will attend.

Friday, April 10, 2015

Assignment 4


Part 1





D. The hypothesis is that as distance from a given point increases the sound level  decreases.
E. Based solely on the data the hypothesis would be accepted. The data has a correlation of  -.896 menaing that the data has a negative correlation and the trend  line will then be a downward slope. With the correlation of the data being so close to  1 that means that the correlation is a very strong one


2.



In this correlation matrix it is examining the relationships between poverty, ethnicity and whether they walk or not. With this views the areas that are predominately Caucasian  have a strong negative relationship with the below poverty column meaning that for the most part they are not below the poverty line. Though based on this matrix areas that have a significant positive correlation with being below the poverty line, meaning that areas that have a high minority population also have a high incidence of poverty. With the walking column  all races correlate siggnifcantly meaning that there is no assumption that can be made that one race is more prone to walking from this matrix and dataset.




Part 2

Introduction

 With this part of the assignment the task was given to  analyze patterns in voting and turnout for the state of Texas. The output data for this is intended to be presented to the governor to analyze if election patterns have changed over the past 20 years. To do this analysis multiple spatial and statistical analysis programs were required to be used, those being ARCMAP GeoDa and Excel. Some of the ways the data will be anyalyzed is through correlation and spatial auto correlation ( correlation of a variable with itself through space ) analysis. The data that is being analyzed is voter turnout for both the 1980 and 2008 elections, Percent Democratic vote for both the 1980 and 2008 elections and percent Hispanic population for 2010 based on the 2010 census.

Methods 

For analysis the data was provided in the Assignment 4 folder but one section of data required downloading from the US Census website and that was the 2010 Hispanic population data. The shape file was also required to be downloaded from the US census website so the data could be properly connected to the correct counties. The only problem with the data is that the data downloaded from the census was not in the same table as the data that was provided in the assignment 4 folder. The way to fix that problem was to take the percent Hispanic population data from the downloaded data and copy it into a new column within the data that was provided to us in our folders.With these two sets of data and the shape file of Texas counties it was time to utilize ARCMAP. In ARCMAP is where the data tables were joined to the counties shape file based off of the Geo_ID column. Now that the table was joined with the shape file it was ready to by put into GeoDa so that the data could be analyzed using spatial autocorrelation and making a LISA map (Local indicators of spatial autocorrelation)

Results

Figure  1 Percent  Democratic vote 1980 
Figure 2 Percent Democratic vote 1980 LISA 
Figure 3 Percent Democratic vote 2008 
Figure 4 Percent Democratic vote 2008 LISA map

Figure 5 Percent Hispanic population 2010

Figure 6 Percent Hispanic population 2010 LISA map 

Figure 7 Voter turnout 1980

Figure 8 Voter turnout 1980 LISA map

Figure 9 Voter turnout 2008

Figure 10 Voter turnout 2008 LISA map



With election data one of the first things the is helpful to look at is the turnout or where the people voting are actually coming from. This is analyzed with figures seven through ten. Figures seven and eight show the voter turnout for the 1980 election. From the Moran's I it can be see that a majority of the data falls in the center of the plot but there are outlier data value. When looking at the areas of high concentration the high voter turnout is generally located in the north of Texas  and the areas of low voter turnout are located to the south of the state. The same trend can be said for the the 2008 election though there is a visible shift here with the areas that are high and low voter turnout seemingly decreasing  or shifting with the areas that were previously low voter turnout in the south of Texas beginning to trend to a higher turnout. The opposite can be said for the north where high voter turnout was previously found with those areas becoming normalized or shifting to a low high correlation. The second big thing to examine is what party these voter are casting their ballots for. Historically Texas is know as a republican state, and examining figure one confirms that with the areas of high democratic votes being minimal and the areas a of low democratic votes being very high. But comparing it to the 2008 data another interesting comparison can be made though the amount of counties that were before considered to be strong republican voting counties may be the same the number of counties that are voting strong democrat are actually increasing. This is really important when comparing it to voter turn out since in the LISA map for the 2008 election shows that the areas that are beginning to trend to being high voter turn out counties are also the counties that are voting predominantly democrat. Through all this analysis there has been a visible shift in voting trends in the state of Texas. Voter Turnout has begun shifting with norther counties decreasing their presence at the polls with southern  counties begining to turn out more. The other big trend shift is the shift from Texas being a predominantly republican voting state to a more well balanced with a rising democratic vote.

conclusions

With this assignment the requirement was to examine Texas voter data for the 1980 and the 2008 elections at the county level. The task was to interpret this data and determine whether any trends are developing noticeably in both voter turnout and which party the vote is cast for. Programs Such as ARCMAP and GeoDa were utilized to compile the data and create LISA maps for the data analysis. Through this data analysis visable trend emerged in the 1980 election  high voter turnout was in the north and low turn out was in the south. democratic voters were concentrated in the south. Though  in the 2008 election  these patterns seemed to start to switch with voter turnout in the north deminishing and the south beginning to increase. This means that potentially Texas could become a state that has a higher proportion or democratic voters than republican if these trend were to continue.