There are no events for this day. Would you like to add one?
You must be logged in to add events in this calendar.
Karen Lum :: Activity :: Just Me | People: | Everyone | Friends & Community | Inbox | Just Me |
| Display: | Full-text | Summary |
| Include: | Blog Posts | Blog Comments | Files | Wiki Page | Wiki Comments |
| << Older | Page 1 of 10 |
Menzies, T., Elrawas, O., Baker, D., Hihn, J., and Lum, K. (Nov. 2007). On the value of stochastic abduction (if you fix everything, you lose fixes for everything else). International Workshop on Living with Uncertainty. Atlanta, Georgia, USA.
Menzies, T., Chen, Z., Hihn, J., & Lum, K. (Nov. 2006). Selecting best practices for effort estimation. IEEE Transactions on Software Engineering, 32(11): 883-95.
IS498: Independent Study with Terry Ryan: Quantitative Research Methods
Summer 2008
I first met with Dr. Ryan in the middle of the summer to work on my independent study. I liked that he was very flexible and worked with me to develop a plan. The plan was intensive, following the textbook[1] and squeezing 9 chapters into the last 6 or 7 weeks of summer. I planned to look for actual data sets to apply what I learned from each chapter. Here is my original independent study plan:
Here was our plan that we developed on June 23.
| Independent Study Plan - Summer 2008 | ||
| Read by | Readings | Assignment** done by |
| July 11, 2008 | Ch 1,2 | July 18, 2008 |
| July 18, 2008 | Ch3 | July 25, 2008 |
| July 25, 2008 | Ch 4, 5 | August 1, 2008 |
| August 1, 2008 | Ch6 | August 8, 2008 |
| August 8, 2008 | Ch7 | August 15, 2008 |
| August 15, 2008 | Ch8 | August 22, 2008 |
| August 22, 2008 | Ch9 | August 29, 2008 |
|
|
|
|
|
|
|
|
| **Assignments: Look for data sets that correspond to the chapter topic. Try to apply the data sets using the techniques and methods discussed in the corresponding chapter. | ||
The textbook was recommended to me by Dr. Ryan. I purchased a newer edition of the book than the one Dr. Ryan had shown me, but I think it was very easy to follow and gave wonderful examples. One of the most helpful things in the book is a flow chart diagram on pages 14-15 that help you determine which multivariate technique to select. I don’t have this memorized yet, but plan on going back to this diagram as an aid in my future research. This chart helped me to learn that certain quantitative techniques are for examining dependence relationships, while others are for interdependence. For examining dependence relationships, whether you have one or more dependent variables affects the technique from which you choose. Then, whether the dependent variable is a metrics or non-metric variable further affects which technique to choose. For interdependence techniques, the choices revolve around the structure of the relationships – whether they are among variables, cases, or objects.
The professor sent me several data files, and I also agreed that as part of my independent study I would search for my own data files. The most valuable thing I learned in this class was how to search for data and how to actually apply the analysis to real data. I was able to apply and practice several multivariate techniques including factor analysis, cluster analysis, multivariate analysis of variance, multiple regression, conjoint analysis, multiple discriminant analysis, correspondence analysis, and multidimensional scaling. Although I was familiar with multiple regression and cluster analysis, many of the other techniques were completely new to me.
My favorite chapter was chapter 2.: Examining data. While I knew a little on dealing with outliers and missing data, I never followed a rigorous process. I learned how you can use graphical methods to examine characteristics of the data. I learned specific methods for dealing with missing data, such as the MCAR (Missing Completely at Random) and MAR (Missing at Random) processes. Other things I learned, that never occurred to me before, includes examining effect size. I never realized that not having a proper effect size could affect statistical power. My least favorite thing was conjoint analysis. I can see how it could be useful for estimating utility in a marketing setting, but it was not as applicable for any of the work I was actually doing at my company.
I was able to take some of the techniques I learned during independent study and apply it to a project at work. We were in the middle of developing a state of software report for my company and needed to analyze a lot of quantitative data (both metric and non-metric). Here are some examples of things I actually did.
Examining Lines of Code Using Multiple Regression
I was having lots of fun performing multiple regression analysis to some company data data. I used a confirmatory approach on different types of lines of code (new lines, reused lines as-is, and inherited lines with major modifications) for different groups of software - ground and flight - to see if the weights for the different types of lines of code differed. These weights can be used to explain or predict equivalent lines of code. I had bad r-squares for both flight and ground software. However, I found it weird that for ground software that tolerance seemed high 0.98. I couldn’t do much with these results for my state of software report, but decided to save it for future exploration.
Examining Software Characteristics Using Discriminant Analysis
I then performed stepwise discriminant analysis on my COCOMO II[2] data between ground software (group 0), and flight software (group1), and there appeared to be have a highly significant discriminant function. 53% of the variance in the dependent variable (flight or ground software type) can be explained by this model of 7 independent variables. The proportional chance criterion was .7squared+.3squared=.58, and the maximum chance criterion was 70%. All the levels of accuracy exceeded these values by more than 25%. I didn’t have time to look at the casewise results to see why things might have been misclassified, but I saw value in doing the comparison in the future, when I find more time. This analysis validated for me and my research team at the company that software characteristics (such as complexity, programmer experience, tool usage, etc.) were dependent on the type of software being developed – ground software vs. flight software, and that we should continue using this split for our other analyses.
| Standardized Canonical Discriminant Function Coefficients | |
|
| Function |
|
| 1 |
| time | .412 |
| stor | .467 |
| pvol | .292 |
| acap | .276 |
| plex | .548 |
| site | -.652 |
| docu | .312 |
| Classification Function Coefficients | ||
|
| FlightOrGrnd | |
|
| 0 | 1 |
| time | 57.916 | 63.680 |
| stor | -16.003 | -8.571 |
| pvol | 77.638 | 84.217 |
| acap | 180.586 | 186.812 |
| plex | 115.151 | 130.100 |
| site | 566.272 | 527.305 |
| docu | 358.726 | 372.582 |
| (Constant) | -638.916 | -660.584 |
| Fisher's linear discriminant functions | ||
Reducing Software Characteristics Using Factor Analysis
After reading all 9 chapters and seeing how the different techniques can be used in different ways, I thought I might try factor analysis again on the COCOMO II[2] data that I had. One of the biggest complaints from software managers at my company is that there are too many cost drivers, I wanted to see if R-type factor analysis could reduce those 17 cost drivers into smaller easier to understand groups. I was able to figure out which variables to exclude from the factor analysis with low intercorrelations. However no matter what rotation methods I used I got overlapping factors. I saw where some of these overlapping factors should have been deleted from the factor analysis because there was strong practical significance in using these specific cost drivers as individual cost drivers (required software reliability and complexity are important and should be estimated for all projects no matter what). Even after deleting these variables from the analysis, it worsened and created more overlapping factors. So from a practical viewpoint, I stopped at the closest factor rotation with the least overlapping variables and considered the new groupings as a reduced set. The factors that came out of these are people factors (their capabilities), development constraints (memory, storage, documentation level), predetermined developmental factors (schedule, platform, language). This analysis reduced the number of factors from 17 down to 5. These resulting factors will be used in my report as a suggested set to use for quick cost estimation.
| Rotated Component Matrixa | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| Component | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| 1 | 2 | 3 | 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| stor | .823 |
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| docu | .760 |
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| time | .729 |
|
| .424 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| apex |
| .814 |
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| pcap |
| .791 |
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| acap |
| .736 |
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ltex |
|
| .830 | [More]
karenl | page | Dec 15, 2008 - 4:16pm
IS498: Independent Study with Terry Ryan: Quantitative Research Methods Summer 2008
I first met with Dr. Ryan in the middle of the summer to work on my independent study. I liked that he was very flexible and worked with me to develop a plan. The plan was intensive, following the textbook[1] and squeezing 9 chapters into the last 6 or 7 weeks of summer. I planned to look for actual data sets to apply what I learned from each chapter. Here is my original independent study plan:
Here was our plan that we developed on June 23.
The textbook was recommended to me by Dr. Ryan. I purchased a newer edition of the book than the one Dr. Ryan had shown me, but I think it was very easy to follow and gave wonderful examples. One of the most helpful things in the book is a flow chart diagram on pages 14-15 that help you determine which multivariate technique to select. I don’t have this memorized yet, but plan on going back to this diagram as an aid in my future research. This chart helped me to learn that certain quantitative techniques are for examining dependence relationships, while others are for interdependence. For examining dependence relationships, whether you have one or more dependent variables affects the technique from which you choose. Then, whether the dependent variable is a metrics or non-metric variable further affects which technique to choose. For interdependence techniques, the choices revolve around the structure of the relationships – whether they are among variables, cases, or objects.
The professor sent me several data files, and I also agreed that as part of my independent study I would search for my own data files. The most valuable thing I learned in this class was how to search for data and how to actually apply the analysis to real data. I was able to apply and practice several multivariate techniques including factor analysis, cluster analysis, multivariate analysis of variance, multiple regression, conjoint analysis, multiple discriminant analysis, correspondence analysis, and multidimensional scaling. Although I was familiar with multiple regression and cluster analysis, many of the other techniques were completely new to me.
My favorite chapter was chapter 2.: Examining data. While I knew a little on dealing with outliers and missing data, I never followed a rigorous process. I learned how you can use graphical methods to examine characteristics of the data. I learned specific methods for dealing with missing data, such as the MCAR (Missing Completely at Random) and MAR (Missing at Random) processes. Other things I learned, that never occurred to me before, includes examining effect size. I never realized that not having a proper effect size could affect statistical power. My least favorite thing was conjoint analysis. I can see how it could be useful for estimating utility in a marketing setting, but it was not as applicable for any of the work I was actually doing at my company.
I was able to take some of the techniques I learned during independent study and apply it to a project at work. We were in the middle of developing a state of software report for my company and needed to analyze a lot of quantitative data (both metric and non-metric). Here are some examples of things I actually did.
Examining Lines of Code Using Multiple Regression
Examining Software Characteristics Using Discriminant Analysis
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||