Adding the Import Method for the 2013 Data #2

Closed
opened 2016-03-11 16:09:45 +00:00 by Llewellyn · 30 comments
Owner

There is a mismatch between the old data 2010 and the new data for 2013.

This means that we will possibly need to rebuild the import method.

At this point we can see that there are far more data then in the 2010 data set, this could also impact the grouping of the cause/risks considerably.

Matthew please give us as much input as you can in regards to this, thanks!

There is a mismatch between the old data 2010 and the new data for 2013. This means that we will possibly need to rebuild the import method. At this point we can see that there are far more data then in the 2010 data set, this could also impact the grouping of the cause/risks considerably. Matthew please give us as much input as you can in regards to this, thanks!
distantobserver commented 2016-03-31 12:11:04 +00:00 (Migrated from github.com)
Author
Owner

Sorry for delay

My reading of the 2013 data is as per attached:

Am I correct in assuming that we need to map the name changes on the tool itself now, or do we need to make other adjustments first.

If a major overhaul is needed, please let me know asap.

Sorry for delay My reading of the 2013 data is as per attached: Am I correct in assuming that we need to map the name changes on the tool itself now, or do we need to make other adjustments first. If a major overhaul is needed, please let me know asap.
distantobserver commented 2016-03-31 12:12:38 +00:00 (Migrated from github.com)
Author
Owner
[2013 GBD Data changes.docx](https://github.com/namibia/CBP-Joomla-3-Component/files/197573/2013.GBD.Data.changes.docx)
Author
Owner

Okay from what I see this is a huge change and will be a major overhaul of the import function.

The old data also had age groups pre cause/risk so this is not new. The import function itself will be a complete rewrite to account for the cause => sub risk concept, since I presume that you would like to in someway use this relationship to possibly replace the "Ref Nr" concept currently in use.

So the current data we have a "Ref Nr" key (A.01.02) to sub divide the data in the selection list. What will be used to do this now, is it as mentioned above or is there still another even more subdivision?

Okay from what I see this is a huge change and will be a major overhaul of the import function. The old data also had age groups pre cause/risk so this is not new. The import function itself will be a complete rewrite to account for the cause => sub risk concept, since I presume that you would like to in someway use this relationship to possibly replace the "Ref Nr" concept currently in use. So the current data we have a "Ref Nr" key (A.01.02) to sub divide the data in the selection list. What will be used to do this now, is it as mentioned above or is there still another even more subdivision?
distantobserver commented 2016-04-01 06:14:36 +00:00 (Migrated from github.com)
Author
Owner

I think we can use the same referencing system.

If it really is a big job. Let's meet Monday and put a ToR together.

I think we can use the same referencing system. If it really is a big job. Let's meet Monday and put a ToR together.
distantobserver commented 2016-04-28 11:15:13 +00:00 (Migrated from github.com)
Author
Owner

AS agreed. We will conduct the following:

  1. Redevelop data download tool so we can import data
  2. Allow backwards compatibility
  3. Move modification tool to backend of database (from hard code) so that it can be modified in future by administrator

Please keep me up to date

AS agreed. We will conduct the following: 1. Redevelop data download tool so we can import data 2. Allow backwards compatibility 3. Move modification tool to backend of database (from hard code) so that it can be modified in future by administrator Please keep me up to date
Author
Owner

I am looking for the raw data file at http://ghdx.healthdata.org/geography/namibia but I am not having success. Up till now I have been working with the data file you gave me, (GBD2013 deaths.xlsx) but I don't have the other, and I would like to see other countries as well. Please give links to the download page for Namibia and explanation how to get to the other Countries data.

I see that each cause has a number and so does the risks do you think we can use those? or will they change again. I also see that the sex and age also have ID's this can be very helpful.

Okay so to import the data the first year is 1990, there is 2000 and 2005 and 2010 also 2015 should we add all the years? or only 2010 and 2015.

The the risks are linked to causes almost like a breakdown, should those also be imported and chained in relation?

I am looking for the raw data file at http://ghdx.healthdata.org/geography/namibia but I am not having success. Up till now I have been working with the data file you gave me, (GBD2013 deaths.xlsx) but I don't have the other, and I would like to see other countries as well. Please give links to the download page for Namibia and explanation how to get to the other Countries data. I see that each cause has a number and so does the risks do you think we can use those? or will they change again. I also see that the sex and age also have ID's this can be very helpful. Okay so to import the data the first year is 1990, there is 2000 and 2005 and 2010 also 2015 should we add all the years? or only 2010 and 2015. The the risks are linked to causes almost like a breakdown, should those also be imported and chained in relation?
distantobserver commented 2016-05-06 09:25:58 +00:00 (Migrated from github.com)
Author
Owner

Have a look at the top file in the following link
[https://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900]

Its almost 1GB so I can't download now (office wifi on the fritz), but I suspect it may have all data for all countries,

Have a look at the top file in the following link [https://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900] Its almost 1GB so I can't download now (office wifi on the fritz), but I suspect it may have all data for all countries,
Author
Owner

These files have a different layout then the one you gave me.
Namibia-Death & Namibia-YLD
These zip files are not that big (only 2.5mb each), so if you can just confirm that these are the correct files.

Then the numbers in the files are under three columns [mean] [lower] [upper] should we use the mean value?

Then about the years, should we import all the years or only 2010 and 2013?

These files have a different layout then the one you gave me. [Namibia-Death](http://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900/download?path=%2FIHME%20GBD%202013%20Deaths%20by%20Location%201990-2013&files=IHME_GBD_2013_NAM_DEATHS_BY_CAUSE_1990_2013_Y2014M12D17.zip) & [Namibia-YLD](http://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900/download?path=%2FIHME%20GBD%202013%20YLLs%20by%20Location%201990-2013&files=IHME_GBD_2013_NAM_YLL_BY_CAUSE_1990_2013_Y2014M12D17.zip) These zip files are not that big (only 2.5mb each), so if you can just confirm that these are the correct files. Then the numbers in the files are under three columns [mean] [lower] [upper] should we use the mean value? Then about the years, should we import all the years or only 2010 and 2013?
distantobserver commented 2016-05-10 10:03:13 +00:00 (Migrated from github.com)
Author
Owner

Firstly the one file is YLL not YLD so we need the other one. It seems to me that the risks are not included in these files. I will look in greater detail this afternoon to confirm.

I have asked Patrick to join the conversation to assist.

Firstly the one file is YLL not YLD so we need the other one. It seems to me that the risks are not included in these files. I will look in greater detail this afternoon to confirm. I have asked Patrick to join the conversation to assist.
distantobserver commented 2016-05-10 10:03:46 +00:00 (Migrated from github.com)
Author
Owner

where did you get those zip files from?

where did you get those zip files from?
Author
Owner

From your link it is the same place. I just opened IHME GBD 2013 Deaths by Location 1990-2013 & IHME GBD 2013 YLLs by Location 1990-2013 and looked for the country download. So if you can give me direct links to the folders in which the correct files are found it would be great, since your initial link was to a folder with many sub folders, I just opened what seemed most correct.

From [your link](https://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900) it is the same place. I just opened [IHME GBD 2013 Deaths by Location 1990-2013](http://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900?path=%2FIHME%20GBD%202013%20Deaths%20by%20Location%201990-2013) & [IHME GBD 2013 YLLs by Location 1990-2013](http://cloud.ihme.washington.edu/index.php/s/b89390325f728bbd99de0356d3be6900?path=%2FIHME%20GBD%202013%20YLLs%20by%20Location%201990-2013) and looked for the country download. So if you can give me direct links to the folders in which the correct files are found it would be great, since your initial link was to a folder with many sub folders, I just opened what seemed most correct.
phswisstph commented 2016-05-10 16:20:07 +00:00 (Migrated from github.com)
Author
Owner

Hi from Basel, I'm sorry to join in so late... I will have a look again before our call tomorrow and we can discuss what is still needed to be clarified.

Hi from Basel, I'm sorry to join in so late... I will have a look again before our call tomorrow and we can discuss what is still needed to be clarified.
distantobserver commented 2016-05-10 16:22:51 +00:00 (Migrated from github.com)
Author
Owner

Ah OK. I see. Sorry, my internet was cutting so I had trouble with the page. So it seems there is no YLDs? I'll chat to Patrick tomorrow morning and get back to you.

Ah OK. I see. Sorry, my internet was cutting so I had trouble with the page. So it seems there is no YLDs? I'll chat to Patrick tomorrow morning and get back to you.
phswisstph commented 2016-05-11 06:56:59 +00:00 (Migrated from github.com)
Author
Owner

The data are extractable from: http://ghdx.healthdata.org/global-burden-disease-study-2013-gbd-2013-data-downloads-full-results
You need to choose YLDs on the left hand side and you should get a zip with all the data for each country.
Let me know if this isn't what it should be. The same can be used to extract the mortality data...

The data are extractable from: http://ghdx.healthdata.org/global-burden-disease-study-2013-gbd-2013-data-downloads-full-results You need to choose YLDs on the left hand side and you should get a zip with all the data for each country. Let me know if this isn't what it should be. The same can be used to extract the mortality data...
phswisstph commented 2016-05-11 07:02:35 +00:00 (Migrated from github.com)
Author
Owner

you need to choose the country on the right and then can select from the menu the deaths and will get a zip file and then the ylds and you get another zip file

you need to choose the country on the right and then can select from the menu the deaths and will get a zip file and then the ylds and you get another zip file
distantobserver commented 2016-05-11 07:28:53 +00:00 (Migrated from github.com)
Author
Owner

OK. Thanks Patrick.

We are in agreement that we only import 2010 and 2013 data.

I'm still getting network failure messages on this site (I think that's why I used the other link). Hopefully Llewellyn will have more luck and we can look into the data.

OK. Thanks Patrick. We are in agreement that we only import 2010 and 2013 data. I'm still getting network failure messages on this site (I think that's why I used the other link). Hopefully Llewellyn will have more luck and we can look into the data.
phswisstph commented 2016-05-11 08:14:24 +00:00 (Migrated from github.com)
Author
Owner

you need to choose the country on the LEFT and then can select from the menu the deaths and will get a zip file and then the ylds and you get another zip file

you need to choose the country on the LEFT and then can select from the menu the deaths and will get a zip file and then the ylds and you get another zip file
Author
Owner

Hi Patrick, thanks!

I think I have the files now (at last 👍 ), this is the file name for Namibia YLD IHME-Data-Namibia-YLDs.csv and DEATH IHME-Data-Namibia-YLDs.csv you can download them to check if they are the correct files.

Matthew is this all the files I need for Namibia I mean these two (kind) of files per country?

Then in the YLD and Death file what column in the spreadsheet is the YLD's and Death's values, the column names with numbers in both files are as follow:

  1. nm_mean
  2. nm_lower
  3. nm_upper
  4. rt_mean
  5. rt_lower
  6. rt_upper
  7. pc_mean
  8. pc_lower
  9. pc_upper
Hi Patrick, thanks! I think I have the files now (_at last_ :+1: ), this is the file name for Namibia YLD _IHME-Data-Namibia-YLDs.csv_ and DEATH _IHME-Data-Namibia-YLDs.csv_ you can download them to check if they are the correct files. Matthew is this all the files I need for Namibia I mean these two (kind) of files per country? Then in the YLD and Death file what column in the spreadsheet is the YLD's and Death's values, the column names with numbers in both files are as follow: 1. nm_mean 2. nm_lower 3. nm_upper 4. rt_mean 5. rt_lower 6. rt_upper 7. pc_mean 8. pc_lower 9. pc_upper
Author
Owner

Matthew you said you will see to it that the names are updated on the system. Will this be done now since we have the correct files?

Please note that the name field that needs to be the same as in the spreadsheet is called the Import Name in the Cause/Risk edit view. Then for the future I will add the given KEY_IDS from the spreadsheet to the cause/risk table for association, this means we will use their ID of each cause/risk instead of the name. Would you agree that this will be a constant ID. The id is under the column name cause in the files I added above as links to the files on dropbox.

Another question, since the data will update the old data. If there is any cause/risk not being updated must it be removed or ignored?

Matthew you said you will see to it that the names are updated on the system. Will this be done now since we have the correct files? Please note that the name field that needs to be the same as in the spreadsheet is called the **Import Name** in the Cause/Risk edit view. Then for the future I will add the given KEY_IDS from the spreadsheet to the cause/risk table for association, this means we will use their ID of each cause/risk instead of the name. Would you agree that this will be a constant ID. The id is under the column name _cause_ in the files I added above as links to the files on dropbox. Another question, since the data will update the old data. If there is any cause/risk not being updated must it be removed or ignored?
distantobserver commented 2016-05-13 07:41:24 +00:00 (Migrated from github.com)
Author
Owner

Thanks.

We will use 4. rt_mean

I'll get on to the file names asap and get back to you on the final question.

Thanks. We will use 4. rt_mean I'll get on to the file names asap and get back to you on the final question.
Author
Owner

Hi Matthew
The new field has been added to the Cause/Risk Edit View called Import ID.

Hi Matthew The new field has been added to the [Cause/Risk Edit View](https://www.staffhealthcbp.com/administrator/index.php?option=com_costbenefitprojection&view=Causesrisks) called Import ID.
distantobserver commented 2016-05-16 09:41:43 +00:00 (Migrated from github.com)
Author
Owner

Hi,

OK. Then I am going to do the following today:
CBP update

  1. Export cause/risks
  2. Add Import ID to Causes
  3. Add Import ID to risks
  4. Import list back to site

Questions:

  • Do I need to change "import_name" or will the system do this?
  • When you update data, will my changes to “name” be overwritten?
  • How will the risks be imported?

Thanks

Hi, OK. Then I am going to do the following today: CBP update 1. Export cause/risks 2. Add Import ID to Causes 3. Add Import ID to risks 4. Import list back to site Questions: - Do I need to change "import_name" or will the system do this? - When you update data, will my changes to “name” be overwritten? - How will the risks be imported? Thanks
Author
Owner

You don't need to change those names, the import will update those names automatically, but it will not change the actual name.

How will the risks be imported?
That is a good question, until now I was only focused on the causes... I would think it must be done at the same time. You last said the risk are in the causes, meaning that their sort of a breakdown of the causes. I then asked if their relationship should be preserved and you never came back to me on that. In-fact it seemed like they were no longer as important, and that the causes now cover all that is needed. But now the question how will the risks be imported...

Okay so we have these values in one spreadsheet. You said that the causes are found where the risk does not have a name. This must mean that as long as the risk has a name we are dealing with risks. But I see that the risk repeat themselves per cause but at least the values remain the same for each risk-age-group.

So I can import those, but now we have a overlapping ID issue right. To solve this please add the following prefix to all ids in relation to its target:

Causes = c123
Risks = r123

This will help avoid conflict.

You don't need to change those names, the import will update those names automatically, but it will not change the actual name. > How will the risks be imported? > That is a good question, until now I was only focused on the causes... I would think it must be done at the same time. You last said the risk are in the causes, meaning that their sort of a breakdown of the causes. I then asked if their relationship should be preserved and you never came back to me on that. In-fact it seemed like they were no longer as important, and that the causes now cover all that is needed. But now the question how will the risks be imported... Okay so we have these values in one spreadsheet. You said that the causes are found where the risk does not have a name. This must mean that as long as the risk has a name we are dealing with risks. But I see that the risk repeat themselves per cause but at least the values remain the same for each risk-age-group. So I can import those, but now we have a overlapping ID issue right. To solve this please add the following prefix to all ids in relation to its target: ``` Causes = c123 Risks = r123 ``` This will help avoid conflict.
Author
Owner

Yet please note that the relationships will be lost since the risks are used under multiple causes. For us the show this kind of relationship and make it selectable based on relationship we will be faced by huge duplication in the selection list.

Not that this can't be done. I can add a field to the cause/risk table called parents and place all the parents in that list to insure that the risk will show-up under its related parent causes.

Then controlling the selection so to insure that any cause that also contain the given risk not be selected twice becomes an intense algorithm, but it can be done. What is your take on the matter?

Yet please note that the relationships will be lost since the risks are used under multiple causes. For us the show this kind of relationship and make it selectable based on relationship we will be faced by huge duplication in the selection list. Not that this can't be done. I can add a field to the cause/risk table called **parents** and place all the parents in that list to insure that the risk will show-up under its related parent causes. Then controlling the selection so to insure that any cause that also contain the given risk not be selected twice becomes an intense algorithm, but it can be done. What is your take on the matter?
distantobserver commented 2016-05-17 08:18:42 +00:00 (Migrated from github.com)
Author
Owner

For the sake of manageability, we will not bring the cause/risk relationships across.

I am not understanding how the c/r prefix on the id works when downloading from GBD. Also, it's unnecessary as there is no overlap between risk and cause id's (i.e. if there is a cause with id 123, there will not be a risk with the same id).

The issue is that the risk id is in a different column to the cause id but this should be easy as we can search 2 columns for the id right?

What needs to happen is:

  1. the cause value (rt_mean) must be imported when the corresponding risk id (risk) value is "0"
  2. The risk value (rt_mean) must be imported when the corresponding cause id (cause) value is "195" (all causes)
For the sake of manageability, we will not bring the cause/risk relationships across. I am not understanding how the c/r prefix on the id works when downloading from GBD. Also, it's unnecessary as there is no overlap between risk and cause id's (i.e. if there is a cause with id 123, there will not be a risk with the same id). The issue is that the risk id is in a different column to the cause id but this should be easy as we can search 2 columns for the id right? What needs to happen is: 1. the cause value (rt_mean) must be imported when the corresponding risk id (risk) value is "0" 2. The risk value (rt_mean) must be imported when the corresponding cause id (cause) value is "195" (all causes)
distantobserver commented 2016-05-17 08:42:02 +00:00 (Migrated from github.com)
Author
Owner

We have compared the old and new cause/risk lists and there are quite a lot of changes. That means old causes/risks going out of the data as well as new one's coming in.

I guess if we to re-import all the data freshly it will mess up our hierarchy of causes/risks?

The thing is, the re-classifications are almost all for causes/risk with very little effect on results. So it seems a lot of work for little gain to fix it, but I guess I'm being lazy.

We have compared the old and new cause/risk lists and there are quite a lot of changes. That means old causes/risks going out of the data as well as new one's coming in. I guess if we to re-import all the data freshly it will mess up our hierarchy of causes/risks? The thing is, the re-classifications are almost all for causes/risk with very little effect on results. So it seems a lot of work for little gain to fix it, but I guess I'm being lazy.
distantobserver commented 2016-05-17 08:59:26 +00:00 (Migrated from github.com)
Author
Owner

I think you misunderstood my question on

When you update data, will my changes to “name” be overwritten?

I had changed some of the fields under "name" - which is what is displayed to the user on the site - to be more clear to a lay person (e.g. changing URI to coughs and colds).

From your answer, these changes will be overwritten correct?

Probably more significantly and in relation to my previous post about causes coming in or dropping out. what happens to the field "Ref_Nr"? This was entered manually and isn't part of the GBD file. Will it be overwritten blank or will the current values remain?

I think you misunderstood my question on > When you update data, will my changes to “name” be overwritten? I had changed some of the fields under "name" - which is what is displayed to the user on the site - to be more clear to a lay person (e.g. changing URI to coughs and colds). From your answer, these changes will be overwritten correct? Probably more significantly and in relation to my previous post about causes coming in or dropping out. what happens to the field "Ref_Nr"? This was entered manually and isn't part of the GBD file. Will it be overwritten blank or will the current values remain?
distantobserver commented 2016-05-17 09:18:51 +00:00 (Migrated from github.com)
Author
Owner

Thanks for the call. To confirm, I will do the following

  1. Enter all the import IDs on the excel sheet downloaded from causes/risks
  2. Enter the new causes and risks on same sheet
  3. archive the causes and risks that are no longer used by the GBD programme
  4. send to you for checking
Thanks for the call. To confirm, I will do the following 1. Enter all the import IDs on the excel sheet downloaded from causes/risks 2. Enter the new causes and risks on same sheet 3. archive the causes and risks that are no longer used by the GBD programme 4. send to you for checking
Author
Owner

The import method is finished, I am just running a few more tests and will have it on https://www.staffhealthcbp.com/ Monday evening.

The import method is finished, I am just running a few more tests and will have it on https://www.staffhealthcbp.com/ Monday evening.
Author
Owner

Okay all the new DATA for both 2010 and 2013 is officially on the https://www.staffhealthcbp.com/ website. Let me know if you find any discrepancies.

Okay all the new DATA for both 2010 and 2013 is officially on the [https://www.staffhealthcbp.com/](https://www.staffhealthcbp.com/) website. Let me know if you find any discrepancies.
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: joomla/Cost-Benefit-Projection#2
No description provided.