MATH6011

Your coursework must be submitted electronically via Blackboard by 3pm on Friday

March 25th. Any work handed in after this time will be subject to the following

penalties: 10% of your marks lost per working day up to 5 working days. Do not write

your names anywhere on your work, as marking will be anonymous. Your student

IDs should be included in the filenames but not your name; see further instructions

on file naming and labelling in Section 3 below. An extension, for bona fide reasons,

may be allowed by prior agreement, but only well before the deadline; you can contact

the Student Office if you would like to apply for an extension. Computer crashes or

file losses a day or two before the deadline will not be an acceptable reason for an

extension. It is therefore advisable to keep back-up copies of your work. Components

of the project will receive different weightings in producing your final mark: 40

marks for the exponential smoothing part, 20 for ARIMA, 20 for regression, 10 for

the presentation slide, and 10 marks for the overall organization of your submitted

material, including the description of your codes/files.

You are expected to complete the assessment in groups of 3 students; working alone or

in a group of 2 could be accepted if there is a valid reason to do so. Please email the

lecturers as soon as possible and no later than 2 weeks before the submission deadline

if you are not able to form a group of 3 to complete the assignment. All students in

each group will get the same mark for their work and for any late submission, all the

group will endure the same level of penalties as indicated above.

- Background and analysis

In light of the recent United Nations Climate Change Conference that took place in Glasgow,

Scotland, from 31 October to 13 November 2021, the UK government through its new Clean

Green Initiative has employed you as a consultant. Your task is to forecast the behaviour

of a number of key environmental indicators until December 2022, to help support the

decision process for new policies to support the countryâ€™s efforts to reduce the impact of

climate change. The data is provided by a number of public organizations, including the

Meteorological (Met) Office and the Office for National Statistics (ONS).

1.1. How to get the data. From the four weblinks given below, download the data sets

and save them in xlsx or xls format. The resulting files might have multiple columns or

sheets; follow the corresponding instructions to access the data necessary for your analysis.

Copy the data sets from the required columns as described below; i.e., MSTA, CH4, GMAF,

and ET12, scrolling down, where necessary, to find the monthly observations.

(A) Global Mean Surface Temperature Anomaly (MSTA) in â—¦C:

https://www.metoffice.gov.uk/hadobs/hadcrut5/data/current/download.html

MSTA: to get the data, see first table monthly box in the Global row; select the CSV-file type

â€“ the data is located in the Anomaly column.

1

2

(Source: The Meteorological Office, abbreviated as the Met Office, which is the United Kingdomâ€™s national weather service).

(B) Global Monthly Atmospheric Carbon Dioxide Levels (CH4):

https://gml.noaa.gov/webdata/ccgg/trends/ch4/ch4 mm gl.txt

CH4: see average monthly values in 4th column.

(Source: Global Monitoring Laboratory of the USA National Oceanic and Atmospheric

Administration, an American scientific and regulatory agency within the United States Department of Commerce).

It is recommended that for time series in (B), you copy the data into text files, using, for

example, Notepad, and then open the text files using Excel, as space delimited. The files can

then be saved as Excel workbooks.

(C) International Passenger Survey, UK visits abroad (GMAF):

https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism/datasets/interna

tionalpassengersurveytimeseriesspreadsheet

GMAF: select the xlsx file; see data in GMAF column, scrolling down to the monthly data.

(Source: UK Office for National Statistics).

(D) UK inland monthly energy consumption (ET12), million tonnes of oil equivalentâ€“xls

file can be downloaded by clicking on the corresponding expression with the this link:

https://www.gov.uk/government/statistics/total-energy-section-1-energy-trends

ET12: use the data in the Total unadjusted column of the Month worksheet.

(Source: Department for Business, Energy & Industrial Strategy).

1.2. Tasks. As it so often happens in the real world, the data sets are of different lengths.

You will have to use your own judgment in inspecting and preparing the data before carrying

out any technical analysis. The analysis is in three parts:

(a) You are asked to take all four series separately and to forecast monthly behaviour until

December 2022, using exponential smoothing-type forecasting methods.

(b) The Clean Green Initiative team have been satisfied in the past with exponential smoothingtype forecasting methods and are happy to see these techniques used in the analysis. However, they are interested in the possible use of the ARIMA methodology to predict MSTA.

You are asked to fit the ARIMA model to MSTA, for analysis in which you compare the use

of ARIMA forecasting and an exponential smoothing method. You should make a recommendation as to future use of ARIMA on this time series.

(c) The Clean Green Initiative team is interested to know whether global temperatures (that

is, series MSTA) are affected by carbon dioxide levels, international air travel, and the consumption of fuels (as exemplified by series CH4, GMAF, and ET12). Develop a multiple

regression model, use it for prediction of MSTA until December 2022, and report on whether

you think the model is satisfactory or not. - What you must produce

You must produce a technical report describing all the analysis done to select the most suitable forecasting methods, as well as the results obtained. The report must be accompanied

3

by a single-page slide summarizing your main results, and also the codes used to perform

the technical analysis, as well as the resulting graphs. More details on each of the aspects

of the work are given in the next subsections.

2.1. The technical report. The technical report must follow the structure described in

Subsection 2.5. It should address the three parts of the analysis: exponential smoothing,

ARIMA, and regression. For each part, give details of the preliminary analysis, data preparation, models chosen and analysis carried out. Also describe why each model was built and

explain the analysis carried out, including an evaluation of the effectiveness of the models.

2.2. Single presentation slide. The executive board members of the Clean Green Initiative are particularly interested in knowing how the three methods (exponential smoothing,

ARIMA, and regression) perform on the four variables mentioned above; i.e., MSTA, CH4,

GMAF, and ET12. You are asked to produce a single-page slide summarizing the main

results of your analysis, in order to enable them to quickly grasp the results without necessarily having to read your technical report. Where necessary, attention should be given to

the comparison of the performance of the methods, while highlighting the best results. This

slide will be judged on the suitability of its presentational style, clarity, and quality.

2.3. Python codes. You must also prepare and submit python codes that you use to generate the results that will be included in your technical report. If any preliminary operations

on your data are needed before applying/developing a python code for your analysis, it is

fine to include this in the corresponding excel file containing your data sets. However, you

must complete all the main tasks of your analysis using python. You can use the codes from

the course, use different ones or develop your own. Marking on this aspect of your work will

not be based on how well you can program in python, but rather on the functionality of your

codes and their relevance in the corresponding analysis.

To help us easily know what you do in each code, you must produce a single page document,

as Appendix A to your technical report, to give a brief one or two sentences description of

what it does. If you do any preliminary operations on your data in the excel file containing

your data set, a line or two should also be included to describe this.

2.4. Analysis and forecast graphs. You are expected to produce graphs to illustrate your

analysis in the technical report. Do not include these graphs in the main part of the report

(Sections 1 – 3; see details in next subsection), but rather, put all of them in Appendix B.

You are allowed up to 12 pages for the graphs produced for your analysis. Organize the

graphs in three main parts, each corresponding to one of the main sections of the technical

report. Also number each of your graphs accordingly to be able to easily refer to them, as

necessary, in Sections 1, 2, and 3. You do not need to repeat graphs in Appendix B. For

example, if you want to refer to a graph under the ARIMA section, which was already done

in the section dedicated to exponential smoothing, you are encouraged to instead use the

figure number of that specific graph rather than repeating the graph again.

2.5. Organizing your technical report. The report must be organized as follows: - Exponential smoothing (maximal length: 4 pages; total marks: 40)

Marks to be attributed based on how well you articulate the following aspects:

4

â€˘ Describe data preparation (and its effects) prior to t he implementation of

exponential smoothing methods.

â€˘ Describe preliminary analysis undertaken (and conclusions drawn) prior to

the implementation of exponential smoothing methods.

â€˘ Give details of how exponential smoothing models were selected for each of

the time series, and how effective these methods are at forecasting.

â€˘ Clarity and quality of presentation.

â€˘ Functionality of python codes.

â€˘ Quality and suitability of illustrative or forecast result graphs. - ARIMA forecasting (maximal length: 2 pages; total marks: 20) Marks

to be attributed based on how well you articulate the following aspects:

â€˘ Describe any data preparation prior to ARIMA, and its effects.

â€˘ Describe preliminary analysis undertaken prior to ARIMA modelling, and

the conclusions drawn.

â€˘ Give details of how an ARIMA model was selected, tested, and its effectiveness evaluated.

â€˘ Compare ARIMA and exponential smoothing forecasting, both in general

terms and in this particular instance.

â€˘ Clarity and quality of presentation.

â€˘ Functionality of python codes.

â€˘ Quality and suitability of illustrative or forecast result graphs. - Regression prediction (maximal length: 2 pages; total marks: 20)

Marks to be attributed based on how well you articulate the following aspects:

â€˘ Describe any data preparation prior to regression.

â€˘ Describe any preliminary analysis undertaken prior to regression and the

conclusions drawn.

â€˘ Give details of how a regression model has been selected and comment on

its suitability for prediction.

â€˘ Clarity and quality of presentation.

â€˘ Functionality of python codes.

â€˘ Quality and suitability of illustrative or forecast result graphs.

Appendix A: Code descriptions (maximal length: 2 pages; full marks: 10), etc.

Marks here will be attributed based on the overall organization of the material

that you submit, and on how clear, informative, and concise is your description of

what each of your python codes (or excel file, in case any preliminary operations

is carried out there) does.

Appendix B: Analysis and forecast graphs (maximal length: 12 pages) This appendix should be organised in 3 sections, with the first, second, and third one

dedicated to graphs related to the exponential smoothing, ARIMA, and regression methods, respectively. As you can see above, marks dedicated to this appendix are attributed under the corresponding sections; i.e., Sections 1, 2, and

3, respectively.

In summary, the following guidelines must be followed while producing the technical report:

5

â€˘ The technical report must be organized as described above, with maximum 22 pages

in total: maximum 10 pages in total for Sections 1, 2, 3, and Appendix A; and

maximum 12 pages dedicated to the analysis and forecast graphs (Appendix B).

â€˘ Do not include graphs in Sections 1, 2, and 3. All graphs should be included under

Appendix B with appropriate numbering, in order to easily refer to them in your

discussions under Sections 1, 2, and 3.

â€˘ No theory of forecasting is required, or repeat of the material from lectures, unless

you have used models not included in notes.

â€˘ Formal English should be used, avoiding abbreviations (such as â€śdoesnâ€™tâ€ť), slang,

and casual vocabulary.

â€˘ In Sections 1, 2, and 3, references to codes developed/used for specific tasks can

be made by using the corresponding codeâ€™s name. But no other details of python

modelling are needed in those sections.

â€˘ At most 2 sentences are needed in Appendix A to explain what each python code (or

excel file, if necessary) does.

â€˘ Feel free to include subsections to Sections 1, 2, 3, and Appendices A and B, if they

seem necessary to help make some parts clearer.

â€˘ No introduction, table of contents or conclusions should be written for the report. - Submission

All submissions should be done under the corresponding assignment tab on Blackboard.

Submit one zipped folder (.zip), not an archived file (.rar), without internal folders, which

contains a pdf copy of the technical report, a pdf copy for the single slide, four spreadsheets

with the data sets provided for the analysis. You should also include an adequate number of

files with your python codes. Remember not to put your name anywhere on your work, as

marking is anonymous. Include the ID numbers of all the students in your group in

your technical report and single-page slide; use the following naming pattern (involving

the IDs of all the students in your group) for all the files to be submitted via Blackboard:

â€˘ 1 pdf file with the technical report:

â€“ TechnicalReport StudentID1 StudentID2 StudentID3.pdf

â€˘ 1 pdf file for the single slide (convert the slide to pdf):

â€“ Slide StudentID1 StudentID2 StudentID3.pdf

â€˘ 4 data files:

â€“ MSTAdata StudentID1 StudentID2 StudentID3.xlsx

â€“ CH4data StudentID1 StudentID2 StudentID3.xlsx

â€“ GMAFdata StudentID1 StudentID2 StudentID3.xlsx

â€“ ET12data StudentID1 StudentID2 StudentID3.xlsx

â€˘ Python codes: each file name should have three components, with first one related

to the corresponding methodology, second to the specific task, and the third being

the student IDs. For example, if you produce/use a code to illustrate something

related to the exponential smoothing, ARIMA, or regression methods, you should

respectively apply the following naming pattern to your files:

â€“ ExpSmooth MSTATimePlot StudentID1 StudentID2 StudentID3.py

â€“ ARIMA ACFPlot StudentID1 StudentID2 StudentID3.py

â€“ Regression Correlation StudentID1 StudentID2 StudentID3.py

6

The middle terms MSTATimePlot, ACFPlot, and Correlation are related to specific

tasks that could be carried out under the corresponding parts. This middle term

should not exceed fifteen characters.