Data Science
Data Science Hierarchy of Needs
AI
AI and Deep Learning models enable the testing of hypothesis
Predict
Extrapolate graphs or use models to predict what will happen next
Analyse
Visualise the data using graphs
Enables trends to be spotted
Organise
Clean the data and remove unecessary details
Store in a usable data format
Collect
Data collection of appropriate data from relevant sources
Data logging, sensors, secondary sources, surveys
Questions to answer
Is the UK becoming a greener country?
Define what 'becoming greener' means
What data will you need to answer this question?
Over what time period will you need data to answer this question?
Where will you find valid and appropriate data to find an answer to this question?
Once the data is clean, how will you visualise it to answer the question?
Data Sources
Website providing up to date data about the UK National Grid
Provides data about current country power usage and how the power is being generated
QGIS
Geographical data provided by sources such as The Ordnance Survey
Available on H2 Geography computers
Big Data
'Big Data' is a catch-all term for data that won't fit the usual containers and can be described in terms of:
volume - too big to fit into a single server
velocity - streaming data, milliseconds to seconds to respond
variety - data in many forms such as structured, unstructured, text, multimedia
Advantages
Greater information means greater oversight and therefore the potential for better decision making leading to:
more desirable products / services
increased productivity leading to more efficient processes saving money and/or creating better services
Drawbacks
With all databases systems, it is only as good as the quality of data it holds
Information is power. With great power comes great responsibility. Can those that have access to this oversight be trusted to do what is best?
Security: What if the data fell into the wrong hands?
Can't see the wood for the trees. Is it possible to have too much data?
Large amounts of data requires highly sophisticated database systems, running on sophisticated distributed systems. These systems are built and maintained by highly trained engineers that are in short supply. It also requires highly trained Data Scientists to extract the data and turn it into information