You have a social enterprise. Over the last few years you’ve seen it grow and develop and you’ve met the people whose lives you’re impacting. You can feel it maturing and you want to help it move forward. To do that, you know that data is important. It helps you show the impact that you’re having, helps you plan out the best use of your funding and helps you communicate to your staff and your donors about your successes and where you want to go next.
How do you manage that data? It needs to be collected and stored, cleaned, and then needs to be analysed. This all needs to happen in the most efficient way, because the less time spent working with data, the more time your enterprise has to grow.
I’d like to show you how I think about data-management, and share a few tips you can use to improve your data-management processes. I’ve worked in data-systems design for the past twelve year, providing support for many different kinds of humanitarian, medical and commercial organisations. These tips will help you reduce the time your enterprise spends working with data, so you can have a greater impact with your time.
The main goal of a data-system is to lighten the work-load for your organisation. It is not to show off a fancy new computer system. So, I favour a balanced discussion about pros and cons, and only swap a pen-and-paper process for a computerised process if they will dramatically improve your organisation’s efficiency. Computers are great when they’re used correctly, but they come with their own challenges and can become a barrier to sustainability if they are not implemented with enough foresight.
Visualizing your data-pipeline
When I’m discussing data-management, I find it helps to visualize the whole system so that we can discuss individual stages. This will help you find the stages that are using up the most of your time, and lets you focus on making improvements to those.
Think of your data-management as a pipeline with a series of stages. On the left, we have the data, swimming out there in the real world waiting to be collected by anyone who want it. Then, over on the right, we have the end-result of the process. This is the presentation of the data for internal or external stakeholders.
I’m deliberately keeping this diagram very generic. This pipeline can apply to you whether you do all your data-collection manually, or if some stages have been automated. Some of the stages are manual and labour-intensive, some are automated using computers. Generally the manual-process and computer-processes are interchangeable, and there are pros and cons to choosing one or the other.
Quick Tip: Computers don’t work any magic. All they do is speed up the processes you already have. If your paper-based data-processes aren’t working correctly, then a computer will only help you do the wrong thing really fast! Fix the manual-process first. Then set up a computer to speed up the process.
Why do we collect data?
Before we start reviewing each individual stage of the data-pipeline, and look for the problems you might be facing, let’s start by first looking at the reasons why you are collecting all this data in the first place. There are two reasons to collect data
- To give you and your staff information so you can carry out your day-to-day tasks
- Produce reports for internal strategic planning, and for liaising with external stakeholder such as donors or authorities
Internal: Day-to-day tasks
What information do you and your staff need so they can carry out their daily tasks? Make a list of the tasks they do and the information they need while they’re doing the task.
For example, if you run a refugee teacher-training project, your teacher-trainers will need to know the school’s address, the teacher’s name and background, the notes and comments from the last training-session with that teacher. If you run a medical research centre, your researchers might need a subset of the full research-dataset that they can quickly run their analysis.
Internal & External: Strategy, planning and reporting
Ask yourself ‘how do we want to show our successes?’. What information best shows the way the great work that you’re doing, and the positive impact that your organisation is having? What reports do our funders need? What reports did your donors ask for in the past, but no longer need? Make a list of all the reports, both internal and external, that you need.
- List the information that you and your staff need for day-to-day tasks
- List the reports that you need to produce to show your impact and liaise with external stakeholder
Looking at each stage in the pipeline
Let’s take a look at each stage in the data-pipeline. I’ll describe each stage, show some issues that tend to appear at that stage, and share some techniques to help you improve that stage of the pipeline.
Think critically about your current system
As you go through this process, I want you to think critically about the data that’s being collected and how it’s being processed. We’re looking to streamline your data-process so that you get the maximum out of your time.
- Is this data really necessary? How does it support the internal and external needs we saw earlier?
- Are all those external reports really needed?
- Is any data being collected that’s unnecessary?
- Is there data that you don’t collect yet, but it would be very beneficial to you if you did?
- What types of data in your database need to be cleaned? What stage in your pipeline might be allowing in the dirty data?
It takes a lot of time to collect and process data. The less time spent in data-management, the more time you’ll have to carry out the real work of helping people. It’s not always easy to be critical about what’s being collected and what could be collected. Try to stay detached from that for the moment. Just be a passive observer, noting down what you see in your organisation. Then, later, you can make a plan for which changes are worth going after, and which aren’t a priority.
The real world
Out there in the real-world, your users are creating data. Whenever they change address, buy something new, make a meal, go to school. Every time someone does anything, there’s some potential information about that person.
One of the major problems is that there are so many different pieces of information you can collect. Most of the information out there is completely irrelevant to your organisation. It’s easy to err on the side of caution, and collect as much information as you can, and then worry later about whether it’s needed. This creates problems, because too much data puts unnecessary demands on your time without helping you show your impact.
A lot of the data in the real world is unreliable. In many cultures, a date of birth is not used, and a person’s age is only an estimate.
- Only collect the data that you need. Remember to focus on the internal and external needs that your organisation has. All the data you collect needs to help you in some way.
- You might come across some data that would be valuable for you when showing your impact. Note this down.
- Be aware of the context of your data. For example, is a date of birth a reliable value in the culture?
Getting the raw data inside your organisation
Some of the raw data from the real-world has been collected and is now inside your organisation. This is often stored on paper forms that were hand-written by your users, or during observations by your staff. Those papers might be in a pile on someone’s desk, or in an Excel file sent by a user.
The main issues arise here because of the workflow. This stage can quickly become complex when different types of data is collected from different places by different people. Hand-written forms can be unclear.
If some fields/questions are often answered incorrectly, it’s likely that the question on the form is unclear. If your organisation operates in a multi-lingual environment, this is a common issue you may be facing, as the question may be interpreted differently due to varying levels of understanding of the language.
- If you’re using paper forms, try to limit the number of fields on the form to the data you strictly need.
- Read through the responses in your paper forms. Review the questions in the forms, and the written responses, to see whether they are unclear/ambiguous. If there are things written in the margins (there will be!) then it’s a signal that the fields in the form aren’t adequate. If one of the fields in rarely/never being written in, find out if it’s still necessary.
Raw data entered into a storage system (Data-entry)
During this stage, the raw data is moved into long-term storage. There are many different options for how to store your data. If you’re going with a computer-based solution, you can choose to enter it into a spreadsheet, or into a database-system. Entering the data can be done using a custom application, or an off-the-shelf web-based system.
The user interface is not clear.
Mismatches between workflow and user interface. Mismatches can cause the computer-system to get in the way of the workflow, which causes frustration. Leads to people finding ways to work around the system in order to carry out their tasks.
Computer-based data-entry needs to correctly validate the data. If the upper and lower boundaries of the data aren’t set, then it’s human nature that eventually some data will be entered incorrectly and it’ll be allowed into your database.
- When you’re optimizing your data-processes, you’ll learn a lot by being a passive observer while the staff carry out data-entry. Even if you think the user-interface on your database is crystal-clear, I guarantee you’ll have some surprises when you watch a few people using it!
- Add validation to the system. Check the upper and lower boundaries of the data and make sure that the computer validates the data so that data can’t be stored that isn’t within sensible ranges
Raw data stored
At this stage, the raw data has been entered into long-term storage and it’s sitting there quietly waiting for you to use it. The data can be digital or paper-based.
Data security: The major issues with resting data is data-security. How easy would it be for someone to get access to the data? Is it possible for someone to accidentally share the data?
Backups: Is the data backed up correctly? One of the major objections I have to paper-based storage of data is that it’s not easy to back it up. Things go wrong with paper-forms. Damp, fire, flooding… even just that the pencil marks fade over time… paper-based forms are volatile and need to be carefully stored. On the other hand, backups is one of the areas where computers win. It’s so easy and quick to do, it’s far more likely that the data will be backed up regularly on a computer system.
- Store your data securely. Find out who has access to the data, and put policies in place to ensure it’s not shared accidentally.
- Make sure that your data is being backed up regularly, and that those backups are stored somewhere safe, in a different building to the main system.
Data retrieval and analysis
At this stage, you take the raw data that you’re storing and assemble it together to present it. You’ll do some calculations to analyse the data. You might list out all the items in your data. You might get a percentage of all the raw data, or select a subset of the data from a particular month or year.
While this task can be done manually, it is far more efficient to program a computer to do this. Computer-systems are a perfect solution either, and problems can arise when you need to produce a new report. In the real-world, change is inevitable. If an donor has a new reporting-requirement, something new in the M&E reports, a new metric they need to track, your organisation will need to be able to program the computer to carry out the calculation. You need to avoid the situation where you have a system that works great for a few months, but that you can’t adapt. As the reporting requirements changed, you need to be able to implement those changes. Otherwise you’ll find that whenever there’s a new report, you’ll have to revert back to the slow, tedious task of doing manual calculations.
- Make sure that the systems you use to do the calculations is something that your organisation can adapt as needed. Ideally, this would be someone within your organisation, possible with support from someone outside with further expertise.
- Whenever a new report is requested, I find that there’s a knee-jerk reaction to just calculate it manually. This should be avoided. It’s much better for your organisation to have someone spend that time learning how to implement the new calculation as a formula in code or in Excel. Preferably, that person is a staff-member who can be tasked with maintaining the formulas in your data-system.
Report produced and used
The results of the analysis are compiled together and presented for use. This can be the information that’s needed internally for the day-to-day operations. It can also be for strategic planning at management level, and for quarterly M&E reports for donors and other external stakeholders.
The main problems I’ve seen here revolve around the number of reports that are being produced. As I said earlier on, the whole purpose of the data-system is to be able to produce information so that you and your staff can work better. Often, there are reports that were needed in the past, but they aren’t used anymore. When you find a report that isn’t used anymore, it’s best to remove it from the system. Maintaining those calculations will take time, so avoid having reports being generated that aren’t needed. If you’re worried that you might need the report in the future, you can keep collecting the data into your database, and just set up the calculations in the future when you need the report again.
- Ask yourself whether all the reports are needed. If you find a report that’s no longer needed, remove it.
- When a new report is needed, encourage your staff to implement it in code or in and Excel formula. Avoid the knee-jerk reaction of doing it manually. It will grow the skills of your staff and will make them more confident in the long-run.
We’ve looked at the data-pipeline, and I hope you’re starting to see how this idea can help you improve your own data-processes. Each stage can be manual or computerized, and the choice between one or the other should be based on a discussion of the pros and cons. Computers won’t always be the best solution for each stage. Sometimes it helps to automate a manual process using a computer. Sometimes it even makes sense to take an automated process and make it manual.
Manual processes tend to work well in the early stages of the pipeline. The early stages of the pipeline are more involved in the real-world where workflows change quickly and where data-collection systems need to be able to adapt to change. As the data moves down the pipeline, it becomes more and more useful for it to be digitized so that it can be easily backed up, analysed and presented as a report. Of course, the more you rely on digitized data, the more skills your organisation will need so that the changes in the real-world can be implemented in the software and the databases.
The key is to get the most out of the time that you and your staff have. If a process is taking a lot of someone’s time, and it’s clear to you that the process is helping you have an impact, then see if you can make it more efficient. Each stage in the pipeline has different things that can be optimized and improved. Look for the bottlenecks, and see if it’s an easy fix. Your enterprise will become more streamlined, and you’ll have more impact with the same amount of time and effort.
What kind of bottlenecks do you see in the data-pipeline at your organisation? What can you do to make them more efficient. Let me know in the comments below.