Flora Salim: Data hunter and gatherer
Dr Flora Salim, from RMIT University covers a lot of ground in what is the lifeblood of all things new mobility — data. In this interview we cover human mobility, crowdsensing, data and visual analytics, innovative user interfaces, and more.
Could you tell us a little bit about what you’re doing at the moment Flora?
I’m a senior lecturer at RMIT University, in Computer Science and IT. I conduct research in mining human mobility behaviours and understanding contextual signals, with various sources of big data that can come from the Internet of Things (IoT), from sensors … in my work, I really view humans as sensors rather than actually relying on sensors deployed. In monitoring urban infrastructure, and in transport infrastructure, you can use the humans themselves as sensors.
And where do we get the data? We can get the data from social media location base, social media check ins, we can get them from smartphone sensors, we can get them from public Wi-Fi data. So there’s a lot of data captured about all of us that people are not aware of. But we can start aggregating and profiling this, start to build very comprehensive pictures of how people move around the city, and this could be used for a lot of different applications.
I heard the D word, data, in relation to people, so I have to ask this question before we go further into this topic. On a scale of 0 to 100, how anonymous is all the data you generally work with?
In terms of anonymisation, it’s 100%!
Perfect, good answer. And prior to RMIT, what was your path?
Prior to RMIT I did my study from Undergrad to Honours degree to PhD at Monash University. And between my undergrad and post-grad study I worked as a senior software engineer in a company called Mediaproxy. My responsibility back then was to develop a real-time, 24/7 monitoring server for video streams that are generated from broadcasting companies such as Channel 7, Channel 10, SBS, TV New Zealand, CBC (Canada). The server solution I generated back then was able to monitor and record 24/7 all signals, digitise them, and also created some coding mechanisms in terms of the quality of the signals.
And so here you are at RMIT doing a lot of research. You’re also teaching … what is it you’re teaching here?
I teach a large undergraduate course, first year, first semester students in IT computer science software engineering, and it’s called User-Centred Design. The course introduces students to involve users in designing and developing software or any solutions or even any interface and basically the whole course aims to train students to run user studies so even running surveys, interviews and conduct their own user study prior to designing their own app, for example, and basically getting user feedback right from mocking up a prototype to actually testing a prototype.
We also introduce behavioural analytics, for example how can you actually run, things like A/B testing when you want to understand how users behave when you try out certain modification to your software or your solution, using data and metrics.
But my workload is largely research. I lead and manage multiple research projects, and my group is called Context Recognition, Urban Sensing and Intelligence (CRUISE) research group. That’s what we call ourselves, and we do like to cruise through problems! Another alias of ours is IoT analytics group within the RMIT’s new Centre for Information Discovery and Data Analytics (CIDDA).
I have seven PhD students and two post docs in CRUISE, and I’ve got another colleague involved in this group. We have multiple projects going on now, including a project funded by Microsoft research and Cortana, another funded by Northrop Grumman Corporation, and another recently finished project with the City of Melbourne on predicting pedestrian foot traffic and parking utilisation. The project with Northrop Grumman is monitoring and generating a situation awareness of airport runways, using a large cloud of GPS trajectories from ground vehicles and airplanes in US airports.
Another project we have going right now is the indoor tracking of workers in the activity-based environment, which is a more sophisticated version of hot desking. We’ve been tasked with trying to understand workers’ choices behind choosing their desk each day, or each part of the day. What factors influence their decision? Is it the project they are on, their mood, or thermal comfort?
A lot of the work that we do is to understand and map resource utilisation, and how can we optimise. So we apply Machine Learning AI Optimisation techniques for this, across different domains, such as building monitoring, and transport and mobility behaviour monitoring, and demand-based prediction and optimisation.
What was the project you did with the City of Melbourne?
Two projects. The first was a pedestrian traffic forecasting system that produces an hour-by-hour prediction of foot traffic volume at several locations in Melbourne CBD, such as at Southern Cross stations, up to two-weeks ahead.
In the second project, we looked at how the street parking is actually being used. We monitored the utilisation of street parking, and came up with a very accurate model to predict availability of street parking in the the next 5, 10, 15, even 30 minutes ahead of time.
We could also predict parking violations ahead of time. So, in a very resource-constrained environment like the street parking in the Melbourne CBD, we have models that we use to help both drivers yes, but also parking officers to be able to direct them to the areas with the highest parking violation rates.
I liked it right up until to the end of what you just said, helping parking officers! So what was your a-ha moment in order to get into your field(s) of research?
When I started my PhD,13 years ago. I realised we were starting to see a proliferation of sensors. Back then I was only working with Nokia phones, they only had accelerometers. But even back then, cars already had a hundred sensors. It’s not comparable to cars these days, where they have at least 200 to more than 1,000 sensors if they are autonomous.
But back then cars had approximately 100 sensors, and each of the sensors provided a lot of time series signals that can signify certain patterns if you tried to find them. So, a lot of transport studies back then were done very much in a qualitative manner. They were performed quite manually, with observation obtained through surveys, interviews… it took a really long time to get good data!
My a-ha moment was basically, why don’t we just deploy them, these sensors that are there already! We don’t have to deploy new sensors, instead we just need to tap into sensors that are already being used in cars and also, by the human users, which are their mobile phones.
So that’s why I did. What I did was basically monitor movement behaviours by tapping into smartphone sensors. And one of the first pieces of work I did was trying to understand how smoothly a person drives simply by looking at acceleration patterns. Another project back then at Monash University involved studying drunk driving and sober driving behaviours, using, again, sensors on people’s phones.
Yes, it was quite accurate.
You’ve mentioned a few times now that you obtain data from smartphone sensors, so I’m curious, how do you obtain this data?
We load an app on people’s smartphones, and it collects the data. It must be an app that user accepts, as in my work, we must obtain full ethics approval and user participation for their data. By installing the app, they’re fully aware of what’s happening, and they consent in providing the data.
They’re fully aware based on what you tell them or what they’re signing?
Ethics approval is very strict. We need to tell users what data we’re collecting in the briefing document and before they install the app. They know exactly what data we’re collecting.
Could you tell us some other projects you’ve been involved with?
In my PhD at Monash University (‘Context Aware Framework for Intersection Collision Avoidance’), what I did was basically ask the question, ‘Can we anticipate an intent of a human driver?’ The scenario was a mixed mode environment. One in which there are autonomous vehicles, semi-autonomous vehicles, and non-autonomous vehicles. So, in the case where you have this kind of mixed environment, and there are V2V and V2I communications in place, could we actually mine near-collision patterns? Could we recognise near-collision patterns? Why I’m interested in the intersection is because intersection collisions are the highest percentage across a lot types of of collisions. There are a lot of forward or rear-end collisions, but intersection collisions are more difficult to solve.
I learnt that in every single intersection, the patterns of collision are very different because of the geometry of the intersections, the signalling and all a number of other variables. But rather than trying to do it from a theoretical study, why not use sensor data? If we have geolocation data, and acceleration and deceleration patterns, we can learn from what happens.
And not only that, we can also personalise data for the driver. We could see the typical behaviour of a driver as they approach intersections. Do they slow down? Speed up? With data we could quickly anticipate if there’s going to be collision, which car it will collide with, and generate a real-time warning.
Almost like a black spot warning.
We realise the amount of connectivity between vehicles and roads and buildings, it’s all so tightly connected these days, so it does all ‘count’.
Exactly. And that has, I think, all really emerged in the last three years. So for example, in a recent ARC Linkage program I’m in, we’re considering airports. A trip to the airport can be difficult to plan. Uncertainties such as delayed flights, length of the Customs queue, length of time baggage pickup takes. All of which competes with the need to be in another destination a certain time. Should we catch a cab, airport bus, or catch the train?
There are a lot of decisions to be made, you’re tired, stressed, and you just don’t want to deal with this kind of decision. It would be incredibly valuable to have some sort of assistance to advise on how best to travel to the next destination from the airport in the smartest, fastest, easiest way.
What we did was build a prototype system for three New York airports. We took into account weather data, Customs booth processing time, flight arrival time, and availability of taxis at pickup and drop off. Aggregating all of this data and provides an accurate forecasting system of how long a person will need to wait in the queue for taxis, for example. If you have just arrived we can also predict how long it will be for you to get to the exit door, therefore we can if you should get a cab if you want to be on time, or if you can still make the train.
That’s for passengers. For taxi drivers, rather than arriving the airport early and have to wait in the holding area for passengers to arrive, or drive around aimlessly, probably causing congestion somewhere, they could more smartly plan their arrival time.
So for everyone in the airport ecosystem we can create a tailored recommendation system, not only for passengers and taxi drivers, but also airport managers, who could run a dynamic pricing system. For example, taxis could be priced cheaply to clear any backlogs, so passengers could catch a taxi at a reduced price plus get them to their next destination faster!
Cheap taxi … that’s not a phrase you hear very often. Now, let’s visit hypothetical land. Someone has offered you a large chunk of money and a reasonable time to implement a project, or fix a problem. What would you do?
What a nice scenario! I would like to create a citizen’s sensing and participatory platform, where we get people in a city to participate and offer their data — of course done in a manner in which privacy is ensured. And what’s in it for them is this. First, because I have that unlimited budget, we can provide gamification. The more they participate in the system, the more they allow themselves to be tracked, the more they can get brownie points or dollars in their account, which could be used for goods and services.
If people were part of this large-scale, participatory sensing system, the benefits too are on a large scale. It would help all transport users, plus help transport authorities. Rather than run a transport survey once or twice a year, which is a very expensive exercise, and they only get one data point per year, they could have continuous data in real-time and everyone’s happy. There’s no transport authority in Australia right now equipped to cope well with transport disruption.
So, if we have unlimited budget. I will also set up a data centre, pulling in all this data, historical data from smart cards, weather data, data from smartphones as I mentioned before, where we actually build a participatory sensing platform. And we can look at the trends and the touchpoints and the profiles of commuters and some people, some purely travel sometimes based on habits.
The big upfront costs would be in creating the reward systems, but I would think in time this would pay for itself in due time as more businesses and services come on-board. Then there’s setup costs of a data centre, research and development, and so forth. Actually getting these systems to work in the first, let’s say three years, testing them with multiple scenarios in disruption, real-world disruptions, not just simulated.
Part two of this hypothetical exercise. This time your budget is small(er), and it’s a shorter timescale. What would you propose for something to make a big impact quickly?
I would partner with a council, along with transport providers or a certain customer segment, who are interested in understanding mobility behaviour to understand and influence certain behaviours.
I’m very much interested to explore understanding how certain behavioural interventions can be observed as well, even in the data, and what will be the best optimal interventions that could be used for regional policy planning, strategic planning, dynamic pricing, demand management in a MaaS environment, and so on.
So that would be run and tested in one place, and then could be replicated elsewhere?
Exactly. So, it could be something that in a smaller, contained region that could be monitored closely with different kind what-if scenarios, resulting in a solution that will be easily scalable.
Now back to you. You’ve done quite a few things, in quite a few places. Of all your work, what have you been most proud of to date?
One project I was involved with last year as part of a larger team was in the area of retail analytics. We used public Wi-Fi data to understand visitor behaviour at a shopping mall. By understanding and analysing the movement behaviours and how they are associated with different public Wi-Fi access points, and also understanding their online behaviours while connected, we could also recognise if somebody’s purpose for the visit. Are they shopping, did they come for lunch. We could even recognise if certain people are travelling in a group, let’s say with colleagues or with their family.
There’s a lot of reasons why this was important. Mall operators could generate personalised recommendations to visitors, plus layout to be more optimal for visitors. They could even tailor certain offerings based on the demography of the visitors. I can see benefits from work like this in transport too. A lot of train stations these days have public Wi-Fi installed, but is only used to provide a connectivity service for passengers. But what if the train or network operators are able to tap into this intelligence to know and understand the needs of their customers better, beyond just knowing the origin and destination of each journey?
Has the mall acted on findings and recommendations from that work?
They actually infer ideas from our works in developing their in-house big data platform.
I’d like to highlight one more project that I’m proud of, because it generated such good results. It monitored trajectory, using smartphones and a lot of different types of data. This work has generated several key papers and methods in representing trajectory data, and how you can actually use this trajectory data to be able to understand both low-level mobility activities and high-level activities.
I also loved this work as it basically funded four of my PhD students.
Before we go on, trajectory data, could you explain that term?
There are two types of trajectory data. People will be very familiar with one, which is GPS trajectory. In that we have x and y, latitude and longitude coordinates x and y, and you have time. So that’s trajectory x, y, t, and when we have them across a series, so this becomes trajectory. It represents basic somebody’s movement, let’s say on a transport network or in a city or within a building.
Now there’s another type of trajectory data and that’s what we call symbolic trajectory. It’s not necessarily a clear x and y. It could be associated with, for instance, Wi-Fi access points. Imagine you’re in a building, or an underground station, and GPS doesn’t work. But we can still know your movement, your trajectory, indoors even underground by your association with certain access points across time. So we actually can map this trajectory, and model it. We can build what we call a situation awareness framework. It’s a full picture of what’s happening, a full ‘vision’ of movement, but via trajectory behaviours, rather than a camera.
Is there an area of smart mobility in which you haven’t done any work yet, but you would like to?
Autonomous vehicles … they have a lot of implications. One of the things I’m very interested to know is about the interactions between autonomous vehicles and the users. So in an autonomous vehicle environment, the smartest object will be the autonomous vehicle itself, the self-driving car …
Hopefully, yes, in a perfect environment. But there are a lot of other users within the ecosystem, maybe someone who still riding a bike, a pedestrian, a mum pushing a pram, an elderly person in a wheelchair. The vulnerable road users and also those with very special transport needs. The demand issues need to be addressed. How do you capture an existing demand, and how do you actually then map it into planning infrastructure for autonomous vehicles?
And the demand varies from one type of user to another. It could be someone requiring disability routes. Another could be the need to drop off two or three kids before going to work. Demands can be very specific, and personal as well.
At the same time, there must be awareness of what’s around you. Advanced driving assistance systems these days provide an extra awareness for the human driver. For autonomous vehicles the awareness will be within the vehicle itself.
Now, how can we create human users to be a superhuman users? For instance, pedestrians. How can we actually create a system that will enhance their situational awareness, just like an autonomous vehicle would? There was an article recently talking about how much harder it will be for blind people to navigate a city with electric cars because they don’t hear them.
I did read that, but also have read and written some stories lately about car companies working on specialised sounds for those people.
Exactly. So, will it be sounds built into the cars, or will it be a new type of hearing aid, to enable the blind to safely navigate this world of autonomous vehicles?
Another area that I think is very touchy is equality and fairness and non-biased Artificial Intelligence. And that’s a new area that we’re trying to work on as well. These days algorithms have a lot of bias in them, because it’s all fed by data collected from certain samples, and so the training data is geared towards the bias in that sample. So how should we employ algorithms in a society where governments, transport authorities, etc run the system? In this brave new world of smart mobility, and smart cities, governments will be reliant much more on algorithms. How best to handle the perils of bias and inequality in this? How do we compute fairness?
Education will be crucial in these matters. Our future children, I think, will need to have at least one programming language. We’re all good in certain languages like English and maybe in a second language. But a programming language needs to be in our profile, especially in a world in which algorithms will play such a large role in our lives.
They may already be! It seems to me like these concerns about fairness and equality could link in with your earlier plan is a of gamifying direct line into your idea for a sensing and participatory platform.
Exactly. This is not new, there has been old field called game theory, for example, that has been looking at that aspect as well. But fairness itself hasn’t been studied well until more recently. Of course, even with gamification, again who pays for the system? If it’s the authority, or if it’s the city manager, or basically the business, the benefit will be geared more towards them than the end user. If there are users who are not part of the system, how can they gain benefit?
And the bigger our data gets the more we will have to work towards that. So last question for you Flora. What do you think is the most exciting thing ahead of us in the next three to five years in terms of transport technology?
So there couple things have happened today, so mobility as a service, that’s where you need to have all the data already anyway to be able to connect all kinds of services and then you actually now see a lot of ride sharing services and not just ride sharing with cars, for instance there’s also share electric scooters fast coming our way. I’ve just came back from Israel with a Victorian government trade mission on transport technologies. We saw a lot of companies operating from Israel including Waze, actually Waze started from Israel. And Bird scooters were just everywhere, so, if you think about mobility as a service with a single payment, that is basically, I will say the low-hanging fruit these days, the user doesn’t care less how they get from A to B. they just want to get to their destination.
Although curiously, if you look at surveys, including one that was one of iMOVE’s projects, respondents don’t care much for scooters and share bikes and so forth.
Yes and that’s why I think something like a MaaS needs to be personalised as well to profile to a user based on their travel behaviour. For different users, their objectives are different … it could be comfort and for some, cheapest for others, or fastest, or a combination of these objectives. I think in navigating these complex mixed modes, even mixed ride sharing and even all these newer options that come in, there needs to be a way to monitor user preferences and personalise them.
Another issue is the division of payments. For example, if there is a single payment system, but one of the modes that was part of the proposed journey just failed, or customers didn’t obtain the promise of arriving at a certain time as promised by the MaaS service, who will take the penalty? So there’s a lot of things to work out.
There’s so much to work out, it’s so complex, so new, that’s why it’s not here yet.
Yes exactly. I think MaaS will be a new, … there’s a lot of research already. We’ve worked on modelling integrated multi-modal services for journey planning. We’ve already done this, actually engaged with public transport authorities. This was five years ago now, so it was a while ago.
These kind of models have existed in research, but actually deploying them in the real world will be another challenge, because the policies and regulations need to be in place, the payment and an aggregator service need to be in place.
Policies and regulations could be the biggest MaaS hurdles of all.