Ben Leong
Description: Ben Leong is an Associate Professor of Computer Science at the National University of Singapore as well as the Director for the Centre for Computing for Social Good and Philanthropy and the AI Centre for Educational Technologies. His research looks into computer communication networks and protocols, which are the foundation for the world wide web. In the episode we discuss the basics of various communication protocols like IP and TCP and the understated role these protocols play in data security and efficiency. As we wrap up, Professor Leong shares his views on AI and how new technologies can be leveraged to create more personalized education for individual students and encourages students to focus on generating skills that will power the future economy.
Websites:
Publications:
Articles:
Show Notes:
[0:00] Introduction and Background
[3:07] Basics of Computer Communication
[9:06] Introduction to IP and TCP Protocols
[18:15] Importance of HTTP vs HTTPS
[20:02] Introduction to HTTP 3 and QUIC
[22:11] Significance of Communication Speeds
[24:15] Evolution of Congestion Control Protocols
[27:50] Traffic Analysis and Website Popularity
[32:37] Exploring BBR Dominance and Equilibria
[39:36] Transition to QUIC Protocol
[48:04] Impact of ChatGPT on Job Market
[51:23] AI's Influence on Future Job Dynamics
[55:52] Language Proficiency Trends
[1:10:37] Automating Grading Processes
[1:15:38] Gamifying Education
[1:18:51] Future Career Advice
Unedited AI Generated Transcript:
Brent:
[0:00] Welcome, Professor Ben Leong. Thank you for coming on today.
Ben:
[0:03] Welcome, Brent. It's great to be here today.
Keller:
[0:06] We'd love to start off by hearing a little bit more about your story, how you got to NUS, and what got you interested in computer science.
Ben:
[0:11] How I got to NUS? Well, I guess I finished grad school, and then I joined NUS as a prof in 2006.
How did I get into comm science? Actually, it's kind of by accident, right? I mean, most of you, you know, you went to high school, and then actually it's difficult to decide what to do for college, right?
So back then, I didn't really know either.
I just knew that in high school, I was pretty good in math and physics and, you know, the sciences.
Then I ended up going to MIT for college, and everyone does electrical engineering or computer science.
I just picked electrical engineering, but knowing what I was doing, right? And along the way, it turns out that I took many computer science classes, and I liked those better.
So for grad school, I ended up doing computer science. And then the next thing I know, I'm here as a professor.
Brent:
[1:04] Is that transition between electrical engineering and comp sci pretty easy?
Ben:
[1:09] Actually, it's not that easy. Electrical engineering is mostly hardware.
ComScience is mostly software. It turns out that I work in an area which is a section between the two disciplines, networking.
So for me, I guess it wasn't that hard or complicated. In general, I guess most people pick one or the other.
It just so turned out that I was in an intersection, so it was okay for me.
Brent:
[1:34] And then why did you choose to come to Singapore after studying?
Ben:
[1:37] Studying well i mean i i was born here and uh actually i sort of had to come back because uh my undergraduate um degree was paid for by the singapore government okay so i saw it's like right i had to serve for a certain period of time i did serve in the government for two years before i kind of quit and got myself transferred to the university yeah so i'm back basically because i signed a contract to serve and then i was i came back to finish up the contract but i've been here ever since this is probably my 19th year at nus yeah it's been a long time so.
Keller:
[2:10] How do you communicate how do computers communicate with each other.
Ben:
[2:13] So well computers generally communicate over some kind of a medium right some kind of wire uh in general what happens is that uh for modern computers right they they will take a message some kind of information you break it down into the little packets, like little bits of electrical signals, right?
And these are kind of transferred across some kind of a medium.
Generally, you know, internet is wired, but sometimes, like today, you know, when we hear on laptops, it's through the Wi-Fi, which is wireless signals.
So generally, that's how computers communicate, right?
They kind of break the message down into little packets, and then those packets are sent on some medium.
Brent:
[2:58] And for like a human analog, what language do they speak to each other?
How do they make sure they can understand their communication?
Ben:
[3:08] That's a little bit complicated. I mean, the underlying everything is electrical signals, but mostly we are dealing with what's called digital communications today.
So they are all zeros and ones, right? And there are many ways to interpret the zeros and ones. And those are called protocols.
Brent:
[3:26] What are some of the most common protocols?
Ben:
[3:30] It's a little bit involved, but the most kind of common protocol for the internet today, these two things called IP, internet protocol, and TCP, which is the transmission control protocol.
So IP is a lower layer, right? And TCP is the higher layer protocol.
So the protocol are kind of like stacked in layers, right?
The lower layers, they do kind of slightly lower layer stuff, and hire layers to do slightly more complicated things.
So IP helps us figure out where to send the packets, right?
TCP stands for Transmission Control Protocol. So this protocol regulates the rate at which packets are being sent.
So the reason why we need to do that is because it's actually amazing that the internet actually works because there is actually no central authority that governs who can send what when.
So all the different endpoints, the machines you use, laptops, phones, they independently decide when to send packets.
And all these users of the internet, they share the same wireless, the same online transmission medium.
So they have to figure out some way to share that wireless access of the medium without choking each other or causing interference.
And that's where TCP comes in.
Brent:
[4:52] Yeah so like if you were to think about it like a shipping company the IP is like the address you're shipping to and then the TCP is the control like the shipping like the actual logistics person like moving it around TCP.
Ben:
[5:05] Okay if you think of ships moving along you know some kind of shipping channel right TCP kind of controls which ship can use the channel at one time without colliding and without causing problems to other ships.
Brent:
[5:19] Yeah and.
Keller:
[5:21] Then within Within these networks, what are some of the common areas that you're looking at for improvement? Is it focused on speed?
Is it integrity of the data? What are some of those areas?
Ben:
[5:32] That's a good question. There are many aspects. So the first aspect is really about sharing, right? So the point is that there's no central authority.
And you have many, okay, you know, on internet today, there are like billions of different users trying to use the same channel at the same time.
So the question is, how do we share this channel, right, without causing interference to each other?
So that's the first basic thing is to ensure that we can share and And also you want the flows to use the channels efficiently.
Ben:
[6:02] So if it's a big channel, you want to send more data. If it's a small channel, you want to send less data.
And pre-preface, you have no idea what the channel is going to be because you could be talking to the machine anywhere in the world.
So TCP sort of figures out how big the channel is and then it tries to adapt to the right sending rate right, setting rate to ensure that we use the channel efficiently, right?
At the same time, if there are other flows that use the same channel, then you need to kind of back off and give the other channel, the other flow, the space to also use some part of the channel so that there's some kind of sharing.
So this, these are two of the things that TCP does.
But of course, There are other issues like security, reliability, so TCP also basically tries to, well, it actually ensures reliability.
So the point about the internet communications today is that most things are best effort. So yes, you're shipping these packets down in the channel.
Sometimes bad things can happen. Sometimes you accidentally send too many packets, other flows are interfering with you and then packets are lost.
Then you need to retransmit those packets. So TCP also takes care of reliability in the transmissions.
Brent:
[7:17] Okay.
Keller:
[7:18] And how is that balancing up between the different networks?
Because there is no central authority. How do they know, you know, when to back off, when to give a little bit more?
Ben:
[7:28] Right, I mean, so you think about this logically, right? So what you do is you send at some rate, right? And you see that whether, you know, everything's good and whether you can get the, okay.
So the way it works is that you send a packet down, right? The end host or the receiver would then reply with what's called act packet.
So the basic idea is to send packets and then receive the acts.
And this whole entire interaction is kind of regulated. so the idea I guess is you send at some rate right and if it works well then you can slowly increase the rate now at some point right you know you will send too fast right and how do you know this if you overdo it right what happens is that you know some buffer will flow somewhere you lose a packet then you know okay you're screwed up and then you kind of backtrack and send slower, sometimes you can actually detect that you have you know basically send too fast because the the delay, okay, between your send packet and the return packet starts to kind of increase, right?
Because this means that the packets are building up at some queue somewhere.
Brent:
[8:29] Okay.
Ben:
[8:29] Yeah, so basically the key idea is that you try to use what's called congestion signals, right, to detect whether your send rate is too high or not.
And then based on that, you could cut back and essentially you kind of like iteratively try to get at the right rate.
So the most common congestion signal in the past is this thing called packet drops, right?
Brent:
[8:49] Okay.
Ben:
[8:49] But most returned in recent years, the reason why congestion control is interesting is because there are new classes of congestion control protocols that have developed that do not, well, they do not, they try to use like the increase in the delay as a congestion
Ben:
[9:05] signal instead of waiting for the drop to happen.
So by the time the drop happens, it's kind of nasty because that means you need to retransmit to fix that drop packet.
Brent:
[9:12] Yeah. And just to pull back out to give like people a broad picture a bit, this is all happening in the background when we pull up a website, send a text message.
This is all, this is the fundamentals, the mechanisms for how we operate on the internet.
Ben:
[9:29] Correct. And this has been the way since the 70s. So I think it's people to appreciate the magic of this whole business, right?
I mean, I'm not sure if you guys are old enough, but there was a time where there was this thing called hot modems that squeal over the phone when they transmit data. Now it's 100 gigabit networks, so essentially the network speeds have increased, you know, Tens of thousands of times. Yeah. And actually underlying everything just kind of worked over time without anything breaking down.
So I guess a lot of this is taken for granted. But there's a lot of work that's being done in the background.
And people have been quite clever in inventing these algorithms to keep the internet running today.
Brent:
[10:07] Yeah. And then how, since the internet is a decentralized network.
Ben:
[10:11] Yes.
Brent:
[10:13] How do you describe what the internet is? Because I think everyone's kind of so used to using it.
Ben:
[10:19] I agree.
Brent:
[10:20] They don't really understand, like, what is the actual internet? Is it a thing?
Is it just the, like, how would you describe what it actually is?
Ben:
[10:30] Okay, so the way the internet works is this. You as a user, right, you normally subscribe to the internet service, okay, through this entity called ISP, Internet Service Provider, right?
So the way it works is that these internet service providers would then kind of, like, connect to each other, right? So you can imagine it as a big web of all these ISPs that kind of connect to each other, right?
And so there is kind of a central lookup system called the Domain Name Service, DNS, right?
That kind of keeps track of where everybody is, right?
So once you're going to connect to some other foreign website, you figure out where it is, and then you send a packet and your service provider then decides which of its neighbors to forward the packet to.
So there are these protocols that kind of allow the ISP to share some information as to who is where.
And so for every packet that your ISP receives, it will know which neighbor to forward the packet to.
So that's the high-level kind of routing on the internet.
Then the actual rate at which it's sent is decided by the sender, which is yourself.
And the TCP is the protocol that decides on how fast to send packets. Okay.
Brent:
[11:46] So then with the internet service providers, is this like a probably crude oversimplification, but is it just basically a network of their computers all over and they'll, if I'm on my phone, it'd basically be sending a signal to them and then they'll send it throughout their computers to where it needs to go?
Ben:
[12:06] Yeah, so each internet service provider would be a network of switches, right?
It means that because the way you look at it is kind of a blob, right? And there are several kind of exit points and entry points.
These are like pairing points with the neighboring internet service providers.
Then essentially whenever you send a packet to the ISP, it will be routed in this internal network of switches to one of these exits, right? And it goes out.
Similarly, you know, a packet from outside side may come in through one of these exits.
These are called ingress, egress points. Ingress is when you come in, egress is when you go out, right?
So after these packets come in, then the ISP will figure out where you are within its internal network and route the packet to you.
Brent:
[12:47] Okay. And it knows where you are because of like the different, is that the IP or the domain name?
Ben:
[12:53] The IP, yeah. So there is some registration service that kind of tells, it's a little bit complicated, but there is someone tracking where you are. Yeah.
There is a kind of, the message is being exchanged across all the ISPs to tell them where their ISPs are. so that it's all nowhere to forward packets.
Brent:
[13:14] Yeah. And all the different ISPs are able to communicate with each other because they agree on certain languages to communicate in?
Ben:
[13:21] Correct, correct. So the way the internet works is that there's some commonly agreed languages or they call them protocols, right? So TCP is one protocol.
The protocol that governs this routing between ISPs is called BGP, the Border Gateway Protocol.
I mean, these names don't really matter too much, But the high level idea is that we have invented certain protocols that allow these ISPs to communicate and to exchange information efficiently.
Keller:
[13:48] Are cellular communication networks unique from the internet communication, or do they function in the same way?
Ben:
[13:54] They function in the same way. The main difference would be the last mile.
So the way it works is the cellular networks, the users connect to the ISPs through the wireless medium. medium, right?
So for cellular, you'd be 4G, you know, 5G, right?
And for us now on a laptop, right, we're communicating to the school's network via Wi-Fi, but from the school's network onwards, it's really just the same thing.
So I guess the wireless cellular communications, what really matters is the last mile, the last step. The rest is more or less the same.
Brent:
[14:32] This might be out of scope, but how does wireless work?
I feel like most people don't understand. How does my computer just talk through the air to the router?
Ben:
[14:43] Well, the way it works is that your computer would then... Okay, so the communications at the end points is really digital, right?
So everything is zeros and ones, bits and bits, right? right?
So what the computer will do is that there's a way to kind of encode this ones and zeros into a wireless waveform, right?
So the, on your, on your computer and wifi or on your mobile phone, right?
Through the 4G, the, you know, there's some kind of a D to D to a digital analog kind of a converter that converts your bits into this electrical signals.
Okay. Which is then, which are then kind of transmitted through the air, you know, on radio frequency waves, right?
There's a receiver somewhere that kind of detects the signals and then there's a kind of A to C that analog to digital converter that converts these signals back into digital form.
Okay. And that's more or less how it sort of works. That's electrical engineering.
Okay. Right? Whereas, you know, figuring out how to ship bits would be CS.
Figuring out how to convert your, you know, bits into electrical signals and vice versa would be signal processing.
Electrical engineering a lot of it's done in hardware.
Brent:
[15:55] Yeah. So do you think your background in electrical engineering made you a better computer scientist?
Ben:
[16:00] I think for me, it helps because I do networking, right? So sometimes, so we operate the border of both EE and CS.
So clearly it's helpful, but there are also parts of CS that don't require you to know anything about, you know, signals and digital signal processing.
So I think it varies, depends on what you care about, right?
There are some computer scientists that are mostly actually mathematicians. Okay.
So who deal with like, for example, what's that called?
Cryptography, right? That's just mostly number theory. And you can have no idea what's going on in AE to do that well. So it all depends on what you do for a living.
Keller:
[16:44] Is there cryptography involved with TCP and other communication networks?
Or is that, like on that first base layer stack, is that already involved? Good question.
Ben:
[16:54] Actually, I'm not sure if you know about this, but the main problem or the main kind of like defect of the internet is that when it was first designed, right, it was actually a kind of experimental research network.
So people trusted everybody.
So actually, security is actually missing from the original design and even the current internet. So many, okay, but of course security is important because you got, you know, people stealing stuff and things like that, right?
So security has been kind of slapped onto the internet, but that to today is actually a problem. So it wasn't designed into the original internet.
So the current kind of slapped on solutions are not perfect.
So TCP itself does not have security, but, you know, people have built, you know, secure protocols above TCP to try to keep something safe.
But ultimately, because security was not actually originally designed into the internet, that's just when you have a lot of issues and DDoS attacks and things like that.
Brent:
[17:49] Yeah. And for the security side of things, is that where we see the HTTP versus HTTPS?
Is the S the security part of it?
Ben:
[17:57] Correct. The S part basically is the kind of encryption part of the protocol.
It's a kind of enhanced protocol.
So HTTP was the original protocol that allowed people to exchange information.
As actually slaps on the secure, this kind of encryption layer. Yeah.
Brent:
[18:12] And that's like, you can see that at the top of like basically every webpage.
Ben:
[18:15] But now there's no, I mean, you shouldn't be doing HTTP, and now HTTPS is the default, right?
Brent:
[18:20] Yeah, yeah. Cause I don't think I've ever seen HTTP.
Ben:
[18:24] No, it's still there. I mean, so I think most websites still support that. It's a fallback.
You can try, and most of the websites will support that, I suspect.
But now by default, it's HTTPS.
Keller:
[18:35] Could you explain the distinction between HTTP 2 and 3?
Because some of the papers we saw, they had mentioned like, oh, you know, this given protocol was using HTTP 2, and then it wasn't really clear to me what those differences were.
Ben:
[18:48] To be honest, I haven't been following the protocol standards per se, but basically, folks look at the current protocols and try to figure out how to improve them, right?
So as you know, QUIC is kind of a new default.
It will be the default transport protocol for HTTP 3.
And what it does is that it tries to improve on the protocol to reduce handshake.
So the point now is that when you connect to a website, typically you need to kind of say hello and there's an exchange of messages to basically set up the connection.
And the trouble with the existing TCP is that but it takes quite a few packets to set up the connection.
And the trouble with HTTP, well, HTTP is used for web pages.
There are many web pages whereby there's a small amount of information, right?
So they don't want to spend too much time setting up the connection.
And so you have QUIC that can, you know, basically reduce the number of steps in the setup process.
I think QUIC also has some security features baked in. You can read the stuff that's going on.
So it's sort of upgrading the underlying protocol. And the way it works is that the site websites will support both the older version as well as the new one
Ben:
[20:01] for a period of time, right?
When the later protocol is sufficiently popular and most of the modern browsers support and use that by default, then they'll kind of like deprecate the old protocol. But HTTP3 is the new thing.
Ben:
[20:17] Okay, we actually did a recent kind of measurement study. about 10% of the sites currently support, 10% of the top 20,000 Alexa sites currently support Quick.
So it's not like widely deployed yet. So I think it's going to sometime before, you know, we switch over to HDB3.
Brent:
[20:38] Yeah. And when we talk about it's faster and like those type of benefits, is that tangible to the end user?
Like why do we pursue these like increases and speed.
Ben:
[20:53] Um okay so it all depends on how perceptive you are right so today actually you go to a site, you can hardly tell this i mean most sites reload very fast yeah uh latencies have improved a lot right since uh over the last 10 10 20 years so uh whether it's perceptible or not i think depends on the user but clearly you know uh improving the efficiency having you know faster handshakes is not what will be better right because you are able to use the channel the wireless the the transmission media more efficiently.
So it's hard to say whether it's perceptible, but I guess if you have huge files or something, you may have some impact.
But overall, I think the internet today is very good. I mean, you guys are not complaining, right, about websites.
Brent:
[21:36] Depends where we are.
Ben:
[21:38] So basically, I think for the websites that are well-served by CDNs, I suspect you can tell the difference.
But for the websites that are a little bit, I mean, where it's a little bit less accessible, or somehow the rates are slower, then some amount of improvements in efficiency might be perceptible but overall i don't think it's going to be a massive uh you know improvement i suspect you guys can't really tell like so so if if some site switched over you know i'm not sure it's
Ben:
[22:07] actually very obvious to the users it's not going to be a 10x improvement right.
Brent:
[22:11] Yeah yeah so in like the world of like networks and computers communicating with each other.
How much attention is being put on these type of increasing communication speeds in this way?
Or is it innovation more driving in other areas of the field?
Ben:
[22:30] I think it's hard to say exactly what is more important. So what we're talking about is kind of a specialized domain.
So most companies don't have anybody who knows this stuff.
They use whatever default values, default protocols and things they can find.
But the big companies like Facebook, Google, they actually have dedicated teams that look into this particular aspect, especially like YouTube, those companies that actually care about the speeds and the performance of their apps, because the speed of transmission may impact a few things. the user kind of experience.
And if you don't do it correctly, you may choke their internal networks, you know. So there are these companies that have massive amounts of data transfers that care.
And even those companies, I suspect only a small, you know, kind of part of the company cares. There's probably a specialized team that cares about these things.
So in that sense, it's a little bit esoteric. I think the vast majority just use whatever's available, you know, but what's true is this.
Recently we did a measurement study, right?
We've seen that there's actually some companies that deploy their own, you know, kind of like congestion control protocols.
Keller:
[23:50] Are those like the protocols for congestion control? Yeah, do they have to be selected directly by the company or are they made autonomously?
Because one of the papers we saw was like, oh, it was like Reno or it was BBR, Cubic.
Is the decision to go and use a certain variant based on the computer system or is it the team that decides, oh no, we're going to try to run this?
Ben:
[24:10] Okay, so good point, right? So these protocols like Reno, BBR, Cubic, right?
These are actually available in the Linux kernel. so what happens is that when they set up a machine most people probably don't care but if they run a server then they can kind of configure the server to use one of these underlying protocols but I think more interesting than that is that some companies have invented their own protocols so one of the things that my students and I have done over the last five years is really studying how the internet has been evolving in terms of which protocols are used by which websites and how they, what's the distribution of these different variants of TCP being used.
Yeah, and that's something that we started about five years ago.
And I think the interesting finding there was that BBR, BBR was sort of released publicly in 2016.
Brent:
[25:01] By Google. By Google, right.
Ben:
[25:03] And by 2019, probably like 30% of the traffic on the web, on the internet was actually, you know, using BBR in some form.
I think what's interesting is that, this is actually very surprising because the previous dominant variant called Qubic, right, took probably 10, 15 years to become dominant.
But the PPR, you know, we took up like 20, 30% of the whole internet within like three years. That was really surprising.
But that's it. It's because, you know, they were driven by the big companies that control a lot of traffic anyway.
So I think what's going, moving forward, what is true is that, you know, the internet can change rapidly because a few companies, big companies, maybe Google, TikTok, Apple, whatever, They control actually the most of the traffic that's going on.
And so in 2019, we saw that PPR was actually quite, I wouldn't say dominant, not dominant, but it's actually the second most popular protocol after Qubic.
Uh interestingly we just did uh we should be right now in the midst of a measurement another kind of measurements follow-up study uh we've seen what we've seen is that cubic hasn't really increased much in proportion over the last five years but google has now deployed it's already deployed i think since august of this year bbr v3 so now it's no longer what we saw five years ago vr v1 and even then google did not deploy the one the version that that was found in the Linux kernel.
Google deployed their own version, which they called G1.1.
Brent:
[26:27] Okay.
Ben:
[26:28] Yeah.
Brent:
[26:28] And these are all different variants of congestion controls?
Ben:
[26:32] Yeah, the different variants of congestion controls. But, okay, right now, there are two major classes of this protocol.
The old kind of way of doing it was called AIMD or MIMD.
AIMD stands for Additive Increase Additive AIMD Multiplicative Decrease.
MIMD stands for Multiplicative Increase multiply to decrease.
So these protocols, what they do is they control what's called congestion window.
So the idea is that a sender would send out a bunch of packets, okay, and it will keep in memory to remember which packets are sent out, right?
And the congestion window kind of like sets a limit as to how many of these unacknowledged packets you can send out at one time, right?
And with this, you can control the rate, right? This is called the so-called AIMD-based protocols.
BPR and the more recent, what's called rate-based variants don't do this thing.
They They actually try to send it at some rate that they decide is actually the right grade to be sent.
They do this by probing the network and trying to figure out what's the bottleneck so far.
Brent:
[27:31] And then these congestion controls, they're set up by the sender?
Ben:
[27:35] Yes, the sender decides how. Because fundamentally, a congestion control protocol is really, you know, kind of an algorithm that determines the send rate, right? How fast you send, when you send, right?
And so the sender actually determines the rate.
Brent:
[27:50] Okay.
Keller:
[27:51] Then you mentioned traffic of different websites.
Could you just give a little bit, if you have any numbers of like some of those top companies, TikTok, Facebook, Netflix, how much of the internet net is really being used by them because i think on the paper it was saying okay bbr is it like around 30 but if it was i think it was netflix or facebook if one more company had been added to that they probably would have added 20 somewhere.
Ben:
[28:13] Okay to be honest it's hard okay we we're not the ones who actually uh measured like the proportion of the traffic okay i think in our paper we kind of sent another kind of survey so we cannot we don't know for sure but what we do uh is that we we kind of use this Alexa ranking table.
Alexa tells us what is the kind of popularity of the various websites.
The Alexa 20,000 that we worked on are the 20,000 most popular websites.
So what we do is to try to probe and to kind of deduce what variants of the TCP protocols are used by these websites.
And then by combining this with the information from other surveys in terms of how much data, how much traffic is used by these websites especially popular ones then we can deduce roughly what's the distribution of the different variants on the internet.
Keller:
[29:05] Then another thing the paper asked was like, okay, could it be possible to have BBR or a given variant hold 100% of the space?
Ben:
[29:14] Correct.
Keller:
[29:15] Could you explain your thoughts on that? Like whether that would even be possible?
Ben:
[29:18] Okay, so that's interesting, right? So five years ago, as I said, as I mentioned earlier, right, we found that BBR had taken up about 30% of the traffic already. already.
And given that it started from 2016, right?
So in three years, you know, you reach 30%, then it seems kind of plausible that, you know, BBR might dominate, right, in the near future.
But as it turns out, then subsequently we did another study, which is to study whether BBR would, whether it's likely for BBR to dominate.
The interesting thing we found in that study was that, okay, if all the flows were all cubic, right, and you added one BBR, then BBR did very well but as you added the number of BBR flows what happens is that the gain that you get as a BBR flow will start to drop okay and the really interesting, thing we found was that if you keep increasing the BBR flows at some point right the new flows would actually get less bandwidth than the existing cubic flows okay so on that basis we hypothesized that there will be some kind of what's called Nash Equilibria which means that as the BBR gets more popular, okay the early adopters actually had some benefit, means they get better performance than the existing users.
But at some point, the later adopters will actually have.
Ben:
[30:34] Worst performance so obviously you will not want to switch over the bbr and get you know lower lower throughput right so you will not switch and so we we kind of hypothesize back in 2021 22, that it's going to get choked and interestingly our latest study shows that i think there has not been many much increase i think it's very five like three to five percent increase uh in the the adoption of BBR over the last, since 2011, I'm sorry, 2021, 2019.
So, I think we're right that the world will not be dominated by BBR anytime soon.
Brent:
[31:08] Yeah, so like, to an extent, it hit its saturation point.
Ben:
[31:12] It seems like it. And in fact, we found that Cloudflare actually switched back to Qubit.
So, it's really interesting. We found that, okay, Cloudflare is one of the major CDNs.
CDN stands for Content Distribution Networks, right? So, in, I think, 2019, 19 when we first did the study we found that cloud flare was using pbr already but i think in the latest study the most recent it's not published that uh cloud flare switched back to qubit okay so uh yeah so so i think i our hypothesis that bbr will not end up dominating it seems to be correct based on what we've seen and.
Brent:
[31:43] In like more simplistic terms is that because like i think one of the words used is it's aggressive so like it like pushes out other people like if if too many people start using BBR, it'll be like competing more than actually sending?
Like competing for bandwidth? Yeah.
Ben:
[32:00] So BBR, there were complaints about BBR because it was competing quite overly aggressively with the other protocols.
But that has been addressed over time and Google has been doing that.
So there was BBRv2 and now there's BBRv3 that's just been deployed since August.
Brent:
[32:22] And I think we got a pretty good understanding of how the internet works.
Is there anything you think we're leaving out to for the general population to understand a little bit more of the mechanisms behind the internet?
Ben:
[32:38] I actually don't know what is it that you want to know I mean for me it's kind of something that we do every day right so it's kind of second nature is there any questions that you have for me in terms of what you'd like to know about internet.
Keller:
[32:50] I had one more back on the BBR which was like you mentioned that the performance went down for the like later adopters.
Ben:
[32:58] Yeah so I mean we did this experiment whereby you started with 20 flows all cubic right obviously that's the case that they all have equal bandwidth more or less by symmetry because nobody is better than the rest so when you add one cubic when you replace one of the cubic flows with BBR the BBR flows gets a significantly larger proportion like several times like four or five times as much bandwidth so clearly if you are one of the original 20 cubic flows you want to switch right then if you have one cubic I'm sorry one BBR and 19 cubic and then you have a second cubic become BBR then the first guy gets a little less first guy gets a little less bandwidth but the new guy still gets quite a lot of bandwidth right so the idea is that at the beginning when there are very few BBR flows right you do get kind of a more than fair share compared to the cubic flows but as as number of as a proportion of flows right switch from cubic to BBR, then this relative advantage will start to drop you know and it's kind of a gradual curve but at some point right, BBR and cubic will have the the same.
Ben:
[34:04] This bandwidth share interestingly and if you go beyond that point then the bvr the new bvr flows will have actually less bandwidth than the existing cubic flows so then at that point then clearly you the the you know not another quick flow on the switch because they switch their voice off right at the same time right the the bvr flows won't switch to cubic either because when they do that they have less bandwidth so this is when you reach some kind of what's called a Nash equilibrium point.
So basically the original reason for why many of the big tech companies switched over was because it was reported that they had better throughput, which we found in experiments.
But that's because the vast majority of the flows of the internet still use cubic.
So when they switched over, they kind of win a little bit. I suspect that gain has basically been kind of withered away over the time.
And also basically being too aggressive it's an issue because you compete too aggressively with your own peers then people start dropping packets right and that kind of degrades performance yeah.
Keller:
[35:07] And then can we transition a little bit from TCP to the QUIC.
Ben:
[35:12] QUIC yeah and.
Keller:
[35:13] Then kind of explain how it's a little bit different while functionally performing a very similar task how it differs and where it's implementation wise.
Ben:
[35:22] Okay QUIC QUIC is kind of a higher level protocol I mean that's a lot more things than just the just transmission control right it turns out that Quick has a kind of a congestion control module so it's actually a.
Ben:
[35:37] Does quick sense packets over UDP, okay, instead of TCP, but it does regulate the rate, right?
So in QUIC, there is a kind of a congestion control module that moderates and control, regulates the rate at which the QUIC application is sending out packets, right?
And what has happened is that, you know, all the major QUIC stacks, they have also implemented the existing algorithms that are popular.
So Qubic, BBR, Reno, they're often implemented in many of the stacks.
Some stacks don't implement everything, but these are the more common stacks.
So the Linux kernel has got, I don't know, seven, eight and nine different implemented protocols.
I think most quick stacks have one, two, about four, three different implementations.
And in theory, these guys seem to have tried to implement the same things that is implemented in the kernel, but as you know, implementations aren't perfect.
Right so there are differences right in the implementations of the congestion control for the quick stacks even though they kind of profess to do the same things as the kernel so one of the things that we've done is to study right how closely the.
Ben:
[36:49] Implementations of the congestion control for this quick stacks kind of like comply with the kernel implementations and what we found was that in many cases they're okay but we have some instances whereby they are quite different And so the performance that you get, even though you think you're running PBR or Qubic, isn't quite what you would expect if you were doing it from the kernel.
Keller:
[37:14] And could you explain how UDP is different from TCP in terms of how the data is transferred?
Because I think for UDP, it's not organized in the same way.
Ben:
[37:24] Okay, so at the bottom is IP, which tells you how to get packets to the destination.
Now, so what TCP does is that on top of IP, right, it also does this rate control, it does reordering, you know, it gives you reliability.
UDP just lets you send stuff over IP. So this is kind of a disconnection, right? TCP actually has stuff.
It does stuff, extra stuff. So that's really the key.
So what happens is that because QUIC implements its own like consciousness control, so it has no need to run over TCP.
So you implement that over UDP instead. So think of UDP as sort of a TCP without the consciousness control, without the reordering guarantees.
Keller:
[38:12] And could you communicate with like, if you had a network running on QUIC, could that communicate with a TCP?
Ben:
[38:18] No, it can't.
Keller:
[38:20] And so is it possible to run both simultaneously?
Like, is there, in terms of implementation issues, how do you deal with the fact that, like, most computers are running on TCP?
Ben:
[38:31] Okay, so it's not running on per se.
Basically, when you receive a connection from another computer, right, the connection comes and tells you what it is, right?
It is, it can be a TCP connection, it can be a UDP connection, right? And depending on what the initial connection tells you, you have this handshake to kind of establish the kind of connection.
So I mentioned earlier that currently about 10% of websites support QUIC.
So what happens is that when the browser makes a connection, let's say a Chrome browser, it connects to a website, it will first try to check whether the server can support QUIC.
If the server supports QUIC, then it will use QUIC to do the exchange of packets. packets, right?
And that means they're running over UDP, right? Where, you know, the quick connection fails, then the browser would turn back to TCP and use normal HTTPS connection, right?
So it takes a different kind of, like, different handshakes to kind of set the connection.
Ben:
[39:32] And after that, packets are sent over TCP. That's more or less it.
Brent:
[39:36] And that ability to switch and toggle back and forth is really important for mobile networks, correct because if you are on a wi-fi and you disconnect from that go to cellular switch back and forth i've i was reading somewhere like that ability to go back and forth to whatever's optimized is like really important with mobile networks.
Ben:
[39:54] Not not really i think mobile networks i mean if you uh have let's say a mobile phone right you can connect either to to the network via either wi-fi or it's a 5g 4g right then i guess you can use whichever one i think it doesn't really matter that much right now it turns out that 4g can be faster than wi-fi so i think that matters more right so i guess you connect on a faster network but um here here is just about, talking to each other so for example right uh suppose you are more fluent okay so you guys are more fluent english okay but maybe you learn mandarin right so essentially you go to a chinese restaurant right and the first thing you try to do is you try to talk in english right i mean you don't really care you don't know all your food right and then if the waiter does not understand English then you try to you know all your food in your kind of like less than fluent, Mandarin right so that's sort of how it is right so you try so you think of quick as English right so you connect the website you try to talk to English which is where you are more fluent and a bit faster if that fails then you fall back to you know kind of a, default which in this case would be HTTP, over TCP okay yeah Yeah.
Brent:
[41:07] That makes sense.
Ben:
[41:08] That makes sense.
Brent:
[41:09] And then, what is the difference between 4G and 5G?
Ben:
[41:13] 4G and 5G, they're different...
Oh, it's a little complicated. So basically the wireless channel, right?
The different modulation kind of electric, basically that you need to kind of convert your digital signals, 1-0, into your analog signals, right?
And 4G and 5G are different ways to kind of like convert these digital signals into waveforms, right? And they use different frequencies, right?
And so obviously, as you know, 5G is faster than 4G.
I mean, I actually don't know the details of this modulation and why they're faster per se, but just suffice to say that the different hardwares, they often use different frequencies, and that gives you better, increased throughput and bandwidth.
Brent:
[42:02] Yeah, that makes sense.
Ben:
[42:03] Yeah, and also I think 5G, generally they have a shorter range, and so there are more smaller base stations.
Brent:
[42:11] Yeah, so you need more towers all over.
Ben:
[42:12] You need more towers all over, but they are more because they are shorter you know they are this is a different technology right yeah and generally you need more towers because they are shorter range.
Keller:
[42:25] And then within the handshake, is there a particular point within that interaction that the congestion happens?
Is it in the acknowledgement phase? Is it in the phase when it gets sent back to the initial sender?
When does the congestion happen or is that not really factored into?
Ben:
[42:45] The handshake and congestion are two separate matters, right?
So the main thing about QUIC is that they have figured out a way to reduce the number of packets required to set up the handshake. That's about it.
Congestion can happen at any time. Congestion basically means that when you send a packet to some destination, along the way there's somewhere, some path, where the number of packets that a switch needs to send is actually larger than the bandwidth it has to send.
So what happens is that the excess packets get kind of like kept in a buffer, which is like a queue, right?
And obviously, all buffers have a certain finite size, right?
If too many packets build up at some point, then it's going to overflow and then you get packets lost, right?
So the fact that you have packets building up, right, is a sign of congestion.
And generally speaking, in the past, nothing happens until something's lost, which is kind of nasty.
And BBR and the rate-based algorithms try to achieve low latency by preventing this buffer from filling up.
So once you see the congestion through this dilation, this increase in the latency, you kind of moderate the rate and try to reduce the rate so that you don't overflow the buffer.
Brent:
[44:01] Okay.
Keller:
[44:01] Yeah. So I think we're going to try to transition more towards your roles now.
Ben:
[44:06] Sure.
Keller:
[44:06] And kind of go a little bit more broad in terms of student involvement.
So firstly, we wanna talk about the computing for social good and philanthropy.
So could you explain your role as director and then I think we wanna spend a good amount of time on just talking about like your role mentoring students.
Ben:
[44:23] All right, so I guess the story of this Center for Computing of Social Good started like many years ago.
Actually, in fact, it sort of started before I came back to Singapore, right?
Well, I was a grad student at MIT. This is year 2001, 2002 or so.
Basically, me and a couple of friends, we built a kind of a barcode scanning system for this Salvation Army homeless shelter along Mass Ave in Cambridge, Massachusetts.
So yeah, so basically it was a system that helped the Salvation Army homeless shelter, right? Keep track of the clients.
So basically it's a homeless shelter and there are limited amount of beds.
There are more homeless people than there are beds, right? So the homeless people have this kind of a, it's like a hotel booking system, but there are just beds, right? And then they figure out who gets who paid for which nights.
Ben:
[45:20] The shelter also offers meals, socks, dental services, and they want to keep track of who is consuming what services.
So what we did was we built a barcode system whereby they issue the clients with these cards with barcodes.
And then there's a handheld scanner. Back then, there were these things called Palm Pilots.
I'm not sure if you guys know. know, but Palm Pilots are the predecessor of their smartphones, right?
And like little, what's called personal digital assistants.
So it's actually a little handheld device, and then we can slap on a kind of barcode scanner.
And with that, we can actually scan these cards and keep track of these clients using the services.
So after coming back, right, my students here suggested we do something similar for the local charities.
That's when we started actually, you know, doing like building software for the local charities every summer. So we've been doing this since 2007 or 2008.
I can't recall exactly, but it's been 15 years or so since then.
So every summer, I have 16 students.
We started off with like 12 of them. But these days, we have about 30 students every summer. So it's quite a large enterprise.
So they go out there, they build software for these local charities.
Ben:
[46:33] And so that was the original project and involvement.
And then 2021 we had this donation from this donor and with that we set up this new center called the Center for Computing Social Good and Philanthropy so the point is that I think beyond just.
Ben:
[46:53] Equipping our students with skills that allow them to get good jobs and high pay we need to go beyond that and the goal of this center is to basically promote the ethos of doing good and service to the community and the country as well as, you know, promote leadership.
So the center offers this original, it supports a bunch of programs.
The one, this initiative where we go out there and build software for charities is one that happens every summer.
We also have some programs, some kind of like programs where our students go out to teach programming and computer skills to the underprivileged kids.
That happens like every holiday because the kids have free time during holidays and our students will teach them these things. we also run this leadership program whereby we try to equip our students with leadership skills, which is quite critical.
So these days, software isn't, you know, software is really a team sport.
You can, you know, the days where you can hide in a garage and just write code and sell it for a lot of money are over.
You know, to build modern software, you actually need a team,
Ben:
[48:01] right? So working together is really important.
So I think one of the key questions really, right, in the past year since the advent of ChatGPT.
People are wondering whether there are going to be jobs. Do you guys wonder that?
Brent:
[48:16] A little bit. I think I'm more curious about how jobs are changing than if there's going to be them.
Ben:
[48:22] Well, so one of the challenges, and I can show you why, is that actually many of the students are using ChatGP to do their homework.
Brent:
[48:29] Oh, yeah.
Ben:
[48:30] Oh, yeah, you know that, right?
Brent:
[48:31] Yeah.
Ben:
[48:32] Yeah, yeah. So we realized that too. And I guess the students are worried, right?
Because what ChatGPT has done GPT has done it slightly it has ingested the entire, of this thing called Stack Overflow you're aware of Stack Overflow right Stack Overflow is a site whereby you know people post questions on programming oh okay sure I think we might have different ones it's a GitHub yeah it's more like GitHub basically people have asked questions about programming and you know it's kind of crowdsourced answers so pretty much Stack Overflow is really a major source of information and because, ChatGPT has ingested all that information right sadly almost every single homework assignment you can think of for intro programming for first year college students or sophomores right you can just type it just cut and paste into chat GPT it will give you the right answer, right so I guess the students have been lazy and they've done this so they know for a fact that you know, yeah chat GPT can return the right answers so obviously they're worried about their jobs so they asked one of the questions that came to me really earlier this year was chat prof what's going to happen to us right will software engineers be replaced by AI.
So what do you think my answer was?
Brent:
[49:41] I don't think it can. Why? Because you have to be able to tell if it's producing the correct responses.
I think, especially with, I'm obviously not as familiar with computer coding.
Ben:
[49:55] Sure.
Brent:
[49:55] But if you ask a question about history or a person, you can get wrong answers that sound correct.
And I feel like the ability to analyze and contextualize answers and outputs is pretty important.
Ben:
[50:08] Okay, first of all, you're correct that I think hallucination is a huge problem that we are grappling with. But suppose that's solved.
Suppose JCPT can always give you the right answer. Then is your job going to be gone?
Brent:
[50:20] No, you still have to tell it what to do.
Ben:
[50:22] Yes, exactly right. So I tell my students, the fact is this, in school, typically we teach our students how to do the things right.
It means what? It means that we give you a question in the exam, you give them the right answer.
So the point is that the question has been asked and your job is to give the answer so clearly in that kind of context chpd wins but let me tell you the the the kind of the reality right when you grow up to work right, well that's not how life how real life works right your real jobs they don't have a situation whereby you know you get questions and you just give the answer right that's not how it works actually when you go out to work okay what's more important is to do the right things essentially you need to figure out like given the context of what you're doing right what is the right thing to do and that that frankly is not something that AI is any good for.
So if you know what question to ask, then you can have the answers.
So that part is gone, right?
Ben:
[51:16] But doing the right finger what to do or what question to ask is something that
Ben:
[51:22] I think at this point, AI isn't any good at.
So the real jobs that our students need to do when they grow up, I think there's still some gap in terms of getting there and being replaced.
And I also spoke to actually the MD of Goldman Sachs and asked him, hey, what's going to happen? Are you guys going to fire people, you know, given AI?
He said, no, yeah, you're kidding me, right? So what happens, he mentioned, is this.
In most companies, they have actually infinite many things they want to do, right?
But a lot of these things are kind of kept in the backlog because they don't have enough manpower to do it.
So for them, ChatGPT is like a minion, all right?
So now for every person, every employee they hire, right, they have some of this automated minion that can help this employee kind of do more, right?
So what they will expect in the future, is that for the same people they have today, right, they will do more work.
Ben:
[52:10] So, yeah. So, also, I don't think, so on that basis, I tell my students that, yeah, not to worry. AI will not take away jobs.
It will change the nature of jobs. It may actually cost you more grief because your bosses will now expect you to do more. Yeah.
But it isn't likely to affect the jobs. That's it.
There has been apparently a huge loss of jobs in India, right?
Because the way the jobs kind of pyramid works is that it's some kind of food chain, right?
And essentially what chat GPT and AI does is it kills off the bottom related to the food chain.
But basically, this doesn't affect advanced economies like US and Singapore.
We're too expensive to host these kinds of jobs. So at some level, I think, yeah, it will not affect us for some time.
Brent:
[52:52] Yeah. Right.
Ben:
[52:53] But if you are in this low-cost outsourcing centers, then I think the jobs are a problem.
Brent:
[52:57] I think in an ideal world, we will hopefully strive to not endlessly increase productivity, but maybe like, okay, this is enough.
Like, we can do 10x the work, but maybe we don't need to work the 80 hours, 100 hours is that the Goldman Sachs analysts do work.
Ben:
[53:12] I don't think it's going to happen. I mean, sorry. I mean, what you just said, right, is something that was said like 30 years ago, right?
No, no, it's not sacrilege. It's just structurally, right?
I tell you why it's not going to happen, right? Because employers are not charities.
I mean, fundamentally, they pay you X dollars, right?
I mean, it's the interest to squeeze out every single ounce of life, right? I mean, there's no good reason just because you can do more that you should do less, right?
So, and then the problem now is, in fact, it's going to get worse.
I tell you why, because of competition.
Brent:
[53:40] Yeah.
Ben:
[53:41] So, you know, the world has moved on, right? And I think the US and Singapore, the advanced economies are actually in deep trouble.
And this is especially true after COVID, right? Because what happened to COVID is that many employers realized that they don't actually need the warm body in the office.
So once you can have remote work, right? Oh my God.
That means that there are a lot more different kinds of work that people didn't realize could be outsourced before would be outsourced.
So I think in this slide, I think the advanced economy like US and Singapore are actually in trouble.
Because post-COVID, in the next decade or so, you're going to see significant amounts of outsourcing.
And another thing that's happening is really the kind of that many of the so-called developing nations have moved up the ranks in terms of the education.
So, I mean, in the early days, the Chinese students, they were not very proficient in English. But these days, you go to China, you can talk to all of them in English. They're fine.
So in China, English is a second language and many of them are proficient.
Right? Vietnam is another country, right? So Vietnam also, so the people are actually moving up the kind of the efficiency level in English, right? So they can take jobs now.
So, and then Eastern Europeans are also very cheap and they speak, many of them speak very good English.
So that has also increased the likelihood of jobs being outsourced.
Brent:
[54:57] Yeah, that's definitely very interestingly. We were in Indonesia speaking with a professor and her son, his first language is English now.
Ben:
[55:05] Yeah.
Brent:
[55:05] It's a very different world.
Ben:
[55:08] So I was just in Lahore over the weekend. Lahore is Pakistan.
Their native language is Urdu.
So interestingly, the professor was telling me, my friend was telling me that the young ones these days, some of them are more proficient in English than Urdu.
Because they watch too many American Hollywood movies and YouTube.
And also another thing that happens is that in many countries like Singapore or Pakistan, they actually use English as a medium of instruction in school yeah so so this trends a bit but it's a very practical reason because English is a is a kind of the most common language as for business so it makes sense it kind of makes sense and.
Brent:
[55:49] Even for grad school if you want to go.
Ben:
[55:50] Oh you have no choice yeah yeah.
Keller:
[55:53] Do you see a shift from institutions, especially in academia, with the dynamic of it's going to become more for students to ask better questions than it is to answer them with AI getting stronger?
Do you see institutions shifting the way that they teach?
Ben:
[56:09] Things are hard I mean most people don't care enough we care NUS actually I mean at least this department we've done a lot of work in recent years to try to move things away from just you know answering questions to do to that thing which is a lot more about, asking questions the whys and doing the right I mean doing the right things instead of doing the things right okay doing things right is important it's part of what we need to do but increasingly we've moved into this other regime of doing the right things, we've tried a lot of this in recent years but I'm not sure how many institutions have actually moved this way do.
Brent:
[56:42] You think institutions are the medium to like influence how people think about doing the right thing.
Ben:
[56:49] I think well most of many of you will spend 4 years in school in college, and not just that in fact before college there's K-12 so you spend 16 years of your life in school, well that's probably where you spend the vast majority of your life is likely where it's most effective to make any change right so if nothing happens there then how right i mean the other thing that happens is that after you graduate from college well the the this this thing called the brutal truth comes right i mean the real world hits you and then you learn from from mistakes but that's already a little bit painful so i guess it it's better if the school system whether college or k-12 will do more to to do the right things i think singapore is simple we realized that for a long time but it's very difficult to change the energy system.
In the sales department in the US, we have actually realized this for quite a few years and we've done quite a few things to try to move the kids along.
Brent:
[57:45] I think it's also easier given the connection between the government and the school system here because you guys can operate a lot quicker and in tandem.
Ben:
[57:54] Which is a very well-run and efficient country. You've just seen your time here.
Brent:
[57:59] Yeah, definitely.
Keller:
[58:00] So the AI for education technologies is kind of like what we're heading towards.
Is that an organization that's done in tandem with the Ministry of Education here?
Ben:
[58:13] Well, actually, I have another center called the AI Center for Educational Technologies.
But what it really is, is actually a grant. So I have this grant from government funded by my school.
Essentially, you have $10 million over five years to try to figure out, to solve problems that will improve education.
And given this day and age, given that we're doing CS, everything is about finding algorithms that improve education.
So the way we work is this. So one part of my work in the center in the first two years was to work with the administrative education to deploy what's called adaptive learning system. That has really been done.
So come next year, the administrative education is going to do a full trial.
I mean, it's not a trial anymore. It's kind of like we did a trial last year for what's called proof of concept.
But from next year, they're going to open it up to all the students.
Keller:
[59:06] That's K-12?
Ben:
[59:08] Okay. So the theory is that you should serve K-12. But right now, the content is limited to, I think, fourth grade or something.
But over time, they want to add more content to power the system.
So that part of my work is done. But in general, what my center does is that we look for problems that are impactful in education and then we figure out how we can build software to kind of solve them efficiently.
So let me just give you some examples.
So one very kind of problem close to home here in CS department here is that our enrollment has increased like four times over the last five years.
So when I was young, you know, and my life was pretty good, right?
We have 200 CS majors a year.
Okay. These days it's like 900. Okay.
Brent:
[59:54] So it's like- And how big is the school?
Ben:
[59:56] I mean, NUS as a school is, okay, we, I think if I'm not wrong, we take about 6,000 in undergraduates a year.
Brent:
[1:00:03] Okay.
Ben:
[1:00:04] All right. My faculty takes about 1,500, right? So we have one quarter of the school.
And CS department CS major is 900 students a year so that's almost, more than half of the faculty so in any case the point is that we have an affordable increase in the number of students what you do realise is that students take exams right, And right now, for all that's been done, paper exams are still the most convenient way of doing it. I mean, online exams have a lot of problems.
Brent:
[1:00:33] Yeah, exemplify.
Ben:
[1:00:34] Exemplify is a disaster.
Brent:
[1:00:36] I'll tell you, my computer is like, can't do a test.
Ben:
[1:00:38] Okay, so anyway, as a prof, I'm not going to use 75 because it causes more grief than anything, right?
Paper exams have done for the last 18 years and I'm good at giving exams.
There's also no hard-in cheating because you can't cheat easily, right? So we still issue paper exams.
Well, you must understand that at 900 students, right, The exam scripts weigh about 80 kg.
I'm not kidding you, right? So now what we do is we print two sets of scripts.
One will be a question scripts, one will be the answer scripts, right? So essentially we go to the exam with 80 kg of scripts.
You guys take home half the 40 kg. I ship home the other 40 kg in luggage, right? So that's what's really happening today.
So you laugh, but you can measure, right? I mean, it's really a problem, right?
So it's like 40 kg of scripts and you can't have one prof grade everything, right? So we need to split the scripts among many graders to grade.
And then in the past, it was a real disaster, right? Because you all have to lock everyone in the same room over several days to finish the marking. That's a real pain.
So now what we've done is that we've taken all the scripts, we've scanned all of them.
Yeah, and everybody can mark at home and grade the exam scripts at home.
And the system will automatically divide the scripts out by question among the graders.
We use computer vision to kind of like detect figure out which script belongs to which student we also can use computer vision to detect.
Ben:
[1:01:57] To read the MCQ bubbles and figure out you know what to answer and automatically grade those, beyond that right if you have this like exam questions that you answer on full scap full scap are this kind of lined paper right you can actually use computer vision to detect which question is on which page right because normally how the teachers grade is that they will grade the same question all in one set right But it's really annoying to have this flip around, find the right question.
So now we have automated computer vision algorithms to figure out which questions on which page and you just click on it, it will show you the page you just grade and then it records all the marks.
Ben:
[1:02:32] So this is something that we've done. This is something we've built.
It's already been deployed, right? So this is an online grading system.
Another thing that we have done, okay, another thing that we do here at NUS, at CS is that we teach programming, right?
So programming, it's easy to check whether the students have got the right answers by using test cases, right? So if they're green, they're correct, it's all good.
But where a student makes a mistake and it's red, right? We currently still need the TAs to go in there and actually provide comments, right, to help the students improve. proof. That's a very expensive process.
So what we've done is also we've built a system which is called Codavari.
Essentially, it's a kind of service whereby we ship the answers, the wrong answers to a server, to a kind of a server.
The server would then give you the feedback you need, right, for the students.
So what it does is that it automates the grading to some extent.
Okay, right now, as you rightly pointed out, right, there's still, okay, hallucination because we use this large language models.
In fact, we use GPT-4, right? So there's some hallucination.
So the answer you get back, right, the TA just does a check, right?
And if it's correct, they just publish the thing.
If it's a little bit wrong, then the TA can replace it.
Okay, but what it does is that it significantly reduces the amount of grading time, right? So we use technology to improve the grading.
Brent:
[1:03:46] Yeah.
Ben:
[1:03:46] Then the last thing is really this thing called adaptive learning, which is that we want to teach people better.
Let me tell you something, some obvious thing, right, which is that you know how the profs teach, right? The profs teach to the median student.
Brent:
[1:03:59] Yeah.
Ben:
[1:04:00] Do you blame them? You don't, right? Because you can't do otherwise, right? You go too slowly, then they're all bored to tears.
If you teach to the fastest students, then they're all lost, right?
So you kind of have to find the sweet spot, which is the median student, right? But do you realize something, right?
By teaching the median student, I'm actually leaving half the students behind by design.
I mean, it's obvious to you now that I'm telling you, but there's nothing we can do, right?
So ideally, what you want to do is to teach every student at a pace that the student needs to be taught at, okay?
But we can't really do it in the past because we have these classes in the run, okay?
So one of the key things that came during COVID was that everything was online, right?
So all the lectures are online, the homework's online.
So essentially what we can do is that, what we can do is that given a kind of a learning plan, which is determined by the median student, what I can do is that for the fast students, I can let them learn faster.
So for example, let's say in most classes, lectures are also on Monday, correct?
But you have a student who finishes the homework by Tuesday or Wednesday.
Then between Thursday to Sunday, the fellow is doing times, waiting for things to happen.
So what you can do if you realize that he's finished the homework is that instead of having the next lecture start on Monday, you can move it forward and start on Sunday, right?
Ben:
[1:05:16] And then, okay, so you give the next lecture on Sunday and then he finishes homework by Tuesday, right? Then, oh, this guy is really fast.
I can now give the next lecture on Friday.
So, essentially, by adapting to the pace at which the student is submitting homework and doing his thing, you can actually reduce the time and help me finish the learning faster, okay?
Now, on the other hand, right, you know, you're supposed to submit homework on Sunday but you can't make it and you submit on Monday, right?
Now, if you keep to the normal schedule, then this poor guy is always behind or he's always struggling to keep up, right?
So, that doesn't make sense, right? So in the same way, if you find that some students is having a homework overdue, he's not coping, the kind of the logical thing is to give him more time. So instead of finishing the course in 13 weeks, perhaps give him like 16 weeks, slowly spread up.
Ben:
[1:06:02] Now, of course, at this point, if you're paying attention, the question is, for the guys who are fast, why don't you give them all the materials and you can do it at your own time?
We tried it also. It doesn't quite work because what happens is that these students would then rush through the first part and they would kind of burn out and then lose interest and then it's a disaster.
So, interestingly, even for the fast students, you want to moderate them.
So, the key idea here is this, is that given an online system with all the homeworks all available, it is possible to kind of estimate the learning rates of students and to let them learn at the correct rates okay now my long term goal and vision is this is that instead of having this fixed exams right we decouple the learning from the exams so potentially think of it as driving tests so every three months potentially I have exams, okay right you learn your own pace and then you just book an exam date when you're ready and then you take the exam and then you fail then it's okay you take the exam again three months later so that is something that we are thinking of we are working towards which is is a more kind of a more adaptive system.
Ben:
[1:07:05] And beyond that, right? The question now is this, what is this adaptive, then how do we manage the students?
So right now, given the, you know, the OpenAI, this LLMs, right?
We're planning to build some kind of a, of a kind of a personal tutor, right?
Now, I will share something, something interesting with you.
I was told actually that the most popular LLM backed, It's actually virtual girlfriends. So apparently there are people who like to build these apps. You know this? Use one of those.
Keller:
[1:07:35] I don't use one now.
Brent:
[1:07:36] No, Kelly loves them.
Keller:
[1:07:37] I've heard about it.
Ben:
[1:07:39] Okay.
Keller:
[1:07:40] We've heard about long podcasts. They're essentially replacing dating apps as a whole.
Brent:
[1:07:44] Yeah.
Keller:
[1:07:45] People are using it. They're becoming so popular because the reception rates are high.
Brent:
[1:07:50] You can also tailor what type of connection you get. It's a sad future.
Ben:
[1:07:55] I have no idea. Clearly, I'm very married and I don't use apps that much.
But I was told, okay, and told you, I actually now, you know, confirming that I'm right, right, that this virtual girlfriends is the most popular kind of apps out there at this point, quite popular.
So anyway, to my idea, our idea is to build like a virtual prof, right? So instead of being a girlfriend, which is kind of frivolous and random, right?
It's a prof, right? So you have this kind of a learning, adaptive learning system, but on top, we kind of slap on a kind of virtual prof that talks to you.
But I don't think, we don't think that just the automated agent is enough.
So what will happen is that on top of this ultimate agent that kind of engages the students and most times we also have human instructors actually look at the interactions and potentially come in to interact as well.
Interestingly, the inputs that the human would kind of make in the system will be captured as further training data for the app.
So over time, because it turns out that what is true is that the domain of the, the students are meant to learn certain topics, right? And the domain main is finite and constrained.
And it turns out that as you teach the same thing year over year, the students ask the same questions all the time.
So I'm quite confident that what happens in this system over time is that we'll be able to answer all the students' queries mostly and capture that information.
And over time, the system will build up into some kind of a learning map that would basically capture the vast majority of the interactions.
Ben:
[1:09:23] So that's the key, which is that we want to achieve a kind of of adaptive learning for students and basically have some level of personalization and have the students feel like, you know, they're being taught by a real human even though, you know, sometimes the human comes in, sometimes it's actually the kind of open AI or LM-backed kind of system, yeah.
Brent:
[1:09:45] Yeah, that seems like really good because like on the way over here to NUS, I was using, ChatGPT to like teach me about some like AI processes and I was like asking questions, getting feedback, asking questions and like, oh, did I like, I explained it to it and then it's like yes you're correct here not correct there so it's already pretty good but this version seems like it's a little bit more narrow to whatever the scope of the class is right so.
Ben:
[1:10:08] The thing okay currently what chatGP chatGP is good for is sort of like a, question answer bot, right? It means you must actually have the question you want to ask before you answer.
But where does the question come from, right? Generally speaking, in a class context, it comes from the prof. The prof has some things he needs to teach you.
And frankly, the fact that you're asking chat GPT tells you one of two things.
Either one, the prof is ineffective and you have no idea what the hell he's talking about.
Ben:
[1:10:34] Or two, you're cheating on homework. You're supposed to do your homework yourself and not, right?
I mean, so the point here is that I guess in the first case whereby the prof doesn't actually exist, it doesn't tell you the thing, I'm going to use use my system to kind of inform you and another observation is that i don't know what what you guys use but telegram is really common yeah right most of the young people okay people nobody uses facebook nowadays young ones don't right facebook is for old people right so now the young ones are on telegram telegram is a really good uh interface and uh you can build telegram bots so essentially our current uh what we're trying to do is to build a telegram bot that does what i said that's exciting we'll be done in a couple of months so this is what my center does essentially we look at real problems.
We look at how to teach better, right? And we build a software to actually execute this and we are real users.
So that means that what we do is not just about publishing papers and doing some random experiments.
We actually do essentially deploy the systems for real, right?
And see how they work and we kind of iterate to improve them.
Keller:
[1:11:37] Do you see the future of education being mainly online? Or do you see a continued role role for like a hybrid or in-person classes?
Ben:
[1:11:45] You know, if you ask me this question before COVID, I would have told you no, I would have said no.
But something has changed post-COVID. I'm not sure if you see it in Davis, but many of us don't come to class.
Brent:
[1:11:58] Yeah, we barely showed up this quarter.
Ben:
[1:12:00] I'm sorry?
Brent:
[1:12:00] We barely showed up to class this year.
Ben:
[1:12:02] Well, do you show up enough times to see what else is showing up?
Well, it turns out that, yeah, many students don't show up to class.
So your question is whether the future of education is online.
I think there will always be an offline, I mean, an in-person component.
I mean, there's some kids that will want to come to class, and actually it's good for them to come to class, okay? Okay, but given, you know, that the behaviors of the young people has changed, kids are just not coming to class.
Then if they don't come to class, you still want to teach them how else could you do it, right?
So I think the circumstances and how things have changed post-COVID, right, will kind of force people, educators and teachers, to think about how to teach better online.
I think this is something that people have been looking at for a long time, right? But prior to COVID, I think most of the students still come to class.
So the online component isn't quite so important. But moving forward, I think it's a matter of how the world's evolved.
I think online is going to be very, very, very, very critical to at least a certain proportion of the population.
Brent:
[1:13:10] Yeah, I think hybrid classes are really good, especially for large lecture halls.
Because especially nowadays, all these kids are scared to ask questions, like raise your hand, get involved.
So everyone just basically walks in, sits in a classroom full of two, 300 people, hears a lecture goes back and the reason i don't show up to a lot of those is i can watch it at two times the speed not waste time traveling and get done with the class quicker and then if i have questions especially if there's discussion groups.
Ben:
[1:13:41] Yeah go.
Brent:
[1:13:42] In and like that's where you get more of the material or i'll be okay this is the basics and now i go and like off on my own and try to figure out the rest.
Ben:
[1:13:49] I i think i buy i dig the part whereby you can save time traveling but i'm not sure it's good at two times the speed because i think most students have trouble even watching and understand what's going on at one time to speed and i think two times the speed probably does them more of a disservice that really helps them but they don't think so but really something them yeah i.
Brent:
[1:14:08] Feel like i focus way better because like you got to stay on task.
Ben:
[1:14:11] Well i guess a lot of it has to do with the prof right i mean yeah i speak pretty fast yeah so i guess uh you know you can probably get through my class at normal speeds without worrying about two times, but if some prof slowly speaks I mean so I agree maybe some prof speak too slowly and they're not very efficient, but keep in mind that we're dealing with not you know a homogeneous population but we're teaching the median so at some level part of the skill of a prof in a class is to figure out how to manage and bring the whole class along even though there are different abilities, but unlike some advantages that I mentioned whereby you can sort of in some cases adapt to the pace of the learner but the challenge here is that what happens, in practice is that you know the bottom 20% of the poor performance students, actually have a lot of like discipline problems they can't get to act together, and having them stay at home and not come to class right actually makes things much worse yeah definitely yeah.
Keller:
[1:15:11] And then on one of your sites, you had the phrasing of gamifying e-learning.
Could you explain that concept a little bit and then like how that could serve to benefit online learning, especially with that discipline aspect, how to get people really more involved and enjoy the learning process?
Ben:
[1:15:26] This is old stuff, right? This is something I did more than 10 years ago.
And it's not a new idea. It's this thing about gamifying the class.
Okay, so what we do, okay, you guys are probably used to this idea whereby you
Ben:
[1:15:37] do some homeworks, right?
And then at the end of the whole semester, semester right there every piece of homework is worth some weightage right some component you have this weighted average of the all the homeworks you've done and that's your final grade for the semester okay that's how well every school does it that's how you know they did it when i was in school right uh but what i did about 10 years ago is that i've kind of invented a new system right and what that system does is that everything is converted into this thing called experience points so every time you do homework you get experience points you make some mistakes you get a bit less but still get some you go talk to class I give some experience points you know you go to forum you say some smart and clever things you get experience points you do some optional homework to get experience points I give a survey form you fill out the form I give experience points so basically you know everything that I initially do you do it you get rewarded and interestingly right the modern students, are very well trained by games to kind of like you know collect experience points and level up and what happens at the end of the whole class is that your grade, your continual assessment grade is the level you reach in a class.
And that works very well. I mean, and I've used it the last 10 years and it's actually quite good.
I mean, most students like it. They actually will do homework, right?
Ben:
[1:16:53] And in turn of a class, if you do that. So it's worked out quite well.
But let me tell you why it's not common because it's actually very hard to execute.
All right so first of all i built a my own learning management system called cosmology that supports this right from the start so basically in my system in my lms which is like canvas right it will do all your usual things but on top of that i can you know step on a kind of a gamified layer whereby i can assign experience points to videos to you watch video experience points i give your surveys and turns off of homeworks so automatically the system will assign these experience points to you when you do your homework some of your assignments, so with that yeah I run this different system so in my classes I don't bother with this weightage of different assignments at the end of the semester I just read off the experience level of the students at the end of the semester and that's your final grade is there an.
Brent:
[1:17:45] Issue with adhering to a bell curve with that.
Ben:
[1:17:49] No there's no I mean you can I mean the students you set up the experience point leveling system properly you can still get a bell curve I mean the students don't always finish all the homeworks yeah Yeah, it's okay.
And also the bell curve, I don't actually use the homework component much as part of bell curve because it's not very fair.
Cheating happens, right? So generally speaking, the experience points, consumer assessment part of the course is mainly used for what's called formative assessment.
So you do the stuff, hopefully you learn something and then you get rewarded for doing work, for hard work.
Basically, in this part of the course, course you are punished for not doing homework right and then uh as for the bell curve normally what i do is set good exams to try to force that curve up yeah and that's a fair way of doing it okay yeah right and normally you know if you do your homework well then it's correlated to doing one exams yeah definitely that's my lesson we've.
Keller:
[1:18:47] Covered a lot of topics in this conversation is there anything else you want
Keller:
[1:18:50] to touch on any advice you have to students.
Ben:
[1:18:51] Advice um, I think it's important to explore and to learn, figure out what you guys like and are good at. I think the future is complicated.
It's not just about graduating, getting a degree.
That itself probably will not be the most important thing in terms of career success.
And as we spoke about AI in this new economy, it cannot be that people focus on just broad learning and doing the things that they're taught to do.
I think they need to figure out how to ask the right questions, right?
And actually, you know, doing the right things instead of doing the things right.
I mean, okay, wrong. I won't say instead.
I think it's still important to do the things right, but that's not enough.
Okay? Because I think, you know, in the future economy, you are probably better off, right?
Doing the right things, you know, you know, possibly well, not very well, rather than doing the wrong things perfectly well.
So I think that's, that's about it. And then, And don't worry too much about, you know, the grades in school, but focus on learning stuff that matters.
Ben:
[1:20:01] And it is true that, you know, soft skills and people skills are much more important.
Communication skills are much more important.
So being able to write well and speak well is really important.
And I think there's something that is often not recognized by the kids when they're in school.
You know, my teachers, to be honest, I'm kind of like guilty.
When I was younger you know my high school teachers always told me you know learn to write better speak better then I'm like oh okay but I never really believed them right but later on as you grow up and you look around and see how life works and how jobs work then you realize that they're right wonderful.
Keller:
[1:20:36] Thank you thank you.
Ben:
[1:20:38] Welcome thanks guys.