S2E05 - Understanding and Working With Tech Debt

0:02

Péter (2): Hey everyone, welcome to the retrospective, the engineering

0:06

leadership podcast where we talk about topics for engineering manager that

0:11

they face in their day to day jobs.

0:14

me here is Jeremy and I'm, I'm Peter.

0:17

Let's get into it.

0:19

Jeremy: Hey Peter, it's great to be here today and I'm really enjoying

0:24

Season 2 and the topics we're covering.

0:26

Péter (2): Same.

0:27

Jeremy: So for the, for the listeners, just the way that we've been planning

0:30

this season is that we've been taking turns bringing a topic each.

0:34

So this week it's, it's your topic.

0:36

So, what are you bringing today for us to talk about?

0:39

Péter (2): I have a nice, beautiful surprise for you,

0:41

which is called TechDept.

0:44

think it's a favorite of a lot of engineers and engineering managers

0:47

and probably business leaders also, not necessarily in a good way.

0:52

We're not going to solve TechDept in 30 minutes, but what we can do is share

0:58

our combined experience on the field.

1:00

And, uh, yeah, let's talk about TechDept.

1:02

Yeah.

1:06

Jeremy: um, partly because in a lot of companies where there are,

1:09

I think we're in, you know, we, we hear the end of zero interest.

1:13

We hear a lot of focus on, companies that are having to do more with less.

1:18

There's a, there's a high pressure, especially in startup land and a

1:23

lot of pressure to ship more in a very competitive environment,

1:27

certainly that's something that we're, I'm experiencing in my job.

1:31

And, um, at the same time we have to manage this mystical

1:35

thing called tech debt.

1:36

And, uh, so I think it's a very timely topic for the, for the

1:40

period of time we're in right now.

1:42

Péter (2): Yeah, and another aspect why I think it's super

1:45

timely is AI assisted coding.

1:47

I'm pretty sure that in every companies there are a lot of code checked in

1:51

the code base that was written by or with the huge assistance of AI.

1:57

if it doesn't work so well, that's going to increase your tech debt.

2:01

Big way, and maybe not so obvious ways, uh, initially.

2:04

Jeremy: Actually any interesting anecdote, Google said that 25

2:07

percent of code at Google is now written by AI and, um, where are you?

2:13

Péter (2): like 25 percent of what, like, uh, yeah, sure.

2:17

The line breaks, the prints, the, the, the returns, the functions,

2:22

there is a lot of text in code that's, that could be easily automatized.

2:26

Jeremy: I think it's auto complete.

2:27

I think it's just fancy auto complete.

2:29

Um, and we've seen that.

2:31

Péter (2): underplay, it's super

2:31

Jeremy: Yeah,

2:32

Péter (2): but yeah,

2:33

Jeremy: yeah, we've seen that we're using an AI, , plugin, called Codium

2:37

and it gives us stats and it says it's done 40 percent of coding for us.

2:40

But I think most of it is just a fancy auto complete.

2:44

That you're going to be doing anyway, it's kind of like, autocomplete in, Grammarly

2:48

or in, I don't know, in Gmail when it's completing sentence this that you would

2:51

have written anyway, that's, yeah,

2:54

Péter (2): Yeah.

2:55

Saving time and as a side effect might increase your tech debt.

2:58

Jeremy: exactly because unintended things creep in, but, yeah, so maybe if you can

3:04

help us level set things a little bit, Peter, how would you define, tech debt?

3:08

Péter (2): Yeah.

3:09

Yeah.

3:09

I actually thought a lot about that and I came up with a long, uh, definition,

3:16

long sentence, but, uh, I will unpack it.

3:19

So my definition of tech debt is our best decisions, negative effect.

3:24

on the quality of our systems, limiting our options today.

3:28

There's three aspects to that, past decisions.

3:31

I wanted to capture that, uh, with tech, you created tech debt.

3:35

Tech debt is, is, is something that you live with today, but it was your past

3:39

decisions that resulted in some tech debt.

3:42

I want to start here by explaining that,

3:45

Jeremy: Um,

4:13

Péter (2): code health in, in, in a lot of different ways.

4:16

And systems can mean anything, technical, your code, your infrastructure, maybe

4:22

even your processes, CI, CD pipelines,

4:24

Jeremy: Um, Um,

4:47

Péter (2): options.

4:48

You can have, uh, you cannot cut costs so easily.

4:52

You cannot pivot to a different feature.

4:55

You cannot re architecture your system to today's needs so easily.

4:59

So it's basically less less options you have in in in what you're doing.

5:06

Jeremy: Yeah, I really like that definition, Peter, because yeah, we, we

5:10

do need to on purpose take decisions.

5:13

And in fact, tech debt can accumulate from very sensible decisions where we

5:17

choose to, be pragmatic architectures, not overbuilding for the future.

5:23

And then over time as things scale and grow, we need to constantly adjust.

5:28

For that.

5:49

Péter (2): that you don't want to miss and you cannot have, have it all.

5:52

It's always a compromise.

5:53

So you will need to compromise on the quality and increase

5:56

the tech depth of your systems.

5:58

I like the mortgage analogy taking out a big loan to be able to buy a

6:03

house where you can live rent free.

6:06

Oftentimes it's a good decision, arguably, and, it limits your options today.

6:10

You, you cannot, maybe you cannot buy a new car for a year now.

6:14

Maybe you cannot go on a big vacation for a while because you keep on have to

6:18

pay back, uh, this monthly installments.

6:20

So yeah, tech deft is not necessarily bad.

6:24

important thing is it should be conscious.

6:26

The result of conscious Decisions and not just happenstance and accidents.

6:33

Jeremy: Yeah, yeah, I'm thinking about this also from explaining

6:36

it to someone in the business.

6:38

Think your mortgage analogy can work really well.

6:41

I think another way to describe it is, you know, you used an example earlier

6:44

where we We try, cut some corners to try and get something out the door fast.

6:51

I was thinking about this one where you have a plumber that, comes and fits a new

6:55

sink and they don't actually have all the pipes, so they fit something in cardboard.

7:00

That's very weird analogy, but we do do this in tech, um, but if

7:04

it's some cardboard pipes and some tape stuff up and that can work.

7:08

Potentially temporarily, um, maybe not for too long, but if you increase

7:12

the pressure or if you let it run for too long, then that's going to fail.

7:17

And oftentimes that's the kind of tech debt that we, we do.

7:20

We build something just enough to ship, but then that cardboard needs to be

7:25

replaced, and put in real pipes and so on.

7:27

And that's again, again, the kind of classical approach

7:30

to tech debt that we have.

7:31

You know,

7:31

Péter (2): Yeah.

7:32

That's, that's, that's a good analogy.

7:33

Cause cause Using cardboard piping, whatever that is, it's, it's still

7:38

better than having no pipes at all and having water coming off

7:42

your sink to your feet and, and

7:44

Jeremy: yeah, or potentially a sink that doesn't work for a bit,

7:47

but you really need it, you know?

7:48

Péter (2): yeah, yeah, exactly.

7:49

Jeremy: Yeah.

7:50

Okay.

7:50

Have you come across any other good ways to explain this to the business?

7:54

Because I think one of the big challenges with tech debt is just

7:58

actually explaining the concept to business stakeholders and even users.

8:02

Honestly, sometimes we have to explain to users why things are taking longer.

8:11

Péter (2): any approach works that connect the limited options that I

8:18

was talking about in the definition to the business, business dictionary.

8:23

We have limited options in quality, stakeholders will feel the pain when

8:28

it takes two days to get back from an incident as a result of some tech

8:32

debt in your processes on pipelines or infrastructure, for example.

8:38

Stakeholders don't care about, the code health, the duplications, the redundancy,

8:43

everything that makes code problematic.

8:46

But they do care about quality.

8:48

They do care about costs.

8:49

They do care about, uh, user satisfaction.

8:54

You as an engineering manager, the listener of this podcast, your

8:57

job is to translate between these stakeholders and engineering, and make

9:03

the connection the impact of that in in the business goals and outcomes.

9:10

Jeremy: Yeah, I really like that.

9:11

Just connect the lack of options you have today and the negative effect that

9:19

past decisions had on the quality and explain that using different analogies.

9:25

Péter (2): Yeah.

9:26

Yeah.

9:27

Jeremy: Yeah.

9:52

Péter (2): If the business is in a big cost cutting exercise.

9:55

You can tie your tech debt initiative to that and focus on the infrastructure.

9:59

Maybe, uh, where you can decommission some old services by moving them to

10:05

two newer ones and save some money.

10:07

It's, it's, it's not a fight.

10:09

It's a discussion between, between the business and the engineering side.

10:13

Jeremy: Yeah, I agree.

10:14

So how as a team, when you're a team, how do you tell when

10:18

you have too much tech debt?

10:19

How do you measure, how do you detect it?

10:22

Like, what's, what's your, what are your thoughts on, on that?

10:25

Péter (2): Yeah.

10:26

I really like that you use the word measure because talking about data

10:30

is, uh, much more productive than talking about feelings and beliefs.

10:34

I'm pretty sure that there are a lot of engineers who are frustrated

10:39

about tech debt with the code they work with, but some of those people

10:43

will probably struggle if they have to point to some numbers, what they

10:47

mean by tech that, uh, exactly.

10:49

So fortunately we have a lot of, good metrics in the industry that

10:52

can be used and satisfy the need to connect this assessment to business.

10:58

What, what we were talking about, metrics can, can come as hand a pretty big

11:04

industry standard 2024, when we are.

11:08

Talking about this, especially metrics like the lead time for changes or

11:13

or failure rate or something that can tell a lot about the system.

11:18

The time to restore service.

11:19

I really like this metric because there are a lot of things that can impact this

11:23

metric and a lot of those are tied to tech depths in various parts of your

11:28

system's infrastructure code or processes.

11:31

So you can use Dora metrics, you can be more direct and measure some

11:37

quality metrics number of bugs or number of new bugs creating the last

11:41

three months, response times, of your systems, like more performance metrics.

11:46

It really depends on the organization.

11:48

What are the goals of your organization?

11:51

What is the business trying to achieve and finding those metrics and , bringing

11:56

in some depth, uh, aspects can, can help.

12:00

Um,

12:01

Jeremy: I think the other part for me is, is toil, Not fun stuff that you have to do

12:07

and I have a really great example of this.

12:09

There was a infrastructure engineer, in one of the companies I worked

12:13

at, where we had a very manual process to deploy the application.

12:16

And he recorded a video of what the deployment process was, and it was

12:22

over 300 clicks and, manual things taken to get a whole thing live.

12:28

And obviously that was to explain what was needed to be done and to

12:34

show very vividly the tech debt.

12:36

And obviously we invested heavily in automation and, we got to the point

12:40

where it was just click a button.

12:42

The whole thing was done automatically.

12:44

So

12:45

Péter (2): I really love that story.

12:47

Uh, the tool this engineer was using recording themselves doing this actual

12:52

process, it's very powerful and yeah, I don't need to say it, but, uh, those

12:57

are 300 ways this process can break.

12:59

I'm pretty sure.

13:01

There were a few occasions when some of those broken.

13:04

Jeremy: all the time, because how do you remember and that process was

13:07

already an improvement because the original one, when that company was

13:11

acquired, the first time it was all deployed from one person's laptop.

13:15

So at least

13:16

Péter (2): wow.

13:17

Jeremy: the process that he videoed was being able to be

13:20

run from multiple places, but,

13:22

Péter (2): you virtualize that guy's laptop?

13:25

Jeremy: No, but, uh, yeah, I mean, you just invest.

13:28

Yeah, I think sometimes it's really, really obvious.

13:32

It's like the elephant, um, and you know, sometimes it's, it's less obvious.

13:36

So depending on the situation you have different kind of ways

13:40

of checking and tracking it.

13:42

Péter (2): It's important.

13:43

You say sometimes it's really obvious.

13:44

Of course it is for an engineer.

13:46

And I would say even for a non engineered leader seeing that video of

13:49

the 300 steps that it takes to deploy the application, it's very obvious.

13:54

They will say, Oh, my God, I feel your pain.

13:56

I promise you we are done with this sales cycle.

13:59

We are done with the next product launch.

14:01

You can, but they don't Actually feel the impact on the business because

14:05

what they are walking away from the demonstration is that, Oh, poor Joe.

14:09

He has to go through 300 steps every day.

14:11

That's not the impact.

14:12

The impact is all the ways this can break and, and all the ways you will

14:17

struggle to release a new feature when, when something is changing this process.

14:22

And that's what the business feels that the users don't get the new feature.

14:25

Features take longer.

14:26

Features are buggy incidents are happening.

14:29

So I would say even when it's obvious.

14:32

Think about the users of your systems and the business stakeholders and

14:36

make it obvious for them the pain, what the option limitations are.

14:40

Jeremy: And actually, in this case, the users were being impacted because things

14:46

would go wrong in that process, as you say, and there were regularly fires.

14:51

So that was easy.

14:52

It was an easy sell.

14:53

Péter (2): yeah, yeah.

14:54

Okay.

14:54

Jeremy: Those are the obvious situations, but what about in a situation where

14:58

it's, it's less obvious and there's more of a disagreement about tech debt.

15:01

How do you come to an agreement then inside a team or between a team and

15:06

its stakeholders that you need to invest on tech debt or that it's

15:10

acceptable to leave it as it is.

15:12

Péter (2): That's a really good point because because we were talking about

15:14

convincing the business, but oftentimes you need to convince your engineering

15:18

peers that we should address this type that and not that type that.

15:22

What you can do is I like to think in teams and not in individuals.

15:26

So in this case, I would suggest a practice something like attack

15:31

that tour like every other sprint or some regular occurrences.

15:35

You pick a different engineer and they hold a short 15 minute walkthrough of

15:40

the part of the code base or the part of the processor or infrastructure

15:45

that they are most concerned about.

15:46

They explain just like this 300 step video.

15:49

That's a perfect example for

15:50

Jeremy: Silence.

16:12

Péter (2): base.

16:13

Maybe we could address this together.

16:14

So elevating the understanding and the knowledge of the team, with these

16:20

regular tech debt tours, I think it can be a good first step to, to create

16:25

alignment on the engineering side, that this is the most impactful or most

16:30

urgent thing that we want to address.

16:32

In the next time, and then you can tie to the business goals and

16:37

impact.

16:38

Jeremy: no, for me, I think it's really important when we talk about the team here

16:41

that the product manager is involved in that because, , often I think engineers

16:46

can quickly see and agree on on things when it's made more visible like that.

16:52

But product manager it's probably caring less about the tech debt

16:57

and they still have a lot of the concerns of the business.

16:59

And , it's super important that they understand the impact on the

17:02

team to be able to protect the team when you take a decision to work

17:08

on tech debt and, um, they need to be part of that, group as well.

17:13

Péter (2): I totally agree.

17:14

And thanks for bringing this up.

17:15

I'm a bit biased because I was leading platform engineering teams most recently.

17:19

And I didn't have product managers in my teams.

17:22

I had a very good product partner on the level, but it's super important

17:27

what you're saying to include the product side in this alignment off of

17:31

seeing the problem, because oftentimes when priorities are decided, you're

17:36

not even going to be in that room.

17:37

It's the product manager of the team who's going to have to fight for for

17:41

the engineering side also to make the pain visible and get some leadership

17:47

buying and stakeholder buying to address.

17:49

Some of the tech devs, so definitely include the PM in these discussions.

17:54

Jeremy: Yeah.

17:54

And I think, um, going back to the point you made earlier about

17:58

data and measuring and so on.

18:00

The data bit is really important.

18:02

In the past, , we've done things like measuring the types of work or the

18:06

signals of not having invested on tech debt, like whether it's an increase in

18:10

bugs, , and some of the other metrics that you mentioned earlier, but I

18:14

think those are, those are important signals as part of this discussion about

18:19

aligning on, on tech debt and what to do.

18:22

Péter (2): they can be very powerful because they bring the discussion to

18:25

principles to how we want to operate.

18:27

For example, if you manage to track where you're spending your time, is it feature

18:32

development, unplanned work, and then the organization can have an ideal and

18:38

say, okay, I want to spend numbers 80 percent of time on feature development

18:44

and 20 percent or everything else.

18:46

And you come back with metrics and say in the last two quarters, we could only spend

18:50

60 percent of time on feature development.

18:53

And that's the impact of the tech that we are having.

18:56

So we would like to double down on that in the next quarter so we can get this amount

19:00

of time closer to the ideal 80 percent that we set as a goal as an organization.

19:06

Again, you're bringing the discussion.

19:09

to talk about interpreting data and impacting data and not about

19:14

philosophies like code should be beautiful and, uh, don't know, or,

19:19

of thing approaches that are not useful in business discussions.

19:23

Jeremy: Exactly.

19:23

And I will say the classic thing that a product manager or a business person,

19:28

uh, will do is say, listen, I just need you to deliver these features.

19:33

And, uh,

19:35

I promise you, I just need it for this next six months because it's super

19:39

crucial for our business right now.

19:41

And then we can do some of those other projects that you have that you talked

19:44

about, but no, we can't do those.

19:46

And we need to do this.

19:48

And then six months later.

19:49

Business, still in a crunch and the platform starts to burn

19:53

more, but you need to continue.

19:56

So that starts to make me think about, what are good strategies

20:00

for managing tech debt?

20:02

Are there different approaches?

20:03

What would you recommend, Peter?

20:05

Peter.

20:05

Silence.

20:24

Péter (2): but your, your house is not on fire.

20:26

So this is, this is a critical stage because you need to maintain this

20:31

and maybe gradually improve this.

20:33

One approach I find useful in these situations is pairing tech

20:37

dev work with feature development.

20:39

It's basically, uh, you know, there is this, I think it's from

20:42

the Boy Scouts that, leave your camping site cleaner than you found.

20:45

I think it applies to code and everything as an engineering team touch.

20:50

Like, uh, you develop a feature about, I don't know, uh, newsletters, then take a

20:57

look at how email sending is implemented and maybe refactor it slightly.

21:01

To, to make it healthier, do some small refactors that are

21:05

adapting it better to today's needs compared to yesterday's needs.

21:09

So pairing this work with feature development has two big benefits.

21:14

One is that you're in the context and mindset of that current feature.

21:19

You don't

21:20

to dig through documentation.

21:21

What is it that we are doing here?

21:23

Oftentimes.

21:25

is your documentation and nobody knows why we are doing things the

21:29

way we are doing it, but that's how it's implemented in the code.

21:32

If you're touching a feature like this that has a lot of historical parts

21:37

implemented, you can discuss with your product manager and remove some of those

21:41

while doing the feature development.

21:43

Jeremy: You

21:56

Péter (2): bit longer for, for the completion of the

21:59

feature, because you keep

22:01

Jeremy: Silence.

22:16

Péter (2): a lot of discipline, but it can be very efficient.

22:20

Jeremy: Yeah, I think the discipline is to let the scope creep of the

22:24

tech debt refactoring overtake too much that the situation.

22:28

Um, but I mean, I've, I've seen a really good example of that

22:31

recently where a staff engineer was working in the code base.

22:35

They found.

22:36

Some part of it that had a lot of toil to update versions.

22:41

And so they improved how we used the renovate bot to

22:44

better do that in the future.

22:46

classic fixing debt as you go approach.

22:49

Péter (2): Yeah, now you mentioned risks and it's very important.

22:53

Getting carried away on and refactoring stuff that are

22:55

not necessarily in the scope.

22:57

And, and then you didn't deliver the feature, but you're already

23:00

two weeks late and the other risk I would say is premature optimization.

23:05

It's when you think that, oh, there is this new feature newsletter sending,

23:11

maybe I should have a separate service that handles emails and maybe I should

23:15

have it done with an auto scaling microservices infrastructure and then you

23:19

start working on those and then Next week, the product decision is that actually,

23:24

which is going to use a third party for newsletters because it scales better

23:28

and all those work, all those premature optimizations was a waste of time.

23:32

So

23:33

Jeremy: Exactly.

23:34

Péter (2): Only solve today's problems.

23:35

But still paired the tech depth work with feature development.

23:39

Jeremy: It's really funny because I really like this example.

23:41

You just used the newsletter, sending thing because that's a

23:46

really good pragmatic decision.

23:47

It should be made like that.

23:48

Most situations.

23:50

Um, but the funny thing is that might become tech debt later

23:53

when you realize that, um, Yeah.

23:55

I know we're not going to use that external service and we are going,

23:58

we're going to need to scale it more.

23:59

And that's perfect.

24:01

You know, that, then that's a good moment to work on, on changing it.

24:05

So that's just a good example of how

24:08

choice now.

24:08

And then it's just a loop, you improve later.

24:11

You'll need to improve it later.

24:12

Péter (2): Yeah, you cannot solve tomorrow's problems because you

24:16

don't know tomorrow's context.

24:17

This is exactly why you need to deliver stuff quickly to get the

24:22

validation and feedback from your users.

24:24

And those information will affect your decisions tomorrow.

24:28

And maybe you will double down on in house built infrastructure.

24:32

Maybe you will pivot to external.

24:34

don't know.

24:34

Nobody knows.

24:35

That's the point.

24:36

You just solve today's problems.

24:39

Jeremy: Exactly.

24:40

Now those are all the kind of like the small ongoing things.

24:43

Péter (2): the big one.

24:44

Jeremy: Yeah.

24:45

Sometimes you have something really big, like that example of

24:48

the manual deployment process or others, how do you handle those?

24:53

Péter (2): I read it in, I think in one of Will Larson's books is that

24:56

the only solid, reliable way in these situations is a migration to a new system.

25:02

There is a point where your old code base, your old system is just

25:05

so Hard to work with so ridden with tech debt that it's easier to throw

25:10

everything out and start from scratch.

25:12

Now, this is not how you should do it.

25:13

And we can talk about how you should migrate to a new system.

25:17

But the goal should be this.

25:19

You leave behind the old mess because you save a lot of time trying to

25:23

discover why is it working like that?

25:26

What are the features that are still needed?

25:28

What is not?

25:29

It's, it's easier to move on to a new system from, from scratch.

25:33

Jeremy: So I have two questions when you say that one is how do I

25:36

know that I need to migrate because I've seen cases where people have

25:41

migrated or doing a rewrite when they absolutely did not need to do it.

25:45

Okay.

25:55

Péter (2): wrote their ideal to do application or commenting system

25:59

and everything and you don't, yeah, I don't like the rewrite word, but

26:03

To answer your question, um, this is why we have good engineers.

26:07

And, I hear myself that this is not a satisfactory answer, but, uh, there

26:11

are points where you can see that the, the, the system needs to move on, maybe

26:15

actually, maybe the way I can answer this is the technique of how I think

26:18

you should do this kind of migrations.

26:21

You know, the strangler fig pattern, uh, those of us who are

26:25

listening, Might not be familiar.

26:27

It's the concept is coming from biology, uh, about some kind of plants.

26:32

The point of it is that you isolate parts in your system.

26:36

You have a big, complex, interconnected system and you isolate a part

26:41

in it that has less connections, ideally with the rest of the system.

26:45

You make those connections very clear and, uh, this isolation, within this

26:52

isolated part of the system, you do some refactoring that the rest of

26:57

the system doesn't see happening.

26:59

So if you take this approach, what you get is that, you can

27:04

basically work in a live system.

27:06

You avoid this big boom migration when you say that, okay, on Monday we start a new

27:11

system, we shot down the old one and there is two weeks off of fires everywhere.

27:16

You have a much more scalable and reliable migration process.

27:20

You also increase.

27:22

confidence in the quality of the system, because isolating those connections

27:28

between the system you selected and the rest of it, the best way to do it is

27:32

to write a lot of tests to ensure that your system is still returning the stuff

27:37

that the rest of it is, is requiring.

27:39

And once you're done with that, you have those tests in your CI CD

27:42

pipelines every time you make a change.

27:45

And then maybe you don't even need to decide if it's just a small

27:48

refactoring, or if it's an entire rewrite, because it's rarely very binary.

27:52

Oftentimes it's a scale.

27:54

So maybe you can do like you isolated one system, you rewrote

27:58

it in the modern, uh, way that you want to isolate another rewrote it.

28:03

And you reach a point where 80 percent of your system is, is, uh, called modern.

28:08

I don't want to get into deeper.

28:09

What is that?

28:10

But you see what I mean?

28:12

And 20 percent is still the old one.

28:14

And maybe you're fine with that.

28:15

Like maybe that 20%, it doesn't worth the effort to address the

28:20

majority of your system is fine.

28:21

And maybe you will decommission it in two years or, or whatever.

28:25

So actually, I think this is a good way to look at this instead of rewrite everything

28:30

from scratch, you take parts isolated and migrate those little parts one by one.

28:35

Keep on evaluating if.

28:37

You still need to pay attention to tag that or you reach the

28:40

point where it's okay to leave it.

28:42

Jeremy: Yeah, I think, I can probably list some bad reasons

28:48

why you shouldn't do it.

28:50

I have some examples.

28:52

There was one stage we had, um, A PHP, um, backend with a,

28:57

a modern JavaScript front end.

28:59

The, the PHP backend was actually very well written.

29:02

Structured a lot of tests, thousands of tests, things

29:05

were fast, CI was really good.

29:07

And the new , CTO.

29:09

Who knew Python, but didn't know PHP , he didn't land and there were challenges

29:14

with that PHP backend, but, uh, he decided that he wanted to re write, the backend

29:20

in Python and, that's a total reskilling of it and then, it was really funny

29:26

was the CTO decided that they didn't like view, which the front end was in.

29:29

So we did the front end and react., and, uh, you know those are the

29:34

wrong reasons to rewrite things.

29:36

I mean, There could be some valid business reasons to do that, but in those cases,

29:41

I really felt like, having known that system and worked in it and then seeing

29:44

those decisions after, um, I lived with things that I felt were suboptimal,

29:49

but I focused on the business value.

29:51

Whereas, those rewrites generated very little value for

29:54

the business, and obviously distracted the team significantly.

29:57

Péter (2): This is why it's very useful to back to our earlier point

30:02

is that you talk about tech that you should talk about the options.

30:07

It's limiting you and the business outcomes and how it's painful and

30:11

hurtful for the business to live with this tech that every day.

30:14

In your example, maybe I'm a Overassuming.

30:18

But, uh, I think what your team was optimizing for is the comfort

30:22

of the CTO with the tech stack.

30:24

And that's not something you spend months and months off developers time

30:30

and engineering time to optimize for.

30:32

And switching Both front end and back end, the wall stack, it's

30:39

super risky and super expensive.

30:40

And sometimes that's a good solution.

30:42

Like your product was pivoting a lot and you're doing a very

30:47

differently, then maybe that's a good opportunity to change that also.

30:50

Jeremy: I do think there has to be a real business value

30:52

justification for changing text.

30:55

Péter (2): be everything that should drive these decisions.

30:57

Cause ultimately the goal of the company is to make money, everything

31:00

else is just a means to those and

31:03

Jeremy: Exactly.

31:04

So thinking more , tactically , our podcast is really targeting

31:09

engineering managers.

31:10

Um, what.

31:11

Péter (2): there.

31:12

Jeremy: Yeah.

31:13

What advice or, and, how would you recommend engineers decide, what

31:17

tech debt to tackle in the sprint they're planning at the moment, or,

31:20

you know, the, the next month ahead, how should they prioritize that?

31:24

Péter (2): yeah, that's, I, I like that lot of stuff that you

31:27

don't mention in this question, which is implied, which is great.

31:30

Like, uh, is a healthy process of every sprint tackling some of the tech debt.

31:36

And this is what I want to make explicit that that's a way for a

31:39

sustainable tech debt management.

31:41

And the other thing that you're implying is that there is.

31:43

Engineering understanding of tasks that we need to tackle.

31:47

So, so this is a great point and now comes the big question.

31:50

Okay.

31:51

Which one of the fires do we turn our attention to?

31:54

And, it's basically a decision making exercise and there are

31:57

a lot of tools that can work.

31:59

One very simple one that I can explain and I think is adaptable.

32:03

Well, here is the Eisenhower matrix for decision making, you know, it's a

32:08

two by two, um, Grid on the horizontal axes, you have urgency on the vertical,

32:15

you have, uh, impact, and then you can group your tasks into these

32:21

four quadrants, you're going to have urgent, but not impactful, tasks.

32:26

You're going to have impactful, but not urgent tasks, and

32:29

then you're going to have.

32:31

and impactful and not urgent, not impactful.

32:34

My point is that you should have a portfolio approach

32:37

because the big risk is.

32:39

you want to avoid that you always pay attention to the urgent

32:42

stuff because you feel the pain now and you always deprioritize

32:45

the impactful non urgent ones.

32:48

Jeremy: Okay.

33:04

Péter (2): have some new API commands.

33:06

And we're going to have to.

33:07

deprecate some of the current ones that you're using.

33:10

It's not urgent at all.

33:11

You have a year to think about this, but it's going to be super impactful

33:15

if you don't meet these deadlines and you're not ready with your API by then.

33:21

So my advice is a portfolio approach.

33:24

Make sure that for example, every quarter you tackle some of the.

33:28

Impactful tasks that you have and then balance it with the urgent ones.

33:33

Also, this framework helps to realize that there are some tabs that are

33:38

not urgent and not impactful, I don't know, code styling, tabs

33:44

versus spaces, this kind of stuff.

33:47

Maybe you can just defer them to a hackathon or, some, backlog , of

33:53

we might address one day.

33:54

Uh, not, not put too much effort in it.

33:59

Jeremy: I like that.

33:59

And I think there's this kind of balancing approach that you described,

34:04

between features and tech debt and , choosing which ones to do, and

34:08

there's another approach, which is, like the kind of Maslow's, , hierarchy

34:12

of needs , pyramid approach, which.

34:15

Pairs well with kind of a zero bugs approach, which is essentially on the

34:20

bottom of your foundation is things like security and compliance and so on.

34:24

So anytime that you have something that, you know, is not secure or

34:29

updating a library or whatever, you do those, all of those things first,

34:33

and then you do things that affect reliability and stability performance.

34:40

And then it's only when your pyramid is stable, and then you do the top parts,

34:44

the, the delivering value to the users, as a new functionality or functionality

34:49

that helps the business grow.

34:51

And, I think, there's no one right approach.

34:54

I do have a personal preference for the kind of zero bugs, um, pyramid approach.

34:59

I really.

35:00

hate carrying stuff in the backlog, maybe I also hate backlogs,, uh,

35:04

there's another topic for another day, but managing bugs and so on.

35:08

But, I really think , if you're say you're going to do it , shouldn't keep

35:11

it on the long finger, as we say in Ireland, it's like you keep it somewhere

35:16

far away, reminding yourself that you need to do it, but you never get to it.

35:20

Um, so I really like, um, this kind of.

35:25

Pyramid prioritization approach, but I think

35:27

Péter (2): talk a bit about this, like, you're working with a system that has a

35:30

lot of, uh, ancient stuff, a lot of things that maybe you will decommission later,

35:35

You have 359 bugs in your ticket systems.

35:39

How do you implement this system where your foundation is always in a good place.

35:46

And before that solid, you don't do anything else on the upper levels.

35:51

Jeremy: So the ideal scenario for this is when you're building

35:55

like, in a startup, you start like this and you continue like this

35:59

and you're able to maintain the pyramid in a stable way.

36:03

Right?

36:04

Um, that's that's how the best , approach is, but , then in the situation you're

36:10

describing, which is a lot more legacy and debt that then you need to see this

36:18

as the direction you want to end up in, and you need to work your way towards

36:22

that, making those migrations and all the different choices, towards that, like

36:27

the scenario I was in where I use that example about the manual deployments.

36:31

And so we carved out time.

36:33

To get to deployments to be automated and, this is like an 18 months, two year

36:38

process to get to a different, more stable part and different bits of your system

36:43

will arrive at different points in time into that place where you can operate

36:48

with a zero bugs approach in that area.

36:50

Um, you also are going to have to have a tolerance for more known issues.

36:55

Potentially and just have more documented, like just the whole

36:59

way that you approach things.

37:00

You know, we talked about gardening and thinking in the future and stuff.

37:04

That's what you want senior people to do is map a route to this place.

37:09

What is not acceptable is staying in that,

37:12

um, rocky tech debt situation with lots of fires constantly happening.

37:16

Péter (2): Yeah, I like that you brought up the startup approach because they

37:20

seem like they have the ruxury of a greenfield project and they can set all

37:24

those foundation levels and pay attention.

37:26

But oftentimes startups.

37:29

say that our only goal is to survive till we can validate or

37:32

MVP or have a product market fit and everything else is secondary.

37:37

And I don't care about uptime.

37:39

I don't care about anything.

37:40

So those works against the foundation.

37:44

I think how you can resolve this.

37:45

This conflict is.

37:48

on making the foundation very minimal, what are the few handful of

37:53

key differentiators that you don't compromise on, for example, in a security

37:58

company, the protection of your users, data, whatever product experiments

38:03

you're doing in a desperate attempt to get some funding or validation.

38:08

You don't compromise that.

38:09

That's the foundation.

38:10

And if there are problems on that area, you throw away

38:13

everything and fix those first.

38:15

So I think a good way to implement this pyramid approach, which I really like

38:20

is to spend some time and don't get too optimistic about what you put on the

38:26

lower levels and just the key stuff.

38:28

Jeremy: Yeah, obviously, in that startup place the danger that

38:31

people do in those startup scenarios as they build too many things.

38:36

So I think you should outsource a lot to sass.

38:43

Péter (2): Yes.

38:43

Jeremy: And not try and build and run on.

38:46

I've written a blog post that about don't use Kubernetes in those

38:50

situations, unless there's a specific reason why you really need to use it.

38:54

Use, I don't know, fly.

38:55

io or one of these other platform as a service offerings, minimize what

39:01

you do, to focus on the value that you can really add, remove, outsource

39:06

all the undifferentiated heavy lifting at some point in the future.

39:10

If you're successful, you might need to bring some of those in house to

39:13

actually start to differentiate.

39:16

This is where a lot of tech debt is accumulated in companies

39:19

because they overbuild, in the early stages and they make their

39:22

surface area of the product too big.

39:25

Péter (2): Yeah,

39:25

Jeremy: Um, Uh, Uh,

39:29

Péter (2): founder shared their learnings.

39:32

It was amazing it was very honest and very good.

39:35

They were saying the same stuff.

39:37

Like You can focus on your core differentiator, be very

39:43

comfortable with manual work.

39:45

Also, like it was some, uh, sports race video app, whatever the founder

39:51

was manually editing videos for their subscribers, because they didn't want to

39:55

invest to any kind of automation there too early and the concept they brought

40:00

in was Similar to minimum viable product is the MVB, the minimum viable business.

40:06

And, and this is what the startup focuses on only to, to get the validation, try out

40:11

all the experiments and see what sticks.

40:13

Investing in a solid architecture, in this period is a waste of

40:18

time, 99 percent of the time.

40:20

Because you don't know what your

40:21

going to do.

40:22

Jeremy: So this is it.

40:23

Uh,

40:24

Péter (2): we went a bit

40:25

Jeremy: no, but I think it's really interesting because this loops

40:28

back to what you said at the very beginning describing tech debt and,

40:31

as we summarize things, I really like how you said past decisions.

40:36

And we talked about how the earlier and better the decisions you can make

40:41

the quality will be better and you'll have more options in the future.

40:45

And, um, so I'd, I'd love for you to kind of wrap up this whole thing now going

40:51

down to the engineering manager level.

40:52

Péter (2): Okay.

40:53

So, so takeaways, don't talk about tech debt in the engineering context, only talk

40:57

about it in the business context, make the business pain visible to stakeholders.

41:03

Understand that tech debt is okay.

41:06

Some level of tech debt, learn how to live with it.

41:09

have a plan how you're going to address it and how you're going to get out of it.

41:13

Um, data, if you're talking about data, things you measure, time spent at various

41:19

areas incident numbers, everything you're arguing about where you want

41:23

your team and your organization to be.

41:25

And not about what you believe is good code or bad code a healthy culture.

41:30

You should have disagreements in your teams, but you should resolve those

41:34

disagreements on the engineering side about what is really a tech debt and

41:38

what are your plans to address it.

41:40

And finally, some balanced approach to prioritization.

41:44

Don't.

41:45

Go into half a year, we stop every feature development kind of

41:49

refactorings because those very rarely work out well in my experience.

41:54

Jeremy: Yeah, just a tip, assign if you know you're doing it well, the product

41:57

manager is the one that prioritizes working on tech debt, because you've

42:02

made it visible enough the business is recognizing the need to invest in it.

42:09

Péter (2): I love that.

42:09

You have a good relationship with your PM and you manage to convince

42:13

them that it's going to hurt them if we don't address this like that, or

42:18

you manage to find a way to pair it with the feature work, then you won.

42:23

you can do it.

42:23

Yeah.

42:24

Jeremy: Yeah, brilliant.

42:26

Great topic, Peter.

42:27

Uh, thanks very much.

42:29

And to our listeners, thank you.

42:30

Would love some feedback and, , really appreciate some of the messages we've had

42:34

in from the, some of the last episodes.

42:38

Of course it would be great if you, fed, the algorithms, to help

42:41

us , be more visible, but more than anything, it would really

42:44

appreciate if you found this valuable.

42:47

That you would share it with others that you think might , appreciate

42:50

, what we're talking about today.

42:52

Péter (2): Yeah, thank you very much and see you or talk to you in two weeks.

42:57

Jeremy: Awesome.

42:58

Yeah.

42:58

Péter (2): Bye.

S2E05 - Understanding and Working With Tech Debt

Episode description

Persons