S2E05 - Understanding and Working With Tech Debt
S02:E05

S2E05 - Understanding and Working With Tech Debt

Episode description

In this episode, we discuss the topic of technical dept. We start with the basic definition, then share techniques about measuring, arguing about and addressing tech debt. A key concept is connecting the issues of technical dept to business goals, to show the concrete pain it causes to organizations. We also discuss prioritization tips and various good and bad examples we saw in our experience.

Download transcript (.srt)
0:02

Péter (2): Hey everyone, welcome to the retrospective, the engineering

0:06

leadership podcast where we talk about topics for engineering manager that

0:11

they face in their day to day jobs.

0:14

me here is Jeremy and I'm, I'm Peter.

0:17

Let's get into it.

0:19

Jeremy: Hey Peter, it's great to be here today and I'm really enjoying

0:24

Season 2 and the topics we're covering.

0:26

Péter (2): Same.

0:27

Jeremy: So for the, for the listeners, just the way that we've been planning

0:30

this season is that we've been taking turns bringing a topic each.

0:34

So this week it's, it's your topic.

0:36

So, what are you bringing today for us to talk about?

0:39

Péter (2): I have a nice, beautiful surprise for you,

0:41

which is called TechDept.

0:44

think it's a favorite of a lot of engineers and engineering managers

0:47

and probably business leaders also, not necessarily in a good way.

0:52

We're not going to solve TechDept in 30 minutes, but what we can do is share

0:58

our combined experience on the field.

1:00

And, uh, yeah, let's talk about TechDept.

1:02

Yeah.

1:06

Jeremy: um, partly because in a lot of companies where there are,

1:09

I think we're in, you know, we, we hear the end of zero interest.

1:13

We hear a lot of focus on, companies that are having to do more with less.

1:18

There's a, there's a high pressure, especially in startup land and a

1:23

lot of pressure to ship more in a very competitive environment,

1:27

certainly that's something that we're, I'm experiencing in my job.

1:31

And, um, at the same time we have to manage this mystical

1:35

thing called tech debt.

1:36

And, uh, so I think it's a very timely topic for the, for the

1:40

period of time we're in right now.

1:42

Péter (2): Yeah, and another aspect why I think it's super

1:45

timely is AI assisted coding.

1:47

I'm pretty sure that in every companies there are a lot of code checked in

1:51

the code base that was written by or with the huge assistance of AI.

1:57

if it doesn't work so well, that's going to increase your tech debt.

2:01

Big way, and maybe not so obvious ways, uh, initially.

2:04

Jeremy: Actually any interesting anecdote, Google said that 25

2:07

percent of code at Google is now written by AI and, um, where are you?

2:13

Péter (2): like 25 percent of what, like, uh, yeah, sure.

2:17

The line breaks, the prints, the, the, the returns, the functions,

2:22

there is a lot of text in code that's, that could be easily automatized.

2:26

Jeremy: I think it's auto complete.

2:27

I think it's just fancy auto complete.

2:29

Um, and we've seen that.

2:31

Péter (2): underplay, it's super

2:31

Jeremy: Yeah,

2:32

Péter (2): but yeah,

2:33

Jeremy: yeah, we've seen that we're using an AI, , plugin, called Codium

2:37

and it gives us stats and it says it's done 40 percent of coding for us.

2:40

But I think most of it is just a fancy auto complete.

2:44

That you're going to be doing anyway, it's kind of like, autocomplete in, Grammarly

2:48

or in, I don't know, in Gmail when it's completing sentence this that you would

2:51

have written anyway, that's, yeah,

2:54

Péter (2): Yeah.

2:55

Saving time and as a side effect might increase your tech debt.

2:58

Jeremy: exactly because unintended things creep in, but, yeah, so maybe if you can

3:04

help us level set things a little bit, Peter, how would you define, tech debt?

3:08

Péter (2): Yeah.

3:09

Yeah.

3:09

I actually thought a lot about that and I came up with a long, uh, definition,

3:16

long sentence, but, uh, I will unpack it.

3:19

So my definition of tech debt is our best decisions, negative effect.

3:24

on the quality of our systems, limiting our options today.

3:28

There's three aspects to that, past decisions.

3:31

I wanted to capture that, uh, with tech, you created tech debt.

3:35

Tech debt is, is, is something that you live with today, but it was your past

3:39

decisions that resulted in some tech debt.

3:42

I want to start here by explaining that,

3:45

Jeremy: Um,

4:13

Péter (2): code health in, in, in a lot of different ways.

4:16

And systems can mean anything, technical, your code, your infrastructure, maybe

4:22

even your processes, CI, CD pipelines,

4:24

Jeremy: Um, Um,

4:47

Péter (2): options.

4:48

You can have, uh, you cannot cut costs so easily.

4:52

You cannot pivot to a different feature.

4:55

You cannot re architecture your system to today's needs so easily.

4:59

So it's basically less less options you have in in in what you're doing.

5:06

Jeremy: Yeah, I really like that definition, Peter, because yeah, we, we

5:10

do need to on purpose take decisions.

5:13

And in fact, tech debt can accumulate from very sensible decisions where we

5:17

choose to, be pragmatic architectures, not overbuilding for the future.

5:23

And then over time as things scale and grow, we need to constantly adjust.

5:28

For that.

5:49

Péter (2): that you don't want to miss and you cannot have, have it all.

5:52

It's always a compromise.

5:53

So you will need to compromise on the quality and increase

5:56

the tech depth of your systems.

5:58

I like the mortgage analogy taking out a big loan to be able to buy a

6:03

house where you can live rent free.

6:06

Oftentimes it's a good decision, arguably, and, it limits your options today.

6:10

You, you cannot, maybe you cannot buy a new car for a year now.

6:14

Maybe you cannot go on a big vacation for a while because you keep on have to

6:18

pay back, uh, this monthly installments.

6:20

So yeah, tech deft is not necessarily bad.

6:24

important thing is it should be conscious.

6:26

The result of conscious Decisions and not just happenstance and accidents.

6:33

Jeremy: Yeah, yeah, I'm thinking about this also from explaining

6:36

it to someone in the business.

6:38

Think your mortgage analogy can work really well.

6:41

I think another way to describe it is, you know, you used an example earlier

6:44

where we We try, cut some corners to try and get something out the door fast.

6:51

I was thinking about this one where you have a plumber that, comes and fits a new

6:55

sink and they don't actually have all the pipes, so they fit something in cardboard.

7:00

That's very weird analogy, but we do do this in tech, um, but if

7:04

it's some cardboard pipes and some tape stuff up and that can work.

7:08

Potentially temporarily, um, maybe not for too long, but if you increase

7:12

the pressure or if you let it run for too long, then that's going to fail.

7:17

And oftentimes that's the kind of tech debt that we, we do.

7:20

We build something just enough to ship, but then that cardboard needs to be

7:25

replaced, and put in real pipes and so on.

7:27

And that's again, again, the kind of classical approach

7:30

to tech debt that we have.

7:31

You know,

7:31

Péter (2): Yeah.

7:32

That's, that's, that's a good analogy.

7:33

Cause cause Using cardboard piping, whatever that is, it's, it's still

7:38

better than having no pipes at all and having water coming off

7:42

your sink to your feet and, and

7:44

Jeremy: yeah, or potentially a sink that doesn't work for a bit,

7:47

but you really need it, you know?

7:48

Péter (2): yeah, yeah, exactly.

7:49

Jeremy: Yeah.

7:50

Okay.

7:50

Have you come across any other good ways to explain this to the business?

7:54

Because I think one of the big challenges with tech debt is just

7:58

actually explaining the concept to business stakeholders and even users.

8:02

Honestly, sometimes we have to explain to users why things are taking longer.

8:11

Péter (2): any approach works that connect the limited options that I

8:18

was talking about in the definition to the business, business dictionary.

8:23

We have limited options in quality, stakeholders will feel the pain when

8:28

it takes two days to get back from an incident as a result of some tech

8:32

debt in your processes on pipelines or infrastructure, for example.

8:38

Stakeholders don't care about, the code health, the duplications, the redundancy,

8:43

everything that makes code problematic.

8:46

But they do care about quality.

8:48

They do care about costs.

8:49

They do care about, uh, user satisfaction.

8:54

You as an engineering manager, the listener of this podcast, your

8:57

job is to translate between these stakeholders and engineering, and make

9:03

the connection the impact of that in in the business goals and outcomes.

9:10

Jeremy: Yeah, I really like that.

9:11

Just connect the lack of options you have today and the negative effect that

9:19

past decisions had on the quality and explain that using different analogies.

9:25

Péter (2): Yeah.

9:26

Yeah.

9:27

Jeremy: Yeah.

9:52

Péter (2): If the business is in a big cost cutting exercise.

9:55

You can tie your tech debt initiative to that and focus on the infrastructure.

9:59

Maybe, uh, where you can decommission some old services by moving them to

10:05

two newer ones and save some money.

10:07

It's, it's, it's not a fight.

10:09

It's a discussion between, between the business and the engineering side.

10:13

Jeremy: Yeah, I agree.

10:14

So how as a team, when you're a team, how do you tell when

10:18

you have too much tech debt?

10:19

How do you measure, how do you detect it?

10:22

Like, what's, what's your, what are your thoughts on, on that?

10:25

Péter (2): Yeah.

10:26

I really like that you use the word measure because talking about data

10:30

is, uh, much more productive than talking about feelings and beliefs.

10:34

I'm pretty sure that there are a lot of engineers who are frustrated

10:39

about tech debt with the code they work with, but some of those people

10:43

will probably struggle if they have to point to some numbers, what they

10:47

mean by tech that, uh, exactly.

10:49

So fortunately we have a lot of, good metrics in the industry that

10:52

can be used and satisfy the need to connect this assessment to business.

10:58

What, what we were talking about, metrics can, can come as hand a pretty big

11:04

industry standard 2024, when we are.

11:08

Talking about this, especially metrics like the lead time for changes or

11:13

or failure rate or something that can tell a lot about the system.

11:18

The time to restore service.

11:19

I really like this metric because there are a lot of things that can impact this

11:23

metric and a lot of those are tied to tech depths in various parts of your

11:28

system's infrastructure code or processes.

11:31

So you can use Dora metrics, you can be more direct and measure some

11:37

quality metrics number of bugs or number of new bugs creating the last

11:41

three months, response times, of your systems, like more performance metrics.

11:46

It really depends on the organization.

11:48

What are the goals of your organization?

11:51

What is the business trying to achieve and finding those metrics and , bringing

11:56

in some depth, uh, aspects can, can help.

12:00

Um,

12:01

Jeremy: I think the other part for me is, is toil, Not fun stuff that you have to do

12:07

and I have a really great example of this.

12:09

There was a infrastructure engineer, in one of the companies I worked

12:13

at, where we had a very manual process to deploy the application.

12:16

And he recorded a video of what the deployment process was, and it was

12:22

over 300 clicks and, manual things taken to get a whole thing live.

12:28

And obviously that was to explain what was needed to be done and to

12:34

show very vividly the tech debt.

12:36

And obviously we invested heavily in automation and, we got to the point

12:40

where it was just click a button.

12:42

The whole thing was done automatically.

12:44

So

12:45

Péter (2): I really love that story.

12:47

Uh, the tool this engineer was using recording themselves doing this actual

12:52

process, it's very powerful and yeah, I don't need to say it, but, uh, those

12:57

are 300 ways this process can break.

12:59

I'm pretty sure.

13:01

There were a few occasions when some of those broken.

13:04

Jeremy: all the time, because how do you remember and that process was

13:07

already an improvement because the original one, when that company was

13:11

acquired, the first time it was all deployed from one person's laptop.

13:15

So at least

13:16

Péter (2): wow.

13:17

Jeremy: the process that he videoed was being able to be

13:20

run from multiple places, but,

13:22

Péter (2): you virtualize that guy's laptop?

13:25

Jeremy: No, but, uh, yeah, I mean, you just invest.

13:28

Yeah, I think sometimes it's really, really obvious.

13:32

It's like the elephant, um, and you know, sometimes it's, it's less obvious.

13:36

So depending on the situation you have different kind of ways

13:40

of checking and tracking it.

13:42

Péter (2): It's important.

13:43

You say sometimes it's really obvious.

13:44

Of course it is for an engineer.

13:46

And I would say even for a non engineered leader seeing that video of

13:49

the 300 steps that it takes to deploy the application, it's very obvious.

13:54

They will say, Oh, my God, I feel your pain.

13:56

I promise you we are done with this sales cycle.

13:59

We are done with the next product launch.

14:01

You can, but they don't Actually feel the impact on the business because

14:05

what they are walking away from the demonstration is that, Oh, poor Joe.

14:09

He has to go through 300 steps every day.

14:11

That's not the impact.

14:12

The impact is all the ways this can break and, and all the ways you will

14:17

struggle to release a new feature when, when something is changing this process.

14:22

And that's what the business feels that the users don't get the new feature.

14:25

Features take longer.

14:26

Features are buggy incidents are happening.

14:29

So I would say even when it's obvious.

14:32

Think about the users of your systems and the business stakeholders and

14:36

make it obvious for them the pain, what the option limitations are.

14:40

Jeremy: And actually, in this case, the users were being impacted because things

14:46

would go wrong in that process, as you say, and there were regularly fires.

14:51

So that was easy.

14:52

It was an easy sell.

14:53

Péter (2): yeah, yeah.

14:54

Okay.

14:54

Jeremy: Those are the obvious situations, but what about in a situation where

14:58

it's, it's less obvious and there's more of a disagreement about tech debt.

15:01

How do you come to an agreement then inside a team or between a team and

15:06

its stakeholders that you need to invest on tech debt or that it's

15:10

acceptable to leave it as it is.

15:12

Péter (2): That's a really good point because because we were talking about

15:14

convincing the business, but oftentimes you need to convince your engineering

15:18

peers that we should address this type that and not that type that.

15:22

What you can do is I like to think in teams and not in individuals.

15:26

So in this case, I would suggest a practice something like attack

15:31

that tour like every other sprint or some regular occurrences.

15:35

You pick a different engineer and they hold a short 15 minute walkthrough of

15:40

the part of the code base or the part of the processor or infrastructure

15:45

that they are most concerned about.

15:46

They explain just like this 300 step video.

15:49

That's a perfect example for

15:50

Jeremy: Silence.

16:12

Péter (2): base.

16:13

Maybe we could address this together.

16:14

So elevating the understanding and the knowledge of the team, with these

16:20

regular tech debt tours, I think it can be a good first step to, to create

16:25

alignment on the engineering side, that this is the most impactful or most

16:30

urgent thing that we want to address.

16:32

In the next time, and then you can tie to the business goals and

16:37

impact.

16:38

Jeremy: no, for me, I think it's really important when we talk about the team here

16:41

that the product manager is involved in that because, , often I think engineers

16:46

can quickly see and agree on on things when it's made more visible like that.

16:52

But product manager it's probably caring less about the tech debt

16:57

and they still have a lot of the concerns of the business.

16:59

And , it's super important that they understand the impact on the

17:02

team to be able to protect the team when you take a decision to work

17:08

on tech debt and, um, they need to be part of that, group as well.

17:13

Péter (2): I totally agree.

17:14

And thanks for bringing this up.

17:15

I'm a bit biased because I was leading platform engineering teams most recently.

17:19

And I didn't have product managers in my teams.

17:22

I had a very good product partner on the level, but it's super important

17:27

what you're saying to include the product side in this alignment off of

17:31

seeing the problem, because oftentimes when priorities are decided, you're

17:36

not even going to be in that room.

17:37

It's the product manager of the team who's going to have to fight for for

17:41

the engineering side also to make the pain visible and get some leadership

17:47

buying and stakeholder buying to address.

17:49

Some of the tech devs, so definitely include the PM in these discussions.

17:54

Jeremy: Yeah.

17:54

And I think, um, going back to the point you made earlier about

17:58

data and measuring and so on.

18:00

The data bit is really important.

18:02

In the past, , we've done things like measuring the types of work or the

18:06

signals of not having invested on tech debt, like whether it's an increase in

18:10

bugs, , and some of the other metrics that you mentioned earlier, but I

18:14

think those are, those are important signals as part of this discussion about

18:19

aligning on, on tech debt and what to do.

18:22

Péter (2): they can be very powerful because they bring the discussion to

18:25

principles to how we want to operate.

18:27

For example, if you manage to track where you're spending your time, is it feature

18:32

development, unplanned work, and then the organization can have an ideal and

18:38

say, okay, I want to spend numbers 80 percent of time on feature development

18:44

and 20 percent or everything else.

18:46

And you come back with metrics and say in the last two quarters, we could only spend

18:50

60 percent of time on feature development.

18:53

And that's the impact of the tech that we are having.

18:56

So we would like to double down on that in the next quarter so we can get this amount

19:00

of time closer to the ideal 80 percent that we set as a goal as an organization.

19:06

Again, you're bringing the discussion.

19:09

to talk about interpreting data and impacting data and not about

19:14

philosophies like code should be beautiful and, uh, don't know, or,

19:19

of thing approaches that are not useful in business discussions.

19:23

Jeremy: Exactly.

19:23

And I will say the classic thing that a product manager or a business person,

19:28

uh, will do is say, listen, I just need you to deliver these features.

19:33

And, uh,

19:35

I promise you, I just need it for this next six months because it's super

19:39

crucial for our business right now.

19:41

And then we can do some of those other projects that you have that you talked

19:44

about, but no, we can't do those.

19:46

And we need to do this.

19:48

And then six months later.

19:49

Business, still in a crunch and the platform starts to burn

19:53

more, but you need to continue.

19:56

So that starts to make me think about, what are good strategies

20:00

for managing tech debt?

20:02

Are there different approaches?

20:03

What would you recommend, Peter?

20:05

Peter.

20:05

Silence.

20:24

Péter (2): but your, your house is not on fire.

20:26

So this is, this is a critical stage because you need to maintain this

20:31

and maybe gradually improve this.

20:33

One approach I find useful in these situations is pairing tech

20:37

dev work with feature development.

20:39

It's basically, uh, you know, there is this, I think it's from

20:42

the Boy Scouts that, leave your camping site cleaner than you found.

20:45

I think it applies to code and everything as an engineering team touch.

20:50

Like, uh, you develop a feature about, I don't know, uh, newsletters, then take a

20:57

look at how email sending is implemented and maybe refactor it slightly.

21:01

To, to make it healthier, do some small refactors that are

21:05

adapting it better to today's needs compared to yesterday's needs.

21:09

So pairing this work with feature development has two big benefits.

21:14

One is that you're in the context and mindset of that current feature.

21:19

You don't

21:20

to dig through documentation.

21:21

What is it that we are doing here?

21:23

Oftentimes.

21:25

is your documentation and nobody knows why we are doing things the

21:29

way we are doing it, but that's how it's implemented in the code.

21:32

If you're touching a feature like this that has a lot of historical parts

21:37

implemented, you can discuss with your product manager and remove some of those

21:41

while doing the feature development.

21:43

Jeremy: You

21:56

Péter (2): bit longer for, for the completion of the

21:59

feature, because you keep

22:01

Jeremy: Silence.

22:16

Péter (2): a lot of discipline, but it can be very efficient.

22:20

Jeremy: Yeah, I think the discipline is to let the scope creep of the

22:24

tech debt refactoring overtake too much that the situation.

22:28

Um, but I mean, I've, I've seen a really good example of that

22:31

recently where a staff engineer was working in the code base.

22:35

They found.

22:36

Some part of it that had a lot of toil to update versions.

22:41

And so they improved how we used the renovate bot to

22:44

better do that in the future.

22:46

classic fixing debt as you go approach.

22:49

Péter (2): Yeah, now you mentioned risks and it's very important.

22:53

Getting carried away on and refactoring stuff that are

22:55

not necessarily in the scope.

22:57

And, and then you didn't deliver the feature, but you're already

23:00

two weeks late and the other risk I would say is premature optimization.

23:05

It's when you think that, oh, there is this new feature newsletter sending,

23:11

maybe I should have a separate service that handles emails and maybe I should

23:15

have it done with an auto scaling microservices infrastructure and then you

23:19

start working on those and then Next week, the product decision is that actually,

23:24

which is going to use a third party for newsletters because it scales better

23:28

and all those work, all those premature optimizations was a waste of time.

23:32

So

23:33

Jeremy: Exactly.

23:34

Péter (2): Only solve today's problems.

23:35

But still paired the tech depth work with feature development.

23:39

Jeremy: It's really funny because I really like this example.

23:41

You just used the newsletter, sending thing because that's a

23:46

really good pragmatic decision.

23:47

It should be made like that.

23:48

Most situations.

23:50

Um, but the funny thing is that might become tech debt later

23:53

when you realize that, um, Yeah.

23:55

I know we're not going to use that external service and we are going,

23:58

we're going to need to scale it more.

23:59

And that's perfect.

24:01

You know, that, then that's a good moment to work on, on changing it.

24:05

So that's just a good example of how

24:08

choice now.

24:08

And then it's just a loop, you improve later.

24:11

You'll need to improve it later.

24:12

Péter (2): Yeah, you cannot solve tomorrow's problems because you

24:16

don't know tomorrow's context.

24:17

This is exactly why you need to deliver stuff quickly to get the

24:22

validation and feedback from your users.

24:24

And those information will affect your decisions tomorrow.

24:28

And maybe you will double down on in house built infrastructure.

24:32

Maybe you will pivot to external.

24:34

don't know.

24:34

Nobody knows.

24:35

That's the point.

24:36

You just solve today's problems.

24:39

Jeremy: Exactly.

24:40

Now those are all the kind of like the small ongoing things.

24:43

Péter (2): the big one.

24:44

Jeremy: Yeah.

24:45

Sometimes you have something really big, like that example of

24:48

the manual deployment process or others, how do you handle those?

24:53

Péter (2): I read it in, I think in one of Will Larson's books is that

24:56

the only solid, reliable way in these situations is a migration to a new system.

25:02

There is a point where your old code base, your old system is just

25:05

so Hard to work with so ridden with tech debt that it's easier to throw

25:10

everything out and start from scratch.

25:12

Now, this is not how you should do it.

25:13

And we can talk about how you should migrate to a new system.

25:17

But the goal should be this.

25:19

You leave behind the old mess because you save a lot of time trying to

25:23

discover why is it working like that?

25:26

What are the features that are still needed?

25:28

What is not?

25:29

It's, it's easier to move on to a new system from, from scratch.

25:33

Jeremy: So I have two questions when you say that one is how do I

25:36

know that I need to migrate because I've seen cases where people have

25:41

migrated or doing a rewrite when they absolutely did not need to do it.

25:45

Okay.

25:55

Péter (2): wrote their ideal to do application or commenting system

25:59

and everything and you don't, yeah, I don't like the rewrite word, but

26:03

To answer your question, um, this is why we have good engineers.

26:07

And, I hear myself that this is not a satisfactory answer, but, uh, there

26:11

are points where you can see that the, the, the system needs to move on, maybe

26:15

actually, maybe the way I can answer this is the technique of how I think

26:18

you should do this kind of migrations.

26:21

You know, the strangler fig pattern, uh, those of us who are

26:25

listening, Might not be familiar.

26:27

It's the concept is coming from biology, uh, about some kind of plants.

26:32

The point of it is that you isolate parts in your system.

26:36

You have a big, complex, interconnected system and you isolate a part

26:41

in it that has less connections, ideally with the rest of the system.

26:45

You make those connections very clear and, uh, this isolation, within this

26:52

isolated part of the system, you do some refactoring that the rest of

26:57

the system doesn't see happening.

26:59

So if you take this approach, what you get is that, you can

27:04

basically work in a live system.

27:06

You avoid this big boom migration when you say that, okay, on Monday we start a new

27:11

system, we shot down the old one and there is two weeks off of fires everywhere.

27:16

You have a much more scalable and reliable migration process.

27:20

You also increase.

27:22

confidence in the quality of the system, because isolating those connections

27:28

between the system you selected and the rest of it, the best way to do it is

27:32

to write a lot of tests to ensure that your system is still returning the stuff

27:37

that the rest of it is, is requiring.

27:39

And once you're done with that, you have those tests in your CI CD

27:42

pipelines every time you make a change.

27:45

And then maybe you don't even need to decide if it's just a small

27:48

refactoring, or if it's an entire rewrite, because it's rarely very binary.

27:52

Oftentimes it's a scale.

27:54

So maybe you can do like you isolated one system, you rewrote

27:58

it in the modern, uh, way that you want to isolate another rewrote it.

28:03

And you reach a point where 80 percent of your system is, is, uh, called modern.

28:08

I don't want to get into deeper.

28:09

What is that?

28:10

But you see what I mean?

28:12

And 20 percent is still the old one.

28:14

And maybe you're fine with that.

28:15

Like maybe that 20%, it doesn't worth the effort to address the

28:20

majority of your system is fine.

28:21

And maybe you will decommission it in two years or, or whatever.

28:25

So actually, I think this is a good way to look at this instead of rewrite everything

28:30

from scratch, you take parts isolated and migrate those little parts one by one.

28:35

Keep on evaluating if.

28:37

You still need to pay attention to tag that or you reach the

28:40

point where it's okay to leave it.

28:42

Jeremy: Yeah, I think, I can probably list some bad reasons

28:48

why you shouldn't do it.

28:50

I have some examples.

28:52

There was one stage we had, um, A PHP, um, backend with a,

28:57

a modern JavaScript front end.

28:59

The, the PHP backend was actually very well written.

29:02

Structured a lot of tests, thousands of tests, things

29:05

were fast, CI was really good.

29:07

And the new , CTO.

29:09

Who knew Python, but didn't know PHP , he didn't land and there were challenges

29:14

with that PHP backend, but, uh, he decided that he wanted to re write, the backend

29:20

in Python and, that's a total reskilling of it and then, it was really funny

29:26

was the CTO decided that they didn't like view, which the front end was in.

29:29

So we did the front end and react., and, uh, you know those are the

29:34

wrong reasons to rewrite things.

29:36

I mean, There could be some valid business reasons to do that, but in those cases,

29:41

I really felt like, having known that system and worked in it and then seeing

29:44

those decisions after, um, I lived with things that I felt were suboptimal,

29:49

but I focused on the business value.

29:51

Whereas, those rewrites generated very little value for

29:54

the business, and obviously distracted the team significantly.

29:57

Péter (2): This is why it's very useful to back to our earlier point

30:02

is that you talk about tech that you should talk about the options.

30:07

It's limiting you and the business outcomes and how it's painful and

30:11

hurtful for the business to live with this tech that every day.

30:14

In your example, maybe I'm a Overassuming.

30:18

But, uh, I think what your team was optimizing for is the comfort

30:22

of the CTO with the tech stack.

30:24

And that's not something you spend months and months off developers time

30:30

and engineering time to optimize for.

30:32

And switching Both front end and back end, the wall stack, it's

30:39

super risky and super expensive.

30:40

And sometimes that's a good solution.

30:42

Like your product was pivoting a lot and you're doing a very

30:47

differently, then maybe that's a good opportunity to change that also.

30:50

Jeremy: I do think there has to be a real business value

30:52

justification for changing text.

30:55

Péter (2): be everything that should drive these decisions.

30:57

Cause ultimately the goal of the company is to make money, everything

31:00

else is just a means to those and

31:03

Jeremy: Exactly.

31:04

So thinking more , tactically , our podcast is really targeting

31:09

engineering managers.

31:10

Um, what.

31:11

Péter (2): there.

31:12

Jeremy: Yeah.

31:13

What advice or, and, how would you recommend engineers decide, what

31:17

tech debt to tackle in the sprint they're planning at the moment, or,

31:20

you know, the, the next month ahead, how should they prioritize that?

31:24

Péter (2): yeah, that's, I, I like that lot of stuff that you

31:27

don't mention in this question, which is implied, which is great.

31:30

Like, uh, is a healthy process of every sprint tackling some of the tech debt.

31:36

And this is what I want to make explicit that that's a way for a

31:39

sustainable tech debt management.

31:41

And the other thing that you're implying is that there is.

31:43

Engineering understanding of tasks that we need to tackle.

31:47

So, so this is a great point and now comes the big question.

31:50

Okay.

31:51

Which one of the fires do we turn our attention to?

31:54

And, it's basically a decision making exercise and there are

31:57

a lot of tools that can work.

31:59

One very simple one that I can explain and I think is adaptable.

32:03

Well, here is the Eisenhower matrix for decision making, you know, it's a

32:08

two by two, um, Grid on the horizontal axes, you have urgency on the vertical,

32:15

you have, uh, impact, and then you can group your tasks into these

32:21

four quadrants, you're going to have urgent, but not impactful, tasks.

32:26

You're going to have impactful, but not urgent tasks, and

32:29

then you're going to have.

32:31

and impactful and not urgent, not impactful.

32:34

My point is that you should have a portfolio approach

32:37

because the big risk is.

32:39

you want to avoid that you always pay attention to the urgent

32:42

stuff because you feel the pain now and you always deprioritize

32:45

the impactful non urgent ones.

32:48

Jeremy: Okay.

33:04

Péter (2): have some new API commands.

33:06

And we're going to have to.

33:07

deprecate some of the current ones that you're using.

33:10

It's not urgent at all.

33:11

You have a year to think about this, but it's going to be super impactful

33:15

if you don't meet these deadlines and you're not ready with your API by then.

33:21

So my advice is a portfolio approach.

33:24

Make sure that for example, every quarter you tackle some of the.

33:28

Impactful tasks that you have and then balance it with the urgent ones.

33:33

Also, this framework helps to realize that there are some tabs that are

33:38

not urgent and not impactful, I don't know, code styling, tabs

33:44

versus spaces, this kind of stuff.

33:47

Maybe you can just defer them to a hackathon or, some, backlog , of

33:53

we might address one day.

33:54

Uh, not, not put too much effort in it.

33:59

Jeremy: I like that.

33:59

And I think there's this kind of balancing approach that you described,

34:04

between features and tech debt and , choosing which ones to do, and

34:08

there's another approach, which is, like the kind of Maslow's, , hierarchy

34:12

of needs , pyramid approach, which.

34:15

Pairs well with kind of a zero bugs approach, which is essentially on the

34:20

bottom of your foundation is things like security and compliance and so on.

34:24

So anytime that you have something that, you know, is not secure or

34:29

updating a library or whatever, you do those, all of those things first,

34:33

and then you do things that affect reliability and stability performance.

34:40

And then it's only when your pyramid is stable, and then you do the top parts,

34:44

the, the delivering value to the users, as a new functionality or functionality

34:49

that helps the business grow.

34:51

And, I think, there's no one right approach.

34:54

I do have a personal preference for the kind of zero bugs, um, pyramid approach.

34:59

I really.

35:00

hate carrying stuff in the backlog, maybe I also hate backlogs,, uh,

35:04

there's another topic for another day, but managing bugs and so on.

35:08

But, I really think , if you're say you're going to do it , shouldn't keep

35:11

it on the long finger, as we say in Ireland, it's like you keep it somewhere

35:16

far away, reminding yourself that you need to do it, but you never get to it.

35:20

Um, so I really like, um, this kind of.

35:25

Pyramid prioritization approach, but I think

35:27

Péter (2): talk a bit about this, like, you're working with a system that has a

35:30

lot of, uh, ancient stuff, a lot of things that maybe you will decommission later,

35:35

You have 359 bugs in your ticket systems.

35:39

How do you implement this system where your foundation is always in a good place.

35:46

And before that solid, you don't do anything else on the upper levels.

35:51

Jeremy: So the ideal scenario for this is when you're building

35:55

like, in a startup, you start like this and you continue like this

35:59

and you're able to maintain the pyramid in a stable way.

36:03

Right?

36:04

Um, that's that's how the best , approach is, but , then in the situation you're

36:10

describing, which is a lot more legacy and debt that then you need to see this

36:18

as the direction you want to end up in, and you need to work your way towards

36:22

that, making those migrations and all the different choices, towards that, like

36:27

the scenario I was in where I use that example about the manual deployments.

36:31

And so we carved out time.

36:33

To get to deployments to be automated and, this is like an 18 months, two year

36:38

process to get to a different, more stable part and different bits of your system

36:43

will arrive at different points in time into that place where you can operate

36:48

with a zero bugs approach in that area.

36:50

Um, you also are going to have to have a tolerance for more known issues.

36:55

Potentially and just have more documented, like just the whole

36:59

way that you approach things.

37:00

You know, we talked about gardening and thinking in the future and stuff.

37:04

That's what you want senior people to do is map a route to this place.

37:09

What is not acceptable is staying in that,

37:12

um, rocky tech debt situation with lots of fires constantly happening.

37:16

Péter (2): Yeah, I like that you brought up the startup approach because they

37:20

seem like they have the ruxury of a greenfield project and they can set all

37:24

those foundation levels and pay attention.

37:26

But oftentimes startups.

37:29

say that our only goal is to survive till we can validate or

37:32

MVP or have a product market fit and everything else is secondary.

37:37

And I don't care about uptime.

37:39

I don't care about anything.

37:40

So those works against the foundation.

37:44

I think how you can resolve this.

37:45

This conflict is.

37:48

on making the foundation very minimal, what are the few handful of

37:53

key differentiators that you don't compromise on, for example, in a security

37:58

company, the protection of your users, data, whatever product experiments

38:03

you're doing in a desperate attempt to get some funding or validation.

38:08

You don't compromise that.

38:09

That's the foundation.

38:10

And if there are problems on that area, you throw away

38:13

everything and fix those first.

38:15

So I think a good way to implement this pyramid approach, which I really like

38:20

is to spend some time and don't get too optimistic about what you put on the

38:26

lower levels and just the key stuff.

38:28

Jeremy: Yeah, obviously, in that startup place the danger that

38:31

people do in those startup scenarios as they build too many things.

38:36

So I think you should outsource a lot to sass.

38:43

Péter (2): Yes.

38:43

Jeremy: And not try and build and run on.

38:46

I've written a blog post that about don't use Kubernetes in those

38:50

situations, unless there's a specific reason why you really need to use it.

38:54

Use, I don't know, fly.

38:55

io or one of these other platform as a service offerings, minimize what

39:01

you do, to focus on the value that you can really add, remove, outsource

39:06

all the undifferentiated heavy lifting at some point in the future.

39:10

If you're successful, you might need to bring some of those in house to

39:13

actually start to differentiate.

39:16

This is where a lot of tech debt is accumulated in companies

39:19

because they overbuild, in the early stages and they make their

39:22

surface area of the product too big.

39:25

Péter (2): Yeah,

39:25

Jeremy: Um, Uh, Uh,

39:29

Péter (2): founder shared their learnings.

39:32

It was amazing it was very honest and very good.

39:35

They were saying the same stuff.

39:37

Like You can focus on your core differentiator, be very

39:43

comfortable with manual work.

39:45

Also, like it was some, uh, sports race video app, whatever the founder

39:51

was manually editing videos for their subscribers, because they didn't want to

39:55

invest to any kind of automation there too early and the concept they brought

40:00

in was Similar to minimum viable product is the MVB, the minimum viable business.

40:06

And, and this is what the startup focuses on only to, to get the validation, try out

40:11

all the experiments and see what sticks.

40:13

Investing in a solid architecture, in this period is a waste of

40:18

time, 99 percent of the time.

40:20

Because you don't know what your

40:21

going to do.

40:22

Jeremy: So this is it.

40:23

Uh,

40:24

Péter (2): we went a bit

40:25

Jeremy: no, but I think it's really interesting because this loops

40:28

back to what you said at the very beginning describing tech debt and,

40:31

as we summarize things, I really like how you said past decisions.

40:36

And we talked about how the earlier and better the decisions you can make

40:41

the quality will be better and you'll have more options in the future.

40:45

And, um, so I'd, I'd love for you to kind of wrap up this whole thing now going

40:51

down to the engineering manager level.

40:52

Péter (2): Okay.

40:53

So, so takeaways, don't talk about tech debt in the engineering context, only talk

40:57

about it in the business context, make the business pain visible to stakeholders.

41:03

Understand that tech debt is okay.

41:06

Some level of tech debt, learn how to live with it.

41:09

have a plan how you're going to address it and how you're going to get out of it.

41:13

Um, data, if you're talking about data, things you measure, time spent at various

41:19

areas incident numbers, everything you're arguing about where you want

41:23

your team and your organization to be.

41:25

And not about what you believe is good code or bad code a healthy culture.

41:30

You should have disagreements in your teams, but you should resolve those

41:34

disagreements on the engineering side about what is really a tech debt and

41:38

what are your plans to address it.

41:40

And finally, some balanced approach to prioritization.

41:44

Don't.

41:45

Go into half a year, we stop every feature development kind of

41:49

refactorings because those very rarely work out well in my experience.

41:54

Jeremy: Yeah, just a tip, assign if you know you're doing it well, the product

41:57

manager is the one that prioritizes working on tech debt, because you've

42:02

made it visible enough the business is recognizing the need to invest in it.

42:09

Péter (2): I love that.

42:09

You have a good relationship with your PM and you manage to convince

42:13

them that it's going to hurt them if we don't address this like that, or

42:18

you manage to find a way to pair it with the feature work, then you won.

42:23

you can do it.

42:23

Yeah.

42:24

Jeremy: Yeah, brilliant.

42:26

Great topic, Peter.

42:27

Uh, thanks very much.

42:29

And to our listeners, thank you.

42:30

Would love some feedback and, , really appreciate some of the messages we've had

42:34

in from the, some of the last episodes.

42:38

Of course it would be great if you, fed, the algorithms, to help

42:41

us , be more visible, but more than anything, it would really

42:44

appreciate if you found this valuable.

42:47

That you would share it with others that you think might , appreciate

42:50

, what we're talking about today.

42:52

Péter (2): Yeah, thank you very much and see you or talk to you in two weeks.

42:57

Jeremy: Awesome.

42:58

Yeah.

42:58

Péter (2): Bye.