Picture this: It’s Monday morning, you grab your coffee and pull up the next feature from your sprint backlog — CSV upload functionality. You lean back in your chair and think, “Easy enough, we’ll just batch insert into our Cosmos DB.” Fast forward a few days of implementation, and you’re staring at a progress bar that’s been crawling for 20 minutes while processing 550,000 rows, wondering where it all went wrong.
This is the story of how a seemingly simple feature request led us to fundamentally rethink our database architecture and migrate from Azure Cosmos DB to PostgreSQL. It’s a tale that many modern development teams will recognize — the tension between the flexibility promises of NoSQL and the performance realities of production workloads.
When we first chose Azure Cosmos DB for our application, the decision seemed obvious. The promise of schema flexibility and seamless scaling made it an attractive choice for our initial development phase. Cosmos DB’s document model felt natural for our use case and the serverless pricing model seemed perfect for our application. The flexible schema allowed us to iterate quickly on our data model without the overhead of migration scripts, and the managed nature of Cosmos DB meant one less infrastructure concern.
Then came the CSV upload feature.
The requirements seemed straightforward: allow users to upload CSV files containing sales data — approximately 550,000 rows per file, with uploads happening weekly for about six months. In today’s world, this isn’t particularly large, but for our Cosmos DB setup, it became a performance nightmare.
Our performance analysis revealed some sobering numbers:
The math was brutal. With Cosmos DB Serverless providing only 5,000 RU/s and no manual scaling options, our 6.8 million RU requirement translated to a theoretical minimum of 20+ minutes — and that’s assuming perfect conditions.
To fully understand why Cosmos DB Serverless is not ideal for bulk inserts, it helps to look at how the serverless model works.
Unlike the provisioned throughput model, where you can explicitly allocate and scale your request units (RUs) to match heavy workloads, the serverless model automatically manages throughput behind the scenes. However, this convenience comes with a trade-off: there’s a built-in upper limit to how much throughput a serverless container can handle — currently around 5,000 RU/s per container.
While Cosmos DB Serverless can theoretically scale up to 20,000 RU/s, this capability depends heavily on the volume of data stored in your container. In our use case, with our data volume and partition structure, scaling beyond 5,000 RU/s was not achievable.
Even though Cosmos DB can add more physical partitions as the data volume grows, this does not translate into higher available throughput in serverless mode. That RU/s cap remains in place, meaning your application can easily hit throttling limits during write-heavy operations.
According to Microsoft’s documentation, the serverless tier is intended for light or unpredictable traffic patterns — for example, workloads that see occasional spikes. It’s not designed for sustained, high-throughput scenarios like bulk data ingestion, where you need consistent performance and the ability to scale throughput significantly beyond that cap.
We explored the Autoscale option, but the economics were prohibitive. To handle our CSV processing requirements efficiently, we would need to provision at least 10,000 RU/s as a baseline, pushing our monthly costs above €100 just for the database — and even then, processing would still take around 10 minutes per file.
There are multiple reasons why Azure PostgreSQL is the better choice for our application.
With PostgreSQL, we can handle much larger batch operations efficiently, and the database can leverage optimized bulk insert operations that dramatically improve performance for our CSV processing use case.
The results were dramatic. What took 20 minutes in Cosmos DB now completes in under 30 seconds with PostgreSQL. Without Cosmos DB’s RU-based throttling acting as a speed limit, PostgreSQL can simply write data as fast as the underlying hardware allows, processing hundreds of thousands of rows with incredible efficiency. Another advantage is PostgreSQL’s elegant ``ON CONFLICT DO UPDATE`` clause, which allows us to insert new records or update existing ones in a single operation — something that’s possible with SQL but PostgreSQL makes it remarkably simple and efficient to use.
One of the most compelling aspects of migrating to Azure PostgreSQL was cost predictability. Instead of paying based on consumed Request Units (which can spike unpredictably during bulk operations), Azure’s pricing for PostgreSQL is based on compute and storage resources that scale manually.
According to our analysis using Azure’s pricing calculator, running a General Purpose PostgreSQL instance capable of handling our workload proved significantly more cost-effective than the equivalent Cosmos DB configuration needed to achieve comparable performance.
PostgreSQL on Azure checks all our security boxes without compromise. Azure Database for PostgreSQL provides built-in Transparent Data Encryption (TDE), support for customer-managed keys configured at the Azure level, and comprehensive backup and restore capabilities with 30-day retention. The service integrates seamlessly with our existing security infrastructure while providing the performance characteristics we need.
Beyond the technical advantages, PostgreSQL brought familiarity. Most developers already know SQL and have experience with relational databases and ORMs like TypeORM. New team members can be productive immediately without learning document database paradigms or RU consumption patterns. When solving problems, we can leverage decades of community knowledge and battle-tested patterns rather than navigating less mature NoSQL ecosystems.
A big challenge in our migration was rethinking our data modeling approach. Moving from a document database to a relational structure meant normalizing our data and establishing proper foreign key relationships.
In Cosmos DB, we stored everything as self-contained documents with user information embedded directly in each sale record. Moving to PostgreSQL meant normalizing this structure — separating users and sales into different tables linked by foreign keys, and making explicit decisions about data types and constraints.
Our original Cosmos DB document structure looked something like this:
// Original Cosmos DB document structure
export const SaleDataLineItem = z.object({
id: z.string().min(1),
schemaVersionId: z.number(),
companyName: z.string(),
saleDate: z.string().date(),
saleValue: z.number(),
createdBy: UserSchema,
createdAt: z.string().datetime()
});
export type SaleDataLineItem = z.infer<typeof SaleDataLineItem>;
For PostgreSQL, we needed to normalize this into a proper relational structure:
// PostgreSQL entity with TypeORM
@Entity({
name: 'sale_data_line_items',
})
export class SaleDataLineItemPg {
@PrimaryColumn('text', { primaryKeyConstraintName: 'sale_data_line_id' })
id!: string;
@Column('text', { nullable: true })
companyName!: AllowedCompanyName;
@ManyToOne(() => CompanyEntityPg)
@JoinColumn({
name: 'companyName',
foreignKeyConstraintName: 'FK_company_name',
})
companyName!: CompanyEntityPg;
@Column({ type: 'date', nullable: false })
saleDate!: string;
@Column({
type: 'numeric',
precision: 12,
scale: 2,
transformer: new ColumnNumericTransformer(),
nullable: false,
})
saleValue!: number;
@ManyToOne(() => UserEntityPg, {
eager: true,
deferrable: 'INITIALLY DEFERRED',
})
@JoinColumn({
name: 'created_by_user_id',
foreignKeyConstraintName: 'FK_base_created_by_user_id',
})
createdBy!: Relation<UserEntityPg>;
@CreateDateColumn({ type: 'timestamp' })
createdAt!: Date;
}
One of the concerns about moving from a schema-flexible database to a structured one was how we’d handle future changes. TypeORM provides built-in support for standard SQL migrations with up and down scripts, giving us structured change management:
// Example migration for adding a new column "VoucherId"
import { MigrationInterface, QueryRunner } from 'typeorm';
export class AddVoucherId1755855956054 implements MigrationInterface {
name = 'AddVoucherId1755855956054';
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`ALTER TABLE "vouchers"
ADD "voucherNumber" text NOT NULL`);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(
`ALTER TABLE "vouchers" DROP COLUMN "voucherNumber"`
);
}
}
This approach gives us structured change management while maintaining the ability to evolve our schema over time.
Our experience reflects a broader industry trend. While NoSQL databases gained significant traction in the early 2010s for their scalability and schema flexibility, recent data shows developers increasingly choosing relational databases like PostgreSQL.
According to Stack Overflow’s Developer Surveys, PostgreSQL has become the most admired and desired database among developers:
“PostgreSQL debuted in the developer survey in 2018 when 33% of developers reported using it, compared with the most popular option that year: MySQL, in use by 59% of developers. Six years later, PostgreSQL is used by 49% of developers and is the most popular database for the second year in a row.”
While some companies have switched from PostgreSQL to NoSQL databases, the broader trend shows increasing adoption of PostgreSQL across the industry. The key insight isn’t that NoSQL is bad or that SQL is always better — it’s about choosing the right tool for your specific use case and scale requirements.
After completing our migration, the results speak for themselves:
But perhaps most importantly, we gained confidence in our ability to scale. We now handle weekly CSV uploads without breaking a sweat, and we’re prepared for larger datasets if needed.
Our journey from Cosmos DB to PostgreSQL reinforced an important lesson: validate your database choice against real workload patterns early, not after you’ve built half your application around it. But the positive side is: By catching this mismatch during development rather than after launch, we still saved time and money.
That experience also gave us a broader perspective on database selection. The database landscape continues to evolve rapidly, with new solutions emerging regularly. But sometimes, the best path forward is choosing proven, battle-tested technologies that align with your specific needs. In our case, that meant choosing PostgreSQL — not because it’s the newest technology, but because it solved our actual problems without compromise.