We want to split a pdf file into pages. However, this division will be according to the data on the page. So let's imagine that in a class of 30 students, each student has data on a page and these pages are created with the students' ID numbers. Can we split each page and make the file names the students' ID numbers?

Separating a Pdf File into Pages

In the first case, we will extract a pdf file on the local disk from the file path and split it into pages. Then we will save the files to the local disk in the same way. For this we use the "iTextSharp" use the nuget package.

        protected void Page_Load(object sender, EventArgs e)
        {
            if (!IsPostBack)
            {

                string inputFilePath = "D:\\Yedek\\Class.pdf";
                string outputFolderPath = "D:\\Yedek\\";
                SplitPdfByPage(inputFilePath, outputFolderPath);
            }
        }

When the page is loaded, we determine the file paths and run the relevant function. Here the file "Class.pdf" is a 30-page document and each page contains information about the students.

        public void SplitPdfByPage(string inputFilePath, string outputFolderPath)
        {
            using (PdfReader reader = new PdfReader(inputFilePath))
            {
                int pageCount = reader.NumberOfPages;

                for (int page = 1; page <= pageCount; page++)
                {
                    string yeniDosyaAdi = "";
                    Document document = new Document();
                    _ = reader.GetPageN(page);
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                    string metin = PdfTextExtractor.GetTextFromPage(reader, page, strategy);
                    string desen = @"\b\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\b";
                    MatchCollection eslesmeler = Regex.Matches(metin, desen);
                    foreach (Match eslesme in eslesmeler)
                    {
                        string tcKimlikNo = eslesme.Value.Replace(" ", "");
                        string ilk10Hane = tcKimlikNo.Substring(0, 10);
                        int toplam = 0;
                        foreach(char karakter in ilk10Hane)
                        {
                            if(Char.IsDigit(karakter))
                            {
                                toplam += Convert.ToInt32(karakter.ToString());
                            }
                        }
                        char toplamSonKarekter = toplam.ToString()[toplam.ToString().Length-1];
                        char tcSonKarekter = tcKimlikNo[tcKimlikNo.ToString().Length - 1];
                        if (toplamSonKarekter == tcSonKarekter)
                        {   
                            yeniDosyaAdi = tcKimlikNo + ".pdf";
                            string outputFilePath = System.IO.Path.Combine(outputFolderPath, $"{yeniDosyaAdi}");
                            PdfCopy pdfCopy = new PdfCopy(document, new FileStream(outputFilePath, FileMode.Create));
                            document.Open();
                            pdfCopy.AddPage(pdfCopy.GetImportedPage(reader, page));
                            document.Close();
                            pdfCopy.Close();
                        }
                    }
                }
            }
        }

Let me briefly explain the code in the function. First, we take the source document and find out how many pages this document has. Then we switch between pages with a for loop. After logging in to the page, we do text elimination with "Regex". Here we extract 11 digit numbers. However, we will make a separate elimination here, considering that not every 11-digit number is an ID number. Identity. There are many ways to eliminate the ID number, but I chose the method of adding the first 10 digits to get the 11th digit. So I add the first ten digits of the ID number and take the ones digit of the result. If this number is equal to the 11th digit of the ID number, I say that this is an ID number.

Now that we are sure of the ID number on the page, we need to take this page and save it. We give the file name the ID number found and add ".pdf" at the end. We also save the file in the "D:/Yedek" directory I specified when loading the page.

Separating Pdf Files and Uploading them to Firebase

Now let's change our code a little more. The user uploads a multi-page pdf and we separate the pages in this document by allocating an ID number.

Let's add an upload area for uploading files.

<asp:FileUpload ID="pdfUpload" runat="server" />

Let's do the separation operations on the uploaded document.

if (pdfUpload.HasFile)
{
  foreach (HttpPostedFile file in pdfUpload.PostedFiles)
  {
    var contentType = file.ContentType;
    var fileStream = file.InputStream;
    var PdfURL = PdfYukle((DateTime.Now.Year + "-" + DateTime.Now.Month).ToString(), contentType, fileStream);
    using (PdfReader reader = new PdfReader(PdfURL))
    {
       int pageCount = reader.NumberOfPages;
       for (int page = 1; page <= pageCount; page++)
       {
         Document document = new Document();
         _ = reader.GetPageN(page);
         ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
         string metin = PdfTextExtractor.GetTextFromPage(reader, page, strategy);
         string desen = @"\b\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\s?\d\b";
         MatchCollection eslesmeler = Regex.Matches(metin, desen);
         foreach (Match eslesme in eslesmeler)
         {
            string tcKimlikNo = eslesme.Value.Replace(" ", "");
            string ilk10Hane = tcKimlikNo.Substring(0, 10);
            int toplam = 0;
            foreach (char karakter in ilk10Hane)
            {
               if (Char.IsDigit(karakter))
               {
                  toplam += Convert.ToInt32(karakter.ToString());
               }
            }
            char toplamSonKarekter = toplam.ToString()[toplam.ToString().Length - 1];
            char tcSonKarekter = tcKimlikNo[tcKimlikNo.ToString().Length - 1];
            if (toplamSonKarekter == tcSonKarekter)
            {
               string yeniDosyaAdi = tcKimlikNo + ".pdf";
               using (MemoryStream memoryStream = new MemoryStream())
               {
                  PdfCopy pdfCopy = new PdfCopy(document, memoryStream);
                  document.Open();
                  pdfCopy.AddPage(pdfCopy.GetImportedPage(reader, page));
                  document.Close();
                  pdfCopy.Close();
                  byte[] fileBytes = memoryStream.ToArray();
                  var storageClient = StorageClient.Create();
                  var bucketName = "firebaseyolu.appspot.com";
                  var objectName = "class/" + yeniDosyaAdi;
                  storageClient.UploadObject(bucketName, objectName, "application/pdf", new MemoryStream(fileBytes));
              }               
           }
         }
       } 
      }
     }
   }
 }

In the first stage, we took the pdf document that the user wants to upload and saved it to firebase with the PdfYukle function. The subsequent operations are very close to the first example. Again, we took the 11 digit numbers with regex and made sure that it was an ID number. We then saved this document in firebase storage.

PdfYukle function;

        private string PdfYukle(string fileName, string contentType, Stream fileStream)
        {
            string newFileName = fileName + ".pdf";

            var storage = StorageClient.Create();

            var bucketName = "firebaseyolu.appspot.com";
            var objectName = "class/" + newFileName;


            new Google.Apis.Storage.v1.Data.Object()
            {
                Bucket = bucketName,
                Name = objectName,
                ContentType = contentType,
                Metadata = new Dictionary<string, string>()
            {

            { "firebaseStorageDownloadTokens", Guid.NewGuid().ToString() }
            }
            };
   
            var newObject = new Google.Apis.Storage.v1.Data.Object()
            {
                Bucket = bucketName,
                Name = objectName,
                ContentType = contentType,
                Metadata = new Dictionary<string, string>()
            {

            { "firebaseStorageDownloadTokens", Guid.NewGuid().ToString() }
            }
            };

            storage.UploadObject(bucketName, objectName, contentType, fileStream);

            string url = $"https://firebasestorage.googleapis.com/v0/b/{bucketName}/o/{Uri.EscapeDataString(objectName)}?alt=media&token={newObject.Metadata["firebaseStorageDownloadTokens"]}";

            return url;
        }

This is the process of separating the pdf file into pages. See you in the next content.